Tuesday, February 3, 2009

[week 5] Reading Notes

2 questions have raised this week:

1. The binary independence model assumes that terms occur in in documents independently and the authors say that nevertheless the assumption is not right, in practice the models perform satisfactorily in some occasions. Is there any explanation for this result? this practical evidence is just in English language or it also occurs with Chinese and Arab languages?

2. In chapter 12, the authors say that most of the time the Stop and (1 - Stop) probabilities are omitted from the language model. If this situation incurs in not modeling a well-formed language (according to Equation 12.1), why do authors do this?

See you on Thursday in class!

No comments:

Post a Comment