Tuesday, January 20, 2009

[week 3] Reading Notes

This week we read about Index construction and Index compression. About the first topic, I have one doubt:
  • At the end of page 72, talking about dynamic indexing, the authors says "where n is the size of the auxiliary index". May be I just miss some part of the reading, but I don't see how they measure the size of the index: numbers of terms in the dictionary, number of postings, the addition of both, etc. I don't know.
In the second topic:
  • In page 79 is said "in this chapter we define a posting as a docID in a posting list"... does this imply that they didn't assume the same definition for previous chapters?
  • In page 81 Heap's law suggests that "(i) the dictionary size continues to increase with more documents in the collection, rather than a maximum vocabulary size being reached". Is this situation always true even for specific (and sometimes large) domains?
  • In the description in pages 84-85 of a long-string-based dictionary as a replacement of the fixed-width entries, it is clear that a good amount of space is saved. However, I think that the costs of operations for maintaining the index (delete terms from the dictionary, add or delete postings) can be more expensive in the second case, making this solution no so good for dynamic libraries.
That's all this week, folks...

No comments:

Post a Comment