Document Compaction for Efficient Query Biased Snippet Generation [chapter]

Yohannes Tsegay, Simon J. Puglisi, Andrew Turpin, Justin Zobel
2009 Lecture Notes in Computer Science  
Current web search engines return query-biased snippets for each document they list in a result set. For efficiency, search engines operating on large collections need to cache snippets for common queries, and to cache documents to allow fast generation of snippets for uncached queries. To improve the hit rate on a document cache during snippet generation, we propose and evaluate several schemes for reducing document size, hence increasing the number of documents in the cache. In particular, we
more » ... argue against further improvements to document compression, and argue for schemes that prune documents based on the a priori likelihood that a sentence will be used as part of a snippet for a given document. Our experiments show that if documents are reduced to less than half their original size, 80% of snippets generated are identical to those generated from the original documents. Moreover, as the pruned, compressed surrogates are smaller, 3-4 times as many documents can be cached.
doi:10.1007/978-3-642-00958-7_45 fatcat:l57ewhdi3vfbfkrhokfg3jcn3m