Index structures for efficiently searching natural language text

Pirooz Chubak, Davood Rafiei
2010 Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10  
Many existing indexes on text work at the document granularity and are not effective in answering the class of queries where the desired answer is only a term or a phrase. In this paper, we study some of the index structures that are capable of answering the class of queries referred to here as wild card queries and perform an analysis of their performance. Our experimental results on a large class of queries from different sources (including query logs and parse trees) and with various
more » ... reveal some of the performance barriers of these indexes. We then present Word Permuterm Index (WPI) which is an adaptation of the permuterm index for natural language text applications and show that this index supports a wide range of wild card queries, is quick to construct and is highly scalable. Our experimental results comparing WPI to alternative methods on a wide range of wild card queries show a few orders of magnitude performance improvements for WPI while the memory usage is kept the same for all compared systems.
doi:10.1145/1871437.1871527 dblp:conf/cikm/ChubakR10 fatcat:tttoakw3czf4xphh7dsx5j77ei