Low-cost management of inverted files for online full-text search

Giorgos Margaritis, Stergios V. Anastasiadis
2009 Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09  
In dynamic environments with frequent content updates, we require online full-text search that scales to large data collections and achieves low search latency. Several recent methods that support fast incremental indexing of documents typically keep on disk multiple partial index structures that they continuously update as new documents are added. However, spreading indexing information across multiple locations on disk tends to considerably decrease the search responsiveness of the system. In
more » ... the present paper, we take a fresh look at the problem of online full-text search with consideration of the architectural features of modern systems. Selective Range Flush is a greedy method that we introduce to manage the index in the system by using fixed-size blocks to organize the data on disk and dynamically keep low the cost of data transfer between memory and disk. As we experimentally demonstrate with the Proteus prototype implementation that we developed, we retrieve indexing information at latency that matches the lowest achieved by existing methods. Additionally, we reduce the total building cost by 30% in comparison to methods with similar retrieval time.
doi:10.1145/1645953.1646012 dblp:conf/cikm/MargaritisA09 fatcat:qtyfmzdcvnawnm33ksxvsej7wm