Effective top-k computation in retrieving structured documents with term-proximity support

Mingjie Zhu, Shuming Shi, Mingjing Li, Ji-Rong Wen
2007 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM '07  
Modern web search engines are expected to return top-k results efficiently given a query. Although many dynamic index pruning strategies have been proposed for efficient top-k computation, most of them are prone to ignore some especially important factors in ranking functions, e.g. term proximity (the distance relationship between query terms in a document). The inclusion of term proximity breaks the monotonicity of ranking functions and therefore leads to additional challenges for efficient
more » ... ry processing. This paper studies the performance of some existing top-k computation approaches using term-proximity-enabled ranking functions. Our investigation demonstrates that, when term proximity is incorporated into ranking functions, most existing index structures and top-k strategies become quite inefficient. According to our analysis and experimental results, we propose two index structures and their corresponding index pruning strategies: Structured and Hybrid, which performs much better on the new settings. Moreover, the efficiency of index building and maintenance would not be affected too much with the two approaches.
doi:10.1145/1321440.1321547 dblp:conf/cikm/ZhuSLW07 fatcat:eg4w2cacz5chfadxbqngfkjkle