Processing Long Queries Against Short Text

Dongxiang Zhang, Yuchen Li, Ju Fan, Lianli Gao, Fumin Shen, Heng Tao Shen
2017 ACM Transactions on Information Systems  
and Technology of China Many real applications in real-time news stream advertising call for efficient processing of long queries against short text. In such applications, dynamic news feeds are regarded as queries to match against an advertisement (ad) database for retrieving the k most relevant ads. The existing approaches to keyword retrieval cannot work well in this search scenario when queries are triggered at a very high frequency. To address the problem, we introduce new techniques to
more » ... nificantly improve search performance. First, we devise a two-level partitioning for tight upper bound estimation and a lazy evaluation scheme to delay full evaluation of unpromising candidates, which can bring three to four times performance boosting in a database with 7 million ads. Second, we propose a novel rank-aware block-oriented inverted index to further improve performance. In this index scheme, each entry in an inverted list is assigned a rank according to its importance in the ad. Then, we introduce a block-at-a-time search strategy based on the index scheme to support a much tighter upper bound estimation and a very early termination. We have conducted experiments with real datasets, and the results show that the rank-aware method can further improve performance by an order of magnitude. . 2017. Processing long queries against short text: Top-k advertisement matching in news stream applications. ACM Trans. Inf.
doi:10.1145/3052772 fatcat:i5sii7dr4nfhfhvi6n3mz5otna