Query-Biased Partitioning for Selective Search

Zhuyun Dai, Chenyan Xiong, Jamie Callan
2016 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM '16  
Selective search is a cluster-based distributed retrieval architecture that reduces computational costs by partitioning a corpus into topical shards, and selectively searching them. Prior research formed topical shards by clustering the corpus based on the documents' contents. This content-based partitioning strategy reveals common topics in a corpus. However, the topic distribution produced by clustering may not match the distribution of topics in search traffic, which may reduce the
more » ... ess of selective search. This paper presents a query-biased partitioning strategy that aligns document partitions with topics from query logs. It focuses on two parts of the partitioning process: clustering initialization and document similarity calculation. A querydriven clustering initialization algorithm uses topics from query logs to form cluster seeds. A query-biased similarity metric favors terms that are important in query logs. Both methods boost retrieval effectiveness, reduce variance, and produce a more balanced distribution of shard sizes.
doi:10.1145/2983323.2983706 dblp:conf/cikm/DaiXC16 fatcat:vegfhdzrdzcjlaksilho7z7mbu