Query performance prediction in web search environments

Yun Zhou, W. Bruce Croft
2007 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07  
Current prediction techniques, which are generally designed for content-based queries and are typically evaluated on relatively homogenous test collections of small sizes, face serious challenges in web search environments where collections are significantly more heterogeneous and different types of retrieval tasks exist. In this paper, we present three techniques to address these challenges. We focus on performance prediction for two types of queries in web search environments: content-based
more » ... d Named-Page finding. Our evaluation is mainly performed on the GOV2 collection. In addition to evaluating our models for the two types of queries separately, we consider a more challenging and realistic situation that the two types of queries are mixed together without prior information on query types. To assist prediction under the mixed-query situation, a novel query classifier is adopted. Results show that our prediction of web query performance is substantially more accurate than the current stateof-the-art prediction techniques. Consequently, our paper provides a practical approach to performance prediction in realworld web settings. Our main contributions include: (1) considerably improved prediction accuracy for web content-based queries over several state-of-the-art techniques. (2) new techniques for successfully predicting NP-query performance. (3) a practical and fully automatic solution to predicting mixed-query performance. In addition, one minor contribution is that we find that the robustness score [1], which was originally proposed for performance prediction, is helpful for query classification. Related work is discussed in Section 2. We detail our prediction models in Section 3. Experimental results are presented in Section 4 and Section 5 concludes the paper.
doi:10.1145/1277741.1277835 dblp:conf/sigir/ZhouC07 fatcat:ecxyackbu5e3hc4v6gn6dnygee