Pooled Evaluation Over Query Variations

Alistair Moffat, Falk Scholer, Paul Thomas, Peter Bailey
2015 Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM '15  
Evaluation of information retrieval systems with test collections makes use of a suite of fixed resources: a document corpus; a set of topics; and associated judgments of the relevance of each document to each topic. With large modern collections, exhaustive judging is not feasible. Therefore an approach called pooling is typically used where, for example, the documents to be judged can be determined by taking the union of all documents returned in the top positions of the answer lists returned
more » ... by a range of systems. Conventionally, pooling uses system variations to provide diverse documents to be judged for a topic; different user queries are not considered. We explore the ramifications of user query variability on pooling, and demonstrate that conventional test collections do not cover this source of variation. The effect of user query variation on the size of the judging pool is just as strong as the effect of retrieval system variation. We conclude that user query variation should be incorporated early in test collection construction, and cannot be considered effectively post hoc.
doi:10.1145/2806416.2806606 dblp:conf/cikm/MoffatSTB15 fatcat:frqlmmylvvazndyw7ejalo75ve