When query expansion fails

Bodo Billerbeck, Justin Zobel
2003 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03  
The effectiveness of queries in information retrieval can be improved through query expansion. This technique automatically introduces additional query terms that are statistically likely to match documents on the intended topic. However, query expansion techniques rely on fixed parameters. Our investigation of the effect of varying these parameters shows that the strategy of using fixed values is questionable. Introduction Query expansion has been widely investigated as a method for improving
more » ... he performance of information retrieval [1, 2, 5, 7, 8, 10] . It is the only effective automatic method for solving the problem of vocabulary mismatch: queries are often not well-formulated, but may be ambiguous, insufficiently precise, or use terminology that is specific to a country-consider for example the US "wrench" versus the UK "spanner". Alternatives to query expansion, such as thesaurus-based techniques, have not been as successful [5] . Query expansion or QE-also known as pseudo-relevance feedback or automatic query expansion-is based on the observation that the top-ranked documents have a reasonable probability of being relevant. It can heuristically be assumed that the first 10 (say) matches to a query are relevant; terms from these documents can be used to form a new query. Several alternative approaches to QE have been described. We have focused on a method shown by Robertson and Walker to be successful at TREC 8 [8], where an average of about 10% improvement in effectiveness was demonstrated through query expansion. In this approach, documents are initially ranked using the Okapi BM25 measure [7, 9] applied to the original query. (For a discussion of this formulation, see Sparck Jones, Walker, and Robertson [11].) In common with all query expansion methods, the Okapi approach requires several parameters, with values determined in experiments on a particular test data set. Expansion terms are
doi:10.1145/860500.860514 fatcat:zri5vxidlzc5jb4xb6grqoih6m