Where do good query terms come from?

Gheorghe Muresan, Dmitri Roussinov
2007 Proceedings of the American Society for Information Science and Technology  
This paper describes a framework for investigating the quality of different query expansion approaches, and applies it in the HARD TREC experimental setting. The intuition behind our approach is that each topic has an optimal term-based representation, i.e. a set of terms that best describe it, and that the effectiveness of any other representation is correlated with the overlap that it has with the optimal representation. Indeed, we find that, for a wide number of candidate topic
more » ... s, obtained through various query-expansion approaches, there is a high correlation between standard effectiveness measures (R-P, P@10, MAP) and term overlap with what is estimated to be the optimal representation. An important conclusion of comparing different query expansion approaches is that machines are better than humans at doing statistical calculations and at estimating which query terms are more likely to discriminate documents relevant for a given topic. This explains why, in the HARD track of TREC 2005, the overall conclusion was that interaction with the searcher and elicitation of additional information could not over-perform automatic procedures for query improvement. However, the best results are obtained from hybrid approaches, in which human relevance judgments are used by algorithms for deriving terms representations. This result suggest that the best approach in improving retrieval performance is probably to focus on implicit relevance feedback and novel interaction models based on ostention or mediation, which have shown great potential.
doi:10.1002/meet.14504301197 fatcat:wwxrzsrmdbhnde2bx6g5r27x4a