Predicting query difficulty on the web by learning visual clues

Eric C. Jensen, Steven M. Beitzel, David Grossman, Ophir Frieder, Abdur Chowdhury
2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05  
We describe a method for predicting query difficulty in a precision-oriented web search task. Our approach uses visual features from retrieved surrogate document representations (titles, snippets, etc.) to predict retrieval effectiveness for a query. By training a supervised machine learning algorithm with manually evaluated queries, visual clues indicative of relevance are discovered. We show that this approach has a moderate correlation of 0.57 with precision at 10 scores from manual
more » ... judgments of the top ten documents retrieved by ten web search engines over 896 queries. Our findings indicate that difficulty predictors which have been successful in recall-oriented ad-hoc search, such as clarity metrics, are not nearly as correlated with engine performance in precision-oriented tasks such as this, yielding a maximum correlation of 0.3. Additionally, relying only on visual clues avoids the need for collection statistics that are required by these prior approaches. This enables our approach to be employed in environments where these statistics are unavailable or costly to retrieve, such as metasearch.
doi:10.1145/1076034.1076155 dblp:conf/sigir/JensenBGFC05 fatcat:ck6jsgos4bf6jfkbycjv5kvi4u