A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine
Proceedings of the third ACM international conference on Web search and data mining - WSDM '10
We propose a new model to interpret the clickthrough logs of a web search engine. This model is based on explicit assumptions on the user behavior. In particular, we draw conclusions on a document relevance by observing the user behavior after he examined the document and not based on whether a user clicks or not a document url. This results in a model based on intrinsic relevance, as opposed to perceived relevance. We use the model to predict document relevance and then use this as feature for
... this as feature for a "Learning to Rank" machine learning algorithm. Comparing the ranking functions obtained by training the algorithm with and without the new feature we observe surprisingly good results. This is particularly notable given that the baseline we use is the heavily optimized ranking function of a leading commercial search engine. A deeper analysis shows that the new feature is particularly helpful for non navigational queries and queries with a large abandonment rate or a large average number of queries per session. This is important because these types of query is considered to be the most difficult to solve.