Enhancing ad-hoc relevance weighting using probability density estimation

Xiaofeng Zhou, Jimmy Xiangji Huang, Ben He
2011 Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR '11  
Classical probabilistic information retrieval (IR) models, e.g. BM25, deal with document length based on a trade-off between the Verbosity hypothesis, which assumes the independence of a document's relevance of its length, and the Scope hypothesis, which assumes the opposite. Despite the effectiveness of the classical probabilistic models, the potential relationship between document length and relevance is not fully explored to improve retrieval performance. In this paper, we conduct an
more » ... study of this relationship based on the Scope hypothesis that document length does have its impact on relevance. We study a list of probability density functions and examine which of the density functions fits the best to the actual distribution of the document length. Based on the studied probability density functions, we propose a length-based BM25 relevance weighting model, called BM25L, which incorporates document length as a substantial weighting factor. Extensive experiments conducted on standard TREC collections show that our proposed BM25L markedly outperforms the original BM25 model, even if the latter is optimized.
doi:10.1145/2009916.2009943 dblp:conf/sigir/ZhouHH11 fatcat:tgg6j2egmvd4doyrhbukboo2ny