On event space and rank equivalence between probabilistic retrieval models

Robert W. P. Luk
2008 Information retrieval (Boston)  
This paper discusses various issues about the rank equivalence of Lafferty and Zhai between the log-odds ratio and the query likelihood of probabilistic retrieval models. It highlights that Robertson's concerns about this equivalence may arise when multiple probability distributions are assumed to be uniformly distributed, after assuming that the marginal probability logically follows from Kolmogorov's probability axioms. It also clarifies that there are two types of rank equivalence relations
more » ... etween probabilistic models, namely strict and weak rank equivalence. This paper focuses on the strict rank equivalence which requires the event spaces of the participating probabilistic models to be identical. It is possible that two probabilistic models are strict rank equivalent when they use different probability estimation methods. This paper shows that the query likelihood, p(q|d, r), is strict rank equivalent to p(q|d) of the language model of Ponte and Croft by applying assumptions 1 and 2 of Lafferty and Zhai. In addition, some statistical component language model may be strict rank equivalent to the log-odds ratio, and that some statistical component model using the log-odds ratio may be strict rank equivalent to the query likelihood. Finally, we suggest adding a random variable for the user information need to the probabilistic retrieval models for clarification when these models deal with multiple requests. Keywords Probabilistic models Á Event space Á Information retrieval Introduction Robertson's (2005) instructive paper cautions some of the mathematical derivations of language models for information retrieval using marginal probabilities (in Eq. 1 of Robertson 2005) because the event space of these probabilities is different from the event space of the conditional probability. He further questions the event spaces of the earlier language model by Ponte and Croft (
doi:10.1007/s10791-008-9062-z fatcat:fb44fm6tiveyldqolsbpjqsa6u