Improving Retrieval Performance for Verbose Queries via Axiomatic Analysis of Term Discrimination Heuristic

Mozhdeh Ariannezhad, Ali Montazeralghaem, Hamed Zamani, Azadeh Shakery
<span title="">2017</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="" style="color: black;">Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR &#39;17</a> </i> &nbsp;
Number of terms in a query is a query-speci c constant that is typically ignored in retrieval functions. However, previous studies have shown that the performance of retrieval models varies for di erent query lengths, and it usually degrades when query length increases. A possible reason for this issue can be the extraneous terms in longer queries that makes it a challenge for the retrieval models to distinguish between the key and complementary concepts of the query. As a signal to understand
the importance of a term, inverse document frequency (IDF) can be used to discriminate query terms. In this paper, we propose a constraint to model the interaction between query length and IDF. Our theoretical analysis shows that current state-of-the-art retrieval models, such as BM25, do not satisfy the proposed constraint. We further analyze the BM25 model and suggest a modi cation to adapt BM25 so that it adheres to the new constraint. Our experiments on three TREC collections demonstrate that the proposed modi cation outperforms the baselines, especially for verbose queries.
