3,251 Hits in 3.7 sec

Are There New BM25 Expectations?

Emanuele Di Buccio, Giorgio Maria Di Nunzio
2013 Italian Information Retrieval Workshop  
In this paper, we present some ideas about possible directions of a new interpretation of the Okapi BM25 ranking formula.  ...  This approach has been tested on a visual data mining tool and the initial results are encouraging.  ...  Conclusions This paper presents a new direction for the study of the Okapi BM25 model.  ... 
dblp:conf/iir/BuccioN13 fatcat:ysuqcoozezb2das44sl2nzlaoi

A Quantum Expectation Value Based Language Model with Application to Question Answering

Qin Zhao, Chenguang Hou, Changjian Liu, Peng Zhang, Ruifeng Xu
2020 Entropy  
Words and sentences are viewed as different observables in this quantum model.  ...  In this paper, we propose a novel Quantum Expectation Value based Language Model (QEV-LM). A unique shared density matrix is constructed for the Semantic Hilbert Space.  ...  BLSTM + BM25) [34] .  ... 
doi:10.3390/e22050533 pmid:33286305 fatcat:npusps54nbdv3ic7glcv7tvmze

Great Expectations: Unsupervised Inference of Suspense, Surprise and Salience in Storytelling [article]

David Wilmot
2022 arXiv   pre-print
Extensions add memory and external knowledge from story plots and from Wikipedia to infer salience on novels such as Great Expectations and plays such as Macbeth.  ...  Stories interest us not because they are a sequence of mundane and predictable events but because they have drama and tension.  ...  There are also genres such as in the hero's journey where although the characters and circumstances are new the plots can be fairly formulaic.  ... 
arXiv:2206.09708v1 fatcat:k4oefywyxvgn5gdtedyvr5mbpi

A probabilistic model to exploit user expectations in XML information retrieval

Fouad Dahak, Mohand Boughanem, Amar Balla
2017 Information Processing & Management  
The main objective of this paper is to exploit a new source of evidence derived from the document hierarchical structure for XML information retrieval.  ...  We therefore exploited a new source of evidence, the structural context importance, in order to quantify the user expectations.  ...  We note that there is not a direct relation between the two features.  ... 
doi:10.1016/j.ipm.2016.06.008 fatcat:uyvkylwranhbvornilmdbhqwny

Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model

Scott Sanner, Shengbo Guo, Thore Graepel, Sadegh Kharazmi, Sarvnaz Karimi
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
This new result is complementary to a variety of diverse retrieval algorithms derived from alternate rank-based relevance criteria such as average precision and reciprocal rank.  ...  In this paper, we proceed one step further and show theoretically that greedily optimizing expected 1-call@k w.r.t. a latent subtopic model of binary relevance leads to a diverse retrieval algorithm sharing  ...  Both MMR and Exp-1-call@k are used to rank the top-20 documents from the top-100 OKAPI BM25 results.  ... 
doi:10.1145/2063576.2063869 dblp:conf/cikm/SannerGGKK11 fatcat:aq3mrlkm4jf3fjvsvi7hkbejka

Relevance Feedback for Best Match Term Weighting Algorithms in Information Retrieval

Djoerd Hiemstra, Stephen E. Robertson
2001 DELOS Workshops / Conferences  
The paper shows that there are no significant differences between simple and sophisticated approaches to relevance feedback.  ...  The paper shows the resemblance of the approaches to relevance feedback of these models, introduces new approaches to relevance feedback for both models, and evaluates the new relevance feedback algorithms  ...  For long queries, however, the results are slightly worse than the ordinary Table 3 new relevance feedback algorithms Pair wise comparison of the new relevance feedback and the ad-hoc experiments shows  ... 
dblp:conf/delos/HiemstraR01 fatcat:4444cfrclzbrdevsi77iipm4ve

BM25-FIC: Information Content-based Field Weighting for BM25F

Tuomas Ketola, Thomas Roelleke
2020 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval  
This paper tackles the challenge by introducing a new analytical method for the automatic estimation of these weights.  ...  The field weights are applied to each document separately rather than to the entire field, as normally done by BM25F where the field weights are constant across documents.  ...  BM25-FIC -Information Content based BM25F There are two common ways in which multiple fields are considered in the BM25 context.  ... 
dblp:conf/sigir/KetolaR20 fatcat:ltrgnuccdrbpjneuj4bj2w2tny

When documents are very long, BM25 fails!

Yuanhua Lv, ChengXiang Zhai
2011 Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR '11  
We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents.  ...  Our experiments show that BM25L, with the same computation cost, is more effective and robust than the standard BM25.  ...  This is likely because there are generally more very long documents in web collections, where the problem of BM25, i.e., overly-penalizing very long documents, would presumably be more severe.  ... 
doi:10.1145/2009916.2010070 dblp:conf/sigir/LvZ11 fatcat:37bbkwgtdbcfhihkdxzkqkjpjq

RealSakaiLab at the TREC 2020 Health Misinformation Track

Sijie Tao, Tetsuya Sakai
2020 Text Retrieval Conference  
The results from a language identification model, a news category classifier and a majority score calculation were used to modify the BM25 scores of the baseline ranking.  ...  To address both relevance and credibility, we combined several techniques to re-rank a BM25 baseline ranking.  ...  There are 41 categories, and we chose four categories, politics, science & tech, wellness, and world news as "relevant categories", that could contain health-related information about COVID-19.  ... 
dblp:conf/trec/TaoS20 fatcat:nti6b3753vha3fbmghnpwnsbo4

Optimal Structure Weighted Retrieval

Andrew Trotman
2004 Australasian Document Computing Symposium  
Weights are learned for vector space inner product, naïve probability and BM25 ranking functions and a performance upper bound is calculated.  ...  The upper bound using a different set of weights for each query, gives mean average precision improvements of about 15% for BM25 and naïve probability; about 30% for inner product.  ...  From Table 3 ; there is no cross function correlation in how much each topic is improved. Using this result, its possible to ask a new question.  ... 
dblp:conf/adcs/Trotman04 fatcat:zc2tx6lpsrfnfbvubsyabykvam

Ad hoc IR

Andrew Trotman, David Keeler
2011 Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR '11  
The conclusion is there isn't much room for ranking function improvement.  ...  First the performance of BM25 is measured as the proportion of queries satisfied on the first page of 10 results -it performs well. The performance is then compared to human performance.  ...  The results are presented in Figure 3 from where it can be seen that when BM25 was introduced (circa 1994) there was room for improvement on TF.IDF.  ... 
doi:10.1145/2009916.2010066 dblp:conf/sigir/TrotmanK11 fatcat:l2dt3mykjrfa7orv5ok3bgokmu

BioASQ Synergy: A strong and simple baseline rooted in relevance feedback

Tiago Almeida, Sérgio Matos
2021 Conference and Labs of the Evaluation Forum  
Code to reproduce our submissions are available on  ...  Then, the revised query is processed by our BioASQ-8b pipeline consisting of BM25 followed by a lightweight neural reranking model.  ...  This new query is then processed by the BM25, hopefully returning a new list of documents that are more similar to the positive documents.  ... 
dblp:conf/clef/AlmeidaM21 fatcat:f627qsbsivetfmy23ai5liv7lu

A Standard Document Score for Information Retrieval

Ronan Cummins
2013 Proceedings of the 2013 Conference on the Theory of Information Retrieval - ICTIR '13  
With experiments on a number of different TREC collections, we show that the standard document score model is comparable with BM25.  ...  However, we show that an advantage of the standard document score model is that the document scores output from the model are dimensionless quantities, and therefore are comparable across different queries  ...  The expected value and variance of the population are then known quantities.  ... 
doi:10.1145/2499178.2499183 dblp:conf/ictir/Cummins13 fatcat:j5dm4kxvwfai7gemwkl6jhgrby

Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models? [article]

Ellen M. Voorhees, Ian Soboroff, Jimmy Lin
2022 arXiv   pre-print
These new search results consisted of five new runs (one each from three transformer-based models and two baseline runs that use BM25) plus the set of TREC-8 submissions that did not previously contribute  ...  To test the reusability claim, we asked TREC assessors to judge new pools created from new search results for the TREC-8 ad hoc collection.  ...  Qualitatively, the effectiveness of the runs and their rank positions are generally within expectations, although there are a few surprises.  ... 
arXiv:2201.11086v1 fatcat:3i7qnoivqjbnnfdsodimpe2wdi

Improvements to BM25 and Language Models Examined

Andrew Trotman, Antti Puurula, Blake Burgess
2014 Proceedings of the 2014 Australasian Document Computing Symposium on - ADCS '14  
In this investigation 9 recent ranking functions (BM25, BM25+, BM25T, BM25-adpt, BM25L, TFl••pID, LM-DS, LM-PYP, and LM-PYP-TFIDF) are compared by training on the INEX 2009 Wikipedia collection and testing  ...  We find that once trained (using particle swarm optimization) there is very little difference in performance between these functions, that relevance feedback is effective, that stemming is effective, and  ...  For brevity the optimal parameters (of which there are up-to 8) are omitted.  ... 
doi:10.1145/2682862.2682863 dblp:conf/adcs/TrotmanPB14 fatcat:jvio2xnkjrd2dcc2shnqh5jl4q
« Previous Showing results 1 — 15 out of 3,251 results