185,152 Hits in 2.8 sec

A comparison of the optimality of statistical significance tests for information retrieval evaluation

Julián Urbano, Mónica Marrero, Diego Martín
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
Previous research has suggested the permutation test as the theoretically optimal statistical significance test for IR evaluation, and advocated for the discontinuation of the Wilcoxon and sign tests.  ...  We present a large-scale study comprising nearly 60 million system comparisons showing that in practice the bootstrap, t-test and Wilcoxon test outperform the permutation test under different optimality  ...  INTRODUCTION An Information Retrieval (IR) researcher is often faced with the question of which of two IR systems, A and B, performs better.  ... 
doi:10.1145/2484028.2484163 dblp:conf/sigir/UrbanoMM13a fatcat:ti27ac7x2nadla2td6mx3n7sfu

Experimental methods for information retrieval

Donald Metzler, Oren Kurland
2012 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12  
Acknowledgments • We thank Fiana Raiber for running experiments and generating the tables used in this tutorial  ...  -Find optimal free-parameter values for a train set (of queries), and apply these for the test set of queries -As the number of queries is often small, use cross validation 'i' and 'r' mark statistically  ...  train-test split of the query set • Statistical significance testing -Paired, two-tailed, tests; p-value=0.01, 0.05, … -Recommended tests: paired t-test, permutation test (Smucker et al. '09) -Approach  ... 
doi:10.1145/2348283.2348534 dblp:conf/sigir/MetzlerK12 fatcat:k3j7brhwmzahdpomupi7z26m4i

Learning more powerful test statistics for click-based retrieval evaluation

Yisong Yue, Yue Gao, Oliver Chapelle, Ya Zhang, Thorsten Joachims
2010 Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10  
We present new methods for learning how to score different types of clicks so that the resulting test statistic optimizes the statistical power of the experiment.  ...  Designed as a blind and unbiased test for eliciting a preference between two retrieval functions, an interleaved ranking of the results of two retrieval functions is presented to the users.  ...  Acknowledgements The work is funded by NSF Awards IIS-0812091 and IIS-0905467. The first author is also supported in part by a Microsoft Research Graduate Fellowship and a Yahoo!  ... 
doi:10.1145/1835449.1835534 dblp:conf/sigir/YueGCZJ10 fatcat:6wj2sinpj5bldhau4b24uvnna4

PubMed related articles: a probabilistic topic-based model for content similarity

Jimmy Lin, W. John Wilbur
2007 BMC Bioinformatics  
Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision.  ...  We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH ® in MEDLINE ® .  ...  Acknowledgements For this work, JL was funded in part by the National Library of Medicine, where he was a visiting research scientist during the summer of 2006.  ... 
doi:10.1186/1471-2105-8-423 pmid:17971238 pmcid:PMC2212667 fatcat:rnwzjiaofjdmniqqw2edjjrqfu

A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information [chapter]

Mohannad ALMasri, Catherine Berrut, Jean-Pierre Chevallet
2016 Lecture Notes in Computer Science  
We use language models for information retrieval to evaluate expansion methods.  ...  The experiments are conducted on four CLEF collections show a statistically significant improvement over the language models and other expansion models.  ...  We use two tests for statistical significance: † indicates a statistical significant improvement over NEXP, and ⇤ indicates a statistical significant improvement over PRF.  ... 
doi:10.1007/978-3-319-30671-1_57 fatcat:qqe45dm3cbfeleyh5jpw2bzjyu

Query-Dependent Feature Weighting [chapter]

Donald Metzler
2011 A Feature-Centric View of Information Retrieval  
Metzler, A Feature-Centric View of Information Retrieval, The Information Retrieval Series 27,  ...  The extension is a generic framework for learning the importance of query term concepts in a way that directly optimizes an underlying retrieval metric.  ...  Training and evaluation of the retrieval models is done using this set of judged Web pages, which is a common evaluation practice for this type of test collection.  ... 
doi:10.1007/978-3-642-22898-8_5 fatcat:3l4sid3novbvbarxruveete3sq

ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons [article]

Margaret Li, Jason Weston, Stephen Roller
2019 arXiv   pre-print
The questions themselves are optimized to maximize the robustness of judgments across different annotators, resulting in better tests.  ...  While dialogue remains an important end-goal of natural language research, the difficulty of evaluation is an oft-quoted reason why it remains troublesome to make real progress towards its solution.  ...  Statistical significance can be computed using a binomial test.  ... 
arXiv:1909.03087v1 fatcat:qobsgsp4y5e4zonbyatv634c3a

An adaptive evidence weighting method for medical record search

Dongqing Zhu, Ben Carterette
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
the same department) of a patient.  ...  Furthermore, we explore several informative features for learning our retrieval model.  ...  To access the statistical significance of differences in the performance of two systems, we perform one-tailed paired t-test for MAP (since we train systems on MAP).  ... 
doi:10.1145/2484028.2484175 dblp:conf/sigir/ZhuC13 fatcat:pcd5rdvf7fab7dxij5yvnn5iqu

Parameterized concept weighting in verbose queries

Michael Bendersky, Donald Metzler, W. Bruce Croft
2011 Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR '11  
The majority of the current information retrieval models weight the query concepts (e.g., terms or phrases) in an unsupervised manner, based solely on the collection statistics.  ...  The experimental results on both newswire and web TREC corpora show that our model consistently and significantly outperforms a wide range of state-of-the-art retrieval models.  ...  We then use a held-out set of test queries to evaluate the performance of the optimized weights WΦ ( Fig. 1 (b) ).  ... 
doi:10.1145/2009916.2009998 dblp:conf/sigir/BenderskyMC11 fatcat:oqmuyiqpafbttavj5wtmuj6wai

A Comparison of Retrieval Models using Term Dependencies

Samuel Huston, W. Bruce Croft
2014 Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM '14  
A number of retrieval models incorporating term dependencies have recently been introduced.  ...  In this paper, we compare the effectiveness of recent bi-term dependency models over a range of TREC collections, for both short (title) and long (description) queries.  ...  Acknowledgments This work was supported in part by the Center for Intelligent Information Retrieval and in part by NSF grant #CNS-0934322.  ... 
doi:10.1145/2661829.2661894 dblp:conf/cikm/HustonC14 fatcat:s7w3xwp6bvaali7kuy2pmxr6ym

Face Retrieval Based on Robust Local Features and Statistical-Structural Learning Approach

Daidi Zhong, Irek Defée
2008 EURASIP Journal on Advances in Signal Processing  
A framework for the unification of statistical and structural information for pattern retrieval based on local feature sets is presented.  ...  We show how a pattern retrieval system based on the feature histograms can be optimized in a training process for the best performance.  ...  The authors would like to thank NIST for providing the FERET data. Support of first author by TISE scholarship is gratefully acknowledged.  ... 
doi:10.1155/2008/631297 fatcat:fahcxksyyzfvnkuxikxalbpxua

Learning concept importance using a weighted dependence model

Michael Bendersky, Donald Metzler, W. Bruce Croft
2010 Proceedings of the third ACM international conference on Web search and data mining - WSDM '10  
Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where relevance at high ranks is  ...  To test the weighted dependence model, we perform experiments on both publicly available TREC corpora and a proprietary web corpus.  ...  Training and evaluation of the retrieval models is done using this set of judged web pages, which is a common evaluation practice for this type of test collection.  ... 
doi:10.1145/1718487.1718492 dblp:conf/wsdm/BenderskyMC10 fatcat:eofm5bl77bbahjujc6ase5afvu

Exploration and visualization of OLAP cubes with statistical tests

Carlos Ordonez, Zhibo Chen
2009 Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery Integrating Automated Analysis with Interactive Exploration - KDD '09  
A parametric statistical test allows pair-wise comparison of neighboring cells in cuboids, providing statistical evidence about the validity of findings.  ...  We introduce a twodimensional checkerboard visualization of the cube that allows interactive exploration to understand significant measure differences between two cuboids differing in one dimension along  ...  Acknowledgments We would like to thank the Emory University Hospital for providing the medical data set used in this work.  ... 
doi:10.1145/1562849.1562855 dblp:conf/kdd/OrdonezC09 fatcat:4b3kk6mkm5fdjmmtonpsstl7sq

Employing Search Engine Optimization (SEO) Techniques for Improving the Discovery of Geospatial Resources on the Web

Samy Katumba, Serena Coetzee
2017 ISPRS International Journal of Geo-Information  
The statistical results were significant in most of the tests performed.  ...  With the increasing use of geographical information and technology in a variety of knowledge domains and disciplines, the need to discover and access suitable geospatial data is imperative.  ...  guidance they provided with regards to the choice of appropriate statistical analysis tests to perform in this research.  ... 
doi:10.3390/ijgi6090284 fatcat:ir7b2tmvrzgtxjeo3sxqs5m224

Integrating clusters created offline with query-specific clusters for document retrieval

Lior Meister, Oren Kurland, Inna Gelfer Kalmanovich
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
We present the potential merit of integrating these two types of clusters.  ...  Previous work on cluster-based document retrieval has used either static document clusters that are created offline, or query-specific (dynamic) document clusters that are created from top-retrieved documents  ...  Acknowledgments We thank the reviewers for their comments, and Lillian Lee for discussions of ideas presented in this paper.  ... 
doi:10.1145/1571941.1572088 dblp:conf/sigir/MeisterKK09 fatcat:zzyknmmk35babmllmb6kqeeaye
« Previous Showing results 1 — 15 out of 185,152 results