199,387 Hits in 8.8 sec

Using score distributions to compare statistical significance tests for information retrieval evaluation

Javier Parapar, David E. Losada, Manuel A. Presedo‐Quindimil, Alvaro Barreiro
2019 Journal of the Association for Information Science and Technology  
Using Score Distributions, we model the output of multiple search systems, produce simulated search results from such models, and compare them using various significance tests.  ...  This new method for studying the power of significance tests in Information Retrieval evaluation is formal and innovative.  ...  We also thank the anonymous reviewers for their really useful suggestions and comments.  ... 
doi:10.1002/asi.24203 fatcat:jgohta6wmvfhbm3owh4zdoq42u

Using statistical testing in the evaluation of retrieval experiments

David Hull
1993 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '93  
Anumber of different statistical tests are described for determining if differences in performance between retrieval methods are significant.  ...  However, one can test this assumption using simple diagnostic plots, and if it is a poor approximation, there are anumber ofnon-parametric alternatives.  ...  The value of this statistic can be compared to the F-distribution, which is the distri- bution of these scores if all retrieval methods are equally effect ive.  ... 
doi:10.1145/160688.160758 dblp:conf/sigir/Hull93 fatcat:4tiviqm5pfbpdpulvcadd67f3q

By the power of Grayskull

Laurence A. F. Park, Glenn Stone
2014 Proceedings of the 2014 Australasian Document Computing Symposium on - ADCS '14  
Information Retrieval evaluation is typically performed using a sample of queries and a statistical hypothesis test is used to make inferences about the systems accuracy on the population of queries.  ...  Research has shown that the t test is one of a set of tests that provides the greatest statistical power while maintaining acceptable type I error rates, when evaluating with a large sample of queries.  ...  Acknowledgement The authors thank Falk Scholer for his comments and advice on Information Retrieval evaluation methods.  ... 
doi:10.1145/2682862.2682878 dblp:conf/adcs/ParkS14 fatcat:jyvegogm25bcxd2qnoap2eonru

Evaluation Metrics and Evaluation [chapter]

Hercules Dalianis
2018 Clinical Text Mining  
First the scientific base for evaluation of all information retrieval systems, called the Cranfield paradigm will be described.  ...  Statistical significance testing will be presented. This chapter will also discuss manual annotation and inter-annotator agreement, annotation tools such as BRAT and the gold standard.  ...  topics are used for the evaluation of information retrieval.  ... 
doi:10.1007/978-3-319-78503-5_6 fatcat:v5mykkmvhrf4xlzcrpwmoi4sdy

Evaluating the Interest of Revamping Past Search Results [chapter]

Claudio Gutiérrez-Soto, Gilles Hubert
2013 Lecture Notes in Computer Science  
Exponential and Zipf distribution as well as Bradford's law are applied to construct simulated document collections suitable for information retrieval evaluation.  ...  In this paper we present two contributions: a method to construct simulated document collections suitable for information retrieval evaluation as well as an approach of information retrieval using past  ...  Then, we applied the Student's paired sample t-test to test if the difference between the two compared approaches with regards to P@10 was statistically significant. Experiment 1.  ... 
doi:10.1007/978-3-642-40173-2_9 fatcat:r5eyqwcncfhppnblas4qn4w2cy

Measuring the Variability in Effectiveness of a Retrieval System [chapter]

Mehdi Hosseini, Ingemar J. Cox, Natasa Millic-Frayling, Vishwa Vinay
2010 Lecture Notes in Computer Science  
A typical evaluation of a retrieval system involves computing an effectiveness metric, e.g. average precision, for each topic of a test collection and then using the average of the metric, e.g. mean average  ...  However, averages do not capture all the important aspects of effectiveness and, used alone, may not be an informative measure of systems' effectiveness.  ...  Acknowledgements The authors thank Jun Wang and Jianhan Zhu of UCL and Stephan Robertson of Microsoft Research Cambridge for useful discussion on earlier drafts of this paper. Bibliography  ... 
doi:10.1007/978-3-642-13084-7_7 fatcat:bmz6ae4ui5aq3m4n32s2xr6igu

A comparison of statistical significance tests for information retrieval evaluation

Mark D. Smucker, James Allan, Ben Carterette
2007 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM '07  
Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student's paired t-test, the Wilcoxon signed rank test, and the sign test.  ...  For each of these five tests, we took the ad-hoc retrieval runs submitted to TRECs 3 and 5-8, and for each pair of runs, we measured the statistical significance of the difference in their mean average  ...  The IR researcher should select a significance test that uses the same test statistic as the researcher is using to compare systems.  ... 
doi:10.1145/1321440.1321528 dblp:conf/cikm/SmuckerAC07 fatcat:yysoebqcxvaxjcuegw3mtdetdm

Combining Multiple Strategies for Effective Monolingual and Cross-Language Retrieval

Jacques Savoy
2004 Information retrieval (Boston)  
This paper describes and evaluates different retrieval strategies that are useful for search operations on document collections written in various European languages, namely French, Italian, Spanish and  ...  In order to cross language barriers, we propose a combined query translation approach that has resulted in interesting retrieval effectiveness.  ...  The author would like to thank the three anonymous referees for their helpful suggestions and remarks.  ... 
doi:10.1023/b:inrt.0000009443.51912.e7 fatcat:vpdkttg7y5bijmnzlmh5wkar7u

How Significant Is Statistically Significant? The Case Of Audio Music Similarity And Retrieval

Julián Urbano, J. Stephen Downie, Brian McFee, Markus Schedl
2012 Zenodo  
INTRODUCTION Evaluation experiments are the main research tool in Information Retrieval (IR) to determine which systems perform well and which perform poorly for a given task [1] .  ...  Thus, observing a statistically significant difference does 13th International Society for Music Information Retrieval Conference (ISMIR 2012) not mean that the systems really are different, in fact  ... 
doi:10.5281/zenodo.1418054 fatcat:m5c5dettxbaq3ktscpnp3xsgom


Sadanandam Manchala
2012 IOSR Journal of Engineering  
Significance tests are often used to estimate the reliability of such comparisons. In this research paper, we revisit the question of how such significance tests should be used.  ...  The efficiency of retrieval system is precise by comparing performance on a regular set of queries in Information Retrieval (IR) and MLIR systems.  ...  INTRODUCTION Test collections are the principal tool used for comparison and evaluation of retrieval systems.  ... 
doi:10.9790/3021-0204794802 fatcat:furywk2wzzbdbhyyuqzzenq5hu

Investigating the exhaustivity dimension in content-oriented XML element retrieval evaluation

Paul Ogilvie, Mounia Lalmas
2006 Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06  
This paper attempts to answer this question through extensive statistical tests to compare the conclusions about system performance that could be made under different assessment scenarios.  ...  INEX, the evaluation initiative for content-oriented XML retrieval, has since its establishment defined the relevance of an element according to two graded dimensions, exhaustivity and specificity.  ...  Acknowledgements The INEX initiative is an activity of DELOS, a network of excellence for digital libraries. Paul Ogilvie was funded in part by NSF grant IIS-0534345.  ... 
doi:10.1145/1183614.1183631 dblp:conf/cikm/OgilvieL06 fatcat:fj2ik7me3rdzpm74nzycrdwsha

Within-Document Term-Based Index Pruning with Statistical Hypothesis Testing [chapter]

Sree Lekha Thota, Ben Carterette
2011 Lecture Notes in Computer Science  
Our method is based on the statistical significance of term frequency ratios: using the two-sample two-proportion (2P2N) test, we statistically compare the frequency of occurrence of a word within a given  ...  Furthermore, we give a formal statistical justification for such methods.  ...  Our goal is to test whether retrieval speed and effectiveness are substantially affected by pruning using the 2N2P tests, and to compare those tests to the baseline.  ... 
doi:10.1007/978-3-642-20161-5_54 fatcat:s27nko6ravhdlm2r23bbharfsy

Does degree of work task completion influence retrieval performance?

Peter Ingwersen, Toine Bogers, Marianne Lykke
2010 Proceedings of the American Society for Information Science and Technology  
Also, with the exception of full text records and across all document types, both measured at rank 10, no statistically significant correlation is observed with respect to retrieval performance influenced  ...  In this contribution we investigate the potential influence between assessors' perceived completion of their work task at hand and their actual assessment of usefulness of the retrieved information.  ...  up to rank 30, and statistically significant at nDCG10: when work tasks are perceived 'Not Complete' the usefulness score of the retrieved documents is indeed lower than for tasks felt 'Somewhat Complete  ... 
doi:10.1002/meet.14504701321 fatcat:qd4poaxlireqffcz45qzhvyjci

User-Centered Measures Vs. System Effectiveness In Finding Similar Songs

Xiao Hu 0001, Noriko Kando
2012 Zenodo  
We also thank the IMIRSEL in the University of Illinois for providing the MIREX AMS data.  ...  Many of these studies used TREC (Text Retrieval Conference) evaluation results to select systems to be evaluated by users and to obtain data on system effectiveness.  ...  does not assume normal distribution of tested variables.  ... 
doi:10.5281/zenodo.1416868 fatcat:lsuwxvvdlra6robk6j6uc7xhiq

A Comparative User Study of Web Search Interfaces: HotMap, Concept Highlighter, and Google

Orland Hoeber, Xue Yang
2006 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)  
We suggest the use of information visualization and interactive visual manipulation as methods for improving the ability of users to evaluate the results of a web search.  ...  Users of traditional web search engines commonly find it difficult to evaluate the results of their web searches.  ...  For Task 1, the differences in the perceived precision scores proved to be statistically significant; for Task 2, the differences proved to not be statistically significant.  ... 
doi:10.1109/wi.2006.6 dblp:conf/webi/HoeberY06 fatcat:nfwuiqpx7bhk3pps3a3iyjyq24
« Previous Showing results 1 — 15 out of 199,387 results