Filters








418,639 Hits in 4.7 sec

A comparison of statistical significance tests for information retrieval evaluation

Mark D. Smucker, James Allan, Ben Carterette
2007 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM '07  
Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student's paired t-test, the Wilcoxon signed rank test, and the sign test.  ...  For each of these five tests, we took the ad-hoc retrieval runs submitted to TRECs 3 and 5-8, and for each pair of runs, we measured the statistical significance of the difference in their mean average  ...  Sakai used bootstrap significance tests to evaluate evaluation metrics [16] , while our emphasis was on the comparison of significance tests.  ... 
doi:10.1145/1321440.1321528 dblp:conf/cikm/SmuckerAC07 fatcat:yysoebqcxvaxjcuegw3mtdetdm

A comparison of the optimality of statistical significance tests for information retrieval evaluation

Julián Urbano, Mónica Marrero, Diego Martín
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
Previous research has suggested the permutation test as the theoretically optimal statistical significance test for IR evaluation, and advocated for the discontinuation of the Wilcoxon and sign tests.  ...  We present a large-scale study comprising nearly 60 million system comparisons showing that in practice the bootstrap, t-test and Wilcoxon test outperform the permutation test under different optimality  ...  INTRODUCTION An Information Retrieval (IR) researcher is often faced with the question of which of two IR systems, A and B, performs better.  ... 
doi:10.1145/2484028.2484163 dblp:conf/sigir/UrbanoMM13a fatcat:ti27ac7x2nadla2td6mx3n7sfu

STATISTICAL SIGNIFICANCE IN MULTILINGUAL INFORMATION RETRIEVAL (MLIR) SYSTEM

Sadanandam Manchala
2012 IOSR Journal of Engineering  
The efficiency of retrieval system is precise by comparing performance on a regular set of queries in Information Retrieval (IR) and MLIR systems.  ...  Significance tests are often used to estimate the reliability of such comparisons. In this research paper, we revisit the question of how such significance tests should be used.  ...  INTRODUCTION Test collections are the principal tool used for comparison and evaluation of retrieval systems.  ... 
doi:10.9790/3021-0204794802 fatcat:furywk2wzzbdbhyyuqzzenq5hu

Audio Music Similarity And Retrieval: Evaluation Power And Stability

Julián Urbano, Diego Martín 0001, Mónica Marrero, Jorge Morato
2011 Zenodo  
The grand results of the evaluation are thus 105 pairwise comparisons between systems, some of which are statistically significant. The rest of the paper is organized as follows.  ...  INTRODUCTION One of the most important tasks in Music Information Retrieval is Audio Music Similarity and Retrieval (AMS).  ... 
doi:10.5281/zenodo.1417268 fatcat:veb2db4dmnecle7bn2cy4f56a4

Using statistical testing in the evaluation of retrieval experiments

David Hull
1993 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '93  
Anumber of different statistical tests are described for determining if differences in performance between retrieval methods are significant.  ...  These tests have often been ignored in the past because most are based on an assumption of normality which is not strictly valid for the standard performance measures.  ...  While the two-sample significance tests are generally accepted by the information retrieval community, large sample comparison tests are far less commonly done.  ... 
doi:10.1145/160688.160758 dblp:conf/sigir/Hull93 fatcat:4tiviqm5pfbpdpulvcadd67f3q

Verbal Memory Retrieval Deficits Associated With Untreated Hypothyroidism

K. J. Miller, T. D. Parsons, P. C. Whybrow, K. Van Herle, N. Rasgon, A. Van Herle, D. Martinez, D. H. Silverman, M. Bauer
2007 The Journal of Neuropsychiatry and Clinical Neurosciences  
Significant differences between groups were limited to verbal memory retrieval as measured by the California Verbal Learning Test (CVLT).  ...  On short delay free recall, long delay free recall, and long delay cued recall, significant differences remained between groups despite the limited statistical power of this study.  ...  TSH levels of both groups are shown in Table 1 . The descriptive statistics and comparisons for all neuropsychological tests are shown in Table 2 .  ... 
doi:10.1176/appi.neuropsych.19.2.132 pmid:17431058 fatcat:bbwgkfl5w5ehzklrk3p7w5pavu

Verbal Memory Retrieval Deficits Associated With Untreated Hypothyroidism

Karen J. Miller, Thomas D. Parsons, Peter C. Whybrow, Katja Van Herle, Natalie Rasgon, Andre Van Herle, Dorothy Martinez, Dan H. Silverman, Michael Bauer
2007 The Journal of Neuropsychiatry and Clinical Neurosciences  
Significant differences between groups were limited to verbal memory retrieval as measured by the California Verbal Learning Test (CVLT).  ...  On short delay free recall, long delay free recall, and long delay cued recall, significant differences remained between groups despite the limited statistical power of this study.  ...  TSH levels of both groups are shown in Table 1 . The descriptive statistics and comparisons for all neuropsychological tests are shown in Table 2 .  ... 
doi:10.1176/jnp.2007.19.2.132 pmid:17431058 fatcat:6y2tvfh2ajaafpucpevuoz2yi4

Evaluation of web search for the information practitioner

A. MacFarlane, David Bawden
2007 ASLIB Proceedings  
Keywords Web search evaluation precision diagnostic measures Abstract Purpose: The aim of the paper is to put forward a structured mechanism for web search evaluation.  ...  We point to useful scientific research and show how information practitioners can use these methods in evaluation of search on the web for their users.  ...  Stephen Robertson for advice on both what measures to use and how to interpret statistical significance on the experiments described in this paper.  ... 
doi:10.1108/00012530710817573 fatcat:wb5gonqulnbjxibiollyj7xiqe

Experimental methods for information retrieval

Donald Metzler, Oren Kurland
2012 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12  
Acknowledgments • We thank Fiana Raiber for running experiments and generating the tables used in this tutorial  ...  train-test split of the query set • Statistical significance testing -Paired, two-tailed, tests; p-value=0.01, 0.05, … -Recommended tests: paired t-test, permutation test (Smucker et al. '09) -Approach  ...  over queries for setting free-parameter values • Experimental results -Our method outperforms the reference comparisons in a consistent, substantial, and statistically significant manner?  ... 
doi:10.1145/2348283.2348534 dblp:conf/sigir/MetzlerK12 fatcat:k3j7brhwmzahdpomupi7z26m4i

Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance [chapter]

Jakub Kanis, Lucie Skorkovská
2010 Lecture Notes in Computer Science  
The comparison is done by evaluating the mean Generalized Average Precision (mGAP) measure of the lemmatized documents and search queries in the set of information retrieval (IR) experiments.  ...  Moreover, the proposed indirect comparison of the lemmatizers circumvents the need for manually lemmatized test data which are hard to obtain and also face the problem of incompatible sets of lemmas across  ...  Results evaluation For the confirmation of our hypotheses, we ran several statistical significance tests.  ... 
doi:10.1007/978-3-642-15760-8_13 fatcat:ejrncrfxubggjhht2hrfjgi5mq

On the Performance of Latent Semantic Indexing-based Information Retrieval

Cherukuri Aswani Kumar, Suripeddi Srinivas
2009 Journal of Computing and Information Technology  
However, statistical significance tests are required to evaluate the reliability of such comparisons. Focus of this paper is to address this issue.  ...  Then we analyze the statistical significance of these performance differences.  ...  Statistical Significance Tests The next step for the evaluation is to analyze values of the interpolated precision obtained by different models.  ... 
doi:10.2498/cit.1001268 fatcat:k7b7hj7h3vbzfbjmttt2jwyzky

Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes

Mark D. Smucker, James Allan, Ben Carterette
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
Research has shown that little practical difference exists between the randomization, Student's paired t, and bootstrap tests of statistical significance for TREC ad-hoc retrieval experiments with 50 topics  ...  At smaller numbers of topics, the randomization test tended to produce smaller p-values than the t-test for p-values less than 0.1.  ...  INTRODUCTION Information retrieval (IR) researchers rely on statistical significance tests to allow them to accurately detect and report significant improvements in performance.  ... 
doi:10.1145/1571941.1572050 dblp:conf/sigir/SmuckerAC09 fatcat:kl57vfh5v5ex7g47bfp6v77iuq

Comparing gene annotation enrichment tools for functional modeling of agricultural microarray data

Bart Berg, Chamali Thanthiriwatte, Prashanti Manda, Susan M Bridges
2009 BMC Bioinformatics  
The experimental set was also used to evaluate identification of enriched functional groups.Comparison of the tools shows that they produce different sets of annotations for the two datasets and different  ...  Annotation was assessed using a randomly selected test gene annotation set and an experimental differentially expressed gene-set--both from chicken.  ...  Huaijun Zhou (Texas A&M University) for their contribution of the microarray data used in this work. Also, we would like to acknowledge Dr. Shane C. Burgess for his support of BVDB.  ... 
doi:10.1186/1471-2105-10-s11-s9 pmid:19811693 pmcid:PMC3226198 fatcat:b3zcldicfjbv3a7ccehihtxype

Search Strategies and the Relevance of Retrieved Information in Persian Articles Database: Survey of M.A Students of Shiraz University

Hassan Moghaddaszadeh
2019 International journal of information science and management  
So the main objective of this study was to evaluate the effect of search strategies on the relevance of retrieved information in domestic article databases.  ...  To test the hypotheses, one-way analysis of variance (ANOVA) and Tukey's post-hoc test were computed using SPSS statistical software version 22.  ...  In the section of inferential statistics, one-way ANOVA and Tukey follow-up tests were conducted for comparison of relevance of retrieved data based on search strategies and comparison of databases.  ... 
doaj:06462a327d0e431bab27b157a8c91839 fatcat:5rq7xbezfvgaloqrtnv5vuvolu

A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information [chapter]

Mohannad ALMasri, Catherine Berrut, Jean-Pierre Chevallet
2016 Lecture Notes in Computer Science  
We use language models for information retrieval to evaluate expansion methods.  ...  Automatic query expansion techniques are widely applied for improving text retrieval performance, using a variety of approaches that exploit several data sources for finding expansion terms.  ...  We use two tests for statistical significance: † indicates a statistical significant improvement over NEXP, and ⇤ indicates a statistical significant improvement over PRF.  ... 
doi:10.1007/978-3-319-30671-1_57 fatcat:qqe45dm3cbfeleyh5jpw2bzjyu
« Previous Showing results 1 — 15 out of 418,639 results