A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
A comparison of statistical significance tests for information retrieval evaluation
2007
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM '07
Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student's paired t-test, the Wilcoxon signed rank test, and the sign test. ...
For each of these five tests, we took the ad-hoc retrieval runs submitted to TRECs 3 and 5-8, and for each pair of runs, we measured the statistical significance of the difference in their mean average ...
Sakai used bootstrap significance tests to evaluate evaluation metrics [16] , while our emphasis was on the comparison of significance tests. ...
doi:10.1145/1321440.1321528
dblp:conf/cikm/SmuckerAC07
fatcat:yysoebqcxvaxjcuegw3mtdetdm
A comparison of the optimality of statistical significance tests for information retrieval evaluation
2013
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13
Previous research has suggested the permutation test as the theoretically optimal statistical significance test for IR evaluation, and advocated for the discontinuation of the Wilcoxon and sign tests. ...
We present a large-scale study comprising nearly 60 million system comparisons showing that in practice the bootstrap, t-test and Wilcoxon test outperform the permutation test under different optimality ...
INTRODUCTION An Information Retrieval (IR) researcher is often faced with the question of which of two IR systems, A and B, performs better. ...
doi:10.1145/2484028.2484163
dblp:conf/sigir/UrbanoMM13a
fatcat:ti27ac7x2nadla2td6mx3n7sfu
STATISTICAL SIGNIFICANCE IN MULTILINGUAL INFORMATION RETRIEVAL (MLIR) SYSTEM
2012
IOSR Journal of Engineering
The efficiency of retrieval system is precise by comparing performance on a regular set of queries in Information Retrieval (IR) and MLIR systems. ...
Significance tests are often used to estimate the reliability of such comparisons. In this research paper, we revisit the question of how such significance tests should be used. ...
INTRODUCTION Test collections are the principal tool used for comparison and evaluation of retrieval systems. ...
doi:10.9790/3021-0204794802
fatcat:furywk2wzzbdbhyyuqzzenq5hu
Audio Music Similarity And Retrieval: Evaluation Power And Stability
2011
Zenodo
The grand results of the evaluation are thus 105 pairwise comparisons between systems, some of which are statistically significant. The rest of the paper is organized as follows. ...
INTRODUCTION One of the most important tasks in Music Information Retrieval is Audio Music Similarity and Retrieval (AMS). ...
doi:10.5281/zenodo.1417268
fatcat:veb2db4dmnecle7bn2cy4f56a4
Using statistical testing in the evaluation of retrieval experiments
1993
Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '93
Anumber of different statistical tests are described for determining if differences in performance between retrieval methods are significant. ...
These tests have often been ignored in the past because most are based on an assumption of normality which is not strictly valid for the standard performance measures. ...
While
the two-sample
significance
tests are generally
accepted
by the information
retrieval
community,
large
sample
comparison
tests are far less commonly
done. ...
doi:10.1145/160688.160758
dblp:conf/sigir/Hull93
fatcat:4tiviqm5pfbpdpulvcadd67f3q
Verbal Memory Retrieval Deficits Associated With Untreated Hypothyroidism
2007
The Journal of Neuropsychiatry and Clinical Neurosciences
Significant differences between groups were limited to verbal memory retrieval as measured by the California Verbal Learning Test (CVLT). ...
On short delay free recall, long delay free recall, and long delay cued recall, significant differences remained between groups despite the limited statistical power of this study. ...
TSH levels of both groups are shown in Table 1 . The descriptive statistics and comparisons for all neuropsychological tests are shown in Table 2 . ...
doi:10.1176/appi.neuropsych.19.2.132
pmid:17431058
fatcat:bbwgkfl5w5ehzklrk3p7w5pavu
Verbal Memory Retrieval Deficits Associated With Untreated Hypothyroidism
2007
The Journal of Neuropsychiatry and Clinical Neurosciences
Significant differences between groups were limited to verbal memory retrieval as measured by the California Verbal Learning Test (CVLT). ...
On short delay free recall, long delay free recall, and long delay cued recall, significant differences remained between groups despite the limited statistical power of this study. ...
TSH levels of both groups are shown in Table 1 . The descriptive statistics and comparisons for all neuropsychological tests are shown in Table 2 . ...
doi:10.1176/jnp.2007.19.2.132
pmid:17431058
fatcat:6y2tvfh2ajaafpucpevuoz2yi4
Evaluation of web search for the information practitioner
2007
ASLIB Proceedings
Keywords Web search evaluation precision diagnostic measures Abstract Purpose: The aim of the paper is to put forward a structured mechanism for web search evaluation. ...
We point to useful scientific research and show how information practitioners can use these methods in evaluation of search on the web for their users. ...
Stephen Robertson for advice on both what measures to use and how to interpret statistical significance on the experiments described in this paper. ...
doi:10.1108/00012530710817573
fatcat:wb5gonqulnbjxibiollyj7xiqe
Experimental methods for information retrieval
2012
Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12
Acknowledgments • We thank Fiana Raiber for running experiments and generating the tables used in this tutorial ...
train-test split of the query set • Statistical significance testing -Paired, two-tailed, tests; p-value=0.01, 0.05, … -Recommended tests: paired t-test, permutation test (Smucker et al. '09) -Approach ...
over queries for setting free-parameter values
• Experimental results
-Our method outperforms the reference comparisons in a
consistent, substantial, and statistically significant manner? ...
doi:10.1145/2348283.2348534
dblp:conf/sigir/MetzlerK12
fatcat:k3j7brhwmzahdpomupi7z26m4i
Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance
[chapter]
2010
Lecture Notes in Computer Science
The comparison is done by evaluating the mean Generalized Average Precision (mGAP) measure of the lemmatized documents and search queries in the set of information retrieval (IR) experiments. ...
Moreover, the proposed indirect comparison of the lemmatizers circumvents the need for manually lemmatized test data which are hard to obtain and also face the problem of incompatible sets of lemmas across ...
Results evaluation For the confirmation of our hypotheses, we ran several statistical significance tests. ...
doi:10.1007/978-3-642-15760-8_13
fatcat:ejrncrfxubggjhht2hrfjgi5mq
On the Performance of Latent Semantic Indexing-based Information Retrieval
2009
Journal of Computing and Information Technology
However, statistical significance tests are required to evaluate the reliability of such comparisons. Focus of this paper is to address this issue. ...
Then we analyze the statistical significance of these performance differences. ...
Statistical Significance Tests The next step for the evaluation is to analyze values of the interpolated precision obtained by different models. ...
doi:10.2498/cit.1001268
fatcat:k7b7hj7h3vbzfbjmttt2jwyzky
Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes
2009
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09
Research has shown that little practical difference exists between the randomization, Student's paired t, and bootstrap tests of statistical significance for TREC ad-hoc retrieval experiments with 50 topics ...
At smaller numbers of topics, the randomization test tended to produce smaller p-values than the t-test for p-values less than 0.1. ...
INTRODUCTION Information retrieval (IR) researchers rely on statistical significance tests to allow them to accurately detect and report significant improvements in performance. ...
doi:10.1145/1571941.1572050
dblp:conf/sigir/SmuckerAC09
fatcat:kl57vfh5v5ex7g47bfp6v77iuq
Comparing gene annotation enrichment tools for functional modeling of agricultural microarray data
2009
BMC Bioinformatics
The experimental set was also used to evaluate identification of enriched functional groups.Comparison of the tools shows that they produce different sets of annotations for the two datasets and different ...
Annotation was assessed using a randomly selected test gene annotation set and an experimental differentially expressed gene-set--both from chicken. ...
Huaijun Zhou (Texas A&M University) for their contribution of the microarray data used in this work. Also, we would like to acknowledge Dr. Shane C. Burgess for his support of BVDB. ...
doi:10.1186/1471-2105-10-s11-s9
pmid:19811693
pmcid:PMC3226198
fatcat:b3zcldicfjbv3a7ccehihtxype
Search Strategies and the Relevance of Retrieved Information in Persian Articles Database: Survey of M.A Students of Shiraz University
2019
International journal of information science and management
So the main objective of this study was to evaluate the effect of search strategies on the relevance of retrieved information in domestic article databases. ...
To test the hypotheses, one-way analysis of variance (ANOVA) and Tukey's post-hoc test were computed using SPSS statistical software version 22. ...
In the section of inferential statistics, one-way ANOVA and Tukey follow-up tests were conducted for comparison of relevance of retrieved data based on search strategies and comparison of databases. ...
doaj:06462a327d0e431bab27b157a8c91839
fatcat:5rq7xbezfvgaloqrtnv5vuvolu
A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information
[chapter]
2016
Lecture Notes in Computer Science
We use language models for information retrieval to evaluate expansion methods. ...
Automatic query expansion techniques are widely applied for improving text retrieval performance, using a variety of approaches that exploit several data sources for finding expansion terms. ...
We use two tests for statistical significance: † indicates a statistical significant improvement over NEXP, and ⇤ indicates a statistical significant improvement over PRF. ...
doi:10.1007/978-3-319-30671-1_57
fatcat:qqe45dm3cbfeleyh5jpw2bzjyu
« Previous
Showing results 1 — 15 out of 418,639 results