A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
A comparison of statistical significance tests for information retrieval evaluation
2007
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM '07
Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student's paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher's randomization (permutation) test as nonparametric significance tests for IR but these tests have seen little use. For each of these five tests, we took the ad-hoc retrieval runs submitted to TRECs 3 and 5-8, and for each pair of runs, we measured
doi:10.1145/1321440.1321528
dblp:conf/cikm/SmuckerAC07
fatcat:yysoebqcxvaxjcuegw3mtdetdm