Topic Set Size Design with Variance Estimates from Two-Way ANOVA

Tetsuya Sakai
2014 NTCIR Conference on Evaluation of Information Access Technologies  
Recently, Sakai proposed two methods for determining the topic set size n for a new test collection based on variance estimates from past data: the first method determines the minimum n to ensure high statistical power [22] , while the second method determines the minimum n to ensure tight confidence invervals [23] . These methods are based on statistical techniques described by Nagata [15]. While Sakai [22] used variance estimates based on oneway ANOVA, Sakai [23] used the 95% percentile
more » ... proposed by Webber, Moffat and Zobel [38]. This paper reruns the experiments reported by Sakai [22, 23] using variance estimates based on two-way ANOVA [17] , which turn out to be slightly larger than their one-way ANOVA counterparts and substantially larger than the percentile-based ones. If researchers should choose to "err on the side of over-sampling" as recommened by Ellis [10], the variance estimation method based on two-way ANOVA and the results reported in this paper are probably the ones researchers should adopt. We also establish empirical relationships between the two topic set size design methods, and discuss the balance between n and the pool depth pd using both methods.
dblp:conf/ntcir/Sakai14 fatcat:4aqkx36gxzcnjnjl722snjeqee