244,412 Hits in 5.5 sec

On Estimating Variances for Topic Set Size Design

Tetsuya Sakai, Lifeng Shang
2016 NTCIR Conference on Evaluation of Information Access Technologies  
Topic set size design is a suite of statistical techniques for determining the appropriate number of topics when constructing a new test collection.  ...  Recently, we ran an IR task at NTCIR-12 where the number of topics was actually determined using topic set size design with an initial pilot data set based on only five similar runs; a test collection  ...  CONCLUSIONS Topic set size design for a new test collection requires a variance estimate, which in turn requires a topic-by-run matrix with some pilot data.  ... 
dblp:conf/ntcir/SakaiS16 fatcat:jf3xsr6kzfg6rde2n6y6s6cbha

Topic Set Size Design with Variance Estimates from Two-Way ANOVA

Tetsuya Sakai
2014 NTCIR Conference on Evaluation of Information Access Technologies  
Recently, Sakai proposed two methods for determining the topic set size n for a new test collection based on variance estimates from past data: the first method determines the minimum n to ensure high  ...  We also establish empirical relationships between the two topic set size design methods, and discuss the balance between n and the pool depth pd using both methods.  ...  Acknowledgements This research was supported by Waseda University Grants for Special Research Projects (2014A-026, 2014B-181, 2014S-077) and by Microsoft Research (Waseda University's project name: "Taxonomising  ... 
dblp:conf/ntcir/Sakai14 fatcat:4aqkx36gxzcnjnjl722snjeqee

Topic set size design

Tetsuya Sakai
2015 Information retrieval (Boston)  
These topic set size design methods require topic-by-run score matrices from past test collections for the purpose of estimating the within-system population variance for a particular evaluation measure  ...  While the previous work of Sakai incorrectly used estimates of the total variances, here we use the correct estimates of the within-system variances, which yield slightly smaller topic set sizes than those  ...  Acknowledgments I would like to thank Professor Yasushi Nagata of Waseda University for his valuable advice, and to the guest editors and reviewers for their constructive feedback.  ... 
doi:10.1007/s10791-015-9273-z fatcat:io7hhbty7zhrfhtdclsbvotkji

Evaluating Evaluation Measures with Worst-Case Confidence Interval Widths

Tetsuya Sakai
2017 NTCIR Conference on Evaluation of Information Access Technologies  
First, we prove that Sakai's ANOVA-based topic set size design tool can be used for discussing WCW instead of his CI-based tool that cannot handle large topic set sizes.  ...  WCW is the worst-case width of a con dence interval (CI) for the di erence between any two systems, given a topic set size.  ...  Sakai [6] observed that, for the data he considered, "the topic set size required based on the CI-based design with α = 0.05 and δ = c is almost the same as the topic set size required based on the ANOVA-based  ... 
dblp:conf/ntcir/Sakai17 fatcat:gul3wm7conheppwz5sfcy2zcba

Designing Test Collections for Comparing Many Systems

Tetsuya Sakai
2014 Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM '14  
We demonstrate that, as different evaluation measures have different variances across topics, they inevitably require different topic set sizes.  ...  Using our simple Excel tools and some pooled variance estimates from past data, researchers can design statistically well-designed test collections.  ...  Similarly, for our topic set size design based on oneway ANOVA, we need an estimate of σ 2 to compute minΔ (Section 4.2).  ... 
doi:10.1145/2661829.2661893 dblp:conf/cikm/Sakai14 fatcat:mm5jqfqwpraldhaz7rk23v7yci

Design of information retrieval experiments: the sufficient topic set size for providing an adequate level of confidence

Bekir Taner DİNÇER
2013 Turkish Journal of Electrical Engineering and Computer Sciences  
This article presents the detailed and formal explanation of how the second fundamental theorem of probability, the central limit theorem, can be used for the estimation of the sufficient size of a topic  ...  Thus, for the design of IR experiments, it agrees with the common view that relying on average figures as a rule of thumb may well be misleading.  ...  When δ is set to 0.0192 , Inequality 2 yields a topic sample size estimate of [(0.1479 × 1.96)/0.0192] 2 = 228 .  ... 
doi:10.3906/elk-1203-20 fatcat:4ez2lcaxnjeiljvqq32mugejfy

Topic Set Size Design with the Evaluation Measures for Short Text Conversation [chapter]

Tetsuya Sakai, Lifeng Shang, Zhengdong Lu, Hang Li
2015 Lecture Notes in Computer Science  
In this study, we apply the topic set size design technique of Sakai to decide on the number of test topics, using variance estimates of the above evaluation measures.  ...  Our main conclusion is to create 100 test topics, but what distinguishes our work from other tasks with similar topic set sizes is that we know what this topic set size means from a statistical viewpoint  ...  In this study, we apply the topic set size design technique of Sakai [13, 14] to decide on the number of test topics, using variance estimates of the above evaluation measures.  ... 
doi:10.1007/978-3-319-28940-3_25 fatcat:mgvbw72u4na5laoudvv2fqj5ca

Revisiting the effect of topic set size on retrieval error

Wei-Hao Lin, Alexander Hauptmann
2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05  
successful retrieval experiments: Sufficient number of topics By increasing the topic set size, i.e.  ...  They have successfully shown how the topic set sizes affect the retrieval experiment reliability.  ... 
doi:10.1145/1076034.1076166 dblp:conf/sigir/LinH05 fatcat:qjz4amn6xjby3cl6rvek6bb274

A General Linear Mixed Models Approach to Study System Component Effects

Nicola Ferro, Gianmaria Silvello
2016 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR '16  
Topic variance has a greater effect on performances than system variance but it cannot be controlled by system developers who can only try to cope with it.  ...  Finally, we extend the analysis to different evaluation measures, showing how they impact on the sources of variance.  ...  [17] applied GLMM to the study of per-topic variance by using simulated data to generate more replicates for each (topic, system) pair in order to estimate also the topic/system interaction effect;  ... 
doi:10.1145/2911451.2911530 dblp:conf/sigir/FerroS16 fatcat:cxnagepg5vbijfxmyanx3nt4w4

Book Reviews : John Neter and William Wasserman. Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs. Homewood, Illinois; Richard D. Irwin, Inc., Pp. xvii - 842. 1974. $16.95

John L. Wasik
1976 Educational and Psychological Measurement  
tables for determining the power of and optimum sample sizes for a desired analysis of variance.  ...  Chapter 14 considers experimental design aspects of the implementation of the methods of analysis of variance for the completely randomized design and in- cludes topics on the determination of sample required  ... 
doi:10.1177/001316447603600138 fatcat:jwk5bx75cnfh5prxmiw4l4aac4

Adaptive Design Theory and Implementation Using SAS and R, Second Edition

Thomas M. Braun
2015 International Statistical Review  
In fact, the main objective of the book is that the reader gets interested in the topic and plays with the presented models and R codes in an active way.  ...  To this end, the R codes presented in this book can be found on github, although some codes are still missing. Moreover, the package that contains all the datasets can be found on  ...  This books focuses on sample size estimation for diverse study designs and endpoints.  ... 
doi:10.1111/insr.12143 fatcat:prwxfuohmbf4hnk5chku4lqphe

Page 1655 of Mathematical Reviews Vol. 53, Issue 5 [page]

1977 Mathematical Reviews  
is the estimator which attains minimum variance within this design.  ...  A note on estimating the variance of the sample mean in stratified sampling. (French summary) Canad. J. Statist. 1 (1973), no. 2, 267-274.  ... 

Test collection reliability: a study of bias and robustness to statistical assumptions via stochastic simulation

Julián Urbano
2015 Information retrieval (Boston)  
The number of topics that a test collection contains has a direct impact on how well the evaluation results reflect the true performance of systems.  ...  with a certain number of topics.  ...  Thanks also to Rafa Nadal for convincing El Gran Guasch to stop shouting "¡La Décima!"...that was definitely it.  ... 
doi:10.1007/s10791-015-9274-y fatcat:t2ukbg7ekbhrdhskvkpikfsyxu

Evaluation over thousands of queries

Ben Carterette, Virgil Pavlu, Evangelos Kanoulas, Javed A. Aslam, James Allan
2008 Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08  
There has been a great deal of recent work on evaluation over much smaller judgment sets: how to select the best set of documents to judge and how to estimate evaluation measures when few judgments are  ...  The Million Query Track at TREC 2007 used two document selection algorithms to acquire relevance judgments for more than 1,800 queries.  ...  Acknowledgments This work was supported in part by the Center for Intelligent Information Retrieval, in part by the Defense Advanced Research Projects Agency (DARPA) under contract number HR0011-06-C-0023  ... 
doi:10.1145/1390334.1390445 dblp:conf/sigir/CarterettePKAA08 fatcat:odo3xgvydrcpdpzg7wjmpwhylm

When to Stop Reviewing in Technology-Assisted Reviews

Dan Li, Evangelos Kanoulas
2020 ACM Transactions on Information Systems (TOIS; Formerly: ACM Transactions on Office Information Systems)  
One of the key challenges for CAL algorithms is deciding when to stop displaying documents to reviewers.  ...  We prove the unbiasedness of the proposed estimators under a with-replacement sampling design, while experimental results demonstrate that the proposed approach, similar to CAL, effectively retrieves relevant  ...  Estimating R This experiment is designed to answer RQ2. We examine whether the estimator R is unbiased with low variance.  ... 
doi:10.1145/3411755 fatcat:wwlb5tubazhhtnit3bgf55rtvq
« Previous Showing results 1 — 15 out of 244,412 results