On the Relation Between Assessor's Agreement and Accuracy in Gamified Relevance Assessment

Olga Megorskaya, Vladimir Kukushkin, Pavel Serdyukov
2015 Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '15  
Expert judgments (labels) are widely used in Information Retrieval for the purposes of search quality evaluation and machine learning. Setting up the process of collecting such judgments is a challenge of its own, and the maintenance of judgments quality is an extremely important part of the process. One of the possible ways of controlling the quality is monitoring inter-assessor agreement level. But does the agreement level really reflect the quality of assessor's judgments? Indeed, if a group
more » ... of assessors comes to a consensus, to what extent should we trust their collective opinion? In this paper, we investigate, whether the agreement level can be used as a metric for estimating the quality of assessor's judgments, and provide recommendations for the design of judgments collection workflow. Namely, we estimate the correlation between assessors' accuracy and agreement in the scope of several workflow designs and investigate which specific workflow features influence the accuracy of judgments the most.
doi:10.1145/2766462.2767727 dblp:conf/sigir/MegorskayaKS15 fatcat:vfl33ksuajb7td2xfafjxrnx5i