Weighted Krippendorff's alpha is a more reliable metrics for multi-coders ordinal annotations: experimental studies on emotion, opinion and coreference annotation

Jean-Yves Antoine, Jeanne Villaneau, Anaïs Lefeuvre
2014 Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics  
The question of data reliability is of first importance to assess the quality of manually annotated corpora. Although Cohen ' s κ is the prevailing reliability measure used in NLP, alternative statistics have been proposed. This paper presents an experimental study with four measures (Cohen's κ, Scott's π, binary and weighted Krippendorff ' s α) on three tasks: emotion, opinion and coreference annotation. The reported studies investigate the factors of influence (annotator bias, category
more » ... nce, number of coders, number of categories) that should affect reliability estimation. Results show that the use of a weighted measure restricts this influence on ordinal annotations. They suggest that weighted α is the most reliable metrics for such an annotation scheme.
doi:10.3115/v1/e14-1058 dblp:conf/eacl/AntoineVL14 fatcat:jkolwvkbz5cehcvvwsgtqbbyzy