A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is
Do dependency parsing metrics correlate with human judgments?
Proceedings of the Nineteenth Conference on Computational Natural Language Learning
Using automatic measures such as labeled and unlabeled attachment scores is common practice in dependency parser evaluation. In this paper, we examine whether these measures correlate with human judgments of overall parse quality. We ask linguists with experience in dependency annotation to judge system outputs. We measure the correlation between their judgments and a range of parse evaluation metrics across five languages. The humanmetric correlation is lower for dependency parsing than fordoi:10.18653/v1/k15-1033 dblp:conf/conll/PlankAAMS15 fatcat:epnmosaw2ba43jnrwy5v3vhi3m