A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Do dependency parsing metrics correlate with human judgments?
2015
Proceedings of the Nineteenth Conference on Computational Natural Language Learning
Using automatic measures such as labeled and unlabeled attachment scores is common practice in dependency parser evaluation. In this paper, we examine whether these measures correlate with human judgments of overall parse quality. We ask linguists with experience in dependency annotation to judge system outputs. We measure the correlation between their judgments and a range of parse evaluation metrics across five languages. The humanmetric correlation is lower for dependency parsing than for
doi:10.18653/v1/k15-1033
dblp:conf/conll/PlankAAMS15
fatcat:epnmosaw2ba43jnrwy5v3vhi3m