There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction [article]

Courtney Napoles, Keisuke Sakaguchi, Joel Tetreault
2016 arXiv   pre-print
Current methods for automatically evaluating grammatical error correction (GEC) systems rely on gold-standard references. However, these methods suffer from penalizing grammatical edits that are correct but not in the gold standard. We show that reference-less grammaticality metrics correlate very strongly with human judgments and are competitive with the leading reference-based evaluation metrics. By interpolating both methods, we achieve state-of-the-art correlation with human judgments.
more » ... ly, we show that GEC metrics are much more reliable when they are calculated at the sentence level instead of the corpus level. We have set up a CodaLab site for benchmarking GEC output using a common dataset and different evaluation metrics.
arXiv:1610.02124v1 fatcat:aqwiwawz4zeozjg7jo2h6jul3m