A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task
[article]
2018
arXiv
pre-print
Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between the candidate and reference answers, such as ROUGE and BLEU. However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists. In this paper, we make adaptations on the metrics to better correlate n-gram overlap with the human judgment for answers to these two question
arXiv:1806.03578v1
fatcat:g66b2rwv5jbnjie5vtwhzuuw5e