Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting

Mikel L. Forcada, Carolina Scarton, Lucia Specia, Barry Haddow, Alexandra Birch
2018 Proceedings of the Third Conference on Machine Translation: Research Papers  
A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language. Evaluation of the usefulness of MT for gisting is surprisingly uncommon. The classical method uses reading comprehension questionnaires (RCQ), in which informants are asked to answer professionally-written questions in their language about a foreign text that has been machine-translated into their language. Recently, gap-filling (GF), a form of cloze testing, has been
more » ... roposed as a cheaper alternative to RCQ. In GF, certain words are removed from reference translations and readers are asked to fill the gaps left using the machine-translated text as a hint. This paper reports, for the first time, a comparative evaluation, using both RCQ and GF, of translations from multiple MT systems for the same foreign texts, and a systematic study on the effect of variables such as gap density, gap-selection strategies, and document context in GF. The main findings of the study are: (a) both RCQ and GF clearly identify MT to be useful; (b) global RCQ and GF rankings for the MT systems are mostly in agreement; (c) GF scores vary very widely across informants, making comparisons among MT systems hard, and (d) unlike RCQ, which is framed around documents, GF evaluation can be framed at the sentence level. These findings support the use of GF as a cheaper alternative to RCQ.
doi:10.18653/v1/w18-6320 dblp:conf/wmt/ForcadaSSHB18 fatcat:7z2wz35y7jfftf6zkhoqcsauea