Coreference and Coherence in Neural Machine Translation: A Study Using Oracle Experiments

Dario Stojanovski, Alexander Fraser
2018 Proceedings of the Third Conference on Machine Translation: Research Papers  
Cross-sentence context can provide valuable information in Machine Translation and is critical for translation of anaphoric pronouns and for providing consistent translations. In this paper, we devise simple oracle experiments targeting coreference and coherence. Oracles are an easy way to evaluate the effect of different discourse-level phenomena in NMT using BLEU and eliminate the necessity to manually define challenge sets for this purpose. We propose two context-aware NMT models and compare
more » ... them against models working on a concatenation of consecutive sentences. Concatenation models perform better, but are computationally expensive. We show that NMT models taking advantage of context oracle signals can achieve considerable gains in BLEU, of up to 7.02 BLEU for coreference and 1.89 BLEU for coherence on subtitles translation. Access to strong signals allows us to make clear comparisons between context-aware models.
doi:10.18653/v1/w18-6306 dblp:conf/wmt/StojanovskiF18 fatcat:3glwodxidfe5bougllo5pouq7a