Search and training with joint translation and reordering models for statistical machine translation [thesis]

Vlad Andreas Guta, Alexander M. Fraser, Hermann Ney
Statistical machine translation describes the task of automatically translating a written text from a natural language into another. This is done by means of statistical models, which implies defining suitable models, searching the most likely translation of the given text using them and training their parameters on given bilingual sentence pairs. Phrase-based machine translation emerged two decades ago—and it became the state of the art throughout the following years. Nevertheless, the
more » ... heless, the breakthrough of neural machine translation in 2014 triggered an abrupt conversion towards neural models. A fundamental drawback of the traditional approach is the phrases themselves. They are extracted from word-aligned bilingual data via hand-crafted heuristics. The phrase translation models are estimated using the extraction counts resulting from the applied phrase extraction heuristics. Moreover, the translation models exclude any phrase-external information, which in turn limits the context used to generate a target word during search. To complement the restricted models, a variety of additional models and heuristics are used. However, the potentially largest downside is that the word alignments required for the phrase extraction are trained with IBM and hidden Markov models. This results in a discrepancy between the models applied in training and those that are actually used in search. Although the neural approach clearly outperforms the phrasal one, it remains to be answered whether it is the complexity of neural models, which capture dependencies between whole source sentences and their translations, or the coherent application of the same models in both, training and decoding that leads to the superior performance of neural machine translation. We aim at answering this research question by developing a coherent modelling pipeline that improves over the phrasal approach by relying on fewer but stronger models, discarding dependencies on phrasal heuristics and applying the same word-level models in training and search. First [...]
doi:10.18154/rwth-2020-09970 fatcat:gzidfjuo4jdqhj6iya32diekoq