Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets

Zhifei Li, Ziyuan Wang, Sanjeev Khudanpur, Jason Eisner
2010 International Conference on Computational Linguistics  
An unsupervised discriminative training procedure is proposed for estimating a language model (LM) for machine translation (MT). An English-to-English synchronous context-free grammar is derived from a baseline MT system to capture translation alternatives: pairs of words, phrases or other sentence fragments that potentially compete to be the translation of the same source-language fragment. Using this grammar, a set of impostor sentences is then created for each English sentence to simulate
more » ... fusions that would arise if the system were to process an (unavailable) input whose correct English translation is that sentence. An LM is then trained to discriminate between the original sentences and the impostors. The procedure is applied to the IWSLT Chinese-to-English translation task, and promising improvements on a state-ofthe-art MT system are demonstrated.
dblp:conf/coling/LiWKE10 fatcat:o5ghue4syndlbcerzq5w7x5ea4