Domain adaptation for translation models in statistical machine translation

Rico Sennrich
2013
We investigate methods to adapt translation models in SMT to a specific target domain. We discuss two major problems, unknown words because of data sparseness in the (in-domain) training data, and ambiguities arising from out-of-domain parallel texts with different domain-specific translations. We propose novel solutions to both problems. The main contributions of this thesis are as follows: * We present a novel translation model architecture that supports domain adaptation at decoding time
more » ... a vector of component models. The combination is implemented through instance weighting, and all statistics necessary for the computation of translation probabilities are stored in the models. * We present an architecture to combine multiple MT systems, using techniques and ideas from domain adaptation. The hypotheses by external MT systems are treated as out-of-domain knowledge, and combined with in-domain data through instance weighting. * We introduce a sentence alignment algorithm that is able to robustly align even noisy parallel texts. We found that higher-quality sentence alignment of the indomain parallel text has a significant effect on translation quality in our target domain. * We propose new translation model features that express how flexible, or general, translation units are, in order to prevent translations that only occur in the context of multiword expressions from being overgeneralised. Abstract We investigate methods to adapt translation models in SMT to a specific target domain. We discuss two major problems, unknown words because of data sparseness in the (indomain) training data, and ambiguities arising from out-of-domain parallel texts with different domain-specific translations. We propose novel solutions to both problems. The main contributions of this thesis are as follows: • We present a novel translation model architecture that supports domain adaptation at decoding time from a vector of component models. The combination is implemented through instance weighting, and all statistics necessary for the computation of translation probabilities are stored in the models.
doi:10.5167/uzh-88574 fatcat:2acbmu34mncublortey4rmuezi