Synthesizing Compound Words for Machine Translation

Austin Matthews, Eva Schlinger, Alon Lavie, Chris Dyer
2016 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
Most machine translation systems construct translations from a closed vocabulary of target word forms, posing problems for translating into languages that have productive compounding processes. We present a simple and effective approach that deals with this problem in two phases. First, we build a classifier that identifies spans of the input text that can be translated into a single compound word in the target language. Then, for each identified span, we generate a pool of possible compounds
more » ... ossible compounds which are added to the translation model as "synthetic" phrase translations. Experiments reveal that (i) we can effectively predict what spans can be compounded; (ii) our compound generation model produces good compounds; and (iii) modest improvements are possible in end-to-end English-German and English-Finnish translation tasks. We additionally introduce KomposEval, a new multi-reference dataset of English phrases and their translations into German compounds.
doi:10.18653/v1/p16-1103 dblp:conf/acl/MatthewsSLD16 fatcat:7dycl7lwcndurh6k6ec5cy2d2e