Designing an Improved Discriminative Word Aligner

Nadi Tomeh, Alexandre Allauzen, Thomas Lavergne, François Yvon
2011 International Journal of Computational Linguistics and Applications  
The quality of statistical machine translation systems depends on the quality of the word alignments, computed during the translation model training phase. IBM generative alignment models, despite their poor quality compared to a gold standard, perform well in practice. In this paper, we propose an improved word aligner based on a maximum entropy alignment combination model, which employ better feature engineering, 1 regularization, and an enhanced search space to improve the quality of both
more » ... gnment and translation. For the Arabic-English language pair, we are able to reduce the Alignment Error Rate by 43.4%, and achieve ≈ 1 BLEU point enhancement over the IBM model 4 symmetrized alignments. These improvement are attainable at a lower computational cost, using only easy to estimate HMM and IBM model 1 features. An analysis of the obtained results shows that a good balance between several alignment characteristics should be maintained in order to deliver good translation quality.
dblp:journals/ijcla/TomehALY11 fatcat:dax5foisajflbhz5wvf3fyaygm