A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Approaches to improving corpus quality for statistical machine translation
2010
2010 International Conference on Machine Learning and Cybernetics
The performance of a statistical machine translation (SMT) system depends heavily on the quantity and quality of the bilingual language resource. However, previous work mainly focuses on the quantity and tries to collect more bilingual data. In this paper, to optimize the bilingual corpus to improve the performance of the translation system, we propose some approaches to processing the training corpus by filtering noise and selecting more informative sentences from the training corpus. Also, to
doi:10.1109/icmlc.2010.5580699
dblp:conf/icmlc/LiuZZ10
fatcat:4r66z64zh5ao3eq5j6np2t757m