Detecting Translation Direction: A Cross-Domain Study

Sauleh Eetemadi, Kristina Toutanova
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop  
Parallel corpora are constructed by taking a document authored in one language and translating it into another language. However, the information about the authored and translated sides of the corpus is usually not preserved. When available, this information can be used to improve statistical machine translation. Existing statistical methods for translation direction detection have low accuracy when applied to the realistic out-of-domain setting, especially when the input texts are short. Our
more » ... ntributions in this work are threefold: 1) We develop a multi-corpus parallel dataset with translation direction labels at the sentence level, 2) we perform a comparative evaluation of previously introduced features for translation direction detection in a cross-domain setting and 3) we generalize a previously introduced type of features to outperform the best previously proposed features in detecting translation direction and achieve 0.80 precision with 0.85 recall.
doi:10.3115/v1/n15-2014 dblp:conf/naacl/EetemadiT15 fatcat:psmfwbh26rb4vilhxelphrpbqu