Unlabeled Dependency Parsing Based Pre-reordering for Chinese-to-Japanese SMT

Dan Han, Pascual Martínez-Gómez, Yusuke Miyao, Katsuhito Sudoh, Masaaki Nagata
2014 Journal of Natural Language Processing  
In statistical machine translation, Chinese and Japanese is a well-known long-distance language pair that causes difficulties to word alignment techniques. Pre-reordering methods have been proven efficient and effective; however, they need reliable parsers to extract the syntactic structure of the source sentences. On one hand, we propose a framework in which only part-of-speech (POS) tags and unlabeled dependency parse trees are used to minimize the influence of parse errors, and linguistic
more » ... , and linguistic knowledge on structural difference is encoded in the form of reordering rules. We show significant improvements in translation quality of sentences in the news domain over state-ofthe-art reordering methods. On the other hand, we explore the relationship between dependency parsing and our pre-reordering method from two aspects: POS tags and dependencies. We observe the effects of different parse errors on reordering performance by combining empirical and descriptive approaches. In the empirical approach, we quantify the distribution of general parse errors along with reordering quality. In the descriptive approach, we extract seven influential error patterns and examine their correlations with reordering errors. Introduction The statistical machine translation (SMT) community has developed methods to obtain useful translations in some domains. Language pairs that have provided the highest translation quality are languages with similar sentence structure, such as English, French, and Spanish. Such languages are typical subject-verb-object (SVO) languages, and unsupervised word alignment methods (Brown, Pietra, Pietra, and Mercer 1993; Yamada and Knight 2001; Zens, Och, and Ney 2002; Koehn, Och, and Marcu 2003) have performed reasonably well. Although these languages have similar sentence structures, they may show slight local differences in word order, making word alignments nonmonotonic. There have been efforts to address the issue of local nonmonotonic word alignments; lexicalized reordering models are an example (Tillmann 2004;
doi:10.5715/jnlp.21.485 fatcat:mmdnqcopafhordwravk22zwx5a