A dependency treelet string correspondence model for statistical machine translation

Deyi Xiong, Qun Liu, Shouxun Lin
2007 Proceedings of the Second Workshop on Statistical Machine Translation - StatMT '07   unpublished
This paper describes a novel model using dependency structures on the source side for syntax-based statistical machine translation: Dependency Treelet String Correspondence Model (DTSC). The DTSC model maps source dependency structures to target strings. In this model translation pairs of source treelets and target strings with their word alignments are learned automatically from the parsed and aligned corpus. The DTSC model allows source treelets and target strings with variables so that the
more » ... del can generalize to handle dependency structures with the same head word but with different modifiers and arguments. Additionally, target strings can be also discontinuous by using gaps which are corresponding to the uncovered nodes which are not included in the source treelets. A chart-style decoding algorithm with two basic operationssubstituting and attaching-is designed for the DTSC model. We argue that the DTSC model proposed here is capable of lexicalization, generalization, and handling discontinuous phrases which are very desirable for machine translation. We finally evaluate our current implementation of a simplified version of DTSC for statistical machine translation. X
doi:10.3115/1626355.1626361 fatcat:uasfycksxjahfaoqc7d2agqosm