The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10

Patrick Simianer, Gesa Stupperich, Laura Jehl, Katharina Wäschle, Artem Sokolov, Stefan Riezler
2013 NTCIR Conference on Evaluation of Information Access Technologies  
We describe the statistical machine translation (SMT) systems developed at Heidelberg University for the Chinese-to-English and Japanese-to-English PatentMT subtasks at the NTCIR10 workshop. The core system used in both subtasks is a combination of hierarchical phrase-based translation and discriminative training using either large feature sets and 1/ 2 regularization (for Japanese-to-English) or variants of soft syntactic constraints (for Chinese-to-English). Our goal is to address the twofold
more » ... nature of patents by exploiting the repetitive nature of patents through feature sharing in a multi-task learning setup (used in the Japaneseto-English translation subtask), and by countersteering complex word order differences with syntactic features (used in the Chinese-to-English translation subtask).
dblp:conf/ntcir/SimianerSJWSR13 fatcat:kw35kai7fzfjbkalrjql2gy3r4