A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Pinyin as Subword Unit for Chinese-Sourced Neural Machine Translation
2017
Irish Conference on Artificial Intelligence and Cognitive Science
Unknown word (UNK) or open vocabulary is a challenging problem for neural machine translation (NMT). For alphabetic languages such as English, German and French, transforming a word into subwords is an effective way to alleviate the UNK problem, such as the Byte Pair encoding (BPE) algorithm. However, for the stroke-based languages, such as Chinese, aforementioned method is not effective enough for translation quality. In this paper, we propose to utilize Pinyin, a romanization system for
dblp:conf/aics/DuW17
fatcat:qyurg3y6o5fuxf67p3nynouwia