A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Do We Need Chinese Word Segmentation for Statistical Machine Translation?
2004
Workshop on Chinese Language Processing
In Chinese texts, words are not separated by white spaces. This is problematic for many natural language processing tasks. The standard approach is to segment the Chinese character sequence into words. Here, we investigate Chinese word segmentation for statistical machine translation. We pursue two goals: the first one is the maximization of the final translation quality; the second is the minimization of the manual effort for building a translation system. The commonly used method for getting
dblp:conf/acl-sighan/XuZN04
fatcat:3apl7obowve55aoqhii772crva