A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Punctuation as Implicit Annotations for Chinese Word Segmentation
2009
Computational Linguistics
We present a Chinese word segmentation model learned from punctuation marks which are perfect word delimiters. The learning is aided by a manually segmented corpus. Our method is considerably more effective than previous methods in unknown word recognition. This is a step toward addressing one of the toughest problems in Chinese word segmentation. Segmentation as Tagging We call the first character of a Chinese word its left boundary L, and the last character its right boundary R. If we regard
doi:10.1162/coli.2009.35.4.35403
fatcat:6inl26biejcqjioa6bh6bpp5xm