A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2006; you can also visit the original URL.
The file type is
Proceedings of the workshop on Human Language Technology - HLT '94
The processing of Japanese text is complicated by the fact that there are no word delimiters. To segment Japanese text, systems typically use knowledge-based methods and large lexicons. This paper presents a novel approach to Japanese word segmentation which avoids the need for Japanese word lexicons and explicit rule bases. The algorithm utilizes a hidden Markov model, a stochastic process, to determine word boundaries. This method has achieved 91% accuracy in segmenting words in a test corpus.doi:10.3115/1075812.1075875 fatcat:c46yaotwybbkhavah7iqo5hkwe