A maximum entropy approach for vietnamese word segmentation

Dinh Dien, Vu Thuy
2006 International Conference onResearch, Innovation and Vision for the Future  
In this paper, we introduce a new approach for Vietnamese Word Segmentation. The word segmentation problem is restated into the morpho-syllable position-in-word (PIW) tagging problem. We used the Maximum Entropy with the Generalized Iterative Scaling (GIS) to train on the annotated corpora. The result of the training process was used to tag all the morpho-syllables of the input sentence. With the output sentence tagged, we can convert it into a segmented sentence for evaluation. The results on
more » ... lot of tagged-corpora show that this approach is suitable for Vietnamese Word Segmentation. The performance achieves precision and recall rates of 94.87% and 94.08% respectively, and the F-measure of 94.44%.
doi:10.1109/rivf.2006.1696447 dblp:conf/rivf/DienT06 fatcat:s2xwtgtmnvgmdo6xl3iqcqmqxe