Voting between Dictionary-Based and Subword Tagging Models for Chinese Word Segmentation

Dong Song, Anoop Sarkar
2006 Workshop on Chinese Language Processing  
This paper describes a Chinese word segmentation system that is based on majority voting among three models: a forward maximum matching model, a conditional random field (CRF) model using maximum subword-based tagging, and a CRF model using minimum subwordbased tagging. In addition, it contains a post-processing component to deal with inconsistencies. Testing on the closed track of CityU, MSRA and UPUC corpora in the third SIGHAN Chinese Word Segmentation Bakeoff, the system achieves a F-score of 0.961, 0.953 and 0.919, respectively.
dblp:conf/acl-sighan/SongS06 fatcat:iyv3yy2p4nactf4nkmsuaa42w4