Word class discovery for postprocessing Chinese handwriting recognition

Chao-Huang Chang
1994 Proceedings of the 15th conference on Computational linguistics -   unpublished
This article presents a novel Chinese class n-gram model for contextual postprocessing of haudwriting recognition results. The word classes in the model are automatically discovered by a corpus-based simulated anuealing procedure. Three other language models, least-word, word-frequency, and the powerflfl interword character bigram model, have been constructed for comparison. Extensive experiments on large text corpora show that the discovered class bigram model outperforms the other three
more » ... e other three competing models.
doi:10.3115/991250.991350 fatcat:vzbxgde5kbbyrncbr5cq2iftje