Word class discovery for postprocessing Chinese handwriting recognition

Chao-Huang Chang
1994 Proceedings of the 15th conference on Computational linguistics -   unpublished
This article presents a novel Chinese class n-gram model for contextual postprocessing of haudwriting recognition results. The word classes in the model are automatically discovered by a corpus-based simulated anuealing procedure. Three other language models, least-word, word-frequency, and the powerflfl interword character bigram model, have been constructed for comparison. Extensive experiments on large text corpora show that the discovered class bigram model outperforms the other three competing models.
doi:10.3115/991250.991350 fatcat:vzbxgde5kbbyrncbr5cq2iftje