Statistical Augmentation of a Chinese Machine-Readable Dictionary [chapter]

P. Fung, D. Wu
1999 Text, Speech and Language Technology  
We describe a method of using statistically-collected Chinese character groups from a corpus to augment a Chinese dictionary. The method is particularly useful for extracting domainspeci c and regional words not readily available in machine-readable dictionaries. Output was evaluated both using human evaluators and against a previously available dictionary. We also evaluated performance improvement in automatic Chinese tokenization. Results show that our method outputs legitimate words,
more » ... mate words, acronymic constructions, idioms, names and titles, as well as technical compounds, many of which w ere lacking from the original dictionary.
doi:10.1007/978-94-017-2390-9_9 fatcat:dhg6wkc26rgrblchpysoranhse