Building a Chinese Collocation Bank

RUIFENG XU, QIN LU, KAM-FAI WONG, WENJIE LI
2009 International Journal of Computer Processing Of Languages  
This paper presents the design and construction of an annotated Chinese collocation bank as the resource to support systematic research on Chinese collocations. The definition and properties are first studied. Based on a combination of different properties, a classification scheme is proposed to categorize Chinese collocations into four types. With the help of computational tools, bigram collocations and n-gram collocations of 3,643 headwords are manually identified in a 5-millionword corpus.
more » ... rthermore, for each identified bigram collocation, its dependency relation, chunking information and classification are annotated to produce a collocation bank. Currently, the Chinese collocation bank contains 23,581 bigram collocations and 2,752 n-gram collocations. The Chinese collocation bank is a valuable resource for Chinese collocation related research. Through statistical analysis on the collocation bank, some interesting characteristics of Chinese bigram collocations are presented in this paper.
doi:10.1142/s1793840609002019 fatcat:56osx54wprg4dcjdarwvo373ri