A Corpus-based Case Study on the POS Tagging of Self-referential Lexemes in the Contemporary Chinese Dictionary

Jun Zhang, Heng Zhang
2020 Theory and Practice in Language Studies  
The POS tagging in the 5th edition of the CCD has been revised in the 6th and the 7th editions. The noun POS of most sports and science lexemes are deleted, and their senses of noun (self-referential senses) are included into verbs. However, most of these lexemes can be used as nouns intuitively, and their noun POS and senses should exist. Based on the grammatical functions of words (Xv & Tang, 2006) and the two-level word class categorization theory (Wang, 2014), this study conducts a
more » ... conducts a corpus-based case study of a science lexeme "guina". The result shows that "guina" not only has self-referential usage, but has high token frequency, with 133 occurrences accounting for 42.8% of the total usages, and rich type frequency widely distributed in "guina + (of) + NP "," NP + (of) + guina" and "VP + guina", which conforms to the criterion of conventionalization. Therefore, it is necessary to tag the noun POS and to set up the self-referential sense for "guina". This research has an implication for solving the POS tagging problem of self-referential lexemes in the CCD.
doi:10.17507/tpls.1008.05 fatcat:o2s6y4zvtnh2hi6jwshzvvhnlu