Korean Compound Noun Term Analysis Based on a Chart Parsing Technique [chapter]

Kyongho Min, William H. Wilson, Yoo-Jin Moon
2003 Lecture Notes in Computer Science  
Unlike compound noun terms in English and French, where words are separated by white space, Korean compound noun terms are not separated by white space. In addition, some compound noun terms in the real world result from a spacing error. Thus the analysis of compound noun terms is a difficult task in Korean NLP. Systems based on probabilistic and statistical information extracted from a corpus have shown good performance on Korean compound noun analysis. However, if the domain of the actual
more » ... em is expanded beyond that of the training system, then the performance on the compound noun analysis would not be consistent. In this paper, we will describe the analysis of Korean compound noun terms based on a longest substring algorithm and an agenda-based chart parsing technique, with a simple heuristic method to resolve the analyses' ambiguities. The system successfully analysed 95.6% of the testing data (6024 compound noun terms) which ranged from 2 to 11 syllables. The average ambiguities ranged from 1 to 33 for each compound noun term.
doi:10.1007/978-3-540-24581-0_16 fatcat:j2k3bgntebcw3o5n2gcfw5yilu