The C-value/NC-value domain-independent method for multi-word term extraction

Katerina T. Frantzi, Sophia Ananiadou
<span title="">1999</span> <i title="Association for Natural Language Processing"> <a target="_blank" rel="noopener" href="" style="color: black;">Journal of Natural Language Processing</a> </i> &nbsp;
In this paper we present a domain-independent method for the automatic extraction of multi-word(technical)terms,from machine-readable special language corpora. The method,(C-value/NC-value),combines linguistic and statistical information. The first part,C-value enhances the common statistical measure of frequency of occurrence for term extraction,making it sensitive to a particular type of multi-word terms,the nested terms.Nested terms are those which also exist as substrings of other terms.The
more &raquo; ... second part,NC-value,gives two things:1)a method for the extraction of term context words(words that tend to appear with terms),2)the incorporation of information from term context words to the extraction of terms.We apply the method to a medical corpus and compare the results with those produced by frequency of occurrence also applied on the same corpus.Frequency of occurrence was chosen for the comparison since it is the most commonly used statistical method for automatic term extraction to date.We show that using C-value we improve the extraction of nested multi-word terms,while using context information(NC-value) we improve the extraction of multi-word terms in general.In the evaluation sections, we give directions for the further improvement of the method.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.5715/jnlp.6.3_145</a> <a target="_blank" rel="external noopener" href="">fatcat:mpq3k7ydlnasbal3d36ihv2djm</a> </span>
