A study on automatically extracted keywords in text categorization

Anette Hulth, Beáta B. Megyesi
2006 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL - ACL '06  
This paper presents a study on if and how automatically extracted keywords can be used to improve text categorization. In summary we show that a higher performance -as measured by micro-averaged F-measure on a standard text categorization collection -is achieved when the full-text representation is combined with the automatically extracted keywords. The combination is obtained by giving higher weights to words in the full-texts that are also extracted as keywords. We also present results for
more » ... eriments in which the keywords are the only input to the categorizer, either represented as unigrams or intact. Of these two experiments, the unigrams have the best performance, although neither performs as well as headlines only.
doi:10.3115/1220175.1220243 dblp:conf/acl/HulthM06 fatcat:lgelsjrxtja5rhymmdrkyrrbg4