Improving Rocchio with Weakly Supervised Clustering [chapter]

Romain Vinot, François Yvon
2003 Lecture Notes in Computer Science  
This paper presents a novel approach for adapting the complexity of a text categorization system to the difficulty of the task. In this study, we adapt a simple text classifier (Rocchio), using weakly supervised clustering techniques. The idea is to identify sub-topics of the original classes which can help improve the categorization process. To this end, we propose several clustering algorithms, and report results of various evaluations on standard benchmark corpora such as the Newsgroups corpus.
doi:10.1007/978-3-540-39857-8_41 fatcat:dubpon6iwbc33kanynmq3jndbu