Text filtering by boosting naive Bayes classifiers

Yu-Hwan Kim, Shang-Yoon Hahn, Byoung-Tak Zhang
2000 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '00  
Several machine learning algorithms have recently been used for text categorization and filtering. In particular, boosting methods such as AdaBoost have shown good performance applied to real text data. However, most of existing boosting algorithms are based on classifiers that use binary-valued features. Thus, they do not fully make use of the weight information provided by standard term weighting methods. In this paper, we present a boosting-based learning method for text filtering that uses
more » ... aive Bayes classifiers as a weak learner. The use of naive Bayes allows the boosting algorithm to utilize term frequency information while maintaining probabilisti-caUy accurate confidence ratio. Applied to TREC-7 and TREC-8 filtering track documents, the proposed method obtained a significant improvement in LF1, LF2, F1 and F3 measures compared to the best results submitted by other TREC entries.
doi:10.1145/345508.345572 dblp:conf/sigir/KimHZ00 fatcat:7djpbpqrbza5pacggbgvytzzqi