FACT: Fast Algorithm for Categorizing Text

Saket S.R. Mengle, Nazli Goharian, Alana Platt
2007 2007 IEEE Intelligence and Security Informatics  
With the ever-increasing number of digital documents, the ability to automatically classifying those documents both quickly and accurately is becoming more critical and difficult. We present Fast Algorithm for Categorizing Text (FACT), which is a statistical based multi-way classifier with our proposed feature selection, Ambiguity measure(AM), that uses only the most unambiguous keywords to predict the category of a document. Our empirical results show that FACT outperforms the best results on
more » ... he best results on the best performing feature selection for the Naïve Bayes classifier namely, Odds Ratio. We empirically show the effectiveness of our approach in outperforming Odds Ratio using four benchmark datasets with a statistical significance of 99% confidence level. Furthermore, the performance of FACT is comparable or better than current non-statistical based classifiers.
doi:10.1109/isi.2007.379490 dblp:conf/isi/MengleGP07 fatcat:cimze7elpndmrnpaj36deacsji