Information-theoretic feature selection algorithms for text classification

J. Novovicovai, A. Malik
Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.  
A major characteristic of text document classification problem is extremely high dimensionality of text data. In this paper we present four new algorithms for feature/word selection for the purpose of text classification. We use sequential forward selection methods based on improved mutual information criterion functions. The performance of the proposed evaluation functions compared to the information gain which evaluate features individually is discussed. We present experimental results using
more » ... aive Bayes classifier based on multinomial model, linear support vector machine and k-nearest neighbor classifiers on the Reuters data set. Finally, we analyze the experimental results from various perspectives, including precision, recall and F1-measure. Preliminary experimental results indicate the effectiveness of the proposed feature selection algorithms in a text classification.
doi:10.1109/ijcnn.2005.1556452 fatcat:qtynbdewbjebnj66mtvbuuivxy