Medical Datamining with a New Algorithm for Feature Selection and Naive Bayesian Classifier

Ranjit Abraham, Jay B. Simha, S. S. Iyengar
2007 10th International Conference on Information Technology (ICIT 2007)  
Much research work in datamining has gone into improving the predictive accuracy of statistical classifiers by applying the techniques of discretization and feature selection. As a probability-based statistical classification method, the Naïve Bayesian classifier has gained wide popularity despite its assumption that attributes are conditionally mutually independent given the class label. In this paper we propose a new feature selection algorithm to improve the classification accuracy of Naïve
more » ... ayes with respect to medical datasets. Our experimental results with 17 medical datasets suggest that on an average the new CHI-WSS algorithm gave best results. The proposed algorithm utilizes discretization and simplifies the' wrapper' approach based feature selection by reducing the feature dimensionality through the elimination of irrelevant and least relevant features using chi-square statistics. For our experiments we utilize two established measures to compare the performance of statistical classifiers namely; classification accuracy (or error rate) and the area under ROC to demonstrate that the proposed algorithm using generative Naïve Bayesian classifier on the average is more efficient than using discriminative models namely Logistic Regression and Support Vector Machine. 10th International Conference on Information Technology 0-7695-3068-0/07 $25.00 © 2007 IEEE DOI 44 10th International Conference on Information Technology 0-7695-3068-0/07 $25.00
doi:10.1109/icit.2007.41 dblp:conf/cit/AbrahamSI07 fatcat:qjwicggccfeijgalsn4yjuul6a