Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining

Ranjit Abraham, Jay B. Simha, S. Sitharama Iyengar
2009 International Journal of Computational Intelligence Research  
As a probability-based statistical classification method, the Naïve Bayesian classifier has gained wide popularity despite its assumption that attributes are conditionally mutually independent given the class label. Improving the predictive accuracy and achieving dimensionality reduction for statistical classifiers has been an active research area in datamining. Our experimental results suggest that on an average, with Minimum Description Length (MDL) discretization the Naïve Bayes Classifier
more » ... ems to be the best performer compared to popular variants of Naïve Bayes as well as some popular non-Naïve Bayesian statistical classifiers. We propose a Hybrid feature selection algorithm (CHI-WSS) that helps in achieving dimensionality reduction by removing irrelevant data, increasing learning accuracy and improving result comprehensibility. Experimental results suggest that on an average the Hybrid Feature Selector gave best results compared to individual techniques with popular filter as well as wrapper based feature selection methods. The proposed algorithm which is a multi-step process utilizes discretization, filters out irrelevant and least relevant features and finally uses a greedy algorithm such as best first search or wrapper subset selector. For experimental validation we have utilized two established measures to compare the performance of statistical classifiers namely; classification accuracy (or error rate) and the area under ROC. Our work demonstrates that the proposed algorithm using generative Naïve Bayesian classifier on the average is more efficient than using discriminative models namely Logistic Regression and Support Vector Machine. This work based on empirical evaluation on publicly available datasets validates our hypothesis of development of parsimonious models from our generalized approach.
doi:10.5019/j.ijcir.2009.175 fatcat:k6r7ibls2jce3mqmji5a2uuuli