A Survey on Probabilistic Computational Model for Microarray Data Classification
International Journal of Advanced Research in Computer Science and Software Engineering
For medical classification problems, it is often desirable to have a probability associated with each class to predict the disease. Various probabilistic computational models are currently used for classifying microarray data. A few classification models which are used for microarray classification are Naive Bayes, logistic regression and probabilistic neural network. Probabilistic classifiers have received relatively little attention in the literature of less number of sample sizes and a large
... number of gene sizes in microarray data and microarray data exhibit a high degree of noise. Most of the time probabilistic model does not adequately address the problem of dimensionality and noise and not giving the good accuracy. For achieving good accuracy feature selection techniques are used to reduce the high dimensional data and remove noisy data. This paper presents various probabilistic computational models for microarray data classification, and also reviews the state-of-the-art probabilistic computational model by grouping the literatures into three categories: Naïve Bayes, probabilistic neural network, logistic regression. Fig. 3 Classification accuracy for different microarray data set VII. SUMMARY This survey presented the different approaches of the probabilistic classifier to solve the problem of microarray data classification. The literature surveyed covers the different probabilistic computational model are used for microarray data classification. This paper does not aim in any way to provide numerical evaluations of probabilistic classifier in terms of which one is the best, but to gather as much as possible domain knowledge about this particular topic. This includes naïve Bayes classifier; Probabilistic Neural Network (PNN) and Logistic regression are used for classification. Several feature selection methods are used in this paper for reducing the high dimensional data and different validation techniques are also used. These methods and algorithm have given a good accuracy results for high dimensional data.