Categorical Variable Selection in Naïve Bayes Classification
단순 베이즈 분류에서의 범주형 변수의 선택

Min-Sun Kim, Hosik Choi, Changyi Park
2015 Korean Journal of Applied Statistics  
Naïve Bayes Classification is based on input variables that are a conditionally independent given output variable. The Naïve Bayes assumption is unrealistic but simplifies the problem of high dimensional joint probability estimation into a series of univariate probability estimations. Thus Naïve Bayes classifier is often adopted in the analysis of massive data sets such as in spam e-mail filtering and recommendation systems. In this paper, we propose a variable selection method based on χ 2
more » ... istic on input and output variables. The proposed method retains the simplicity of Naïve Bayes classifier in terms of data processing and computation; however, it can select relevant variables. It is expected that our method can be useful in classification problems for ultra-high dimensional or big data such as the classification of diseases based on single nucleotide polymorphisms(SNPs).
doi:10.5351/kjas.2015.28.3.407 fatcat:6naksvo4zbht7lboy2l2igmvey