Study on a Novel Data Classification Method Based on Improved GA and SVM Model

Jing Huo, Yuxiang Zhao
2016 International Journal of Smart Home  
Support vector machine (SVM) can effectively solve the classification problem with small samples, nonlinear and high dimensions, but it exits the weak generalization ability and low classification accuracy. So an improved genetic algorithm(IGA) is introduced in order to propose a new classification(IGASVM) method based on combining improved GA and SVM model. In the proposed IGASVM method, the self-adaptive control parameter strategy and improving convergence speed strategy are introduced into
more » ... e GA to keep the diversity of the population, promptly reflect the premature convergence of the individual and escape from the local optimal solution for improving the search performance. Then the improved GA is used to optimize and determine the parameters of the SVM model in order to improve the learning ability and generalization ability of the SVM model for obtaining new classification (IGASVM) method. Finally, the experiment data is selected to test the effectiveness of the proposed IGASVM method. The experiment results show that the improved GA can effectively optimize and determine the parameters of the SVM model, and the IGASVM method takes on the better learning ability, generalization ability and classification accuracy. 120 Copyright ⓒ 2016 SERSC classification results. So a lot of researchers proposed many methods for improving the SVM model. Fung and Mangasarian[3] proposed a concave minimization approach for classifying unlabeled data based on the small representative percentage and linear support vector machine. Park and Zhang [4] proposed an approach for classifying large scale unstructured documents by incorporating both the lexical and the syntactic information of documents. Liu [5] proposed an active learning algorithm with support vector machine for performing active learning with support vector machine and applied the algorithm to gene expression profiles of colon cancer, lung cancer, and prostate cancer samples. Jayadeva et al. [6] proposed a multi-category extension of fuzzy proximal support vector machines, where a fuzzy membership is assigned to each data point. Jin et al. [7] proposed a genetic fuzzy feature transformation method for support vector machines (SVMs) to do more accurate data classification. Yang et al. [8] proposed a weighted support vector machine (WSVM) to improve the outlier sensitivity problem of standard support vector machine (SVM) for two-class data classification. The basic idea is to assign different weights to different data points such that the WSVM training algorithm learns the decision surface according to the relative importance of data points in the training data set. Li et al. [9] proposed a clustering algorithm for efficient learning. The method mainly categorizes data into clusters, and finds critical data in clusters as a substitute for the original data to reduce the computational complexity. Cervantes[10] proposed a novel SVM classification approach for large data sets by using minimum enclosing ball clustering. Mathur and Foody[11] proposed a crop classification method based support vector machine with intelligently selected training data for an operational application. Essam[12] proposed a new accurate classifier based on Signal-to-Noise, support vector machine, Bayesian neural network and AdaBoost for data mining and classification. Li and Liu[13] proposed a new kernel generating method dependent on classifying related properties of the data structure itself. The new kernel concentrates on the similarity of paired data in classes, where the calculation of similarity is based on fuzzy theories. Ji et al. [14] introduced the support vector machine which the training examples are fuzzy input, and give some solving procedure of the Support vector machine with fuzzy training data. Jordi et al. [15] proposed two semisupervised one-class support vector machine (OC-SVM) classifiers for remote sensing applications. The first proposed algorithm is based on modifying the OC-SVM kernel by modeling the data marginal distribution with the graph Laplacian built with both labeled and unlabeled samples. The second one is based on a simple modification of the standard SVM cost function which penalizes more the errors made when classifying samples of the target class. Al-Ataby et al. [16] proposed several multi-resolution approaches employing the wavelet transform and texture analysis for de-noising and enhancing the quality of data to help in the automatic detection and classification of defects. Hwang et al. [17] proposed a new weighted approach on Lagrangian support vector machine for imbalanced data classification problem. The weight parameters are embedded in the Lagrangian SVM formulation. Jan et al. [18] proposed a mixed effects least squares support vector machine model to extend the standard LS-SVM classifier for handling longitudinal data. Tian et al.[19] proposed a new method based on support vector machine (SVM) and genetic algorithm (GA) to analyze signals of wound infection detection. Ahmed et al.[20] proposed to apply an ensemble of SVMs coupled with feature-subset selection methods to alleviate the curse of dimensionality associated with expression-based classification of DNA data in order to achieve stable and reliable results. Li et al.[21] proposed a novel SVM classification approach based on the random selection and de-clustering technique for large data sets. Chau et al. [22] proposed a novel method for SVM classification based on convex-concave hull and support vector machine, called convex-concave hull SVM (CCH-SVM). Li et al. [23] proposed a probabilistic support vector machine (PSVM) to capture the probabilistic information of the separating margin and formulate the decision function within such a noisy environment. Wei et al. [24] proposed a least squares support 127 from Table 2 , the proposed IGASVM algorithm can obtain best classification results in the given data from the UCI database. The average classification accuracy respectively is 93.65%, 94.79%, 90.34%, 91.48%, 90.65% and 91.13% for Wine, Pittsburgh Bridges, Balance Scale, Libras Movement, Arrhythmia and Spectrometer. And the improved GA can better obtain the parameters of kernel function and penalty factor of SVM for improving the speed and efficiency of parameter selection and the learning ability and generalization ability. In general, the classification results of the IGASVM algorithm are more better and has higher optimization ability and classification accuracy.
doi:10.14257/ijsh.2016.10.5.12 fatcat:syos6hsgyfhwjjy22okuutn4mq