A Survey on Methods for Solving Data Imbalance Problem for Classification

Arpit Singh, Anuradha Purohit
2015 International Journal of Computer Applications  
The term "data imbalance" in classification is a well established phenomenon in which data set contains unbalanced class distributions. Dataset is called unbalanced if it contains at least one class which is presented by very few examples. A range of solutions have been proposed for the problem of data imbalance including data sampling, cost evaluation of model, bagging, boosting, Genetic Programming (GP) based methods etc. This paper presents a survey of various methods introduced by
more » ... s to handle data imbalance problem in order to improve classification performance and further the comparison between the methods on the basis of their advantages and disadvantages is done. General Terms Survey on methods for data imbalance.
doi:10.5120/ijca2015906677 fatcat:nslebujkgfhylks57jbmexvxoy