adabag: AnRPackage for Classification with Boosting and Bagging

Esteban Alfaro, Matías Gámez, Noelia García
2013 Journal of Statistical Software  
Boosting and bagging are two widely used ensemble methods for classification. Their common goal is to improve the accuracy of a classifier combining single classifiers which are slightly better than random guessing. Among the family of boosting algorithms, AdaBoost (adaptive boosting) is the best known, although it is suitable only for dichotomous tasks. AdaBoost.M1 and SAMME (stagewise additive modeling using a multi-class exponential loss function) are two easy and natural extensions to the
more » ... neral case of two or more classes. In this paper, the adabag R package is introduced. This version implements AdaBoost.M1, SAMME and bagging algorithms with classification trees as base classifiers. Once the ensembles have been trained, they can be used to predict the class of new samples. The accuracy of these classifiers can be estimated in a separated data set or through cross validation. Moreover, the evolution of the error as the ensemble grows can be analysed and the ensemble can be pruned. In addition, the margin in the class prediction and the probability of each class for the observations can be calculated. Finally, several classic examples in classification literature are shown to illustrate the use of this package. in bootstrap replicates of the training set. Boosting is a family of algorithms and two of them are implemented here: AdaBoost.M1 (Freund and Schapire 1996) and SAMME (Zhu, Zou, Rosset, and Hastie 2009) . To the best of our knowledge, the SAMME algorithm is not available in any other R package. The package adabag 3.2, available from de Comprehesive R Archive Network at http://CRAN. R-project.org/package=adabag, is the current update of adabag that changes the measure of relative importance of the predictor variables using the gain of the Gini index given by a variable in a tree and, in the case of the boosting function, the weight of this tree. For this goal, the varImp function of the caret package (Kuhn
doi:10.18637/jss.v054.i02 fatcat:hi46pro6wvdc7msbotyf5k3bni