Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction [chapter]

Branko Kavšek, Nada Lavrač, Anuška Ferligoj
2001 Lecture Notes in Computer Science  
In data analysis, induction of decision trees serves two main goals: first, induced decision trees can be used for classification/prediction of new instances, and second, they represent an easyto-interpret model of the problem domain that can be used for explanation. The accuracy of the induced classifier is usually estimated using N-fold cross validation, whereas for explanation purposes a decision tree induced from all the available data is used. Decision tree learning is relatively
more » ... : a small change in the training set may significantly change the structure of the induced decision tree. This paper presents a decision tree construction method in which the domain model is constructed by consensus clustering of N decision trees induced in N-fold cross-validation. Experimental results show that consensus decision trees are simpler than C4.5 decision trees, indicating that they may be a more stable approximation of the intended domain model than decision tree, constructed from the entire set of training instances.
doi:10.1007/3-540-44795-4_22 fatcat:eux7wfwi25eevaxgoxkee2ksoi