A Splitting Criteria Based on Similarity in Decision Tree Learning

Xinmeng Zhang, Shengyi Jiang
2012 Journal of Software  
Decision trees are considered to be the most effective and widely used data mining technique for classification, their representation is intuitive and generally easy to be comprehended by humans. The most critical issue in the learning process of decision trees is the splitting criteria. In this paper, we firstly provide the definition of similarity computation that usually used in data clustering and apply it to the learning process of decision trees. Then, we propose a novel splitting
more » ... l splitting criteria which chooses the split with maximum similarity and the decision tree is called mstree. At the same time, we suggest the pruning methodology. The empirical experiments conducted on benchmark datasets have verified that the algorithm has outperformed some classic algorithms such as id3, c4.5 in the classification precision, and less affected by the size of training set.
doi:10.4304/jsw.7.8.1775-1782 fatcat:u5rnilp5ajcixleuqtvtkcw5am