Contribution to Decision Tree Induction with Python: A Review [chapter]

Bouchra Lamrini
2020 Data Mining - Methods, Applications and Systems [Working Title]  
Among the learning algorithms, one of the most popular and easiest to understand is the decision tree induction. The popularity of this method is related to three nice characteristics: interpretability, efficiency, and flexibility. Decision tree can be used for both classification and regression kind of problem. Automatic learning of a decision tree is characterised by the fact that it uses logic and mathematics to generate rules instead of selecting them based on intuition and subjectivity. In
more » ... nd subjectivity. In this review, we present essential steps to understand the fundamental concepts and mathematics behind decision tree from training to building. We study criteria and pruning algorithms, which have been proposed to control complexity and optimize decision tree performance. A discussion around several works and tools will be exposed to analyze the techniques of variance reduction, which do not improve or change the representation bias of decision tree. We chose Pima Indians Diabetes dataset to cover essential questions to understand pruning process. The paper's original contribution is to provide an up-to-date overview that is fully focused on implemented algorithms to build and optimize decision trees. This contributes to evolve future developments of decision tree induction.
doi:10.5772/intechopen.92438 fatcat:pb5gmmp54zf27kkedhiwxkfvui