Exploring error-sensitive attributes and branches of decision trees
2013 Ninth International Conference on Natural Computation (ICNC)
Decision trees have a reputation of being efficient and illustrative in classification learning, and majority of the research effort has been focused on making classification improvement in a head-on style with wide-range research topics, such as tree algorithm development and refinement, attribute selection and prioritization, sampling technique improvement, and the addition of cost matrix and other performance-enhancing factors. One less commonly studied topic is about the characteristics of
... characteristics of classification errors and how they may be associated with specific attributes due to correlation or causation, and within what value ranges on such attributes when pattern are most likely. This research intends to study this dim area in a sort-of reverse and forensic style as part of post-classification analysis, to analyze the patterns and relationship between errors and attributes, to explore how attributes' risk level in error may play a role in leading to more risky, more error-prone decision tree branches or decision paths. Possible benefits from this study would include raising data stakeholders' awareness of such specific errorsensitive attributes and decision paths, to facilitate better understanding of possible causes and impact of errors and the development of more effective error-reduction measures customized to suit the specific patterns and individual datasets. Such emphasis on highlighting the specific error-sensitive attributes and decision branches within individual datasets is a reflection of our observation which shared by others -"additional domain-specific knowledge, external to the training set, must be employed to estimate the noise level (... and) the underlying model's complexity ... (because) knowledge-poor tree induction algorithms do not exploit such information."  Keywords -decision tree, error-sensitive attribute, errorsensitive tree branch, feature selection, post-classification analysis I.