An extended Chi2 algorithm for discretization of real value attributes

Chao-Ton Su, Jyh-Hwa Hsu
2005 IEEE Transactions on Knowledge and Data Engineering  
The Variable Precision Rough Sets (VPRS) model is a powerful tool for data mining, as it has been widely applied to acquire knowledge. Despite its diverse applications in many domains, the VPRS model unfortunately cannot be applied to real-world classification tasks involving continuous attributes. This requires a discretization method to preprocess the data. Discretization is an effective technique to deal with continuous attributes for data mining, especially for the classification problem.
more » ... e modified Chi2 algorithm is one of the modifications to the Chi2 algorithm, replacing the inconsistency check in the Chi2 algorithm by using the quality of approximation, coined from the Rough Sets Theory (RST), in which it takes into account the effect of degrees of freedom. However, the classification with a controlled degree of uncertainty, or a misclassification error, is outside the realm of RST. This algorithm also ignores the effect of variance in the two merged intervals. In this study, we propose a new algorithm, named the extended Chi2 algorithm, to overcome these two drawbacks. By running the software of See5, our proposed algorithm possesses a better performance than the original and modified Chi2 algorithms.
doi:10.1109/tkde.2005.39 fatcat:iccp42mfebbttkngw27ekuab7u