A Statistical Decision Tree Algorithm for Data Stream Classification
english

Mirela Teixeira Cazzolato, Marcela Xavier Ribeiro, Cristiane A. Yaguinuma, Marilde Terezinha Prado Santos
2013 Proceedings of the 15th International Conference on Enterprise Information Systems  
A large amount of data is generated daily. Credit card transactions, monitoring networks, sensors and telecommunications are some examples among many applications that generate large volumes of data in an automated way. Data streams storage and knowledge extraction techniques differ from those used on traditional data. In the context of data stream classification many incremental techniques has been proposed. In this paper we present an incremental decision tree algorithm called StARMiner Tree
more » ... ST), which is based on Very Fast Decision Tree (VFDT) system, which deals with numerical data and uses a method based on statistics as a heuristic to decide when to split a node and also to choose the best attribute to be used in the test at a node. We applied ST in four datasets, two synthetic and two real-world, comparing its performance to the VFDT. In all experiments ST achieved a better accuracy, dealing well with noise data and describing well the data from the earliest examples. However, in three of four experiments ST created a bigger tree. The obtained results indicate that ST is a good classifier using large and smaller datasets, maintaining good accuracy and execution time.
doi:10.5220/0004447202170223 dblp:conf/iceis/CazzolatoRYS13 fatcat:i6ycgcjoxbd3bgbkbvqqdzw3su