Temporal Structure Learning for Clustering Massive Data Streams in Real-Time [chapter]

Michael Hahsler, Margaret H. Dunham
2011 Proceedings of the 2011 SIAM International Conference on Data Mining  
This paper describes one of the first attempts to model the temporal structure of massive data streams in real-time using data stream clustering. Recently, many data stream clustering algorithms have been developed which efficiently find a partition of the data points in a data stream. However, these algorithms disregard the information represented by the temporal order of the data points in the stream which for many applications is an important part of the data stream. In this paper we propose
more » ... a new framework called Temporal Relationships Among Clusters for Data Streams (TRACDS) which allows us to learn the temporal structure while clustering a data stream. We identify, organize and describe the clustering operations which are used by state-of-the-art data stream clustering algorithms. Then we show that by defining a set of new operations to transform Markov Chains with states representing clusters dynamically, we can efficiently capture temporal ordering information. This framework allows us to preserve temporal relationships among clusters for any state-of-the-art data stream clustering algorithm with only minimal overhead. To investigate the usefulness of TRACDS, we evaluate the improvement of TRACDS over pure data stream clustering for anomaly detection using several synthetic and realworld data sets. The experiments show that TRACDS is able to considerably improve the results even if we introduce a high rate of incorrect time stamps which is typical for real-world data streams.
doi:10.1137/1.9781611972818.57 dblp:conf/sdm/HahslerD11 fatcat:qtq3jk7yezg2xlcni4nj2xxeqq