Clustering from Data Streams [chapter]

João Gama
2017 Encyclopedia of Machine Learning and Data Mining  
Clustering is one of the most popular data mining techniques. In this article, we review the relevant methods and algorithms for designing cluster algorithms under the data streams computational model, and discuss research directions in tracking evolving clusters. Definition Clustering is the process of grouping objects into different groups, such that the common properties of data in each subset is high, and between different subsets is low. The data stream clustering problem is defined as to
more » ... m is defined as to maintain a continuously consistent good clustering of the sequence observed so far, using a small amount of memory and time. The issues are imposed by the continuous arriving data points, and the need to analyze them in real time. These characteristics requires incremental clustering, maintaining cluster structures that evolve over time. Moreover, the data stream may evolve over time, and new clusters might appear, other disappears, reflecting the dynamics of the stream.
doi:10.1007/978-1-4899-7687-1_41 fatcat:blemwt2r4bhqhch4mk3rfls5ve