Resource-aware kernel density estimators over streaming data

Christoph Heinz, Bernhard Seeger
2006 Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06  
A variety of real-world applications heavily relies on the analysis of transient data streams. Due to the rigid processing requirements of data streams, common analysis techniques as known from data mining are not applicable. A fundamental building block of many data mining and analysis approaches is density estimation. It provides a well-defined estimation of a continuous data distribution, a fact which makes its adaptation to data streams desirable. A convenient method for density estimation
more » ... tilizes kernels. However, its computational complexity collides with the rigid processing requirements of data streams. In this work, we present a new approach to this problem that combines linear processing cost with a constant amount of allocated memory. We even support a dynamic memory adaption to changing system resources. Our kernel density estimators over streaming data are related to M-Kernels, a previously proposed technique, but substantially improve them in terms of accuracy as well as processing time. The results of an experimental study with synthetic as well as real-world data streams substantiate the efficiency of our approach and its superiority to M-Kernels with respect to estimation quality and processing time. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. With respect to these requirements, we specifically aim to provide kernel density estimators over data streams.
doi:10.1145/1183614.1183772 dblp:conf/cikm/HeinzS06 fatcat:kose47v4jfbdtghincdne7obka