Active Mining of Data Streams [chapter]

Wei Fan, Yi-an Huang, Haixun Wang, Philip S. Yu
2004 Proceedings of the 2004 SIAM International Conference on Data Mining  
Most previously proposed mining methods on data streams make an unrealistic assumption that "labelled" data stream is readily available and can be mined at anytime. However, in most real-world problems, labelled data streams are rarely immediately available. Due to this reason, models are refreshed periodically, that is usually synchronized with data availability schedule. There are several undesirable consequences of this "passive periodic refresh". In this paper, we propose a new concept of
more » ... mand-driven active data mining. It estimates the error of the model on the new data stream without knowing the true class labels. When significantly higher error is suspected, it investigates the true class labels of a selected number of examples in the most recent data stream to verify the suspected higher error.
doi:10.1137/1.9781611972740.46 dblp:conf/sdm/FanHWY04 fatcat:rfe3kgzdubgtlf6xeabi6h3u2u