Incremental Data Mining Using Concurrent Online Refresh of Materialized Data Mining Views [chapter]

Mikołaj Morzy, Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz
2005 Lecture Notes in Computer Science  
Data mining is an iterative process. Users issue series of similar data mining queries, in each consecutive run slightly modifying either the definition of the mined dataset, or the parameters of the mining algorithm. This model of processing is most suitable for incremental mining algorithms that reuse the results of previous queries when answering a given query. Incremental mining algorithms require the results of previous queries to be available. One way to preserve those results is to use
more » ... terialized data mining views. Materialized data mining views store the mined patterns and refresh them as the underlying data change. Data mining and knowledge discovery often take place in a data warehouse environment. There can be many relatively small materialized data mining views defined over the data warehouse. Separate refresh of each materialized view can be expensive, if the refresh process has to re-discover patterns in the original database. In this paper we present a novel approach to materialized data mining view refresh process. We show that the concurrent on-line refresh of a set of materialized data mining views is more efficient than the sequential refresh of individual views. We present the framework for the integration of data warehouse refresh process with the maintenance of materialized data mining views. Finally, we prove the feasibility of our approach by conducting several experiments on synthetic data sets. The patterns discovered as the result of the execution of a data mining algorithm can be regarded as an answer to a sophisticated database query. A user defines the set of mined data using standard SQL commands and determines the parameters governing a given data mining algorithm. In response, relevant patterns are returned to the user for evaluation. Users usually do not achieve satisfying results immediately. It is an iterative process, where in each consecutive step the user evaluates the patterns and, suitably to the needs, expectations, and experience, modifies either the mined dataset, or algorithm parameters, or both. Because of this iterative and repetitive nature of mining processing, a data mining system must efficiently exploit the results of previous queries when fulfilling user requests. Data warehouses facing similar requirements in on-line analytical processing are materializing the results of queries as snapshots and rewrite incoming queries to use the materialized data. The same principle applies to data mining systems, where previously discovered patterns are stored in materialized data mining views and used to efficiently answer user queries.
doi:10.1007/11546849_29 fatcat:i72raoykcjac5o2yibc2k4oy4i