A survey of Knowledge Discovery and Data Mining process models
LUKASZ A. KURGAN, PETR MUSILEK
2006
Knowledge engineering review (Print)
Knowledge Discovery and Data Mining is a very dynamic research and development area that is reaching maturity. As such, it requires stable and well-defined foundations, which are well understood and popularized throughout the community. This survey presents a historical overview, description and future directions concerning a standard for a Knowledge Discovery and Data Mining process model. It presents a motivation for use and a comprehensive comparison of several leading process models, and
more »
... cusses their applications to both academic and industrial problems. The main goal of this review is the consolidation of the research in this area. The survey also proposes to enhance existing models by embedding other current standards to enable automation and interoperability of the entire process. Introduction '. . . Knowledge Discovery is the most desirable end-product of computing. Finding new phenomena or enhancing our knowledge about them has a greater long-range value than optimizing production processes or inventories, and is second only to task that preserve our world and our environment. It is not surprising that it is also one of the most difficult computing challenges to do well . . . ' Gio Wiederhold (1996). Current technological progress permits the storage and access of large amounts of data at virtually no cost. Although many times preached, the main problem in a current information-centric world remains to properly put the collected raw data to use. The true value is not in storing the data, but rather in our ability to extract useful reports and to find interesting trends and correlations, through the use of statistical analysis and inference, to support decisions and policies made by scientists and businesses (Fayyad et al., 1996c) . Before any attempt can be made to perform the extraction of this useful knowledge, an overall approach that describes how to extract knowledge needs to be established. Therefore, the focus of this paper is not on describing the methods that can be used to extract knowledge from data, but rather on discussing the methodology that supports the process that leads to finding this knowledge. The main reason for establishing and using process models is to organize the Knowledge Discovery and Data Mining (KDDM) projects within a common framework. The models help organizations to understand the Knowledge Discovery process and provide a road map to follow while planning and carrying out the projects. This in turn results in time and cost savings, and in a better understanding and acceptance of such projects. The first step is to understand that such processes are not trivial, but rather involve multiple steps, reviews and iterations. To date, there have been several attempts made to develop such models, with varying degrees of success. This paper summarizes the state-of-the-art in this subject area, and discusses future research directions. The main motivation for this paper is a lack of a comprehensive overview and comparison of KDDM models. Although several models have been developed that have received broad attention of both research and industrial communities, they have been usually discussed separately, making their comparison and selection of the most suitable model a daunting task. This survey is organized as follows. First, basic definitions concerning the Knowledge Discovery domain, motivation for the existence of process models, and a historical overview are provided in Section 2. Next, in Section 3, several leading models are reviewed and discussed. A formal comparison of the models and their applications in both research and industrial context are presented in Section 4. Finally, future trends in this area are discussed and conclusions are provided in Sections 5 and 6, respectively.
doi:10.1017/s0269888906000737
fatcat:pzqrmrntp5g23i4avzqkgculry