GPR: A Data Mining Tool Using Genetic Programming
Communications of the Association for Information Systems
This paper proposes an inductive data mining technique (named GPR) based on genetic programming. Unlike other mining systems, the particularity of our technique is its ability to discover business rules that satisfy multiple (and possibly conflicting) decision or search criteria simultaneously. We present a step-by-step method to implement GPR, and introduce a prototype that generates production rules from real life data. We also report in this article on the use of GPR in an organization that
... eeks to understand how its employees make decisions in a "voluntary separation" program. Using a personnel database of 12,787 employees with 35 descriptive variables, our technique is able to discover employees' hidden decision making patterns in the form of production rules. As our approach does not require any domain specific knowledge, it can be used without any major modification in different domains. Most large organizations possess tremendous amounts of data stored in databases including financial information, personnel records, manufacturing data inventory information, and customer information. These data are accessed to produce reports, statistics, and business queries. Corporate managers finding themselves in the possession of large and rapidly growing databases are beginning to suspect that, despite the large amount of available output, information in their databases is not used to the fullest potential. With the limitations imposed by cognitive capabilities, they are unlikely to discover any but the most obvious and uninteresting patterns in the massive data. Mechanisms to find underlying patterns of behavior hidden in databases in critical business areas such as market intelligence, manufacturing process control, purchasing, and inventory management, can provide invaluable competitive advantage to the organization that uses them [Chung and Gray, 1999, Dhar, 1998]. The use of automated systems to find new knowledge is necessary and worthwhile because it is neither feasible nor cost effective to examine, analyze, and interpret the typically large corporate database manually in this pursuit [Smyth and Goodman, 1992] . In this paper, we present a novel approach to data mining that uses the principles of genetic programming to generate production rules from databases. Our approach is unique in that it easily accommodates knowledge discovery satisfying any user-specified criteria and is generic enough to offer wide applicability in a large number of data mining applications. We present GPR, an inductive data-mining system we developed. GPR uses the technique of genetic programming to discover rules. In the following section, we briefly define terminology and concepts related to knowledge discovery and the reasons for our focus on discovering production rules. In Section III we discuss the application of genetic programming to data mining. Section IV provides a detailed description of GPR, our prototype data mining tool. We illustrate the use of genetic programming for data mining with a detailed case study in Section V of a real-life application in military manpower management. In this Section we also present and discuss the significance of the results from GPR. The last two sections discuss related work and presents conclusions.