Automating Knowledge Discovery Workflow Composition Through Ontology-Based Planning

Monika Zakova, Petr Kremen, Filip Zelezny, Nada Lavrac
2011 IEEE Transactions on Automation Science and Engineering  
The problem addressed in this paper is the challenge of automated construction of knowledge discovery workflows, given the types of inputs and the required outputs of the knowledge discovery process. Our methodology consists of two main ingredients. The first one is defining a formal conceptualization of knowledge types and data mining algorithms by means of knowledge discovery ontology. The second one is workflow composition formalized as a planning task using the ontology of domain and task
more » ... scriptions. Two versions of a forward chaining planning algorithm were developed. The baseline version demonstrates suitability of the knowledge discovery ontology for planning and uses Planning Domain Definition Language (PDDL) descriptions of algorithms; to this end, a procedure for converting data mining algorithm descriptions into PDDL was developed. The second directly queries the ontology using a reasoner. The proposed approach was tested in two use cases, one from scientific discovery in genomics and another from advanced engineering. The results show the feasibility of automated workflow construction achieved by tight integration of planning and ontological reasoning. Note to Practitioners-The use of advanced knowledge engineering techniques is becoming popular not only in bioinformatics, but also in engineering. One of the main challenges is therefore to efficiently extract relevant information from large amounts of data from different sources. For example, in product engineering, the focus of project SEVENPRO, efficient reuse of knowledge can be significantly enhanced by discovering implicit knowledge in past designs, which are described by product structures, CAD designs and technical specifications. Fusion of relevant data requires the interplay of diverse specialized algorithms. Therefore, traditional data mining techniques are not straightforwardly applicable. Rather, complex knowledge discovery workflows are required. Knowledge about the algorithms and principles of their applicability cannot be expected from the end user, e.g., a product engineer. A formal capture of this knowledge is thus needed, to serve as a basis for intelligent computational support of workflow composition. Therefore we developed a knowledge discovery (KD) ontology describing knowledge types and algorithms required for complex knowledge discovery tasks. A planning algorithm was implemented and employed to assemble workflows for the task specified by the user's input-output Manuscript task requirements. Two versions of the planning algorithm were developed. The first one uses standard PDDL descriptions of algorithms, accessible to third party planning algorithms. A procedure for converting algorithm descriptions into PDDL was developed. The second directly queries the ontology using a reasoner. The proposed approach was tested in two use cases, one from genomics and another from product engineering. The results show the feasibility of automated workflow construction achieved by tight integration of planning and ontological reasoning. The generated workflows can be executed on the SEVENPRO platform; however, since they are annotated using the KD ontology, the planner can be integrated into other workflow execution environments. Index Terms-Data mining, knowledge management. 1 http://www.geneontology.org/ 1545-5955/$26.00
doi:10.1109/tase.2010.2070838 fatcat:rxfbnxy4d5aurgknuslsjj32ja