Automated Knowledge Discovery from Simulators [chapter]

M.C. Burl, D. DeCoste, B.L. Enke, D. Mazzoni, W.J. Merline, L. Scharenbroich
2006 Proceedings of the 2006 SIAM International Conference on Data Mining  
In this paper, we explore one aspect of knowledge discovery from simulators, the landscape characterization problem, where the aim is to identify regions in the input/parameter/model space that lead to a particular output behavior. Large-scale numerical simulators are in widespread use by scientists and engineers across a range of government agencies, academia, and industry; in many cases, simulators provide the only means to examine processes that are infeasible or impossible to study
more » ... . However, the cost of simulation studies can be quite high, both in terms of the time and computational resources required to conduct the trials and the manpower needed to sift through the resulting output. Thus, there is strong motivation to develop automated methods that enable more efficient knowledge extraction. Unlike traditional data mining, knowledge discovery from simulators is not limited to a static, pre-determined dataset; instead, the simulator itself can be used as an oracle to generate new data of our own choosing. We exploit this opportunity by employing active learning and support vector machines (SVMs) to choose which are the most valuable simulation trials to run next. On two realworld scientific simulators, one for asteroid collisions and one for magnetospheric modeling, we demonstrate twofold and sixfold reductions, respectively, in the number of simulator trials required to achieve a particular level of fidelity in landscape characterization as compared with a standard grid-based sampling approach.
doi:10.1137/1.9781611972764.8 dblp:conf/sdm/BurlDEMMS06 fatcat:pezj2xl67zhkxdugluiuc6muty