WOWMON: A Machine Learning-based Profiler for Self-adaptive Instrumentation of Scientific Workflows

Xuechen Zhang, Hasan Abbasi, Kevin Huck, Allen D. Malony
2016 Procedia Computer Science  
Performance debugging using program profiling and tracing for scientific workflows can be extremely difficult for two reasons. 1) Existing performance tools lack the ability to automatically produce global performance data based on local information from coupled scientific applications of workflows, particularly at runtime. 2) Profiling/tracing with static instrumentation may incur high overhead and significantly slow down science-critical tasks. To gain more insights on workflows we introduce
more » ... lightweight workflow monitoring infrastructure, WOW-MON (WOrkfloW MONitor), which enables user's access not only to cross-application performance data such as end-to-end latency and execution time of individual workflow components at runtime, but also to customized performance events. To reduce profiling overhead, WOW-MON uses adaptive selection of performance metrics based on machine learning algorithms to guide profilers collecting only metrics that have most impact on performance of workflows. Through the study of real scientific workflows (e.g., LAMMPS) with the help of WOWMON, we found that the performance of the workflows can be significantly affected by both software and hardware factors, such as the policy of process mapping and in-situ buffer size. Moreover, we experimentally show that WOWMON can reduce data movement for profiling by up to 54% without missing the key metrics for performance debugging.
doi:10.1016/j.procs.2016.05.474 fatcat:jr7uyavgcfelncbfwnkz5n66ma