Keeping track of user steering actions in dynamic workflows

Renan Souza, Vítor Silva, Jose J. Camata, Alvaro L.G.A. Coutinho, Patrick Valduriez, Marta Mattoso
2019 Future generations computer systems  
In long-lasting scientific workflow executions in HPC machines, computational scientists (the users in this work) often need to fine-tune several workflow parameters. These tunings are done through user steering actions that may significantly improve performance (e.g., reduce execution time) or improve the overall results. However, in executions that last for weeks, users can lose track of what has been adapted if the tunings are not properly registered. In this work, we build on provenance
more » ... management to address the problem of tracking online parameter fine-tuning in dynamic workflows steered by users. We propose a lightweight solution to capture and manage provenance of the steering actions online with negligible overhead. The resulting provenance database relates tuning data with data for domain, dataflow provenance, execution, and performance, and is available for analysis at runtime. We show how users may get a detailed view of the execution, providing insights to determine when and how to tune. We discuss the applicability of our solution in different domains and validate its ability to allow for online capture and analyses of parameter finetunings in a real workflow in the Oil and Gas industry. In this experiment, the user could determine which tuned parameters influenced simulation accuracy and performance. The observed overhead for keeping track of user steering actions at runtime is less than 1% of total execution time. * Vítor Silva is currently at Dell EMC. defining steering points, checking-points and rolling-back, refining loop conditions, reducing datasets, modification of filter conditions, and parameter tuning [3] . Parameter tuning is by far the mostly supported one by computational steering solutions [1, 2, [4] [5] [6] [7] [8] [9] [10] . Due to the large number of parameters and combinations of values, uncontrolled parameter fine tunings may lead to rework and difficulties in overall data analysis. In iterative workflows, where several parameters drive each iteration, analyzing results from initial executions may suggest better settings for the parameters in the following ones. For example, training deep neural networks in large datasets is complex, time consuming, demands parallel computation, and user steering. Often, machine learning experts fine tune the training hyperparameters (e.g., learning rate, batch size, number of training iterations) based on the evolution of the performance of the model and the training time [11] . In Astronomy applications, users may set up data and input parameters to assemble custom mosaics of the sky. During the execution, data analyses may identify that certain input parameters produced images with poor image resolution or quality, making it harder to identify an interesting celestial object. Such parameters can be modified at runtime. In Computational Fluid Dynamics applications, users tune several parameters of the underlying numerical methods [12] . As a result, fine tunings can generate major improvements in performance, resource consumption, and quality of results [13] . Despite the current initiatives to support computational steering in large-scale scientific computing, such as surveyed in [1, 2] , it remains an open problem [13, 14] . Computational steering solutions [1,2,4-10] allow for steering actions. Capturing and registering user steering data (e.g., why the user decided to tune, what were the values before and after the tuning, who and when tuned), relating them to other relevant data (e.g., domainspecific strategic values, execution state of the simulation when the tuning happened, performance data), and allowing all these data to be efficiently integrated and queried at runtime deliver important advantages to the user. They contribute to online data analysis and data-driven decisions. On the other hand, failing to capture steering data has several disadvantages. It may compromise experiment reproducibility and results' reliability as users hardly remember what and how dataflow elements were modified (especially modifications in early stages), and what happened to the execution because of a specific adaptation. This is more critical when users adapt several times in long experiments, which may last for weeks. In addition to losing track of changes, one misses opportunities to learn from the adaptation data (i.e., data generated when humans adapt a certain dataflow element) with the associated dataflow. For example, by registering adaptation data, one may query the data and discover that when parameters are changed to certain range of values, the output result improves by a defined amount. Moreover, opportunities to use the data for AI-based assistants recommending on what to adapt next, based on a database of adaptations, are lost. In this work, we build on provenance data management to address the problem of keeping track of online parameter fine-tuning in dynamic workflows steered by users. In two recent surveys [13, 14] , the authors report that solutions for online provenance management and human-in-the-loop of workflows are lacking. To capture and manage provenance of the steering actions online, we consider three steps and their challenges: Challenge 1: Online data analysis. Online data analysis is essential for monitoring, debugging and user steering. In workflows with several parameters to be setup, the user needs to inspect the evolution of results, correlate them with specific input parameter values, and determine which input value is influencing specific outputs [2] . Otherwise, the user will hardly know what or when it should be tuned. According to a recent report [13] , current online data analysis solutions are not aware of parameter combinations and their relations with output values. Challenge 2: Register the steering action. Several systems support user steering and parameter fine-tuning [1,2,4-10], but none of them track the steering actions. Not tracking the steering actions jeopardizes the experiment reproducibility. In [13] , the authors also state that it is still a challenge to develop a sufficiently descriptive and detailed provenance model to represent steering to enable processing, optimization, validation, interpretation, and reproducibility. Challenge 3: Evaluate the steering action. Enabling online data analyses aware of human adaptations supports data-driven decisions, retrieval of recorded human actions, and understanding of how they relate to the workflow execution status (e.g., how a user action impacts the processing time?). To evaluate the adaptations, the user needs an online query support to access who, when, what was adapted, and how the steering action relates to other data. In previous works [15, 16] , we show how applications can benefit from online analysis for steering, supporting Challenge 1, but we are not aware of other works that have addressed the latter two challenges. In [17], we presented an abstract with preliminary ideas to investigate the potential for registering steering actions. In this paper, we formalize steering actions and propose DfAdapter, a lightweight solution to capture and analyze online steering actions in workflows. To evidence the benefits of keeping track of online parameter fine-tuning, we explore a motivating real case study in the Oil and Gas industry. There are over 50 configuration parameters and their values have a direct impact on the simulation. With the aid of online data analysis, the user can understand which parameters are needed to be tuned and do the adjustment, often several times. For example, the user may identify online regions of interest, which should have more iterations and higher resolution, and regions that can be processed in a lower resolution. This requires adapting several times rather than choosing one single best configuration for the whole workflow execution. There are several advantages in using DfAdapter with an HPC machine to control online the fine-tuning of the workflow. First, users can evaluate which specific parameter and which ranges of values they modified at runtime led to reduction of memory consumption. Second, DfAdapter helps the user with more data and ways to query these data to allow for better data-driven decisions. More specifically, by using data captured by DfAdapter, the users can verify which parameters were modified, at which iteration in the loop, and when (in time) their steering actions caused the simulation execution time to be reduced by a certain amount, leading their simulation to finish faster, with results they found satisfying. Finally, we observe that the overhead added by DfAdapter for provenance and steering action tracking account for less than 1% of the total execution time. Paper organization. Section 2 presents our motivating case study work. Section 3 presents related work. Section 4 presents our approach for tracking online steering in dataflows. Section 5 presents DfAdapter. In Section 6, we discuss our approach applied to two real-world scientific workflows in the Astronomy and in the Oil and Gas domains. Section 7 shows the experiments. Section 8 concludes.
doi:10.1016/j.future.2019.05.011 fatcat:hppilac4vzbl7mgwqdgv3t27ca