Ground-Truth Prediction to Accelerate Soft-Error Impact Analysis for Iterative Methods

Burcu O. Mutlu, Gokcen Kestor, Adrian Cristal, Osman Unsal, Sriram Krishnamoorthy
2019 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)  
Understanding the impact of soft errors on applications can be expensive. Often, it requires an extensive error injection campaign involving numerous runs of the full application in the presence of errors. In this paper, we present a novel approach to arrive at the ground truth-the true impact of an error on the final output-for iterative methods by observing a small number of iterations to learn deviations between normal and error-impacted execution. We develop a machine learning based
more » ... r for three iterative methods to generate groundtruth results without running them to completion for every error injected. We demonstrate that this approach achieves greater accuracy than alternative prediction strategies, including three existing soft error detection strategies. We demonstrate the effectiveness of the ground truth prediction model in evaluating vulnerability and the effectiveness of soft error detection strategies in the context of iterative methods.
doi:10.1109/hipc.2019.00048 dblp:conf/hipc/MutluKCUK19 fatcat:wyiuvdpi5jgcvhaj4gcn5jtacq