Learning universal probabilistic models for fault localization
Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering - PASTE '10
Recently there has been significant interest in employing probabilistic techniques for fault localization. Using dynamic dependence information for multiple passing runs, learning techniques are used to construct a probabilistic graph model for a given program. Then, given a failing run, the probabilistic model is used to rank the executed statements according to the likelihood of them being faulty. In this paper we present a novel probabilistic approach in which universal probabilistic models
... obabilistic models are learned to characterize the behaviors of various instruction types used by all programs. The universal probabilistic model for an instruction type is in form of a probability distribution that represents how errors in the input (operand) values are propagated as errors in the output (result) of a given instruction type. Once these models have been constructed, they can be used in the analysis of any program as follows. Given a set of runs for any program, including at least one passing and one failing run, a Bayesian network called the Error Flow Graph (EFG) is then constructed from the dynamic dependence graphs of the program runs and the universal probabilistic models. Standard inference algorithms are employed to compute the probability of each executed statement being faulty. We also present optimizations to reduce the runtime cost of inference using the EFG. Our experiments demonstrate that our approach is highly effective in fault localization even when very few passing runs are available. It also performs well in the presence of multiple faults.