The Effect of Coincidental Correctness on Defect Detection: an Empirical Study
According to the PIE model, three conditions must be met for failure to be observed: 1) the defect is executed, 2) the program is infected, and 3) the infection has propagated to the output. Weak coincidental correctness (CC) occurs when the program produces the correct output, while condition 1) is satisfied but 2) and 3) are not satisfied. Strong coincidental correctness occurs when a correct output is observed, while both conditions 1) and 2) are satisfied but not 3). In prior work, we
... rior work, we analytically demonstrated that CC is a safety reducing factor for coverage-based fault localization (CBFL). However, we did not experimentally validate that fact, which we do in this paper. Specifically, we comparatively evaluated the performance of CBFL using ten different suspiciousness metrics when: a) both weak and strong CC tests are present; b) no weak nor strong CC tests are present; c) only weak CC tests are present; d) only strong CC tests are present. Our experiments showed that when the CC tests are discarded, in most cases the suspiciousness score of the defective statement increased and its EXAM ranking score also improved. The metrics that benefited most from discarding CC tests are: Tarantula, Ample, Ochiai, Dstar2, and Dstar3. Whereas, discarding CC tests had no effect on Russel, Wong1, and Binary. However, the latter three metrics were the worst performers in regard to the EXAM score.