How Detrimental is Coincidental Correctness to Defect Detection?

Wes Masri
2018 Figshare  
According to the PIE and RIP models, three conditions must be satisfied for program failure to occur: 1) the defect's location must execute or be reached; 2) the program's state must become infected; and 3) the infection must propagate to the output. Weak coincidental correctness (or weak CC) occurs when the program produces the correct output, while condition 1) is satisfied but 2) and 3) are not satisfied. Strong coincidental correctness (or strong CC) occurs when the output is correct, while
more » ... both conditions 1) and 2) are satisfied, but not 3). In the literature, typically coincidental correctness (CC) refers to strong CC. Researchers have recognized the presence of CC and analytically demonstrated that it is a safety-reducing factor for spectrum-based fault localization (SBFL). However, they did not empirically validate that fact, which we do in this paper. Specifically, using the Defects4J benchmark, we comparatively evaluated the performance of SBFL using 52 different suspiciousness metrics when: a) both weak and strong CC tests are present (TwsCC); b) no weak nor strong CC tests are present (TnoCC); c) weak CC tests are present (TwCC); and d) strong CC tests are present (TsCC). Similarly, using five multi-fault Java programs, we evaluated the performance of greedy Test Suite Reduction (TSR) in the presence and absence of CC. That is, we empirically studied the impact of CC on defect detection using two commonly used techniques. Using 49 out of the 52 metrics, our results showed with statistical significance that SBFL performs better when using TwCC, TsCC, and TnoCC than when using TwsCC. They also showed that TnoCC yields the best performance followed by TsCC, and then TwCC</ [...]
doi:10.6084/m9.figshare.7077203 fatcat:mr3ywqttvvf6jmsfym62syejfm