Automatically Repairing Programs Using Both Tests and Bug Reports
As industry begins deploying automated program repair, concerns remain about repair quality. This paper develops a new fault localization (FL) technique that combines information from bug reports and test executions and uses it to improve the quality of program repair. We develop Blues, the first information-retrieval-based FL technique that localizes defects at the statement level without requiring training data. We further develop RAFL, the first unsupervised approach for combining multiple
... techniques (supervised techniques exist), and use it to create SBIR, which combines Blues with a spectrum-based (SBFL) technique. On a dataset of 815 real-world defects, SBIR consistently outperforms SBFL and Blues. For example, SBIR identifies a buggy statement as the most suspicious for 18.3% of the defects, while SBFL for 11.0% and Blues for 4.0%. Note that 18.3% is greater than 11.0% + 4.0%! Next, we compare the repair performance of three state-of-the-art repair tools, Arja, SequenceR, and SimFix, on 689 real-world defects using SBIR, SBFL, and Blues. Arja and SequenceR significantly benefit from SBIR: Arja using SBIR correctly repairs 10 (47.6%) more defects than using SBFL and 13 (72.2%) more defects than using Blues; for SequenceR, it's 4 (44.4%) and 8 (160.0%) more defects. SimFix, which employs tricks to overcome poor FL, correctly repairs the same number of defects using SBIR as when using SBFL, and 13 (81.3%) more defects than using Blues. Our findings that FL can be improved by combining multiple sources of information, and that program repair benefits greatly from improved FL suggest a fruitful direction for research into both FL and program repair.