Automatically Repairing Programs Using Both Tests and Bug Reports [article]

Manish Motwani, Yuriy Brun
2022 arXiv   pre-print
As industry begins deploying automated program repair, concerns remain about repair quality. This paper develops a new fault localization (FL) technique that combines information from bug reports and test executions and uses it to improve the quality of program repair. We develop Blues, the first information-retrieval-based FL technique that localizes defects at the statement level without requiring training data. We further develop RAFL, the first unsupervised approach for combining multiple
more » ... techniques (supervised techniques exist), and use it to create SBIR, which combines Blues with a spectrum-based (SBFL) technique. On a dataset of 815 real-world defects, SBIR consistently outperforms SBFL and Blues. For example, SBIR identifies a buggy statement as the most suspicious for 18.3% of the defects, while SBFL for 11.0% and Blues for 4.0%. Note that 18.3% is greater than 11.0% + 4.0%! Next, we compare the repair performance of three state-of-the-art repair tools, Arja, SequenceR, and SimFix, on 689 real-world defects using SBIR, SBFL, and Blues. Arja and SequenceR significantly benefit from SBIR: Arja using SBIR correctly repairs 10 (47.6%) more defects than using SBFL and 13 (72.2%) more defects than using Blues; for SequenceR, it's 4 (44.4%) and 8 (160.0%) more defects. SimFix, which employs tricks to overcome poor FL, correctly repairs the same number of defects using SBIR as when using SBFL, and 13 (81.3%) more defects than using Blues. Our findings that FL can be improved by combining multiple sources of information, and that program repair benefits greatly from improved FL suggest a fruitful direction for research into both FL and program repair.
arXiv:2011.08340v3 fatcat:zh4m7ywu5jc5vpzf4uk32h4flu