Characterizing Duplicate Bugs: An Empirical Analysis

Berfin Kucuk, Eray Tuzun
2021 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)  
Bug handling is an essential part of the software development process. Ideally, in a bug tracking system, bugs are reported, fixed, verified, and closed. In some cases, bugs have to be reopened mostly due to an incorrect fix. However, instead of reopening the existing bug report, users may submit a new report on a previously reported bug, which causes duplicate bug reports. Additionally, users might report duplicate bugs if they are unable to reopen the previously reported bugs due to the bug
more » ... ing unresolved (i.e., in progress) and when they miss previously reported bug reports. These duplicate bug reports may cost extra maintenance efforts in triaging and fixing bugs. There have been several studies on characterizing reopened bugs and duplicate bug reports, however, to the best of our knowledge, there has been no prior work on understanding the dynamics of their intersection, which is missed reopen bugs. Our study is based on analyzing the differences between duplicate and non-duplicate bugs, and further categorizing the duplicated bugs. In this regard, we categorize duplicate bugs according to their creation time with respect to their master's resolution status as Master-Unresolved bugs and Master-Resolved (Missed Reopen bugs) to distinguish their properties. We compare these two different types of bugs in terms of various aspects such as their relationships to their master bugs, bug surface time, bug fix time, bug's severity, and the number of users involved. We perform case studies using the Eclipse and Mozilla projects' bug repositories that include more than 165,500 and 394,000 bug reports respectively. Index Terms-duplicate bug reports; reopened bugs; characterization study; bug management RQ2: How do Duplicate and Non-duplicate bugs differ?
doi:10.1109/saner50967.2021.00084 fatcat:gdglxy2durfnzfpytnuvvvqasi