When process data quality affects the number of bugs: Correlations in software engineering datasets

Adrian Bachmann, Abraham Bernstein
2010 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)  
Software engineering process information extracted from version control systems and bug tracking databases are widely used in empirical software engineering. In prior work, we showed that these data are plagued by quality deficiencies, which vary in its characteristics across projects. In addition, we showed that those deficiencies in the form of bias do impact the results of studies in empirical software engineering. While these findings affect software engineering researchers the impact on
more » ... ctitioners has not yet been substantiated. In this paper we, therefore, explore (i) if the process data quality and characteristics have an influence on the bug fixing process and (ii) if the process quality as measured by the process data has an influence on the product (i.e., software) quality. Specifically, we analyze six Open Source as well as two Closed Source projects and show that process data quality and characteristics have an impact on the bug fixing process: the high rate of empty commit messages in Eclipse, for example, correlates with the bug report quality. We also show that the product quality -measured by number of bugs reported --is affected by process data quality measures. These findings have the potential to prompt practitioners to increase the quality of their software process and its associated data quality. Abstract-Software engineering process information extracted from version control systems and bug tracking databases are widely used in empirical software engineering. In prior work, we showed that these data are plagued by quality deficiencies, which vary in its characteristics across projects. In addition, we showed that those deficiencies in the form of bias do impact the results of studies in empirical software engineering. While these findings affect software engineering researchers the impact on practitioners has not yet been substantiated. In this paper we, therefore, explore (i) if the process data quality and characteristics have an influence on the bug fixing process and (ii) if the process quality as measured by the process data has an influence on the product (i.e., software) quality. Specifically, we analyze six Open Source as well as two Closed Source projects and show that process data quality and characteristics have an impact on the bug fixing process: the high rate of empty commit messages in Eclipse, for example, correlates with the bug report quality. We also show that the product quality -measured by number of bugs reported -is affected by process data quality measures. These findings have the potential to prompt practitioners to increase the quality of their software process and its associated data quality.
doi:10.1109/msr.2010.5463286 dblp:conf/msr/BachmannB10 fatcat:fpztjcnsffbe3dyyjrfksnm3va