Integrating conceptual and logical couplings for change impact analysis in software

Huzefa Kagdi, Malcom Gethers, Denys Poshyvanyk
2012 Empirical Software Engineering  
The paper presents an approach that combines conceptual and evolutionary techniques to support change impact analysis in source code. Conceptual couplings capture the extent to which domain concepts and software artifacts are related to each other. This information is derived using Information Retrieval based analysis of textual software artifacts that are found in a single version of software (e.g., comments and identifiers in a single snapshot of source code). Evolutionary couplings capture
more » ... e extent to which software artifacts were co-changed. This information is derived from analyzing patterns, relationships, and relevant information of source code changes mined from multiple versions in software repositories. The premise is that such combined methods provide improvements to the accuracy of impact sets compared to the two individual approaches. A rigorous empirical assessment on the changes of the open source systems Apache httpd, ArgoUML, iBatis, KOffice, and jEdit is also reported. The impact sets are evaluated at the file and method levels of granularity for all the software systems considered in the empirical evaluation. The results show that a combination of conceptual and evolutionary techniques, across several cutoff points and periods of history, provides statistically significant improvements in accuracy over either of the two techniques used independently. Improvements in F-measure values of up to 14% (from 3% to 17%) over the conceptual technique in ArgoUML at the method granularity, and up to 21% over the evolutionary technique in iBatis (from 9% to 30%) at the file granularity were reported. Zimmermann et al. 2005; Canfora et al. 2010; Kagdi et al. 2010) . Although ample progress has been made, there still remains much work to be done in further improving the effectiveness (e.g., accuracy) of the state-of-the-art IA techniques. Our goal is to develop a new and improved IA approach by utilizing some of the existing solutions. Central to our approach are the information sources that are developer/human centric (e.g., comments and identifiers, and commit practices), rather than (formal) language/artifact centric (e.g., static and dynamic dependencies such as call graphs). In this paper, we present an approach that combines conceptual and evolutionary couplings to support IA in source code. Conceptual couplings capture the extent to which domain concepts and software artifacts are related to each other. This information is derived using Information Retrieval based analysis of textual software artifacts that are found in a single version of software (e.g., comments and identifiers in a single snapshot of source code). This analysis focused on a single version is consistent with its previous usages in IA (Antoniol et al. 2000; Poshyvanyk et al. 2009 ). Evolutionary couplings capture the extent to which software artifacts were co-changed. This information is derived from analyzing patterns, relationships, and relevant information of source code changes mined from multiple versions in software repositories. The core research philosophy behind our approach is that present+past of software systems leads to better IA. For IA, both single (present) and multiple versions (past) analysis methods have been utilized independently, but their combined use has not been previously investigated. Our larger research objective is focused on the investigation of these combinations of IR and MSR techniques for IA. The combinations presented in this paper are a fundamental and necessary baseline step in this direction. We investigate two different combinations, i.e., disjunctive and conjunctive, and compute impact sets at varying source code granularity levels (e.g., files and methods). Our primary research hypothesis is that such combined methods provide improvements to the accuracy of impact sets. An extensive empirical study on hundreds of changes from open source systems, such as Apache httpd, ArgoUML, iBatis, KOffice, and jEdit, was conducted to test the research hypothesis. The results of the study show that the disjunctive combination of IR and MSR techniques, across several cut-off points (impact set sizes), provides statistically significant improvements in accuracy over either of the two standalone techniques. For example, the disjunctive method reported improvements in F-measure values of up to 14% (from 3% to 17%) over the conceptual technique in ArgoUML at the method granularity, and up to 21% over the evolutionary technique in iBatis (from 9% to 30%) at the file granularity. Also, we found that using larger history periods for computing evolutionary couplings improves impact analysis results for the combined technique. These results are encouraging considering that the combinations do not require an overly complex blending of two standalone approaches. We significantly extends our previous work (Kagdi et al. 2010) . In particular, we present detailed analysis results at the method level granularity for all the studies software systems: Apache httpd, ArgoUML, iBatis, and KOffice. These results were not available in (Kagdi et al. 2010) . We added and analyzed data from another software system (jEdit) for the file and method granularity levels. Also, we extended the statistical tests to all the systems for both file and method levels of granularity. Finally, we investigated an additional research question (RQ3) in our empirical evaluation that studies the impact of history on the accuracy of our approach on Apache httpd, ArgoUML, iBatis, and KOffice software systems. The rest of the paper is organized as follows. Section 2 provides a brief discussion of the related work, whereas section 3 presents our combined approach. The empirical assessment is presented in Section 4. We conclude in Section 5.
doi:10.1007/s10664-012-9233-9 fatcat:526cysn5l5f2pnriupkzdtfjvy