Early Conflict Detection with Mined Models

Leonardo Mariani, Daniela Micucci, Fabrizio Pastore
2014 2014 IEEE International Symposium on Software Reliability Engineering Workshops  
Developers are increasingly adopting Source Code Management (SCM) systems with extensive support to branching, parallel development, and merging, such as Git and Mercurial. For example, 62% of the Debian projects use modern SCM environments [1] , and 40% of the medium and large enterprises surveyed in [2] use Git. The use of the branching logic provided by modern SCM gives flexibility to developers but also produces issues when multiple branches of development have to be merged together. For
more » ... ed together. For example, Brun et al. report that 24% of merge operations in open source projects generate textual conflicts, build problems, and test failures [3]; Microsoft developers report that most of the time dedicated to merge operations is spent on resolving conflicts and verifying correctness [4] . In a nutshell, merging multiple branches is a painful, expensive, and error-prone process that requires specific techniques to be handled efficiently. Techniques for the early detection of conflicts and unexpected interactions among changes on multiple branches can reduce the effort required to cope with software evolution and concurrent development by a significant factor. So far, Brun et al. [3] and Guimaraes et al. [5] have investigated the idea of anticipating conflict detection by merging, locally to the developers' workspaces, the code in the working copy with the code extracted from other branches. The analysis executed on the developers' machines can detect textual conflicts, build problems, and test failures, before the merge actually takes place in the SCM system. These approaches, although useful to anticipate the discovery of some conflicts, suffer several practical limitations: the conflict identification mechanism is limited to textual conflicts and test case execution, which may miss several subtle faults hard to discover and fix [4]; the analysis is executed on the developer's machine while the verification of evolving software systems should take place on the SCM server, without bothering developers; and finally the behavior of software systems is not limited to functional behavior but also includes other dimensions, such as the temporal behavior, that may evolve across branches and that is ignored by these techniques. The key idea introduced in this paper consists of running a multi-branch server-side dynamic analysis at every commit operation. The analysis will execute the test cases available in the SCM system to trace the behavior of the application and automatically derive models that capture how the program behaves according to multiple dimensions. For instance, the functional behavior of the program can be represented with method pre-and post-conditions, API usage protocols, and precedence rules among method invocations. The temporal behavior of a program can be represented with models that capture aspects, such as deadlines, periodicity, and constraints on the timing of the tasks. These models are used on the server side to run automated conflict detection. Models are derived for every version in every branch, and automatically compared every time a change is introduced. Comparing models allows identifying behavioral conflicts regardless the presence of textual conflicts, which do not need to be resolved to run the analysis. We call this analysis Behavioral Driven Continuous Integration (BDCI). By raising the analysis to the behavioral level, BDCI can dramatically improve the rate of conflicts that are detected and resolved early in the process, as well as the rate of the bugs due to the concurrent modifications of the software that are revealed and fixed as soon as they are introduced. As a consequence, the cost and the effort required to complete merge operations will drastically decrease, and the capability to timely evolve software will significantly improve. II. BDCI The key idea of BDCI is to automatically derive models that represent the behavior of a program under evolution at multiple program locations, either recently changed or not, and raise the detection of conflicts from the source code level to the level of behavioral models. Although in the following we present an example referring to the analysis of the functional behavior of a program, the same kind of analysis can be instantiated to address other dimensions, such as resource consumption and timing of the operations. Figure 1 shows some of the functional models that could be automatically mined with BDCI for a simple program. The method pre-and post-conditions capture information about the values that can be assigned to program variables. In the example, port must be positive when a socket is created and open must be true for the returned socket. Finite state automata (FSA) indicate how components interact. In the example, the automaton shows how the function sendData uses the socket library. Precedence rules indicate the execution order between operations. In the example, the creation of the conf.xml file must always precede the creation of the socket. While in a traditional SCM only the test cases and the project source files are stored for each change, in a BDCI environment each program version and its behavioral models
doi:10.1109/issrew.2014.33 dblp:conf/issre/MarianiMP14 fatcat:lqhzj5la2jdqfpwy3oykgyx2hy