A systematic literature review of how mutation testing supports quality assurance processes

Qianqian Zhu, Annibale Panichella, Andy Zaidman
2018 Software testing, verification & reliability  
Mutation testing has been very actively investigated by researchers since the 1970s and remarkable advances have been achieved in its concepts, theory, technology and empirical evidence. While the most influential realisations have been summarised by existing literature reviews, we lack insight into how mutation testing is actually applied. Our goal is to identify and classify the main applications of mutation testing and analyse the level of replicability of empirical studies related to
more » ... n testing. To this aim, this paper provides a systematic literature review on the application perspective of mutation testing based on a collection of 191 papers published between 1981 and 2015. In particular, we analysed in which quality assurance processes mutation testing is used, which mutation tools and which mutation operators are employed. Additionally, we also investigated how the inherent core problems of mutation testing, i.e., the equivalent mutant problem and the high computational cost, are addressed during the actual usage. The results show that most studies use mutation testing as an assessment tool targeting unit tests, and many of the supporting techniques for making mutation testing applicable in practice are still underdeveloped. Based on our observations, we made nine recommendations for future work, including an important suggestion on how to report mutation testing in testing experiments in an appropriate manner. Prepared using stvrauth.cls [Version: 2010/05/13 v2.00] 2 Q. ZHU ET AL. survey of more than 390 papers on mutation testing that Jia and Harman published in 2011 [1] . Jia and Harman's survey highlights the research achievements that have been made over the years, including the development of tools for a variety of languages and empirical studies performed [1] . Additionally, they highlight some of the actual and inherent problems of mutation testing, amongst others: (1) the high computational cost caused by generating and executing the numerous mutants and (2) the tremendous time-consuming human investigation required by the test oracle problem and equivalent mutant detection. While existing surveys (e.g., [1, 2, 6]) provide us with a great overview of the most influential realisations in research, we lack insight into how mutation testing is actually applied. Specifically, we are interested in analysing in which quality assurance processes mutation testing is used, which mutation tools are employed and which mutation operators are used. Additionally, we want to investigate how the aforementioned problems of the high computational cost and the considerable human effort required are dealt with when applying mutation testing. In order to steer our research, we aim to fulfil the following objectives: • to identify and classify the applications of mutation testing in quality assurance processes; • to analyse how the main problems are coped with when applying mutation testing; • to provide guidelines for applying mutation testing in testing experiments; • to identify gaps in current research and to provide recommendations for future work. As systematic literature reviews have been shown to be good tools to summarise existing evidence concerning a technology and identify gaps in current research [7], we follow this approach for reaching our objectives. We only consider the articles which provide sufficient details on how mutation testing is used in their studies, i.e., we require at least a brief specification about the adopted mutation tool, mutation operators or mutation score. Moreover, we selected only papers that use mutation testing as a tool for evaluating or improving other quality assurance processes rather than focusing on the development of mutation tools, operators or challenges and open issues for mutation testing. This resulted in a collection containing 191 papers published from 1981 to 2015. We analysed this collection in order to answer the following two research questions: RQ1: How is mutation testing used in quality assurance processes? This research question aims to identify and classify the main software testing tasks where mutation testing is applied. In particular, we are interested in the following key aspects: (1) in which circumstances mutation testing is used (e.g., assessment tool), (2) which quality assurance processes are involved (e.g., test data generation, test case prioritisation), (3) which test level it targets (e.g., unit level) and (4) which testing strategies it supports (e.g., structural testing). The above four detailed aspects are defined to characterise the essential features related to the usage of mutation testing and the quality assurance processes involved. With these elements in place, we can provide an in-depth analysis of the applications of mutation testing. RQ2: How are empirical studies related to mutation testing designed and reported? The objective of this question is to synthesise empirical evidence related to mutation testing. The case studies or experiments play an inevitable role in a research study. The design and demonstration of the evaluation methods should ensure the replicability. For replicability, we mean that the subject, the basic methodology, as well as the result, should be clearly pointed out in the article. In particular, we are interested in how the articles report the following information related to mutation testing: A SYS. LITER. REV. OF HOW MUTATION TESTING SUPPORT QUALITY ASSURANCE PROCESSES 3 (1) mutation tools, (2) mutation operators, (3) mutant equivalence problem, (4) techniques for reduction of computational cost and (5) subject programs used in the case studies. After gathering this information, we can draw conclusions from the distribution of related techniques adopted under the above five facets and thereby provide guidelines for applying mutation testing and reporting the used setting/tools. The remainder of this review is organised as follows: Section 2 provides an overview on background notions on mutation testing. Section 3 details the main procedures we followed to conduct the systematic literature review and describes our inclusion and exclusion criteria. Section 4 presents the discussion of our findings, particularly Section 4.3 summarises the answers to the research questions, while Section 4.4 provides recommendations for future research. Section 5 discusses the threats to validity, and Section 6 concludes the paper. BACKGROUND In order to level the playing field, we first provide the basic concepts related to mutation testing, i.e., its fundamental hypothesis and generic process, including the Competent Programmer Hypothesis, the Coupling Effect, mutation operators and the mutation score. Subsequently, we discuss the benefits and limitations of mutation testing. After that, we present a historical overview of mutation testing where we mainly address the studies that concern the application of mutation testing. 2.1. Basic Concepts 2.1.1. Fundamental Hypothesis. Mutation testing starts with the assumption of the Competent Programmer Hypothesis (introduced by DeMillo et al. [4] in 1978): " The competent programmers create programs that are close to being correct." This hypothesis implies that the potential faults in the programs delivered by the competent programmers are just very simple mistakes; these defects can be corrected by a few simple syntactical changes. Inspired by the above hypothesis, mutation testing typically applies small syntactical changes to original programs, thus implying that the faults that are seeded resemble faults made by "competent programmers". At first glance, it seems that the programs with complex errors cannot be explicitly generated by mutation testing. However, the Coupling Effect, which was coined by DeMillo et al. [4] states that "Test data that distinguishes all programs differing from a correct one by only simple errors is so sensitive that it also implicitly distinguishes more complex errors". This means complex faults are coupled to simple faults. This hypothesis was later supported by Offutt [8, 9] through empirical investigations over the domain of mutation testing. In his experiments, he used first-order mutants, which are created by applying the mutation operator to the original program once, to represent simple faults. Conversely, higher-order mutants, which are created by applying mutation operators to the original program more than once, stand for complex faults. The results showed that the test data generated for first-order mutants killed a higher percentage of mutants when applied to higher-order mutants, thus yielding positive empirical evidence about the Coupling Effect. Besides, there has been a considerable effort in validating the coupling effect hypothesis, amongst others the theoretical studies of Wah [10] [11] [12] and Kapoor [13]. . The Generic Mutation Testing Process. After introducing the fundamental hypotheses of mutation testing, we are going to give a detailed description of the generic process of mutation testing: Given a program P and a test suite T , a mutation engine makes syntactic changes to the program P : the rule that specifies syntactic variations are defined as a mutation operator, and the result of one application of a mutation operator is a set of mutants M. After that, each mutant P m ∈ M is executed against T to verify whether test cases in T fail or not. Here is an example of a mutation operator, i.e., Arithmetic Operator Replacement (AOR), on a statement X=a+b. The produced mutants include X=a-b, X=a×b, and X=a÷b. The execution results of T on P m ∈ M are compared with P : (1) if the output of P m is different from P , then P m is killed by T ; (2) otherwise, i.e., the output of P m is the same as P , this leads to either (2.1) P m is equivalent to P , which means that they are syntactically different but functionally equivalent; or (2.2) T is not adequate to detect the mutants, which requires test case augmentation.
doi:10.1002/stvr.1675 fatcat:cfkjrerjefcqxp6fr7usbacsmy