Finding compiler bugs via live code mutation
Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications - OOPSLA 2016
Validating optimizing compilers is challenging because it is hard to generate valid test programs (i.e., those that do not expose any undefined behavior). Equivalence Modulo Inputs (EMI) is an effective, promising methodology to tackle this problem. Given a test program with some inputs, EMI mutates the program to derive variants that are semantically equivalent w.r.t. these inputs. The state-of-the-art instantiations of EMI are Orion and Athena, both of which rely on deleting code from or
... g code from or inserting code into code regions that are not executed under the inputs. Although both have demonstrated their ability in finding many bugs in GCC and LLVM, they are still limited due to their mutation strategies that operate only on dead code regions. This paper presents a novel EMI technique that allows mutation in the entire program (i.e., both live and dead regions). By removing the restriction of mutating only the dead regions, our technique significantly increases the EMI variant space. It also helps to more thoroughly stress test compilers as compilers must optimize mutated live code, whereas mutated dead code might be eliminated. Finally, our technique also makes compiler bugs more noticeable as miscompilations on mutated dead code may not be observable. We have realized the proposed technique in Hermes.