RUGRAT: Evaluating program analysis and testing tools and compilers with large generated random benchmark applications

Ishtiaque Hussain, Christoph Csallner, Mark Grechanik, Qing Xie, Sangmin Park, Kunal Taneja, B. M. Mainul Hossain
2014 Software, Practice & Experience  
Benchmarks are heavily used in different areas of computer science to evaluate algorithms and tools. In program analysis and testing, open-source and commercial programs are routinely used as benchmarks to evaluate different aspects of algorithms and tools. Unfortunately, many of these programs are written by programmers who introduce different biases, not to mention that it is very difficult to find programs that can serve as benchmarks with high reproducibility of results. We propose a novel
more » ... pproach for generating random benchmarks for evaluating program analysis and testing tools and compilers. Our approach uses stochastic parse trees, where language grammar production rules are assigned probabilities that specify the frequencies with which instantiations of these rules will appear in the generated programs. We implemented our tool for Java and applied it to generate a set of large benchmark programs of up to 5M LOC each with which we evaluated different program analysis and testing tools and compilers. The generated benchmarks let us independently rediscover several issues in the evaluated tools. This article makes the following contributions: • We apply stochastic parse trees for generating random application benchmarks. In stochastic parse trees, language grammar production rules are assigned probabilities that specify the frequencies with which instantiations of these rules will appear in the generated programs. • We implemented RUGRAT for Java and used it to generate dozens of applications, ranging from 300 LOC to 5M LOC, to benchmark several versions of a popular Java source to bytecode compiler as well as popular program analysis and testing tools. This version of RUGRAT is open source software and available for download from the RUGRAT tool web site § . OUR APPROACH: LEVERAGING STOCHASTIC GRAMMARS TO GENERATE LARGE RANDOM BENCHMARK APPLICATIONS In this section we present a model for RUGRAT and discuss our goals and approach for generating random object-oriented benchmark applications. Specifically, we review the stochastic grammar model [16, 17] that is at the core of our technique and discuss how we can apply the stochastic model to generating large-scale benchmark applications that resemble real-world applications. Then, we list the benefits of our approach over handwritten programs. Background: Stochastic Grammar Model Consider that every program is an instance of the grammar of the language in which this program is written. Typically, grammars are used in compiler construction to write parsers that check the syntactic validity of a program and transform its source code into a parse tree [18] . An opposite use of the grammar is to generate branches of a parse tree for different production rules, where each rule is assigned the probability with which it is instantiated in a program. These grammars and parse trees are called stochastic, and they are widely used in natural language processing, speech recognition, information retrieval [19] , and also in generating SQL statements for testing database engines [15] . We use a stochastic grammar model to generate large random object-oriented programs. Random programs are constructed based on the stochastic grammar model, and the construction process can be described as follows. Starting with the top production rules of the grammar, each nonterminal is recursively replaced with its corresponding production rule. When more than one production rule is available to replace a nonterminal, a rule is randomly chosen based on the rules' probabilities. Terminals are replaced with randomly generated identifiers and values that preserve syntax rules of the given language. Termination conditions for this process of generating programs include the limit on the size of the program or selected complexity metrics. In addition to the rules that are found in a typical context-free grammar of a programming language, our approach takes into account additional rules and constraints that are imposed by the programming language specification. For example, a variable has to be defined before it can be used and a non-abstract class in an object-oriented program has to implement all abstract methods it inherits from its super-types. With such an enhanced stochastic grammar model it is ensured that the generated program is syntactically correct and compiles. The construction process can be finetuned by varying the ranges of different configuration parameter values and limiting the grammar to a subset of the production rules that are important for evaluating specific RAT tools (e.g., recursion, use of arrays, or use of different data types can be turned off if a RAT approach does not address these). Our Goal and Approach We address one main goal-to allow experimenters to automatically generate benchmark applications that have desired properties for evaluating RAT approaches and tools. We do not see §
doi:10.1002/spe.2290 fatcat:wwqskrvs2rgxfmpxq5xxyshktu