Improving the static analysis of embedded languages via partial evaluation

David Herman, Philippe Meunier
2004 SIGPLAN notices  
Programs in embedded languages contain invariants that are not automatically detected or enforced by their host language. We show how to use macros to easily implement partial evaluation of embedded interpreters in order to capture invariants encoded in embedded programs and render them explicit in the terms of their host language. We demonstrate the effectiveness of this technique in improving the results of a value flow analysis. Every practical programming language contains small programming
more » ... languages. For example, C's printf [18] supports a stringbased output formatting language, and Java [3] supports a declarative sub-language for laying out GUI elements in a window. PLT Scheme [9] offers at least five such languages: one for formatting console output; two for regular expression matching; one for sending queries to a SQL server; and one for laying out HTML pages. In many cases, though not always, programs in these embedded special-purpose programming languages are encoded as strings. Library functions consume these strings and interpret them. Often the interpreters consume additional arguments, which they use as inputs to the little programs. Take a look at this expression in PLT Scheme: (regexp-match "http://([a-z.]*)/([a-z]*)/" line) The function regexp-match is an interpreter for the regular expression language. It consumes two arguments: a string in the regular expression language, which we consider a program, and another string, which is that program's input. A typical use looks like the example above. The first string is actually specified at the call site, while the second string is often given by a variable or an expression that reads from an input port. The interpreter attempts to match the regular expression and the second string. In PLT Scheme, the regular expression language allows programmers to specify subpatterns via parentheses. Our running example contains two such subexpressions: ([a-z.]*) and ([a-z]*). If the regular expression interpreter fails to match the regular expression and the string, it produces false (#f); otherwise it produces a list with n + 1 elements: the first one for the overall match plus one per subexpression. Say line stands for "http://aaa.bbb.edu/zzz/" In this case, the regular expression matches the string, and regexp-match produces the list (list "http://aaa.bbb.edu/zzz/" "aaa.bbb.edu" "zzz") The rest of the Scheme program extracts the pieces from this list and computes with them. The regexp-match expression above is a simplified excerpt from the PLT Web Server [12] . Here is a slightly larger fragment: (let ([r (regexp-match "http://([a-z.]*)/([a-z]*)/" line)]) (if r (process-url (third r) (dispatch (second r))) (log-error line))) Notice how the then-clause of the if-expression extracts the second
doi:10.1145/1016848.1016857 fatcat:tvnmm74lezhmpi4o3ie4ziag7a