ANTLRWorks: an ANTLR grammar development environment

Jean Bovet, Terence Parr
2008 Software, Practice & Experience  
Programmers tend to avoid using language tools, resorting to ad hoc methods, because tools can be hard to use, their parsing strategies can be difficult to understand and debug, and their generated parsers can be opaque black-boxes. In particular, there are two very common difficulties encountered by grammar developers: understanding why a grammar fragment results in a parser non-determinism and determining why a generated parser incorrectly interprets an input sentence. This paper describes
more » ... LRWorks, a complete development environment for ANTLR grammars that attempts to resolve these difficulties and, in general, make grammar development more accessible to the average programmer. The main components are a grammar editor with refactoring and navigation features, a grammar interpreter, and a domain-specific grammar debugger. ANTLRWorks' primary contributions are a parser non-determinism visualizer based on syntax diagrams and a time-traveling debugger that pays special attention to parser decision-making by visualizing lookahead usage and speculative parsing during backtracking. Copyright are written entirely by hand without the use of automated language tools such as parser generators, tree walker generators, and other code generators. Programmers tend to avoid using language tools, resorting to ad hoc methods, partly because of the raw and low-level interface to these tools. The threat of having to contort grammars to resolve parser non-determinisms is enough to induce many programmers to build recursive-descent parsers by hand; some readers are familiar with LALR reduce-reduce warnings from YACC [2] or LL warnings from other parser generators. Programmers commonly resort to hand-built parsers despite the fact that grammars offer a more natural, high-fidelity, robust, and maintainable means of encoding a language-related problem. The ANTLR parser generator [3] attempts to make grammars more accessible to the average programmer by accepting a larger class of grammars than LL(k) and generating recursive-descent parsers that are very similar to what a programmer would build by hand. Still, developing grammars is a non-trivial task. Just as developers use integrated development environments to dramatically improve their productivity, programmers need a sophisticated development environment for building, understanding, and debugging grammars. Unfortunately, most grammar development is done today with a simple text editor. This paper introduces ANTLRWorks, a domain-specific development environment for ANTLR version 3 grammars that we built in order to: ANTLRWORKS 1307 immediate feedback about their correctness. Developers tend to test methods, rather than waste time mentally checking the functionality of a method, because it is so quick and easy. Similarly, being able to dynamically test rules as they are written can dramatically reduce development time. Once a grammar is more-or-less complete and the generated parser has been integrated into a larger application, the grammar interpreter is less useful primarily because it cannot execute embedded semantic actions. ANTLRWorks has a domain-specific debugger that attaches to language applications running natively via a network socket using a custom text-based protocol. Parsers generated by ANTLR with the -debug command-line option trigger debugging events that are passed over the socket to ANTLRWorks, which then visually represents the data structures and state of the parser. Because ANTLRWorks merely sees a stream of events, it can rewind and replay the parse multiple times by re-executing the events without having to restart the actual parser. This domain-specific time-travel debugging mechanism is similar to the more general framework of Bhansali et al. [4] . The primary advantage of the socket connection, however, is that the debugger can debug parsers generated in any programming language that has a socket library. Because ANTLR can generate parsers in many different target languages, we needed a mechanism capable of supporting more than just Java (ANTLR's implementation language). The debugger dynamically displays a parser's input stream, parse tree, generated abstract syntax tree (AST), rule invocation stack, and event stream as the user traces through the parser execution. The grammar, input, and tree display panes are always kept in sync so that clicking on, for example, an AST node shows the grammar element that created it and the token within the input stream from which it was created. ANTLRWorks has breakpoints and single-step facilities that allow programmers to stop the parser when it reaches a grammar location of interest or even an input phrase of interest. Sometimes it is useful to jump to a particular event (such as a syntax error) within the parse and then back up to examine the state of the parser before that event. To accommodate this, ANTLRWorks has a 'step backwards' facility. Complex language problems are often broken down into multiple phases with the first phase parsing the input and building an intermediate-form AST. This AST is then passed between multiple tree walkers to glean information or modify the AST. ANTLR accepts tree grammars and can automatically generate tree walkers, again in the form of a recursive-descent parser. The ANTLRWorks debugger graphically illustrates the node-by-node construction of ASTs as the parser being debugged constructs these nodes. ASTs grow and shrink as the developer steps forward and backwards in the parse. ANTLR treats tree grammars just like parser grammars except that the input is a tree instead of a flat token sequence. In the ANTLRWorks debugger, programmers can set breakpoints in the input tree and single step through tree grammars to detect errors just like when debugging a token stream parser. The following (second) section provides an overview of ANTLR syntax and LL( * ) parsing concepts required to understand the operation and appreciate the utility of ANTLRWorks. The third section describes the grammar interpreter feature, which is useful for rapid prototyping. The fourth section describes ANTLRWorks' debugger including information on its socket protocol, single stepping and breakpoints, dynamic AST display, and tree parser debugging. The fifth section describes some of the miscellaneous features found in ANTLRWorks. Finally, we discuss related work and then a few of the planned features. This paper is illustrated throughout with screen snapshots from ANTLRWorks.
doi:10.1002/spe.872 fatcat:uxibu66p6vhadoow7git3oy26y