A new parsing method for non-LR(1) grammars

A. Gayler Harford, Vincent P. Heurinc, Michael G. Main
1992 Software, Practice & Experience  
One of the difficult problems that faces a compiler writer is to devise a grammar that is suitable for both efficient parsing and semantic attribution. This paper describes a system that resolves conflicts in LR(l) parsing by taking advantage of information in the parse tree. The system, which functions as part of a compiler generator, rewrites the user's grammar to remove parsing conficts. It then places code into the generated compiler that rewrites the parse tree during parsing so as to
more » ... ce the tree of the original grammar. The compiler writer can then write the semantic attribution to fit his or her original grammar without any knowledge of the changes made. The method is expected to be efficient in most cases, even in parsing systems that do not explicitly build the entire parse tree. The method complements previous work in its capabilities and advantages. The system has been implemented and integrated into a compiler generator system. A. G. HARFORD , V. P. HEURING AND M. G. MAIN As an example, consider the following grammar fragment: One symbol of look ahead is insufficient to distinguish between 〈 leftside 〉 and 〈 expression 〉 (the grammar may not even be LR( k ) if expressions maybe arbitrarily long strings). Removing the conflict requires not only rewriting the grammar but rethinking the semantic attribution as well. We have devised an enhancement of LR parsing for compiler generators which loosens the parsing constraints while leaving the user's semantic attribution intact. A grammar may be viewed as describing a set of parse trees in which each internal node represents a production of the grammar and each leaf corresponds to a basic symbol in the language. The leaves of the tree, taken left-to-right, correspond to a sentence in the language. The parse tree contains information that goes beyond the information inherent in each node. This is information about the relations between productions. Some parsers explicitly build the parse tree in memory so that this information can be used in the semantic computations. We propose that this information can also be used during the parsing process itself to resolve parsing conflicts. This paper describes a tree rewriting algorithm called tris (tree rewriting system) which solves parsing conflicts by rewriting the grammar. Tris then puts code into the compiler that rewrites appropriate fragments of the parse tree during parsing so that the final tree corresponds to the original unmodified grammar. In a parser that does not explicitly build the parse tree during parsing, the code generated by tris would construct just those parts of the tree needed to ascertain the correct parse. This scheme relieves the compiler writer of much of the burden of constructing a grammar suitable for the mechanics of parsing and leaves the writer free to tailor the grammar to the needs of the semantic attribution. The semantic attribution can be written for the original grammar without reference to the changes which tris has made. The system can be integrated into a compiler writing system so that the compiler writer need not even be aware of the changes that have been made to the grammar. An experimental implementation of tris has been integrated into the Eli compiler generating system . 9 We will show that tris can be used to generate a working compiler of reasonable efficiency. In addition, we will show that the approach is suitable for other compiler generator systems and that the rewriting of fragments of the parse tree should not unduly affect the parsing efficiency in systems that do not explicitly build the parse tree. The following sections informally describe and discuss the method. Formal proofs will be found in Reference 10.
doi:10.1002/spe.4380220505 fatcat:ke5peqds35czzb7oh56wz43at4