Adaptive LL(*) parsing

Terence Parr, Sam Harwell, Kathleen Fisher
2014 Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications - OOPSLA '14  
Despite the advances made by modern parsing strategies such as PEG, LL(*), GLR, and GLL, parsing is not a solved problem. Existing approaches suffer from a number of weaknesses, including difficulties supporting side-effecting embedded actions, slow and/or unpredictable performance, and counterintuitive matching strategies. This paper introduces the ALL(*) parsing strategy that combines the simplicity, efficiency, and predictability of conventional top-down LL(k) parsers with the power of a
more » ... like mechanism to make parsing decisions. The critical innovation is to move grammar analysis to parsetime, which lets ALL(*) handle any non-left-recursive contextfree grammar. ALL(*) is O(n 4 ) in theory but consistently performs linearly on grammars used in practice, outperforming general strategies such as GLL and GLR by orders of magnitude. ANTLR 4 generates ALL(*) parsers and supports direct left-recursion through grammar rewriting. Widespread ANTLR 4 use (5000 downloads/month in 2013) provides evidence that ALL(*) is effective for a wide variety of applications.
doi:10.1145/2660193.2660202 dblp:conf/oopsla/ParrHF14 fatcat:qst6nncejvbcjbk7bjopzltfx4