Tolerant parsing using modified LR(1) and LL(1) algorithms with embedded "Any" symbol

A.V. Goloveshkin
<span title="">2019</span> <i title="Institute for System Programming of the Russian Academy of Sciences"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/q5rpshlfgfb5vn5yo6kwvnmsqe" style="color: black;">Proceedings of the Institute for System Programming of RAS</a> </i> &nbsp;
Tolerant parsing is a form of syntax analysis aimed at capturing the structure of certain points of interest presented in a source code. While these points should be well-described in a tolerant grammar of the language, other parts of the program are allowed to be described coarse-grained, thereby parser remains tolerant to the possible variations of the irrelevant area. Island grammars are one of the basic tolerant parsing techniques. "Islands" term is used as the relevant code alias, the
more &raquo; ... evant code is called "water". Efforts required to write water rules are supposed to be as small as possible. Previously, we extended island grammars theory and introduced a novel formal concept of a simplified grammar based on the idea of eliminating water description by replacing it with a special "Any" symbol. To work with this concept, a standard LL(1) parsing algorithm was modified and LanD parser generator was developed. In the paper, "Any"-based modification is described for LR(1) parsing algorithm. In comparison with LL(1) tolerant grammars, LR(1) tolerant grammars are easier to develop and explore due to solid island rules. Supplementary "Any" processing techniques are introduced to make this symbol easier to use while staying in the boundaries of the given simplified grammar definition. Specific error recovery algorithms are presented both for LL and LR tolerant parsing. They allow one to further minimize the number and complexity of water rules and make tolerant grammars extendible. In the experiments section, results of a large-scale LL and LR tolerant parsers testing on the basis of 9 open-source project repositories are presented.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.15514/ispras-2019-31(3)-1">doi:10.15514/ispras-2019-31(3)-1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hgh5b6nxgrfzpbtmubzs23rbci">fatcat:hgh5b6nxgrfzpbtmubzs23rbci</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200213102246/https://ispras.ru/proceedings/docs/2019/31/3/isp_31_2019_3_7.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/06/6a/066a4615e601873cc17756fde5d81e1233d0a8be.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.15514/ispras-2019-31(3)-1"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>