Fast packrat parsing in a live programming environment
- Language developers who design domain-specific languages or new language features need a way to make fast changes to language definitions. Those fast changes require immediate feedback. Also, it should be possible to parse the developed languages quickly to handle extensive sets of code. Parsing expression grammars provides an easy to understand method for language definitions. Packrat parsing is a method to parse grammars of this kind, but this method is unable to handle left-recursion properly. Existing solutions either partially rewrite left-recursive rules and partly forbid them, or use complex extensions to packrat parsing that are hard to understand and cost-intensive. We investigated methods to make parsing as fast as possible, using easy to follow algorithms while not losing the ability to make fast changes to grammars. We focused our efforts on two approaches. One is to start from an existing technique for limited left-recursion rewriting and enhance it to work for general left-recursive grammars. The second approachLanguage developers who design domain-specific languages or new language features need a way to make fast changes to language definitions. Those fast changes require immediate feedback. Also, it should be possible to parse the developed languages quickly to handle extensive sets of code. Parsing expression grammars provides an easy to understand method for language definitions. Packrat parsing is a method to parse grammars of this kind, but this method is unable to handle left-recursion properly. Existing solutions either partially rewrite left-recursive rules and partly forbid them, or use complex extensions to packrat parsing that are hard to understand and cost-intensive. We investigated methods to make parsing as fast as possible, using easy to follow algorithms while not losing the ability to make fast changes to grammars. We focused our efforts on two approaches. One is to start from an existing technique for limited left-recursion rewriting and enhance it to work for general left-recursive grammars. The second approach is to design a grammar compilation process to find left-recursion before parsing, and in this way, reduce computational costs wherever possible and generate ready to use parser classes. Rewriting parsing expression grammars is a task that, if done in a general way, unveils a large number of cases such that any rewriting algorithm surpasses the complexity of other left-recursive parsing algorithms. Lookahead operators introduce this complexity. However, most languages have only little portions that are left-recursive and in virtually all cases, have no indirect or hidden left-recursion. This means that the distinction of left-recursive parts of grammars from components that are non-left-recursive holds great improvement potential for existing parsers. In this report, we list all the required steps for grammar rewriting to handle left-recursion, including grammar analysis, grammar rewriting itself, and syntax tree restructuring. Also, we describe the implementation of a parsing expression grammar framework in Squeak/Smalltalk and the possible interactions with the already existing parser Ohm/S. We quantitatively benchmarked this framework directing our focus on parsing time and the ability to use it in a live programming context. Compared with Ohm, we achieved massive parsing time improvements while preserving the ability to use our parser it as a live programming tool. The work is essential because, for one, we outlined the difficulties and complexity that come with grammar rewriting. Also, we removed the existing limitations that came with left-recursion by eliminating them before parsing.…
Author details: | Friedrich Eichenroth, Patrick ReinORCiD, Robert HirschfeldORCiDGND |
---|---|
URN: | urn:nbn:de:kobv:517-opus4-491242 |
DOI: | https://doi.org/10.25932/publishup-49124 |
ISBN: | 978-3-86956-503-3 |
ISSN: | 1613-5652 |
ISSN: | 2191-1665 |
Title of parent work (German): | Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam |
Subtitle (English): | improving left-recursion in parsing expression grammars |
Publication series (Volume number): | Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam (135) |
Publisher: | Universitätsverlag Potsdam |
Place of publishing: | Potsdam |
Publication type: | Monograph/Edited Volume |
Language: | English |
Year of first publication: | 2022 |
Publication year: | 2022 |
Publishing institution: | Universität Potsdam |
Publishing institution: | Universitätsverlag Potsdam |
Release date: | 2022/05/16 |
Tag: | left recursion; packrat parsing; parsing expression grammars |
Issue: | 135 |
Number of pages: | 79 |
RVK - Regensburg classification: | ST 230 |
Organizational units: | Digital Engineering Fakultät / Hasso-Plattner-Institut für Digital Engineering GmbH |
DDC classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
Publishing method: | Universitätsverlag Potsdam |
Open Access / Gold Open-Access | |
Peer review: | Nicht ermittelbar |
License (German): | Keine öffentliche Lizenz: Unter Urheberrechtsschutz |