Scaladays '15: Meerkat Parsers
Meerkat Parsers
An expression language
It’s common to formulate grammars in BNF-like formats and let the combinator generate the parser out of it.
Parsing as a search problem.
Parsing is like trying to get through a a maze: there is dead ends.
Head recursion
- Natural grammars are head-recursive (as you want to have a tree)
Methods: Backtracking schemes
- deterministic
- fast (like a ferrari)
- local backtracking (PEG, Paper in 2005)
- suffers from false positive recognitions
- full backtracking
- more expressivity
- you can go everywhere (like a jeep)
Nondeterminism and Ambiguity
Being non ambiguous does not always require being deterministic
Objective: Near-linear performance on near-deterministic grammars
Separation of Lexing and Parsing
- Lexer
- cuts character-stream into tokens
- save efforts for
- whitespace/comment
- longest match (mathing whole identifiers)
- keyword reservation
- context sensitivity required:
- indentation-sensitive languages
Meerkats
Combines the power of general parsing with disambiguational parsing and combinational filters. The best of both worlds: parser generators and parser combinators.
Paper: Faster, Practical GLL Parsing, CC'15
Shared packed parse forest
Build parsing trees and share subtrees. We need to make it binary.
Contributions
- implicits enable switching the layout depending on context
- enforce longest match by “must not follow”-combinator
- data-dependent parsers: parse-time evaluation mechanism and “~>>”-combinator takes a closure to evaluate those data
- library supports associativity groups
- library supports character-level filters
- library supports EBNF-constructs