A Universal Dynamic LL(*) Parser
written in and for the Go programming language.
romshark/llparser is a dynamic recursive-descent top-down parser which parses any given input stream by trying to recursively match the root rule. It's universal in that it supports any LL(*) grammar.
This library allows building parsers in Go with relatively good error messages and flexible, even dynamic LL(*) grammars which may mutate at runtime. It parses the input stream into a typed parse-tree and allows action hooks to be executed when a particular rule is matched.
A grammar always begins with a root rule. A rule is a non-terminal symbol. Non-terminals are nodes of the parse-tree that consist of other non-terminals or terminals while terminals are leaf-nodes. A rule consists of a Designation
, a Pattern
, a Kind
and an Action
:
mainRule := &llparser.Rule{
Designation: "name of the rule",
Kind: 100,
Pattern: &llparser.Exact{
Kind: 101,
Expectation: []rune("string"),
},
Action: func(f llparser.Fragment) error {
log.Print("the rule was successfuly matched!")
return nil
},
}
Designation
defines the optional logical name of the rule and is used for debugging and error reporting purposes.Kind
defines the type identifier of the rule. If this field isn't set then zero (untyped) is used by default.Pattern
defines the expected pattern of the rule. This field is required.Action
defines the optional callback which is executed when this rule is matched. The action callback may return an error which will make the parser stop and fail immediately.
Rules can be nested:
ruleTwo := &llparser.Rule{
Pattern: &llparser.Exact{
Kind: 101,
Expectation: []rune("string"),
},
}
ruleOne := &llparser.Rule{
Pattern: ruleTwo,
}
Rules can also recurse:
rule := &llparser.Rule{Kind: 1}
rule.Pattern = llparser.Sequence{
&llparser.Exact{Expectation: []rune("=")},
&llparser.Repeated{
Min: 0,
Max: 1,
Pattern: rule,
}, // potential recursion
}
Exact
expects a particular sequence of characters to be lexed:
Pattern: &llparser.Exact{
Kind: SomeKindConstant,
Expectation: []rune("some string"),
},
Lexed
tries to lex an arbitrary sequence of characters according to Fn
:
Pattern: &llparser.Lexed{
Designation: "some lexed terminal",
Kind: SomeKindConstant,
MinLen: 1,
Fn: func(index uint, cursor llparser.Cursor) bool {
if index > 0 && cursor.File.Src[cursor.Index] == '|' {
// Stop lexing on `|` when it's not the first rune
return false
}
// Continue lexing
return true
},
},
The lexing is aborted when Fn
returns false
, otherwise the lexer
advances by 1 rune. The first parameter index
defines the index of the
lexed sequence. MinLen
defines the minimum number of runes required
to be matched.
Sequence
expects a specific sequence of patterns:
Pattern: llparser.Sequence{
somePattern,
llparser.Term(SomeKindConstant),
&llparser.Repeated{
Min: 0,
Max: 1,
Pattern: &llparser.Exact{
Kind: SomeKindConstant,
Expectation: []rune("foo"),
},
},
},
Repeated
tries to match a number of repetitions of a single pattern:
Pattern: &llparser.Repeated{
Pattern: somePattern,
},
- Setting
Min
to a greater number thanMax
whenMax
is greater0
is illegal and will cause a panic. - Setting
Min
to0
andMax
to1
is equivalent to declaring an optional. - Setting both
Min
andMax
to0
is equivalent to declaring an unlimited number of repetitions. - Setting
Min
to a positive number will require at leastMin
number of repetitions. - Setting
Max
to a positive number will matchMax
number of occurrences and stop matching the pattern. Min
andMax
are0
by default.
Either
expects either of the given patterns selecting the first match:
Pattern: llparser.Either{
somePattern,
anotherPattern,
},
Not
expects the given pattern to not match. It'll make the parser return an ErrUnexpectedToken
error if the given pattern is matched successfully.
Pattern: llparser.Not{
Pattern: somePattern,
},
A parse-tree defines the serialized representation of the parsed input stream and consists of Fragment
interfaces represented by the main fragment returned by llparser.Parse
. A fragment is a typed chunk of the source code pointing to a start and end position in the source file, defining the kind of the chunk and referring to its child-fragments.
Normally, when the parser fails to match the provided grammar it returns an
ErrUnexpectedToken
error which is rather generic and doesn't reflect the actual
mistake with a comprehensive error message. To improve the quality of the returned
error messages an error-rule can be provided which the parser tries to match at
the position of an unexpected token. If the error-rule is matched successfully
the error returned by the matched Action
callback is returned. If the error-rule
doesn't match then the default ErrUnexpectedToken
error is returned as usual.
grammar := &parser.Rule{
Pattern: parser.Sequence{
&parser.Exact{Expectation: []rune("foo")},
&parser.Exact{Expectation: []rune("...")},
},
}
errRule := &parser.Rule{
Pattern: parser.Either{
parser.OneOrMore{
Pattern: &parser.Exact{Expectation: []rune(";")},
},
parser.OneOrMore{
Pattern: &parser.Exact{Expectation: []rune(".")},
},
},
Action: func(fr parser.Fragment) error {
// Return a convenient error message instead of a generic one
return fmt.Errorf("expected 3 dots, got %d", len(fr.Src()))
},
}
mainFrag, err := pr.Parse(src, grammar, errRule)
if err != nil {
log.Fatal("Parser error: ", err)
}
Since rules can be recursive it often makes sense to specify a recursion level limit by setting Parser.MaxRecursionLevel
:
pr := newParser(t, grammar, errGrammar)
// Don't recurse over more than 1000 levels
pr.MaxRecursionLevel = 1000
This will prevent the parser from falling into endless recursion and panicking when the goroutine stack exceeds its size limit and instead make it abort parsing when the specified limit is reached.
Setting Parser.MaxRecursionLevel
to 0
will disable recursion tracking, which is disabled by default.
Parser.MaxRecursionLevel
shouldn't be altered while the parser is parsing!