1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
|
[section {PEG serialization format}]
Here we specify the format used by the Parser Tools to serialize
Parsing Expression Grammars as immutable values for transport,
comparison, etc.
[para]
We distinguish between [term regular] and [term canonical]
serializations.
While a PEG may have more than one regular serialization only exactly
one of them will be [term canonical].
[list_begin definitions][comment {-- serializations --}]
[def {regular serialization}]
[list_begin enumerated][comment {-- regular points --}]
[enum]
The serialization of any PEG is a nested Tcl dictionary.
[enum]
This dictionary holds a single key, [const pt::grammar::peg], and its
value. This value holds the contents of the grammar.
[enum]
The contents of the grammar are a Tcl dictionary holding the set of
nonterminal symbols and the starting expression. The relevant keys and
their values are
[list_begin definitions][comment {-- grammar keywords --}]
[def [const rules]]
The value is a Tcl dictionary whose keys are the names of the
nonterminal symbols known to the grammar.
[list_begin enumerated][comment {-- nonterminals --}]
[enum]
Each nonterminal symbol may occur only once.
[enum]
The empty string is not a legal nonterminal symbol.
[enum]
The value for each symbol is a Tcl dictionary itself. The relevant
keys and their values in this dictionary are
[list_begin definitions][comment {-- nonterminal keywords --}]
[def [const is]]
The value is the serialization of the parsing expression describing
the symbols sentennial structure, as specified in the section
[sectref {PE serialization format}].
[def [const mode]]
The value can be one of three values specifying how a parser should
handle the semantic value produced by the symbol.
[include ../modes.inc]
[list_end][comment {-- nonterminal keywords --}]
[list_end][comment {-- nonterminals --}]
[def [const start]]
The value is the serialization of the start parsing expression of the
grammar, as specified in the section [sectref {PE serialization format}].
[list_end][comment {-- grammar keywords --}]
[enum]
The terminal symbols of the grammar are specified implicitly as the
set of all terminal symbols used in the start expression and on the
RHS of the grammar rules.
[list_end][comment {-- regular points --}]
[def {canonical serialization}]
The canonical serialization of a grammar has the format as specified
in the previous item, and then additionally satisfies the constraints
below, which make it unique among all the possible serializations of
this grammar.
[list_begin enumerated][comment {-- canonical points --}]
[enum]
The keys found in all the nested Tcl dictionaries are sorted in
ascending dictionary order, as generated by Tcl's builtin command
[cmd {lsort -increasing -dict}].
[enum]
The string representation of the value is the canonical representation
of a Tcl dictionary. I.e. it does not contain superfluous whitespace.
[list_end][comment {-- canonical points --}]
[list_end][comment {-- serializations --}]
[subsection Example]
Assuming the following PEG for simple mathematical expressions
[para]
[include ../example/expr_peg.inc]
[para]
then its canonical serialization (except for whitespace) is
[para]
[include ../example/expr_serial.inc]
[para]
|