1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138
|
Normalized PE Grammar Tree
==========================
This file documents the tree generated by the transformational package
'pg::peg::normalize'. The input to the transformation is assumed to be
a 'Raw PE Grammar AS Tree', as generated by the PEG frontend, and
described in 'doc_grammar.txt'.
General information
-------------------
* The tree is implemented using 'struct::tree'.
* The tree is used as a higher data structures keeping the various
parts of a grammar together, which are: Start parsing expression,
definitions, and their parsing expressions. The tree nature of the
parsing expressions map especially nicely to this data structure.
Structure
---------
* The root node represents the overall grammar. It has one child node
for the start expression, and one child per definition of a
nonterminal symbol in the grammar. No other nodes are possible. The
order of the children is not specified and an implementation
detail. Attributes in the root provide quick access, and the nodes
can also be distinguished by the attributes they have and/or their
values.
* A definition node represents a single nonterminal definition from
the grammar. Most of the information describing the definition is
stored in attributes of the node. Sole exception is the parsing
expression associated with the defined nonterminal symbol. This is
represented by an expression subtree, the root of which is the
single child of the definition node.
* All other nodes represent a parsing expression with the operator
stored in the node and its arguments represented by the children of
the node. For operators allowing more than one argument the children
will be in the same order as specified in the grammar. I.e. the
first child represents the first argument to the operator, and so
on.
Attributes
----------
Name Type Details
---- ---- -------
name string Root only. The name of the grammar represented by the
tree.
---- ---- -------
start noderef Root only. Id of the tree node which is the root of
the start expression. A child of the root node. Does
not intersect with the set of definition nodes. Can be
empty, representing a grammar without start expression.
---- ---- -------
definitions Root only. Maps the names (strings) of nonterminal
dict symbols to the ids of the tree nodes (noderef) which
represents the definition of that symbol. The nodes
are all immediate children of the root node. None of
them can be the root of the start expression
however. The dictionary can be empty, representing a
grammar which has no nonterminal symbols.
---- ---- -------
undefined Root only. Maps the name (string) of a nonterminal
dict symbol which has no definition in the grammar to a
list containting the ids of the tree nodes (noderef)
which use the symbol despite that. I.e. if this value
is not empty the grammar is invalid and has 'holes'.
==== ==== =======
symbol string Root and definition nodes only. The name of the
nonterminal symbol whose definition the node is
representing. For root this is '<StartExpression>'.
It is defined for root so that some algorithms on
expressions can use it as a sentinel.
---- ---- -------
label string Definition nodes only. The name of the input grammar
level nonterminal symbol represented by the node. This
is normally identical to 'symbol', but can differ for
helper definitions introduced by transformations. For
such 'symbol' will refer to the generated name of the
symbol, and 'label' to the name of the symbol in the
grammar the helper belongs to.
---- ---- -------
mode enum Definition nodes only. Values in {value, discard,
leaf, match}. Specifies how the defined nonterminal
handles the generation of its semantic value during
matching.
---- ---- -------
users list Definition nodes only. A list containing the ids of
the tree nodes which reference this definition. These
nodes are always expression nodes, with operator
'n'. The list can be empty, representing a nonterminal
symbol which is defined, but not used anywhere in
grammar.
==== ==== =======
op enum All expression nodes. Values in {n, t, .., epsilon,
alpha, alnum, x, /, ?, *, +, !, &}. Specifies the
operator part of the expression represented by the
node.
---- ---- -------
char char Expression nodes with operator 't' (t-Nodes)
only. Value is the single character from the grammar
to match, as represented by Tcl. I.e. any quoting from
the input has been resolved.
---- ---- -------
begin char ..-Nodes only. Values are like 'char' above, the first
end char and last character in the range to match.
---- ---- -------
sym string n-Nodes only. The name of the nonterminal symbol to
match.
---- ---- -------
def noderef n-Nodes only. The id of the definition node for the
nonterminal symbol to match. Can be empty. In that
case the node repesents a try to match an undefined
nonterminal symbol. The value of 'sym' will be a key
in the dictionary of root->undefined, and the id of
this node an element in the list associated with that
key.
==== ==== =======
at*, to* See 'doc_grammar.txt' for the general definition.
All nodes except root.
Definition nodes: The span of input covered by the
definition.
Expression nodes: The span of input covered by the
expression.
The nodes for the operators
dot, alpha, alnum, epsilon
have no location information right now. Nodes based
on them may have only partial or no information as
well.
---- ---- -------
|