1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
|
# Semantic analysis - Visitors
This section explains how to transform parse tree to a more usable structure.
---
You will surely always want to extract some information from the parse tree or
to transform it in some more usable form. The process of parse tree
transformation to other forms is referred to as *semantic analysis*. You could
do that using parse tree navigation etc., but it is better to use some standard
mechanism.
In Arpeggio a visitor pattern is used for semantic analysis. You write a python
class that inherits `PTNodeVisitor` and has a methods of the form
`visit_<rule name>(self, node, children)` where rule name is a rule name from
the grammar.
class CalcVisitor(PTNodeVisitor):
def visit_number(self, node, children):
return float(node.value)
def visit_factor(self, node, children):
if len(children) == 1:
return children[0]
sign = -1 if children[0] == '-' else 1
return sign * children[-1]
...
During a semantic analysis a parse tree is walked in the depth-first manner and
for each node a proper visitor method is called to transform it to some other
form. The results are then fed to the parent node visitor method. This is
repeated until the final, top level parse tree node is processed (its visitor is
called). The result of the top level node is the final output of the semantic
analysis.
To run semantic analysis apply your visitor class to the parse tree using
`visit_parse_tree` function.
```python
result = visit_parse_tree(parse_tree, CalcVisitor(debug=True))
```
The first parameter is a parse tree you get from the `parser.parse` call while
the second parameter is an instance of your visitor class. Semantic analysis can
be run in debug mode if you set `debug` parameter to `True` during visitor
construction. You can use this flag to print your own debug information from
visitor methods.
class MyLanguageVisitor(PTNodeVisitor):
def visit_somerule(self, node, children):
if self.debug:
print("Visiting some rule!")
During semantic analysis, each `visitor_xxx` method gets current parse tree node
as the `node` parameter and the evaluated children nodes as the `children`
parameter.
For example, if you have `expression` rule in your grammar then the
transformation of the non-terminal matched by this rule can be done as:
def visitor_expression(self, node, children):
... # transform node using 'node' and 'children' parameter
return transformed_node
`node` is the current `NonTerminal` or `Terminal` from the parse tree while the
`children` is an instance of `SemanticActionResults` class. This class is a
list-like structure that holds the results of semantic evaluation from the
children parse tree nodes (analysis is done bottom-up).
To suppress node completely return `None` from visitor method. In this case
the parent visitor method will not get this node in its `children` parameter.
In the [calc.py
example](https://github.com/textX/Arpeggio/blob/master/examples/calc/calc.py)
a semantic analysis
([CalcVisitor](https://github.com/textX/Arpeggio/blob/master/examples/calc/calc.py#L31)
class) will evaluate the result of arithmetic expression. The parse tree is thus
transformed to a single numeric value that represent the result of the
expression.
In the [robot.py
example](https://github.com/textX/Arpeggio/tree/master/examples/robot) a
semantic analysis
([RobotVisitor](https://github.com/textX/Arpeggio/blob/master/examples/robot/robot.py#L36)
class) will evaluate robot program (transform its parse tree) to the final robot
location.
Semantic analysis can do a complex stuff. For example, see
[peg_peg.py](https://github.com/textX/Arpeggio/blob/master/examples/peg_peg/peg_peg.py)
and
[PEGVisitor](https://github.com/textX/Arpeggio/blob/master/arpeggio/peg.py#L53)
class where the PEG parser for the given language is built using semantic
analysis.
## SemanticActionResults
`SemanticActionResults` is the class of object returned from the parse tree nodes evaluation.
This class is used for filtering and navigation over evaluation results on children nodes.
Instance of this class is given as `children` parameter of `visitor_xxx`
methods. This class inherits `list` so index access as well as iteration is
available.
Furthermore, child nodes can be filtered by rule name using attribute access.
```python
def visit_bar(self, node, children):
# Index access
child = children[2]
# Iteration
for child in children:
...
# Returns a list of all rules created by PEG rule 'baz'
baz_created = children.baz
```
## Post-processing in second calls
Visitor may define method with the `second_<rule_name>` name form. If this
method exists it will be called after all parse tree node are processed and it
will be given the results of the `visit_<rule_name>` call.
This is usually used when some additional post-processing is needed (e.g.
reference resolving).
## Default actions
For each parse tree node that does not have an appropriate `visit_xxx` method a
default action is performed. If the node is created by a plain string match,
action will return `None` and thus suppress this node. This is handy for all
those syntax noise tokens (brackets, braces, keywords etc.).
For example, if your grammar is:
```python
number_in_brackets = "(" number ")"
number = r'\d+'
```
then the default action for `number` will return number node converted to
a string and the default action for `(` and `)` will return `None` and thus
suppress these nodes so the visitor method for `number_in_brackets` rule will
only see one child (from the `number` rule reference).
If the node is a non-terminal and there is only one child the default action
will return that child effectively passing it to the parent node visitor.
Default actions can be disabled by setting parameter `defaults` to `False` on
visitor construction.
```python
result = visit_parse_tree(parse_tree, CalcVisitor(defaults=False))
```
If you want to call this default behaviour from your visitor method, call
`visit__default__(node, children)` on superclass (`PTNodeVisitor`).
```python
def visitor_myrule(self, node, children):
if some_condition:
...
else:
return super(MyVisitor, self).visit__default__(node, children)
```
|