1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316
|
.. Copyright (c) 2017-2026 Juancarlo AƱez (apalala@gmail.com)
.. SPDX-License-Identifier: BSD-4-Clause
.. include:: links.rst
Models
------
Building Models
~~~~~~~~~~~~~~~
Naming elements in grammar rules makes the parser discard uninteresting
other parts of the input from the output, like punctuation. With naming
|TatSu| produces an *Abstract Syntax Tree* (`AST`_) that reflects the
semantic structure of what was parsed. But an `AST`_ doesn't carry information about the rule that generated it, so navigating the trees
may be difficult.
|TatSu| defines the ``tatsu.semantics.ModelBuilderSemantics`` semantics
class which helps construct object models from abstract syntax trees:
.. code:: python
from tatsu.semantics import ModelBuilderSemantics
parser = MyParser(semantics=ModelBuilderSemantics())
Then you add the desired node type as first parameter to each grammar
rule:
Builder semantics are enabled by passing `asmodel=True` to the
``tatsu.compile()`` or ``tatsu.parse()`` functions.
.. code:: ocaml
addition::AddOperator = left:mulexpre '+' right:addition ;
``ModelBuilderSemantics`` will synthesize a ``class AddOperator(Node):``
class and use it to construct the node. The synthesized class will have
one attribute with the same name as each of the named elements in the rule.
You can also use `Python`_'s built-in types as node types, and
``ModelBuilderSemantics`` will do the right thing:
.. code:: ocaml
integer::int = /[0-9]+/ ;
``ModelBuilderSemantics`` acts as any other semantics class, so its
default behavior can be overridden by defining a method to handle the
result of any particular grammar rule.
Generating Models
~~~~~~~~~~~~~~~~~
To see what the classes for the grammar look like the ``tatsu`` command-line
tool will generate a module definition with the required classes:
.. code:: bash
$ tatsu --object-model mygrammar.tatsu
You can capture the output, or specify the module filename with the
``--object-model-outfile`` option to ``tatsu``.
.. code:: bash
$ tatsu --object-model-outfile mymodel.py mygrammar.tatsu
|TatSu| will generate a ``mymodel.MyModelBuilderSemantics`` that can be
passed as semantics to the ``parse()`` function to make it generate objects
from the model according to rule declarations:
.. code:: python
model = tatsu.parse(
mygrammar_str,
text,
semantics=mymodel.MyModelBuilderSemantics(),
)
Defining Custom Models
~~~~~~~~~~~~~~~~~~~~~~
|TatSu| allows any definition of model classes:
.. code:: python
class Expression:
...
class Addition(Expression):
...
There's loss of functionality if model classes are not subclasses of
``objectmodel.Node`` (no ``node.children()``, ``node.parseinfo``,
``node.parent``, ``...``). For complete functionality it's better if custom
model classes inherit from ``objectmodel.Node`` and are defined as
``@tatsudataclass`` so they are configured the |TatSu| way:
.. code:: python
from dataclasses import dataclass
from tatsu.objectmodel import Node, tatsudataclass
@tatsudataclass
class Expression(Node):
...
@tatsudataclass
class Addition(Expression):
...
Once the custom model classes are defined, |TatSu|'s entry points need to
know about them, and there are flexible ways to do that:
.. code:: python
from . import model
ct = {
'Expression': model.Expression,
'Addition': model.Addition,
}
result = tatsu.parse(grammar_str, text, constructors=ct)
.. code:: python
from tatsu.builder import types_defined_in
ct = types_defined_in(globals())
result = tatsu.parse(grammar_str, text, constructors=ct)
.. code:: python
from tatsu.builder import types_defined_in
from . import model
ct = types_defined_in(model)
result = tatsu.parse(grammar_str, text, constructors=ct)
.. code:: python
from . import model
result = tatsu.parse(grammar_str, text, typedefs=model)
.. code:: python
from . import model
grammar_model = tatsu.compile(gramar_str, typedefs=model)
result = grammar_model.parse(text)
Passing ``constructors=`` or ``typedefs=`` to the |TatSu| API implies that
a model instead of an AST_ is being requested (``asmodel=True``).
To know what ``@tatsudataclass`` means, you can take a look at
``objectmodel.TatSuDataclassParams`` for the used ``dataclass`` parameters.
Viewing Models as JSON
~~~~~~~~~~~~~~~~~~~~~~
Models generated by |TatSu| can be viewed by converting them to a
JSON-compatible structure with the help of ``tatsu.util.asjson()``.
The protocol tries to provide the best representation for common types,
and can handle any type using ``repr()``. Back references are handled to
prevent infinite recursion.
.. code:: python
import json
print(json.dumps(asjson(model), indent=2))
The ``model``, with richer semantics, remains unaltered.
Conversion to a JSON-compatible structure relies on the protocol defined by
``tatsu.utils.asjson.AsJSONMixin``. The mixin defines a ``__json__()``
method that allows classes to define their best translation.
You can use ``AsJSONMixin`` as a base class in your own models to take advantage
of ``asjson()``, and you can specialize the conversion by overriding ``AsJSONMixin.__json__()``.
.. code:: python
def __json__(self, seen: set[int] | None = None) -> Any:
return None # should not be rendered as JSON
The ``AsJSONMixin`` implementation of ``__json__` decides what goes into
the JSON representation by calling the ``__pub__()`` method. The default
implementation of ``__pub__()`` returns the contents of ``vars(self)``
filtering out ``(name, value)`` items when:
* ``name`` starts with an underscore
* ``value`` is a method that is not also a ``property``
An easy way to restrict what goes into the JSON output is to override
the ``__pub__()`` method in classes that inherit from ``AsJSONMixin``.
.. code:: python
def __pub__(self) -> dict[str, Any]:
return {
name: value for name, value in super().__pub__()
if not name[0].isupper()
}
You can also write your own version of ``asjson()`` to handle special cases that are recurrent
in your context.
Walking Models
~~~~~~~~~~~~~~
The class ``tatsu.walkers.NodeWalker`` allows for the easy traversal
(*walk*) a model constructed with a ``ModelBuilderSemantics`` instance:
.. code:: python
from tatsu.walkers import NodeWalker
class MyNodeWalker(NodeWalker):
def walk_AddOperator(self, node):
left = self.walk(node.left)
right = self.walk(node.right)
print('ADDED', left, right)
model = MyParser(semantics=ModelBuilderSemantics()).parse(input)
walker = MyNodeWalker()
walker.walk(model)
When a method with a name like ``walk_AddOperator()`` is defined, it
will be called when a node of that type is *walked*. The *pythonic*
version of the class name may also be used for the *walk* method:
``walk__add_operator()`` (note the double underscore).
If a *walk* method for a node class is not found, then a method for the
class's bases is searched. That makes is possible to write *catch-all*
methods such as:
.. code:: python
def walk_Node(self, node):
print('Reached Node', node)
def walk_str(self, s):
return s
def walk_object(self, o):
raise Exception(f'Unexpected type {type(o).__name__} walked')
Which nodes get *walked* is up to the ``NodeWalker`` implementation. Some
strategies for walking *all* or *most* nodes are implemented as classes
in ``tatsu.walkers``, such as ``PreOrderWalker`` and ``DepthFirstWalker``.
Sometimes nodes must be walked more than once for the purpose at hand, and it's
up to the walker how and when to do that.
Take a look at ``tatsu.ngcodegen.PythonParserGenerator`` for the walker that
generates a parser in Python from the model of a parsed grammar.
Model Class Hierarchies
~~~~~~~~~~~~~~~~~~~~~~~
It's possible to specify a base class for generated model nodes:
.. code:: ocaml
additive
=
| addition
| substraction
;
addition::AddOperator::Operator
=
left:mulexpre op:'+' right:additive
;
substraction::SubstractOperator::Operator
=
left:mulexpre op:'-' right:additive
;
|TatSu| will generate the base class if it's not already known.
Base classes can be used as the target class in *walkers*, and in *code
generators*:
.. code:: python
class MyNodeWalker(NodeWalker):
def walk_Operator(self, node):
left = self.walk(node.left)
right = self.walk(node.right)
op = self.walk(node.op)
print(type(node).__name__, op, left, right)
|