File: models.rst

package info (click to toggle)
python-tatsu 5.17.1%2Bds-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,516 kB
  • sloc: python: 13,185; makefile: 127
file content (316 lines) | stat: -rw-r--r-- 8,786 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
.. Copyright (c) 2017-2026 Juancarlo AƱez (apalala@gmail.com)
.. SPDX-License-Identifier: BSD-4-Clause

.. include:: links.rst


Models
------


Building Models
~~~~~~~~~~~~~~~

Naming elements in grammar rules makes the parser discard uninteresting
other parts of the input from the output, like punctuation. With naming
|TatSu| produces an *Abstract Syntax Tree* (`AST`_) that reflects the
semantic structure of what was parsed. But an `AST`_ doesn't carry information about the rule that generated it, so navigating the trees
may be difficult.

|TatSu| defines the ``tatsu.semantics.ModelBuilderSemantics`` semantics
class which helps construct object models from abstract syntax trees:

.. code:: python

    from tatsu.semantics import ModelBuilderSemantics

    parser = MyParser(semantics=ModelBuilderSemantics())

Then you add the desired node type as first parameter to each grammar
rule:

Builder semantics are enabled by passing `asmodel=True` to the
``tatsu.compile()`` or ``tatsu.parse()`` functions.

.. code:: ocaml

    addition::AddOperator = left:mulexpre '+' right:addition ;

``ModelBuilderSemantics`` will synthesize a ``class AddOperator(Node):``
class and use it to construct the node. The synthesized class will have
one attribute with the same name as each of the named elements in the rule.

You can also use `Python`_'s built-in types as node types, and
``ModelBuilderSemantics`` will do the right thing:


.. code:: ocaml

    integer::int = /[0-9]+/ ;

``ModelBuilderSemantics`` acts as any other semantics class, so its
default behavior can be overridden by defining a method to handle the
result of any particular grammar rule.


Generating Models
~~~~~~~~~~~~~~~~~

To see what the classes for the grammar look like the ``tatsu`` command-line
tool will generate a module definition with the required classes:

.. code:: bash

    $ tatsu --object-model mygrammar.tatsu


You can capture the output, or specify the module filename with the
``--object-model-outfile`` option to ``tatsu``.

.. code:: bash

    $ tatsu --object-model-outfile mymodel.py mygrammar.tatsu


|TatSu| will generate a ``mymodel.MyModelBuilderSemantics`` that can be
passed as semantics to the ``parse()`` function to make it generate objects
from the model according to rule declarations:

.. code:: python

    model = tatsu.parse(
        mygrammar_str,
        text,
        semantics=mymodel.MyModelBuilderSemantics(),
    )



Defining Custom Models
~~~~~~~~~~~~~~~~~~~~~~

|TatSu| allows any definition of model classes:

.. code:: python

    class Expression:
        ...

    class Addition(Expression):
        ...

There's loss of functionality if model classes are not subclasses of
``objectmodel.Node`` (no ``node.children()``, ``node.parseinfo``,
``node.parent``, ``...``). For complete functionality it's better if custom
model classes inherit from ``objectmodel.Node`` and are defined as
``@tatsudataclass`` so they are configured the |TatSu| way:

.. code:: python

    from dataclasses import dataclass
    from tatsu.objectmodel import Node, tatsudataclass

    @tatsudataclass
    class Expression(Node):
        ...

    @tatsudataclass
    class Addition(Expression):
        ...

Once the custom model classes are defined, |TatSu|'s entry points need to
know about them, and there are flexible ways to do that:


.. code:: python

    from . import model

    ct = {
        'Expression': model.Expression,
        'Addition': model.Addition,
    }
    result = tatsu.parse(grammar_str, text, constructors=ct)


.. code:: python

    from tatsu.builder import types_defined_in

    ct = types_defined_in(globals())
    result = tatsu.parse(grammar_str, text, constructors=ct)


.. code:: python

    from tatsu.builder import types_defined_in
    from . import model

    ct = types_defined_in(model)
    result = tatsu.parse(grammar_str, text, constructors=ct)


.. code:: python

    from . import model

    result = tatsu.parse(grammar_str, text, typedefs=model)


.. code:: python

    from . import model

    grammar_model = tatsu.compile(gramar_str, typedefs=model)
    result = grammar_model.parse(text)

Passing ``constructors=`` or ``typedefs=`` to the |TatSu| API implies that
a model instead of an AST_ is being requested (``asmodel=True``).

To know what ``@tatsudataclass`` means, you can take a look at
``objectmodel.TatSuDataclassParams`` for the used ``dataclass`` parameters.


Viewing Models as JSON
~~~~~~~~~~~~~~~~~~~~~~

Models generated by |TatSu| can be viewed by converting them to a
JSON-compatible structure with the help of ``tatsu.util.asjson()``.
The protocol tries to provide the best representation for common types,
and can handle any type using ``repr()``. Back references are handled to
prevent infinite recursion.

.. code:: python

    import json

    print(json.dumps(asjson(model), indent=2))

The ``model``, with richer semantics, remains unaltered.

Conversion to a JSON-compatible structure relies on the protocol defined by
``tatsu.utils.asjson.AsJSONMixin``.  The mixin defines a ``__json__()``
method that allows classes to define their best translation.

You can use ``AsJSONMixin`` as a base class in your own models to take advantage
of ``asjson()``, and you can specialize the conversion by overriding ``AsJSONMixin.__json__()``.

.. code:: python

    def __json__(self, seen: set[int] | None = None) -> Any:
        return None  # should not be rendered as JSON

The ``AsJSONMixin`` implementation of ``__json__` decides what goes into
the JSON representation by calling the ``__pub__()`` method. The default
implementation of ``__pub__()`` returns the contents of ``vars(self)``
filtering out ``(name, value)`` items when:

*   ``name`` starts with an underscore
*   ``value`` is a method that is not also a ``property``

An easy way to restrict what goes into the JSON output is to override
the ``__pub__()`` method in classes that inherit from ``AsJSONMixin``.


.. code:: python

    def __pub__(self) -> dict[str, Any]:
        return {
            name: value for name, value in super().__pub__()
            if not name[0].isupper()
        }

You can also write your own version of ``asjson()`` to handle special cases that are recurrent
in your context.

Walking Models
~~~~~~~~~~~~~~

The class ``tatsu.walkers.NodeWalker`` allows for the easy traversal
(*walk*) a model constructed with a ``ModelBuilderSemantics`` instance:

.. code:: python

    from tatsu.walkers import NodeWalker

    class MyNodeWalker(NodeWalker):

        def walk_AddOperator(self, node):
            left = self.walk(node.left)
            right = self.walk(node.right)

            print('ADDED', left, right)

    model = MyParser(semantics=ModelBuilderSemantics()).parse(input)

    walker = MyNodeWalker()
    walker.walk(model)

When a method with a name like ``walk_AddOperator()`` is defined, it
will be called when a node of that type is *walked*. The *pythonic*
version of the class name may also be used for the *walk* method:
``walk__add_operator()`` (note the double underscore).

If a *walk* method for a node class is not found, then a method for the
class's bases is searched. That makes is possible to write *catch-all*
methods such as:

.. code:: python

    def walk_Node(self, node):
        print('Reached Node', node)

    def walk_str(self, s):
        return s

    def walk_object(self, o):
        raise Exception(f'Unexpected type {type(o).__name__} walked')

Which nodes get *walked* is up to the ``NodeWalker`` implementation. Some
strategies for walking *all* or *most* nodes are implemented as classes
in ``tatsu.walkers``,  such as ``PreOrderWalker`` and ``DepthFirstWalker``.

Sometimes nodes must be walked more than once for the purpose at hand, and it's
up to the walker how and when to do that.

Take a look at ``tatsu.ngcodegen.PythonParserGenerator`` for the walker that
generates a parser in Python from the model of a parsed grammar.


Model Class Hierarchies
~~~~~~~~~~~~~~~~~~~~~~~

It's possible to specify a base class for generated model nodes:

.. code:: ocaml

    additive
        =
        | addition
        | substraction
        ;

    addition::AddOperator::Operator
        =
        left:mulexpre op:'+' right:additive
        ;

    substraction::SubstractOperator::Operator
        =
        left:mulexpre op:'-' right:additive
        ;

|TatSu| will generate the base class if it's not already known.

Base classes can be used as the target class in *walkers*, and in *code
generators*:

.. code:: python

    class MyNodeWalker(NodeWalker):
        def walk_Operator(self, node):
            left = self.walk(node.left)
            right = self.walk(node.right)
            op = self.walk(node.op)

            print(type(node).__name__, op, left, right)