File: translation.rst

package info (click to toggle)
python-tatsu 5.13.1%2Bds-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 892 kB
  • sloc: python: 10,202; makefile: 54
file content (148 lines) | stat: -rw-r--r-- 5,419 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
.. include:: links.rst

.. _mini-tutorial: mini-tutorial.rst

.. _pegen: https://github.com/we-like-parsers/pegen
.. _PEG parser: https://peps.python.org/pep-0617/

Translation
-----------

Translation is one of the most common tasks in language processing.
Analysis often sumarizes the parsed input, and *walkers* are good for that.


|TatSu| doesn't impose a way to create translators, but it
exposes the facilities it uses to generate the `Python`_ source code for
parsers.


Print Translation
~~~~~~~~~~~~~~~~~

Translation in |TatSu| is based on subclasses of ``NodeWalker``. Print-based translation
relies on classes that inherit from ``IndentPrintMixin``, a strategy copied from
the new PEG_ parser in Python_ (see `PEP 617`_).

``IndentPrintMixin`` provides an ``indent()`` method, which is a context manager,
and should be used thus:

.. code:: python

    class MyTranslationWalker(NodeWalker, IndentPrintMixin):

        def walk_SomeNodeType(self, node: NodeType):
            self.print('some preamble')
            with self.indent():
                # continue walking the tree
                self.print('something else')


The ``self.print()`` method takes note of the current level of indentation, so
output will be indented by the `indent` passed to
the ``IndentPrintMixin`` constructor, or to the ``indent(amount: int)`` method.
The mixin keeps as stack of the indent ammounts so it can go back to where it
was after each ``with indent(amount=n):`` statement:


.. code:: python

    def walk_SomeNodeType(self, node: NodeType):
        with self.indent(amount=2):
            self.print(node.exp)

The printed code can be retrieved using the ``printed_text()`` method, but other
posibilities are available by assigning a stream-like object to
``self.output_stream`` in the ``__init__()`` method.

A good example of how to do code generation with a ``NodeWalker`` and ``IndentPrintMixin``
is |TatSu|'s own code generator, which can be
found in ``tatsu/ngcodegen/python.py``, or the model
generation found in ``tatsu/ngcodegen/objectomdel.py``.


.. _PEP 617: https://peps.python.org/pep-0617/


Declarative Translation (deprecated)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


|TatSu| provides support for template-based code generation ("translation", see below)
in the ``tatsu.codegen`` module.
Code generation works by defining a translation class for each class in the model specified by the grammar.

Nowadays the preferred code generation strategy is to walk down the AST_ and `print()` the desired output,
with the help of the ``NodWalker`` class, and the ``IndentPrintMixin`` mixin. That's the strategy used
by pegen_, the precursor to the new `PEG parser`_ in Python_. Please take a lookt at the
`mini-tutorial`_ for an example.

Basically, the code generation strategy changed from declarative with library support, to procedural,
breadth or depth first, using only standard Python_. The procedural code must know the AST_ structure
to navigate it, although other strategies are available with ``PreOrderWalker``, ``DepthFirstWalker``,
and ``ContextWalker``.

|TatSu| doesn't impose a way to create translators with it, but it
exposes the facilities it uses to generate the `Python`_ source code for
parsers.

Translation in |TatSu| was *template-based*, but instead of defining or
using a complex templating engine (yet another language), it relies on
the simple but powerful ``string.Formatter`` of the `Python`_ standard
library. The templates are simple strings that, in |TatSu|'s style,
are inlined with the code.

To generate a parser, |TatSu| constructs an object model of the parsed
grammar. A ``tatsu.codegen.CodeGenerator`` instance matches model
objects to classes that descend from ``tatsu.codegen.ModelRenderer`` and
implement the translation and rendering using string templates.
Templates are left-trimmed on whitespace, like `Python`_ *doc-comments*
are. This is an example taken from |TatSu|'s source code:

.. code:: python

    class Lookahead(ModelRenderer):
        template = '''\
                    with self._if():
                    {exp:1::}\
                    '''

Every *attribute* of the object that doesn't start with an underscore
(``_``) may be used as a template field, and fields can be added or
modified by overriding the ``render_fields(fields)`` method. Fields
themselves are *lazily rendered* before being expanded by the template,
so a field may be an instance of a ``ModelRenderer`` descendant.

The ``rendering`` module defines a ``Formatter`` enhanced to support the
rendering of items in an *iterable* one by one. The syntax to achieve
that is:

.. code:: python

        '''
        {fieldname:ind:sep:fmt}
        '''

All of ``ind``, ``sep``, and ``fmt`` are optional, but the three
*colons* are not. A field specified that way will be rendered using:

.. code:: python

    indent(sep.join(fmt % render(v) for v in value), ind)

The extended format can also be used with non-iterables, in which case
the rendering will be:

.. code:: python

    indent(fmt % render(value), ind)

The default multiplier for ``ind`` is ``4``, but that can be overridden
using ``n*m`` (for example ``3*1``) in the format.

**note**
    Using a newline character (``\n``) as separator will interfere with
    left trimming and indentation of templates. To use a newline as
    separator, specify it as ``\\n``, and the renderer will understand
    the intention.