1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
|
.. include:: links.rst
.. _mini-tutorial: mini-tutorial.rst
.. _pegen: https://github.com/we-like-parsers/pegen
.. _PEG parser: https://peps.python.org/pep-0617/
Translation
-----------
Translation is one of the most common tasks in language processing.
Analysis often sumarizes the parsed input, and *walkers* are good for that.
|TatSu| doesn't impose a way to create translators, but it
exposes the facilities it uses to generate the `Python`_ source code for
parsers.
Print Translation
~~~~~~~~~~~~~~~~~
Translation in |TatSu| is based on subclasses of ``NodeWalker``. Print-based translation
relies on classes that inherit from ``IndentPrintMixin``, a strategy copied from
the new PEG_ parser in Python_ (see `PEP 617`_).
``IndentPrintMixin`` provides an ``indent()`` method, which is a context manager,
and should be used thus:
.. code:: python
class MyTranslationWalker(NodeWalker, IndentPrintMixin):
def walk_SomeNodeType(self, node: NodeType):
self.print('some preamble')
with self.indent():
# continue walking the tree
self.print('something else')
The ``self.print()`` method takes note of the current level of indentation, so
output will be indented by the `indent` passed to
the ``IndentPrintMixin`` constructor, or to the ``indent(amount: int)`` method.
The mixin keeps as stack of the indent ammounts so it can go back to where it
was after each ``with indent(amount=n):`` statement:
.. code:: python
def walk_SomeNodeType(self, node: NodeType):
with self.indent(amount=2):
self.print(node.exp)
The printed code can be retrieved using the ``printed_text()`` method, but other
posibilities are available by assigning a stream-like object to
``self.output_stream`` in the ``__init__()`` method.
A good example of how to do code generation with a ``NodeWalker`` and ``IndentPrintMixin``
is |TatSu|'s own code generator, which can be
found in ``tatsu/ngcodegen/python.py``, or the model
generation found in ``tatsu/ngcodegen/objectomdel.py``.
.. _PEP 617: https://peps.python.org/pep-0617/
Declarative Translation (deprecated)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|TatSu| provides support for template-based code generation ("translation", see below)
in the ``tatsu.codegen`` module.
Code generation works by defining a translation class for each class in the model specified by the grammar.
Nowadays the preferred code generation strategy is to walk down the AST_ and `print()` the desired output,
with the help of the ``NodWalker`` class, and the ``IndentPrintMixin`` mixin. That's the strategy used
by pegen_, the precursor to the new `PEG parser`_ in Python_. Please take a lookt at the
`mini-tutorial`_ for an example.
Basically, the code generation strategy changed from declarative with library support, to procedural,
breadth or depth first, using only standard Python_. The procedural code must know the AST_ structure
to navigate it, although other strategies are available with ``PreOrderWalker``, ``DepthFirstWalker``,
and ``ContextWalker``.
|TatSu| doesn't impose a way to create translators with it, but it
exposes the facilities it uses to generate the `Python`_ source code for
parsers.
Translation in |TatSu| was *template-based*, but instead of defining or
using a complex templating engine (yet another language), it relies on
the simple but powerful ``string.Formatter`` of the `Python`_ standard
library. The templates are simple strings that, in |TatSu|'s style,
are inlined with the code.
To generate a parser, |TatSu| constructs an object model of the parsed
grammar. A ``tatsu.codegen.CodeGenerator`` instance matches model
objects to classes that descend from ``tatsu.codegen.ModelRenderer`` and
implement the translation and rendering using string templates.
Templates are left-trimmed on whitespace, like `Python`_ *doc-comments*
are. This is an example taken from |TatSu|'s source code:
.. code:: python
class Lookahead(ModelRenderer):
template = '''\
with self._if():
{exp:1::}\
'''
Every *attribute* of the object that doesn't start with an underscore
(``_``) may be used as a template field, and fields can be added or
modified by overriding the ``render_fields(fields)`` method. Fields
themselves are *lazily rendered* before being expanded by the template,
so a field may be an instance of a ``ModelRenderer`` descendant.
The ``rendering`` module defines a ``Formatter`` enhanced to support the
rendering of items in an *iterable* one by one. The syntax to achieve
that is:
.. code:: python
'''
{fieldname:ind:sep:fmt}
'''
All of ``ind``, ``sep``, and ``fmt`` are optional, but the three
*colons* are not. A field specified that way will be rendered using:
.. code:: python
indent(sep.join(fmt % render(v) for v in value), ind)
The extended format can also be used with non-iterables, in which case
the rendering will be:
.. code:: python
indent(fmt % render(value), ind)
The default multiplier for ``ind`` is ``4``, but that can be overridden
using ``n*m`` (for example ``3*1``) in the format.
**note**
Using a newline character (``\n``) as separator will interfere with
left trimming and indentation of templates. To use a newline as
separator, specify it as ``\\n``, and the renderer will understand
the intention.
|