File: parserconfig.rst

package info (click to toggle)
python-tatsu 5.15.1%2Bds-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 904 kB
  • sloc: python: 10,128; makefile: 54
file content (362 lines) | stat: -rw-r--r-- 7,766 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
.. include:: links.rst
.. highlight:: none

Parser Configuration
--------------------

|TatSu| has many configuration options. They are all defined in
``tatsu.parserconfig.ParserConfig``. With the introduction of ``ParserConfig``
there's no need to declare every configuration parameter as an optional named
argument in entry points and internal methods. The defaults set in
``ParserConfig`` are suitable for most cases, and they are easy to override.

.. code:: python

    @dataclass
    class ParserConfig:
        name: str | None = 'Test'
        filename: str = ''
        encoding: str = 'utf-8'

        start: str | None = None  # FIXME
        start_rule: str | None = None  # FIXME
        rule_name: str | None = None  # Backward compatibility

        comments_re: re.Pattern | str | None = None  # WARNING: deprecated
        eol_comments_re: re.Pattern | str | None = None  # WARNING: deprecated

        tokenizercls: type[Tokenizer] | None = None  # FIXME
        semantics: type | None = None

        comment_recovery: bool = False   # warning: not implemented

        memoization: bool = True
        memoize_lookaheads: bool = True
        memo_cache_size: int = MEMO_CACHE_SIZE

        colorize: bool = True  # INFO: requires the colorama library
        trace: bool = False
        trace_filename: bool = False
        trace_length: int = 72
        trace_separator: str = C_DERIVE

        grammar: str | None = None
        left_recursion: bool = True

        comments: str | None = None
        eol_comments: str | None = None
        keywords: Collection[str] = field(default_factory=set)

        ignorecase: bool | None = False
        namechars: str = ''
        nameguard: bool | None = None  # implied by namechars
        whitespace: str | None = _undefined_str

        parseinfo: bool = False

Entry points and internal methods in |TatSu| have an optional
``config: ParserConfig | None = None`` argument.

.. code:: Python

    def parse(
        grammar,
        input,
        start=None,
        name=None,
        semantics=None,
        asmodel=False,
        config: ParserConfig | None = None,
        **settings,
    ):

If no ``ParserConfig`` is passed, a default one is created. Configuration
attributes may be overridden by relevant arguments in ``**settings``.

These are several ways to apply a configuration setting:

.. code:: Python

    config = tatsu.parserconfig.ParserConfig()
    config.left_recursion = False
    ast = tatsu.parse(grammar, text, config=config)

    config = tatsu.parserconfig.ParserConfig(left_recursion=False)
    ast = tatsu.parse(grammar, text, config=config)

    ast = tatsu.parse(grammar, text, left_recursion=False)


name
~~~~
.. code:: Python

    name: str | None = 'Test'

The name of the grammar. It's used in generated Python parsers and may be
used in error reporting.


filename
~~~~~~~~

.. code:: Python

    filename: str = ''

The file name from which the grammar was read. It may be used in error reporting.


encoding
~~~~~~~~

.. code:: Python

    encoding: str = 'utf-8'

The encoding for any text input or output performed by the librarry


start
~~~~~

.. code:: Python

    start: str | None = None  # FIXME

The name of the rule on which to start parsing. It may be used to invoke
only a specific part of the parser.

.. code:: Python

    ast = parse(grammar, '(2+2)*2', start='expression')


tokenizercls
~~~~~~~~~~~~

.. code:: Python

    tokenizercls: type[Tokenizer] | None = None  # FIXME

The class that implements tokenization for the parser. If it's not defined
then the parsing modules will default to ``buffering.Buffer``.

This option was applied in the prototype PEG parser for Python as to use
the native Python tokenizer.


semantics
~~~~~~~~~

.. code:: Python

    semantics: type | None = None

The class implementing parser semantics. See other sections of the
documentation for meaning, implementation and default and generated
semantic classes and objects.

memoization
~~~~~~~~~~~

.. code:: Python

    memoization: bool = True

Enable or disable memoization in the parser. Only very specific input languages
require this to be ``False``.


memoize_lookaheads
~~~~~~~~~~~~~~~~~~

.. code:: Python

    memoize_lookaheads: bool = True

Enables or disables memoization for lookaheads. Only very specific input languages
require this to be ``False``.

memo_cache_size
~~~~~~~~~~~~~~~

.. code:: Python

    memo_cache_size: int = MEMO_CACHE_SIZE

The size of the cache for memos. As parsing progresses, previous memos
are rarely needed, so there's a bound to the number of memos saved
(currently 1024).

colorize
~~~~~~~~

.. code:: Python

    colorize: bool = True

Colorize trace output. Colorization requires that the ``colorama`` library
is available.

trace
~~~~~

.. code:: Python

    trace: bool = False

Produce a trace of the parsing process. See the `Traces <traces.html>`_
section for more information.


trace_filename
~~~~~~~~~~~~~~

.. code:: Python

    trace_filename: bool = False

Include the input textt's filename in trace output.

trace_length
~~~~~~~~~~~~

.. code:: Python

    trace_length: int = 72

The maximum width of a line in a trace.

trace_separator
~~~~~~~~~~~~~~~

.. code:: Python

    trace_separator: str = C_DERIVE

The separator to usee between lines in a trace.

grammar
~~~~~~~

.. code:: Python

    grammar: str | None = None

An alias for the `name <#name>`_ option.

left_recursion
~~~~~~~~~~~~~~

.. code:: Python

    left_recursion: bool = True

Enable or disable left recursion in analysis and parsing.

comments
~~~~~~~~

.. code:: Python

    comments: str | None = None

A regular expression describing comments in the input. Comments are skipped
during parsing.

eol_comments
~~~~~~~~~~~~

.. code:: Python

    eol_comments: str | None = None

A regular expression describing end-of-line comments in the input.
Comments are skipped during parsing.


keywords
~~~~~~~~

.. code:: Python

    keywords: Collection[str] = field(default_factory=set)

The list of keywords in the input language. See
`Reserved Words and Keywords <syntax.html#reserved-words-and-keywords>`_
for more information.

ignorecase
~~~~~~~~~~

.. code:: Python

    ignorecase: bool | None = False

namechars
~~~~~~~~~

.. code:: Python

    namechars: str = ''

Additional characters that can be part of an identifier
(for example ``namechars='$@'``').

nameguard
~~~~~~~~~

.. code:: Python

    nameguard: bool = False  # implied by namechars

When set to ``True``, avoids matching tokens when the next character in the input sequence is
alphanumeric or a ``@@namechar``. Defaults to ``False``.
See `token expression <syntax.html#text-or-text>`_ for an explanation.

whitespace
~~~~~~~~~~

.. code:: Python

    whitespace: str | None = _undefined_str

Provides a regular expression for the whitespace to be ignored by the parser.
See the `@@whitespace <directives.html#whitespace-regexp>`_ section for more
information.


parseinfo
~~~~~~~~~

.. code:: Python

    parseinfo: bool = False

When ``parseinfo==True``, a ``parseinfo`` entry is added to `AST`_ nodes
that are dict-like. The entry provides information about what was parsed and
where. See `Abstract Syntax Trees <ast.html>`_ for more information.


.. code:: Python

    class ParseInfo(NamedTuple):
        tokenizer: Any
        rule: str
        pos: int
        endpos: int
        line: int
        endline: int
        alerts: list[Alert] = []  # noqa: RUF012

        def text_lines(self):
            return self.tokenizer.get_lines(self.line, self.endline)

        def line_index(self):
            return self.tokenizer.line_index(self.line, self.endline)

        @property
        def buffer(self):
            return self.tokenizer