File: parsers.rst

package info (click to toggle)
python-sybil 9.3.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,164 kB
  • sloc: python: 4,545; makefile: 90
file content (301 lines) | stat: -rw-r--r-- 10,697 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
Developing your own parsers
===========================

Sybil :term:`parsers <Parser>` are callables that take a
:term:`document` and yield a sequence of :term:`regions <region>`. A :term:`region` contains
the character position of the :attr:`~sybil.Region.start` and :attr:`~sybil.Region.end`
of the example in the document's
:attr:`~sybil.Document.text`, along with a :attr:`~sybil.Region.parsed` version of the
example and a callable :attr:`~sybil.Region.evaluator`.
Parsers are free to access any documented attribute of the :class:`~sybil.Document` although
will most likely only need to work with :attr:`~sybil.Document.text`.
The :attr:`~sybil.Document.namespace` attribute should **not** be modified.

The :attr:`~sybil.Region.parsed` version can take any form and only needs to be understood by the
:attr:`~sybil.Region.evaluator`.

That :term:`evaluator` will be called with an :term:`example` constructed from the
:term:`document` and the :term:`region` and should return a :ref:`false value <truth>`
if the example is as expected. Otherwise, it should
either raise an exception or return a textual description in the
event of the example not being as expected. Evaluators may also
modify the document's :attr:`~sybil.Document.namespace`
or :any:`push <sybil.Document.push_evaluator>` and
:any:`pop <sybil.Document.pop_evaluator>` evaluators.

:class:`~sybil.Example` instances are used to wrap up
all the attributes you're likely to need when writing an evaluator and all
documented attributes are fine to use. In particular,
:attr:`~sybil.Example.parsed` is the parsed value provided by the parser
when instantiating the :class:`~sybil.Region` and
:attr:`~sybil.Example.namespace` is a reference to the document's
namespace. Evaluators **are** free to modify the
:attr:`~sybil.Document.namespace` if they need to.

If you need to write your own parser, you should consult the :doc:`api` so see if suitable
:term:`lexers <Lexer>` already exist for the source language containing your examples.

Worked example
~~~~~~~~~~~~~~

As an example, let's look at a parser suitable for evaluating bash commands
in a subprocess and checking the output is as expected::

  .. code-block:: bash

     $ echo hi there
     hi there

.. -> bash_document_text

Since this is a ReStructured Text code block, the simplest thing we could do would be to use
the existing support for :ref:`other languages <codeblock-other>`:

.. code-block:: python

    from subprocess import check_output
    from sybil import Sybil
    from sybil.parsers.rest import CodeBlockParser

    def evaluate_bash_block(example):
        command, expected = example.parsed.strip().split('\n')
        assert command.startswith('$ ')
        command = command[2:].split()
        actual = check_output(command).strip().decode('ascii')
        assert actual == expected, repr(actual) + ' != ' + repr(expected)

    bash_parser = CodeBlockParser(language='bash', evaluator=evaluate_bash_block)

    sybil = Sybil(parsers=[bash_parser], pattern='*.rst')


.. invisible-code-block: python

  from sybil.testing import check_sybil
  check_sybil(sybil, bash_document_text)

Another alternative would be to start with the
:class:`lexer for ReST directives <sybil.parsers.rest.lexers.DirectiveLexer>`.
Here, the parsed version consists of a tuple of the command to run and the expected output:

.. code-block:: python

    from subprocess import check_output
    from typing import Iterable
    from sybil import Sybil, Document, Region, Example
    from sybil.parsers.rest.lexers import DirectiveLexer

    from subprocess import check_output

    def evaluate_bash_block(example: Example):
        command, expected = example.parsed
        actual = check_output(command).strip().decode('ascii')
        assert actual == expected, repr(actual) + ' != ' + repr(expected)

    def parse_bash_blocks(document: Document) -> Iterable[Region]:
        lexer = DirectiveLexer(directive='code-block', arguments='bash')
        for lexed in lexer(document):
            command, output = lexed.lexemes['source'].strip().split('\n')
            assert command.startswith('$ ')
            parsed = command[2:].split(), output
            yield Region(lexed.start, lexed.end, parsed, evaluate_bash_block)

    sybil = Sybil(parsers=[parse_bash_blocks], pattern='*.rst')

.. invisible-code-block: python

  from sybil.testing import check_sybil
  check_sybil(sybil, bash_document_text)

.. _parser-from-scratch:

Finally, the parser could be implemented from scratch, with the parsed version again consisting of
a tuple of the command to run and the expected output:

.. code-block:: python

    from subprocess import check_output
    import re, textwrap
    from sybil import Sybil, Region
    from sybil.parsers.abstract.lexers import BlockLexer

    BASHBLOCK_START = re.compile(r'^\.\.\s*code-block::\s*bash')
    BASHBLOCK_END = r'(\n\Z|\n(?=\S))'

    def evaluate_bash_block(example):
        command, expected = example.parsed
        actual = check_output(command).strip().decode('ascii')
        assert actual == expected, repr(actual) + ' != ' + repr(expected)

    def parse_bash_blocks(document):
        lexer = BlockLexer(BASHBLOCK_START, BASHBLOCK_END)
        for region in lexer(document):
            command, output = region.lexemes['source'].strip().split('\n')
            assert command.startswith('$ ')
            region.parsed = command[2:].split(), output
            region.evaluator = evaluate_bash_block
            yield region

    sybil = Sybil(parsers=[parse_bash_blocks], pattern='*.rst')

.. invisible-code-block: python

  from sybil.testing import check_sybil
  check_sybil(sybil, bash_document_text)

Of course, you should also write tests for your parser, showing it both succeeding and failing.
Here are examples for the Bash parser implementation at the start of this section, making use
of :func:`~sybil.testing.check_parser` to check a single example in a string against the supplied
:data:`~sybil.typing.Parser`:

.. code-block:: python

    from sybil.testing import check_parser
    from testfixtures import ShouldAssert

    def test_bash_success() -> None:
        check_parser(
            bash_parser,
            text="""
                .. code-block:: bash

                    $ echo hi there
                    hi there
            """,
        )

    def test_bash_failure() -> None:
        with ShouldAssert("'this is wrong' != 'hi there'"):
            check_parser(
                bash_parser,
                text="""
                    .. code-block:: bash

                        $ echo this is wrong
                        hi there
                """,
            )

.. invisible-code-block: python

  test_bash_success()
  test_bash_failure()

Developing with Lexers
~~~~~~~~~~~~~~~~~~~~~~

Sybil has a fairly rich selection of :term:`parsers <Parser>` and :term:`lexers <Lexer>` such that
even if your source format isn't directly supported, you may not have too much work to do in order
to support it.

Take `Docusaurus code blocks`__, which add parameters to Markdown fenced code blocks. Suppose we
want to implement a parser which will execute Python code blocks in this format:

.. code-block:: markdown

    ```python title="hello.py"
    print("hello")
    ```

__ https://docusaurus.io/docs/markdown-features/code-blocks

Firstly, let's implement a lexer that understands this extension to the markdown format:

.. code-block:: python

    from sybil.parsers.markdown.lexers import RawFencedCodeBlockLexer

    class DocusaurusCodeBlockLexer(RawFencedCodeBlockLexer):

        def __init__(self) -> None:
            super().__init__(
                info_pattern=re.compile(
                    r'^(?P<language>\w+)(?:\s+(?P<params>.+))?$\n', re.MULTILINE
                ),
            )

        def __call__(self, document: Document) -> Iterable[Region]:
            for lexed in super().__call__(document):
                lexemes = lexed.lexemes
                raw_params = lexemes.pop('params', None)
                params = lexemes['params'] = {}
                if raw_params:
                    for match in re.finditer(r'(?P<key>\w+)="(?P<value>[^"]*)"', raw_params):
                        params[match.group('key')] = match.group('value')
                yield lexed

We can write a unit test that verifies this lexer works as follows:

.. code-block:: python

    from sybil import Region
    from sybil.testing import check_lexer

    def test_docusaurus_lexing() -> None:
        regions = check_lexer(
            lexer=DocusaurusCodeBlockLexer(),
            source_text="""
                ```jsx title="/src/components/HelloCodeTitle.js"
                function HelloCodeTitle(props) {
                  return <h1>Hello, {props.name}</h1>;
                }
                ```
            """,
            expected_text=(
                '            ```jsx title="/src/components/HelloCodeTitle.js"\n'
                '            function HelloCodeTitle(props) {\n'
                '              return <h1>Hello, {props.name}</h1>;\n'
                '            }\n            ```'
            ),
            expected_lexemes={
                'language': 'jsx',
                'params': {'title': '/src/components/HelloCodeTitle.js'},
                'source': (
                    'function HelloCodeTitle(props) {\n'
                    '  return <h1>Hello, {props.name}</h1>;\n}'
                    '\n'
                ),
            }
        )

.. invisible-code-block: python

  test_docusaurus_lexing()

Once we're confident that the lexer is working as required, we can use it with the existing
:class:`~sybil.parsers.abstract.codeblock.AbstractCodeBlockParser` as follows:

.. code-block:: python

    from sybil.evaluators.python import PythonEvaluator
    from sybil.parsers.abstract.codeblock import AbstractCodeBlockParser

    class DocusaurusCodeBlockParser(AbstractCodeBlockParser):
        def __init__(self) -> None:
            super().__init__(
                lexers=[DocusaurusCodeBlockLexer()],
                language='python',
                evaluator=PythonEvaluator(),
                language_lexeme_name = 'language',
            )

This can then be tested as follows:

.. code-block:: python

    from sybil.testing import check_parser

    def test_docusaurus_parsing() -> None:
        document = check_parser(
            DocusaurusCodeBlockParser(),
            text="""
                ```python title="hello.py"
                x = 1
                ```
            """,
        )
        assert document.namespace['x'] == 1

.. invisible-code-block: python

  test_docusaurus_parsing()