1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301
|
Developing your own parsers
===========================
Sybil :term:`parsers <Parser>` are callables that take a
:term:`document` and yield a sequence of :term:`regions <region>`. A :term:`region` contains
the character position of the :attr:`~sybil.Region.start` and :attr:`~sybil.Region.end`
of the example in the document's
:attr:`~sybil.Document.text`, along with a :attr:`~sybil.Region.parsed` version of the
example and a callable :attr:`~sybil.Region.evaluator`.
Parsers are free to access any documented attribute of the :class:`~sybil.Document` although
will most likely only need to work with :attr:`~sybil.Document.text`.
The :attr:`~sybil.Document.namespace` attribute should **not** be modified.
The :attr:`~sybil.Region.parsed` version can take any form and only needs to be understood by the
:attr:`~sybil.Region.evaluator`.
That :term:`evaluator` will be called with an :term:`example` constructed from the
:term:`document` and the :term:`region` and should return a :ref:`false value <truth>`
if the example is as expected. Otherwise, it should
either raise an exception or return a textual description in the
event of the example not being as expected. Evaluators may also
modify the document's :attr:`~sybil.Document.namespace`
or :any:`push <sybil.Document.push_evaluator>` and
:any:`pop <sybil.Document.pop_evaluator>` evaluators.
:class:`~sybil.Example` instances are used to wrap up
all the attributes you're likely to need when writing an evaluator and all
documented attributes are fine to use. In particular,
:attr:`~sybil.Example.parsed` is the parsed value provided by the parser
when instantiating the :class:`~sybil.Region` and
:attr:`~sybil.Example.namespace` is a reference to the document's
namespace. Evaluators **are** free to modify the
:attr:`~sybil.Document.namespace` if they need to.
If you need to write your own parser, you should consult the :doc:`api` so see if suitable
:term:`lexers <Lexer>` already exist for the source language containing your examples.
Worked example
~~~~~~~~~~~~~~
As an example, let's look at a parser suitable for evaluating bash commands
in a subprocess and checking the output is as expected::
.. code-block:: bash
$ echo hi there
hi there
.. -> bash_document_text
Since this is a ReStructured Text code block, the simplest thing we could do would be to use
the existing support for :ref:`other languages <codeblock-other>`:
.. code-block:: python
from subprocess import check_output
from sybil import Sybil
from sybil.parsers.rest import CodeBlockParser
def evaluate_bash_block(example):
command, expected = example.parsed.strip().split('\n')
assert command.startswith('$ ')
command = command[2:].split()
actual = check_output(command).strip().decode('ascii')
assert actual == expected, repr(actual) + ' != ' + repr(expected)
bash_parser = CodeBlockParser(language='bash', evaluator=evaluate_bash_block)
sybil = Sybil(parsers=[bash_parser], pattern='*.rst')
.. invisible-code-block: python
from sybil.testing import check_sybil
check_sybil(sybil, bash_document_text)
Another alternative would be to start with the
:class:`lexer for ReST directives <sybil.parsers.rest.lexers.DirectiveLexer>`.
Here, the parsed version consists of a tuple of the command to run and the expected output:
.. code-block:: python
from subprocess import check_output
from typing import Iterable
from sybil import Sybil, Document, Region, Example
from sybil.parsers.rest.lexers import DirectiveLexer
from subprocess import check_output
def evaluate_bash_block(example: Example):
command, expected = example.parsed
actual = check_output(command).strip().decode('ascii')
assert actual == expected, repr(actual) + ' != ' + repr(expected)
def parse_bash_blocks(document: Document) -> Iterable[Region]:
lexer = DirectiveLexer(directive='code-block', arguments='bash')
for lexed in lexer(document):
command, output = lexed.lexemes['source'].strip().split('\n')
assert command.startswith('$ ')
parsed = command[2:].split(), output
yield Region(lexed.start, lexed.end, parsed, evaluate_bash_block)
sybil = Sybil(parsers=[parse_bash_blocks], pattern='*.rst')
.. invisible-code-block: python
from sybil.testing import check_sybil
check_sybil(sybil, bash_document_text)
.. _parser-from-scratch:
Finally, the parser could be implemented from scratch, with the parsed version again consisting of
a tuple of the command to run and the expected output:
.. code-block:: python
from subprocess import check_output
import re, textwrap
from sybil import Sybil, Region
from sybil.parsers.abstract.lexers import BlockLexer
BASHBLOCK_START = re.compile(r'^\.\.\s*code-block::\s*bash')
BASHBLOCK_END = r'(\n\Z|\n(?=\S))'
def evaluate_bash_block(example):
command, expected = example.parsed
actual = check_output(command).strip().decode('ascii')
assert actual == expected, repr(actual) + ' != ' + repr(expected)
def parse_bash_blocks(document):
lexer = BlockLexer(BASHBLOCK_START, BASHBLOCK_END)
for region in lexer(document):
command, output = region.lexemes['source'].strip().split('\n')
assert command.startswith('$ ')
region.parsed = command[2:].split(), output
region.evaluator = evaluate_bash_block
yield region
sybil = Sybil(parsers=[parse_bash_blocks], pattern='*.rst')
.. invisible-code-block: python
from sybil.testing import check_sybil
check_sybil(sybil, bash_document_text)
Of course, you should also write tests for your parser, showing it both succeeding and failing.
Here are examples for the Bash parser implementation at the start of this section, making use
of :func:`~sybil.testing.check_parser` to check a single example in a string against the supplied
:data:`~sybil.typing.Parser`:
.. code-block:: python
from sybil.testing import check_parser
from testfixtures import ShouldAssert
def test_bash_success() -> None:
check_parser(
bash_parser,
text="""
.. code-block:: bash
$ echo hi there
hi there
""",
)
def test_bash_failure() -> None:
with ShouldAssert("'this is wrong' != 'hi there'"):
check_parser(
bash_parser,
text="""
.. code-block:: bash
$ echo this is wrong
hi there
""",
)
.. invisible-code-block: python
test_bash_success()
test_bash_failure()
Developing with Lexers
~~~~~~~~~~~~~~~~~~~~~~
Sybil has a fairly rich selection of :term:`parsers <Parser>` and :term:`lexers <Lexer>` such that
even if your source format isn't directly supported, you may not have too much work to do in order
to support it.
Take `Docusaurus code blocks`__, which add parameters to Markdown fenced code blocks. Suppose we
want to implement a parser which will execute Python code blocks in this format:
.. code-block:: markdown
```python title="hello.py"
print("hello")
```
__ https://docusaurus.io/docs/markdown-features/code-blocks
Firstly, let's implement a lexer that understands this extension to the markdown format:
.. code-block:: python
from sybil.parsers.markdown.lexers import RawFencedCodeBlockLexer
class DocusaurusCodeBlockLexer(RawFencedCodeBlockLexer):
def __init__(self) -> None:
super().__init__(
info_pattern=re.compile(
r'^(?P<language>\w+)(?:\s+(?P<params>.+))?$\n', re.MULTILINE
),
)
def __call__(self, document: Document) -> Iterable[Region]:
for lexed in super().__call__(document):
lexemes = lexed.lexemes
raw_params = lexemes.pop('params', None)
params = lexemes['params'] = {}
if raw_params:
for match in re.finditer(r'(?P<key>\w+)="(?P<value>[^"]*)"', raw_params):
params[match.group('key')] = match.group('value')
yield lexed
We can write a unit test that verifies this lexer works as follows:
.. code-block:: python
from sybil import Region
from sybil.testing import check_lexer
def test_docusaurus_lexing() -> None:
regions = check_lexer(
lexer=DocusaurusCodeBlockLexer(),
source_text="""
```jsx title="/src/components/HelloCodeTitle.js"
function HelloCodeTitle(props) {
return <h1>Hello, {props.name}</h1>;
}
```
""",
expected_text=(
' ```jsx title="/src/components/HelloCodeTitle.js"\n'
' function HelloCodeTitle(props) {\n'
' return <h1>Hello, {props.name}</h1>;\n'
' }\n ```'
),
expected_lexemes={
'language': 'jsx',
'params': {'title': '/src/components/HelloCodeTitle.js'},
'source': (
'function HelloCodeTitle(props) {\n'
' return <h1>Hello, {props.name}</h1>;\n}'
'\n'
),
}
)
.. invisible-code-block: python
test_docusaurus_lexing()
Once we're confident that the lexer is working as required, we can use it with the existing
:class:`~sybil.parsers.abstract.codeblock.AbstractCodeBlockParser` as follows:
.. code-block:: python
from sybil.evaluators.python import PythonEvaluator
from sybil.parsers.abstract.codeblock import AbstractCodeBlockParser
class DocusaurusCodeBlockParser(AbstractCodeBlockParser):
def __init__(self) -> None:
super().__init__(
lexers=[DocusaurusCodeBlockLexer()],
language='python',
evaluator=PythonEvaluator(),
language_lexeme_name = 'language',
)
This can then be tested as follows:
.. code-block:: python
from sybil.testing import check_parser
def test_docusaurus_parsing() -> None:
document = check_parser(
DocusaurusCodeBlockParser(),
text="""
```python title="hello.py"
x = 1
```
""",
)
assert document.namespace['x'] == 1
.. invisible-code-block: python
test_docusaurus_parsing()
|