1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453
|
.. image:: https://img.shields.io/pypi/v/flexparser.svg
:target: https://pypi.python.org/pypi/flexparser
:alt: Latest Version
.. image:: https://img.shields.io/pypi/l/flexparser.svg
:target: https://pypi.python.org/pypi/flexparser
:alt: License
.. image:: https://img.shields.io/pypi/pyversions/flexparser.svg
:target: https://pypi.python.org/pypi/flexparser
:alt: Python Versions
.. image:: https://github.com/hgrecco/flexparser/workflows/CI/badge.svg
:target: https://github.com/hgrecco/flexparser/actions?query=workflow%3ACI
:alt: CI
.. image:: https://github.com/hgrecco/flexparser/workflows/Lint/badge.svg
:target: https://github.com/hgrecco/flexparser/actions?query=workflow%3ALint
:alt: LINTER
.. image:: https://coveralls.io/repos/github/hgrecco/flexparser/badge.svg?branch=main
:target: https://coveralls.io/github/hgrecco/flexparser?branch=main
:alt: Coverage
flexparser
==========
Why write another parser? I have asked myself the same question while
working on this project. It is clear that there are excellent parsers out
there but I wanted to experiment with another way of writing them.
The idea is quite simple. You write a class for every type of content
(called here ``ParsedStatement``) you need to parse. Each class should
have a ``from_string`` constructor. We used extensively the ``typing``
module to make the output structure easy to use and less error prone.
For example:
.. code-block:: python
from dataclasses import dataclass
import flexparser as fp
@dataclass(frozen=True)
class Assigment(fp.ParsedStatement):
"""Parses the following `this <- other`
"""
lhs: str
rhs: str
@classmethod
def from_string(cls, s):
lhs, rhs = s.split("<-")
return cls(lhs.strip(), rhs.strip())
(using a frozen dataclass is not necessary but it convenient. Being a
dataclass you get the init, str, repr, etc for free. Being frozen, sort
of immutable, makes them easier to reason around)
In certain cases you might want to signal the parser
that his class is not appropriate to parse the statement.
.. code-block:: python
@dataclass(frozen=True)
class Assigment(fp.ParsedStatement):
"""Parses the following `this <- other`
"""
lhs: str
rhs: str
@classmethod
def from_string(cls, s):
if "<-" not in s:
# This means: I do not know how to parse it
# try with another ParsedStatement class.
return None
lhs, rhs = s.split("<-")
return cls(lhs.strip(), rhs.strip())
You might also want to indicate that this is the right ``ParsedStatement``
but something is not right:
.. code-block:: python
@dataclass(frozen=True)
class InvalidIdentifier(fp.ParsingError):
value: str
@dataclass(frozen=True)
class Assigment(fp.ParsedStatement):
"""Parses the following `this <- other`
"""
lhs: str
rhs: str
@classmethod
def from_string(cls, s):
if "<-" not in s:
# This means: I do not know how to parse it
# try with another ParsedStatement class.
return None
lhs, rhs = (p.strip() for p in s.split("<-"))
if not str.isidentifier(lhs):
return InvalidIdentifier(lhs)
return cls(lhs, rhs)
Put this into ``source.txt``
.. code-block:: text
one <- other
2two <- new
three <- newvalue
one == three
and then run the following code:
.. code-block:: python
parsed = fp.parse("source.txt", Assigment)
for el in parsed.iter_statements():
print(repr(el))
will produce the following output:
.. code-block:: text
BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='37bc23cde7cad3ece96b7abf64906c84decc116de1e0486679eb6ca696f233a403f756e2e431063c82abed4f0e342294c2fe71af69111faea3765b78cb90c03f'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source1.txt'), mtime=1658550284.9419456)
Assigment(start_line=1, start_col=0, end_line=1, end_col=12, raw='one <- other', lhs='one', rhs='other')
InvalidIdentifier(start_line=2, start_col=0, end_line=2, end_col=11, raw='2two <- new', value='2two')
Assigment(start_line=3, start_col=0, end_line=3, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
UnknownStatement(start_line=4, start_col=0, end_line=4, end_col=12, raw='one == three')
EOS(start_line=5, start_col=0, end_line=5, end_col=0, raw=None)
The result is a collection of ``ParsedStatement`` or ``ParsingError`` (flanked by
``BOF`` and ``EOS`` indicating beginning of file and ending of stream respectively
Alternative, it can beginning with ``BOR`` with means beginning of resource and it
is used when parsing a Python Resource provided with a package).
Notice that there are two correctly parsed statements (``Assigment``), one
error found (``InvalidIdentifier``) and one unknown (``UnknownStatement``).
Cool, right? Just writing a ``from_string`` method that outputs a datastructure
produces a usable structure of parsed objects.
Now what? Let's say we want to support equality comparison. Simply do:
.. code-block:: python
@dataclass(frozen=True)
class EqualityComparison(fp.ParsedStatement):
"""Parses the following `this == other`
"""
lhs: str
rhs: str
@classmethod
def from_string(cls, s):
if "==" not in s:
return None
lhs, rhs = (p.strip() for p in s.split("=="))
return cls(lhs, rhs)
parsed = fp.parse("source.txt", (Assigment, Equality))
for el in parsed.iter_statements():
print(repr(el))
and run it again:
.. code-block:: text
BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='37bc23cde7cad3ece96b7abf64906c84decc116de1e0486679eb6ca696f233a403f756e2e431063c82abed4f0e342294c2fe71af69111faea3765b78cb90c03f'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source1.txt'), mtime=1658550284.9419456)
Assigment(start_line=1, start_col=0, end_line=1, end_col=12, raw='one <- other', lhs='one', rhs='other')
InvalidIdentifier(start_line=2, start_col=0, end_line=2, end_col=11, raw='2two <- new', value='2two')
Assigment(start_line=3, start_col=0, end_line=3, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
EqualityComparison(start_line=4, start_col=0, end_line=4, end_col=12, raw='one == three', lhs='one', rhs='three')
EOS(start_line=5, start_col=0, end_line=5, end_col=0, raw=None)
You need to group certain statements together: welcome to ``Block``
This construct allows you to group
.. code-block:: python
class Begin(fp.ParsedStatement):
@classmethod
def from_string(cls, s):
if s == "begin":
return cls()
return None
class End(fp.ParsedStatement):
@classmethod
def from_string(cls, s):
if s == "end":
return cls()
return None
class ParserConfig:
pass
class AssigmentBlock(fp.Block[Begin, Assigment, End, ParserConfig]):
pass
parsed = fp.parse("source.txt", (AssigmentBlock, Equality))
Run the code:
.. code-block:: text
BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='37bc23cde7cad3ece96b7abf64906c84decc116de1e0486679eb6ca696f233a403f756e2e431063c82abed4f0e342294c2fe71af69111faea3765b78cb90c03f'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source1.txt'), mtime=1658550284.9419456)
UnknownStatement(start_line=1, start_col=0, end_line=1, end_col=12, raw='one <- other')
UnknownStatement(start_line=2, start_col=0, end_line=2, end_col=11, raw='2two <- new')
UnknownStatement(start_line=3, start_col=0, end_line=3, end_col=17, raw='three <- newvalue')
UnknownStatement(start_line=4, start_col=0, end_line=4, end_col=12, raw='one == three')
EOS(start_line=5, start_col=0, end_line=5, end_col=0, raw=None)
Notice that there are a lot of ``UnknownStatement`` now, because we instructed
the parser to only look for assignment within a block. So change your text file to:
.. code-block:: text
begin
one <- other
2two <- new
three <- newvalue
end
one == three
and try again:
.. code-block:: text
BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='3d8ce0051dcdd6f0f80ef789a0df179509d927874f242005ac41ed886ae0b71a30b845b9bfcb30194461c0ef6a3ca324c36f411dfafc7e588611f1eb0269bb5a'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source2.txt'), mtime=1658550707.1248093)
Begin(start_line=1, start_col=0, end_line=1, end_col=5, raw='begin')
Assigment(start_line=2, start_col=0, end_line=2, end_col=12, raw='one <- other', lhs='one', rhs='other')
InvalidIdentifier(start_line=3, start_col=0, end_line=3, end_col=11, raw='2two <- new', value='2two')
Assigment(start_line=4, start_col=0, end_line=4, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
End(start_line=5, start_col=0, end_line=5, end_col=3, raw='end')
EqualityComparison(start_line=6, start_col=0, end_line=6, end_col=12, raw='one == three', lhs='one', rhs='three')
EOS(start_line=7, start_col=0, end_line=7, end_col=0, raw=None)
Until now we have used ``parsed.iter_statements`` to iterate over all parsed statements.
But let's look inside ``parsed``, an object of ``ParsedProject`` type. It is a thin wrapper
over a dictionary mapping files to parsed content. Because we have provided a single file
and this does not contain a link another, our ``parsed`` object contains a single element.
The key is ``None`` indicating that the file 'source.txt' was loaded from the root location
(None). The content is a ``ParsedSourceFile`` object with the following attributes:
- **path**: full path of the source file
- **mtime**: modification file of the source file
- **content_hash**: hash of the pickled content
- **config**: extra parameters that can be given to the parser (see below).
.. code-block:: text
ParsedSource(
parsed_source=parse.<locals>.CustomRootBlock(
opening=BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='3d8ce0051dcdd6f0f80ef789a0df179509d927874f242005ac41ed886ae0b71a30b845b9bfcb30194461c0ef6a3ca324c36f411dfafc7e588611f1eb0269bb5a'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source2.txt'), mtime=1658550707.1248093),
body=(
Block.subclass_with.<locals>.CustomBlock(
opening=Begin(start_line=1, start_col=0, end_line=1, end_col=5, raw='begin'),
body=(
Assigment(start_line=2, start_col=0, end_line=2, end_col=12, raw='one <- other', lhs='one', rhs='other'),
InvalidIdentifier(start_line=3, start_col=0, end_line=3, end_col=11, raw='2two <- new', value='2two'),
Assigment(start_line=4, start_col=0, end_line=4, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
),
closing=End(start_line=5, start_col=0, end_line=5, end_col=3, raw='end')),
EqualityComparison(start_line=6, start_col=0, end_line=6, end_col=12, raw='one == three', lhs='one', rhs='three')),
closing=EOS(start_line=7, start_col=0, end_line=7, end_col=0, raw=None)),
config=None
)
A few things to notice:
1. We were using a block before without knowing. The ``RootBlock`` is a
special type of Block that starts and ends automatically with the
file.
2. ``opening``, ``body``, ``closing`` are automatically annotated with the
possible ``ParsedStatement`` (plus `ParsingError`),
therefore autocompletes works in most IDEs.
3. The same is true for the defined ``ParsedStatement`` (we have use
``dataclass`` for a reason). This makes using the actual
result of the parsing a charm!.
4. That annoying ``subclass_with.<locals>`` is because we have built
a class on the fly when we used ``Block.subclass_with``. You can
get rid of it (which is actually useful for pickling) by explicit
subclassing Block in your code (see below).
Multiple source files
---------------------
Most projects have more than one source file internally connected.
A file might refer to another that also need to be parsed (e.g. an
`#include` statement in c). **flexparser** provides the ``IncludeStatement``
base class specially for this purpose.
.. code-block:: python
@dataclass(frozen=True)
class Include(fp.IncludeStatement):
"""A naive implementation of #include "file"
"""
value: str
@classmethod
def from_string(cls, s):
if s.startwith("#include "):
return None
value = s[len("#include "):].strip().strip('"')
return cls(value)
@propery
def target(self):
return self.value
The only difference is that you need to implement a ``target`` property
that returns the file name or resource that this statement refers to.
Customizing statementization
----------------------------
statementi ... what? **flexparser** works by trying to parse each statement with
one of the known classes. So it is fair to ask what is an statement in this
context and how can you configure it to your needs. A text file is split into
non overlapping strings called **statements**. Parsing work as follows:
1. each file is split into statements (can be single or multi line).
2. each statement is parsed with the first of the contextually
available ParsedStatement or Block subclassed that returns
a ``ParsedStatement`` or ``ParsingError``
You can customize how to split each line into statements with two arguments
provided to parse:
- **strip_spaces** (`bool`): indicates that leading and trailing spaces must
be removed before attempting to parse.
(default: True)
- **delimiters** (`dict`): indicates how each line must be subsplit.
(default: do not divide)
An delimiter example might be
``{";": (fp.DelimiterInclude.SKIP, fp.DelimiterAction.CONTINUE)}``
which tells the statementizer (sorry) that when a ";" is found a new statement should
begin. ``DelimiterMode.SKIP`` tells that ";" should not be added to the previous
statement nor to the next. Other valid values are ``SPLIT_AFTER`` and ``SPLIT_BEFORE``
to append or prepend the delimiter character to the previous or next statement.
The second element tells the statementizer (sorry again) what to do next:
valid values are: `CONTINUE`, `CAPTURE_NEXT_TIL_EOL`, `STOP_PARSING_LINE`, and
`STOP_PARSING`.
This is useful with comments. For example,
``{"#": (fp.DelimiterMode.WITH_NEXT, fp.DelimiterAction.CAPTURE_NEXT_TIL_EOL))}``
tells the statementizer (it is not funny anymore) that after the first "#"
it should stop splitting and capture all.
This allows:
.. code-block:: text
## This will work as a single statement
# This will work as a single statement #
# This will work as # a single statement #
a = 3 # this will produce two statements (a=3, and the rest)
Explicit Block classes
----------------------
.. code-block:: python
class AssigmentBlock(fp.Block[Begin, Assigment, End]):
pass
class EntryBlock(fp.RootBlock[Union[AssigmentBlock, Equality]]):
pass
parsed = fp.parse("source.txt", EntryBlock)
Customizing parsing
-------------------
In certain cases you might want to leave to the user some configuration
details. We have method for that!. Instead of overriding ``from_string``
override ``from_string_and_config``. The second argument is an object
that can be given to the parser, which in turn will be passed to each
``ParsedStatement`` class.
.. code-block:: python
@dataclass(frozen=True)
class NumericAssigment(fp.ParsedStatement):
"""Parses the following `this <- other`
"""
lhs: str
rhs: numbers.Number
@classmethod
def from_string_and_config(cls, s, config):
if "==" not in s:
# This means: I do not know how to parse it
# try with another ParsedStatement class.
return None
lhs, rhs = s.split("==")
return cls(lhs.strip(), config.numeric_type(rhs.strip()))
class Config:
numeric_type = float
parsed = fp.parse("source.txt", NumericAssigment, Config)
----
This project was started as a part of Pint_, the python units package.
See AUTHORS_ for a list of the maintainers.
To review an ordered list of notable changes for each version of a project,
see CHANGES_
.. _`AUTHORS`: https://github.com/hgrecco/flexparser/blob/main/AUTHORS
.. _`CHANGES`: https://github.com/hgrecco/flexparser/blob/main/CHANGES
.. _`Pint`: https://github.com/hgrecco/pint
|