1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195
|
# Recipes
A collection of recipes to use Lark and its various features
## Use a transformer to parse integer tokens
Transformers are the common interface for processing matched rules and tokens.
They can be used during parsing for better performance.
```python
from lark import Lark, Transformer
class T(Transformer):
def INT(self, tok):
"Convert the value of `tok` from string to int, while maintaining line number & column."
return tok.update(value=int(tok))
parser = Lark("""
start: INT*
%import common.INT
%ignore " "
""", parser="lalr", transformer=T())
print(parser.parse('3 14 159'))
```
Prints out:
```python
Tree(start, [Token(INT, 3), Token(INT, 14), Token(INT, 159)])
```
## Collect all comments with lexer_callbacks
`lexer_callbacks` can be used to interface with the lexer as it generates tokens.
It accepts a dictionary of the form
{TOKEN_TYPE: callback}
Where callback is of type `f(Token) -> Token`
It only works with the basic and contextual lexers.
This has the same effect of using a transformer, but can also process ignored tokens.
```python
from lark import Lark
comments = []
parser = Lark("""
start: INT*
COMMENT: /#.*/
%import common (INT, WS)
%ignore COMMENT
%ignore WS
""", parser="lalr", lexer_callbacks={'COMMENT': comments.append})
parser.parse("""
1 2 3 # hello
# world
4 5 6
""")
print(comments)
```
Prints out:
```python
[Token(COMMENT, '# hello'), Token(COMMENT, '# world')]
```
*Note: We don't have to return a token, because comments are ignored*
## CollapseAmbiguities
Parsing ambiguous texts with earley and `ambiguity='explicit'` produces a single tree with `_ambig` nodes to mark where the ambiguity occurred.
However, it's sometimes more convenient instead to work with a list of all possible unambiguous trees.
Lark provides a utility transformer for that purpose:
```python
from lark import Lark, Tree, Transformer
from lark.visitors import CollapseAmbiguities
grammar = """
!start: x y
!x: "a" "b"
| "ab"
| "abc"
!y: "c" "d"
| "cd"
| "d"
"""
parser = Lark(grammar, ambiguity='explicit')
t = parser.parse('abcd')
for x in CollapseAmbiguities().transform(t):
print(x.pretty())
```
This prints out:
start
x
a
b
y
c
d
start
x ab
y cd
start
x abc
y d
While convenient, this should be used carefully, as highly ambiguous trees will soon create an exponential explosion of such unambiguous derivations.
## Keeping track of parents when visiting
The following visitor assigns a `parent` attribute for every node in the tree.
If your tree nodes aren't unique (if there is a shared Tree instance), the assert will fail.
```python
class Parent(Visitor):
def __default__(self, tree):
for subtree in tree.children:
if isinstance(subtree, Tree):
assert not hasattr(subtree, 'parent')
subtree.parent = proxy(tree)
```
## Unwinding VisitError after a transformer/visitor exception
Errors that happen inside visitors and transformers get wrapped inside a `VisitError` exception.
This can often be inconvenient, if you wish the actual error to propagate upwards, or if you want to catch it.
But, it's easy to unwrap it at the point of calling the transformer, by catching it and raising the `VisitError.orig_exc` attribute.
For example:
```python
from lark import Lark, Transformer
from lark.visitors import VisitError
tree = Lark('start: "a"').parse('a')
class T(Transformer):
def start(self, x):
raise KeyError("Original Exception")
t = T()
try:
print( t.transform(tree))
except VisitError as e:
raise e.orig_exc
```
## Adding a Progress Bar to Parsing with tqdm
Parsing large files can take a long time, even with the `parser='lalr'` option. To make this process more user-friendly, it's useful to add a progress bar. One way to achieve this is to use the `InteractiveParser` to display each token as it is processed. In this example, we use [tqdm](https://github.com/tqdm/tqdm), but a similar approach should work with GUIs.
```python
from tqdm import tqdm
def parse_with_progress(parser: Lark, text: str, start=None):
last = 0
progress = tqdm(total=len(text))
pi = parser.parse_interactive(text, start=start)
for token in pi.iter_parse():
if token.end_pos is not None:
progress.update(token.end_pos - last)
last = token.end_pos
return pi.result
```
Note that we don't simply wrap the iterable because tqdm would not be able to determine the total. Additionally, keep in mind that this implementation relies on the `InteractiveParser` and, therefore, only works with the `LALR(1)` parser, not `earley`.
|