File: recipes.md

package info (click to toggle)
python-lark 1.3.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,804 kB
  • sloc: python: 13,554; javascript: 88; makefile: 34; sh: 8
file content (206 lines) | stat: -rw-r--r-- 5,152 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
# Recipes

A collection of recipes to use Lark and its various features


## Use a transformer to parse integer tokens

Transformers are the common interface for processing matched rules and tokens.

They can be used during parsing for better performance.

```python
from lark import Lark, Transformer

class T(Transformer):
    def INT(self, tok):
        "Convert the value of `tok` from string to int, while maintaining line number & column."
        return tok.update(value=int(tok))

parser = Lark("""
start: INT*
%import common.INT
%ignore " "
""", parser="lalr", transformer=T())

print(parser.parse('3 14 159'))
```

Prints out:

```python
Tree(start, [Token(INT, 3), Token(INT, 14), Token(INT, 159)])
```


## Collect all comments with lexer_callbacks

`lexer_callbacks` can be used to interface with the lexer as it generates tokens.

It accepts a dictionary of the form

    {TOKEN_TYPE: callback}

Where callback is of type `f(Token) -> Token`

It only works with the basic and contextual lexers.

This has the same effect of using a transformer, but can also process ignored tokens.

```python
from lark import Lark

comments = []

parser = Lark("""
    start: INT*

    COMMENT: /#.*/

    %import common (INT, WS)
    %ignore COMMENT
    %ignore WS
""", parser="lalr", lexer_callbacks={'COMMENT': comments.append})

parser.parse("""
1 2 3  # hello
# world
4 5 6
""")

print(comments)
```

Prints out:

```python
[Token(COMMENT, '# hello'), Token(COMMENT, '# world')]
```

*Note: We don't have to return a token, because comments are ignored*


## CollapseAmbiguities

Parsing ambiguous texts with earley and `ambiguity='explicit'` produces a single tree with `_ambig` nodes to mark where the ambiguity occurred.

However, it's sometimes more convenient instead to work with a list of all possible unambiguous trees.

Lark provides a utility transformer for that purpose:

```python
from lark import Lark, Tree, Transformer
from lark.visitors import CollapseAmbiguities

grammar = """
    !start: x y

    !x: "a" "b"
      | "ab"
      | "abc"

    !y: "c" "d"
      | "cd"
      | "d"

"""
parser = Lark(grammar, ambiguity='explicit')

t = parser.parse('abcd')
for x in CollapseAmbiguities().transform(t):
    print(x.pretty())
```

This prints out:

    start
    x
        a
        b
    y
        c
        d

    start
    x     ab
    y     cd

    start
    x     abc
    y     d

While convenient, this should be used carefully, as highly ambiguous trees will soon create an exponential explosion of such unambiguous derivations.


## Keeping track of parents when visiting

The following visitor assigns a `parent` attribute for every node in the tree.

If your tree nodes aren't unique (if there is a shared Tree instance), the assert will fail.

```python
class Parent(Visitor):
    def __default__(self, tree):
        for subtree in tree.children:
            if isinstance(subtree, Tree):
                assert not hasattr(subtree, 'parent')
                subtree.parent = proxy(tree)
```


## Unwinding VisitError after a transformer/visitor exception

Errors that happen inside visitors and transformers get wrapped inside a `VisitError` exception.

This can often be inconvenient, if you wish the actual error to propagate upwards, or if you want to catch it.

But, it's easy to unwrap it at the point of calling the transformer, by catching it and raising the `VisitError.orig_exc` attribute.

For example:
```python
from lark import Lark, Transformer
from lark.visitors import VisitError

tree = Lark('start: "a"').parse('a')

class T(Transformer):
    def start(self, x):
        raise KeyError("Original Exception")

t = T()
try:
    print( t.transform(tree))
except VisitError as e:
    raise e.orig_exc
```


## Adding a Progress Bar to Parsing with tqdm

Parsing large files can take a long time, even with the `parser='lalr'` option. To make this process more user-friendly, it's useful to add a progress bar. One way to achieve this is to use the `InteractiveParser` to display each token as it is processed. In this example, we use [tqdm](https://github.com/tqdm/tqdm), but it should be easy to adapt to other kinds of progress bars.

```python
from tqdm import tqdm

def parse_with_progress(parser: Lark, text: str, start=None):
    last = 0
    progress = tqdm(total=len(text))
    pi = parser.parse_interactive(text, start=start)
    for token in pi.iter_parse():
        if token.end_pos is not None:
            progress.update(token.end_pos - last)
            last = token.end_pos
    return pi.resume_parse()    # Finish up and get the result
```

Keep in mind that this implementation relies on the `InteractiveParser` and, therefore, only works with the `LALR(1)` parser, and not `Earley`.


## Parsing a Language with Significant Indentation

If your grammar needs to support significant indentation (e.g. Python, YAML), you will need to use
the `Indenter` class. Take a look at the [indented tree example][indent] as well as the
[Python grammar][python] for inspiration.

[indent]: examples/indented_tree.html
[python]: https://github.com/lark-parser/lark/blob/master/lark/grammars/python.lark