File: tutorial.rst

package info (click to toggle)
python-parsley 1.3-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,048 kB
  • sloc: python: 9,897; makefile: 127
file content (381 lines) | stat: -rw-r--r-- 12,273 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
==========================================
Parsley Tutorial Part I: Basics and Syntax
==========================================

*************************************
From Regular Expressions To Grammars
*************************************

Parsley is a pattern matching and parsing tool for Python programmers.

Most Python programmers are familiar with regular expressions, as
provided by Python's `re` module. To use it, you provide a string that
describes the pattern you want to match, and your input.

For example::

    >>> import re
    >>> x = re.compile("a(b|c)d+e")
    >>> x.match("abddde")
    <_sre.SRE_Match object at 0x7f587af54af8>


You can do exactly the same sort of thing in Parsley::

    >>> import parsley
    >>> x = parsley.makeGrammar("foo = 'a' ('b' | 'c') 'd'+ 'e'", {})
    >>> x("abdde").foo()
    'e'

From this small example, a couple differences between regular
expressions and Parsley grammars can be seen:

Parsley Grammars Have Named Rules
---------------------------------

A Parsley grammar can have many rules, and each has a name. The
example above has a single rule named `foo`. Rules can call each
other; calling rules in Parsley works like calling functions in
Python. Here is another way to write the grammar above::

    foo = 'a' baz 'd'+ 'e'
    baz = 'b' | 'c'


Parsley Grammars Are Expressions
--------------------------------

Calling `match` for a regular expression returns a match object if the
match succeeds or None if it fails. Parsley parsers return the value
of last expression in the rule. Behind the scenes, Parsley turns each
rule in your grammar into Python methods. In pseudo-Python code, it
looks something like this::

    def foo(self):
        match('a')
        self.baz()
        match_one_or_more('d')
        return match('e')

    def baz(self):
        return match('b') or match('c')

The value of the last expression in the rule is what the rule
returns. This is why our example returns 'e'.

The similarities to regular expressions pretty much end here,
though. Having multiple named rules composed of expressions makes for
a much more powerful tool, and now we're going to look at some more
features that go even further.

Rules Can Embed Python Expressions
----------------------------------

Since these rules just turn into Python code eventually, we can stick
some Python code into them ourselves. This is particularly useful for
changing the return value of a rule. The Parsley expression for this
is `->`. We can also bind the results of expressions to variable names
and use them in Python code. So things like this are possible::

    x = parsley.makeGrammar("""
    foo = 'a':one baz:two 'd'+ 'e' -> (one, two)
    baz = 'b' | 'c'
    """, {})
    print x("abdde").foo()

::

    ('a', 'b')

Literal match expressions like `'a'` return the character they
match. Using a colon and a variable name after an expression is like
assignment in Python. As a result, we can use those names in a Python
expression - in this case, creating a tuple.

Another way to use Python code in a rule is to write custom tests for
matching. Sometimes it's more convenient to write some Python that
determines if a rule matches than to stick to Parsley expressions
alone. For those cases, we can use `?()`. Here, we use the builtin
rule `anything` to match a single character, then a Python predicate
to decide if it's the one we want::

    digit = anything:x ?(x in '0123456789') -> x

This rule `digit` will match any decimal digit. We need the `-> x` on
the end to return the character rather than the value of the predicate
expression, which is just `True`.

Repeated Matches Make Lists
---------------------------

Like regular expressions, Parsley supports repeating matches. You can
match an expression zero or more times with '* ', one or more times
with '+', and a specific number of times with '{n, m}' or just
'{n}'. Since all expressions in Parsley return a value, these
repetition operators return a list containing each match they made.

::

    x = parsley.makeGrammar("""
    digit = anything:x ?(x in '0123456789') -> x
    number = digit+
    """, {})
    print x("314159").number()

::

    ['3', '1', '4', '1', '5', '9']

The `number` rule repeatedly matches `digit` and collects the matches
into a list. This gets us part way to turning a string like `314159`
into an integer. All we need now is to turn the list back into a
string and call `int()`::

    x = parsley.makeGrammar("""
    digit = anything:x ?(x in '0123456789') -> x
    number = digit+:ds -> int(''.join(ds))
    """, {})
    print x("8675309").number()

::

    8675309

Collecting Chunks Of Input
--------------------------

If it seemed kind of strange to break our input string up into a list
and then reassemble it into a string using `join`, you're not
alone. Parsley has a shortcut for this since it's a common case: you
can use `<>` around a rule to make it return the slice of input it
consumes, ignoring the actual return value of the rule. For example::

    x = parsley.makeGrammar("""
    digit = anything:x ?(x in '0123456789')
    number = <digit+>:ds -> int(ds)
    """, {})
    print x("11235").number()

::

    11235

Here, `<digit+>` returns the string `"11235"`, since that's the
portion of the input that `digit+` matched. (In this case it's the
entire input, but we'll see some more complex cases soon.) Since it
ignores the list returned by `digit+`, leaving the `-> x` out of
`digit` doesn't change the result.

**********************
Building A Calculator
**********************

Now let's look at using these rules in a more complicated parser. We
have support for parsing numbers; let's do addition, as well.
::

    x = parsley.makeGrammar("""
    digit = anything:x ?(x in '0123456789')
    number = <digit+>:ds -> int(ds)
    expr = number:left ( '+' number:right -> left + right
                       | -> left)
    """, {})
    print x("17+34").expr()
    print x("18").expr()

::

    51
    18

Parentheses group expressions just like in Python. the '`|`' operator
is like `or` in Python - it short-circuits. It tries each expression
until it finds one that matches. For `"17+34"`, the `number` rule
matches "17", then Parsley tries to match `+` followed by another
`number`. Since "+" and "34" are the next things in the input, those
match, and it then runs the Python expression `left + right` and
returns its value. For the input `"18"` it does the same, but `+` does
not match, so Parsley tries the next thing after `|`. Since this is
just a Python expression, the match succeeds and the number 18 is
returned.

Now let's add subtraction::

    digit = anything:x ?(x in '0123456789')
    number = <digit+>:ds -> int(ds)
    expr = number:left ( '+' number:right -> left + right
                       | '-' number:right -> left - right
                       | -> left)

This will accept things like '5-4' now.

Since parsing numbers is so common and useful, Parsley actually has
'digit' as a builtin rule, so we don't even need to define it
ourselves. We'll leave it out in further examples and rely on the
version Parsley provides.

Normally we like to allow whitespace in our expressions, so let's add
some support for spaces::

    number = <digit+>:ds -> int(ds)
    ws = ' '*
    expr = number:left ws ('+' ws number:right -> left + right
                          |'-' ws number:right -> left - right
                          | -> left)

Now we can handle "17 +34", "2  - 1", etc.

We could go ahead and add multiplication and division here (and
hopefully it's obvious how that would work), but let's complicate
things further and allow multiple operations in our expressions --
things like "1 - 2 + 3".

There's a couple different ways to do this. Possibly the easiest is to
build a list of numbers and operations, then do the math.::

    x = parsley.makeGrammar("""
    number = <digit+>:ds -> int(ds)
    ws = ' '*
    add = '+' ws number:n -> ('+', n)
    sub = '-' ws number:n -> ('-', n)
    addsub = ws (add | sub)
    expr = number:left (addsub+:right -> right
                       | -> left)
    """, {})
    print x("1 + 2 - 3").expr()

::

    [('+', 2), ('-, 3)]

Oops, this is only half the job done. We're collecting the operators
and values, but now we need to do the actual calculation. The easiest
way to do it is probably to write a Python function and call it from
inside the grammar.

So far we have been passing an empty dict as the second argument to
``makeGrammar``. This is a dict of variable bindings that can be used
in Python expressions in the grammar. So we can pass Python objects,
such as functions, this way::

    def calculate(start, pairs):
        result = start
        for op, value in pairs:
            if op == '+':
                result += value
            elif op == '-':
                result -= value
        return result
    x = parsley.makeGrammar("""
    number = <digit+>:ds -> int(ds)
    ws = ' '*
    add = '+' ws number:n -> ('+', n)
    sub = '-' ws number:n -> ('-', n)
    addsub = ws (add | sub)
    expr = number:left (addsub+:right -> calculate(left, right)
                       | -> left)
    """, {"calculate": calculate})
    print x("4 + 5 - 6").expr()

::

    3


Introducing this function lets us simplify even further: instead of
using ``addsub+``, we can use ``addsub*``, since ``calculate(left, [])``
will return ``left`` -- so now ``expr`` becomes::

    expr = number:left addsub*:right -> calculate(left, right)


So now let's look at adding multiplication and division. Here, we run
into precedence rules: should "4 * 5 + 6" give us 26, or 44? The
traditional choice is for multiplication and division to take
precedence over addition and subtraction, so the answer should
be 26. We'll resolve this by making sure multiplication and division
happen before addition and subtraction are considered::

    def calculate(start, pairs):
        result = start
        for op, value in pairs:
            if op == '+':
                result += value
            elif op == '-':
                result -= value
            elif op == '*':
                result *= value
            elif op == '/':
                result /= value
        return result
    x = parsley.makeGrammar("""
    number = <digit+>:ds -> int(ds)
    ws = ' '*
    add = '+' ws expr2:n -> ('+', n)
    sub = '-' ws expr2:n -> ('-', n)
    mul = '*' ws number:n -> ('*', n)
    div = '/' ws number:n -> ('/', n)

    addsub = ws (add | sub)
    muldiv = ws (mul | div)

    expr = expr2:left addsub*:right -> calculate(left, right)
    expr2 = number:left muldiv*:right -> calculate(left, right)
    """, {"calculate": calculate})
    print x("4 * 5 + 6").expr()

::

    26

Notice particularly that ``add``, ``sub``, and ``expr`` all call the
``expr2`` rule now where they called ``number`` before. This means
that all the places where a number was expected previously, a
multiplication or division expression can appear instead.


Finally let's add parentheses, so you can override the precedence and
write "4 * (5 + 6)" when you do want 44. We'll do this by adding a
``value`` rule that accepts either a number or an expression in
parentheses, and replace existing calls to ``number`` with calls to
``value``.

::

    def calculate(start, pairs):
        result = start
        for op, value in pairs:
            if op == '+':
                result += value
            elif op == '-':
                result -= value
            elif op == '*':
                result *= value
            elif op == '/':
                result /= value
        return result
    x = parsley.makeGrammar("""
    number = <digit+>:ds -> int(ds)
    parens = '(' ws expr:e ws ')' -> e
    value = number | parens
    ws = ' '*
    add = '+' ws expr2:n -> ('+', n)
    sub = '-' ws expr2:n -> ('-', n)
    mul = '*' ws value:n -> ('*', n)
    div = '/' ws value:n -> ('/', n)

    addsub = ws (add | sub)
    muldiv = ws (mul | div)

    expr = expr2:left addsub*:right -> calculate(left, right)
    expr2 = value:left muldiv*:right -> calculate(left, right)
    """, {"calculate": calculate})

    print x("4 * (5 + 6) + 1").expr()

::

    45

And there you have it: a four-function calculator with precedence and
parentheses.