1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381
|
==========================================
Parsley Tutorial Part I: Basics and Syntax
==========================================
*************************************
From Regular Expressions To Grammars
*************************************
Parsley is a pattern matching and parsing tool for Python programmers.
Most Python programmers are familiar with regular expressions, as
provided by Python's `re` module. To use it, you provide a string that
describes the pattern you want to match, and your input.
For example::
>>> import re
>>> x = re.compile("a(b|c)d+e")
>>> x.match("abddde")
<_sre.SRE_Match object at 0x7f587af54af8>
You can do exactly the same sort of thing in Parsley::
>>> import parsley
>>> x = parsley.makeGrammar("foo = 'a' ('b' | 'c') 'd'+ 'e'", {})
>>> x("abdde").foo()
'e'
From this small example, a couple differences between regular
expressions and Parsley grammars can be seen:
Parsley Grammars Have Named Rules
---------------------------------
A Parsley grammar can have many rules, and each has a name. The
example above has a single rule named `foo`. Rules can call each
other; calling rules in Parsley works like calling functions in
Python. Here is another way to write the grammar above::
foo = 'a' baz 'd'+ 'e'
baz = 'b' | 'c'
Parsley Grammars Are Expressions
--------------------------------
Calling `match` for a regular expression returns a match object if the
match succeeds or None if it fails. Parsley parsers return the value
of last expression in the rule. Behind the scenes, Parsley turns each
rule in your grammar into Python methods. In pseudo-Python code, it
looks something like this::
def foo(self):
match('a')
self.baz()
match_one_or_more('d')
return match('e')
def baz(self):
return match('b') or match('c')
The value of the last expression in the rule is what the rule
returns. This is why our example returns 'e'.
The similarities to regular expressions pretty much end here,
though. Having multiple named rules composed of expressions makes for
a much more powerful tool, and now we're going to look at some more
features that go even further.
Rules Can Embed Python Expressions
----------------------------------
Since these rules just turn into Python code eventually, we can stick
some Python code into them ourselves. This is particularly useful for
changing the return value of a rule. The Parsley expression for this
is `->`. We can also bind the results of expressions to variable names
and use them in Python code. So things like this are possible::
x = parsley.makeGrammar("""
foo = 'a':one baz:two 'd'+ 'e' -> (one, two)
baz = 'b' | 'c'
""", {})
print x("abdde").foo()
::
('a', 'b')
Literal match expressions like `'a'` return the character they
match. Using a colon and a variable name after an expression is like
assignment in Python. As a result, we can use those names in a Python
expression - in this case, creating a tuple.
Another way to use Python code in a rule is to write custom tests for
matching. Sometimes it's more convenient to write some Python that
determines if a rule matches than to stick to Parsley expressions
alone. For those cases, we can use `?()`. Here, we use the builtin
rule `anything` to match a single character, then a Python predicate
to decide if it's the one we want::
digit = anything:x ?(x in '0123456789') -> x
This rule `digit` will match any decimal digit. We need the `-> x` on
the end to return the character rather than the value of the predicate
expression, which is just `True`.
Repeated Matches Make Lists
---------------------------
Like regular expressions, Parsley supports repeating matches. You can
match an expression zero or more times with '* ', one or more times
with '+', and a specific number of times with '{n, m}' or just
'{n}'. Since all expressions in Parsley return a value, these
repetition operators return a list containing each match they made.
::
x = parsley.makeGrammar("""
digit = anything:x ?(x in '0123456789') -> x
number = digit+
""", {})
print x("314159").number()
::
['3', '1', '4', '1', '5', '9']
The `number` rule repeatedly matches `digit` and collects the matches
into a list. This gets us part way to turning a string like `314159`
into an integer. All we need now is to turn the list back into a
string and call `int()`::
x = parsley.makeGrammar("""
digit = anything:x ?(x in '0123456789') -> x
number = digit+:ds -> int(''.join(ds))
""", {})
print x("8675309").number()
::
8675309
Collecting Chunks Of Input
--------------------------
If it seemed kind of strange to break our input string up into a list
and then reassemble it into a string using `join`, you're not
alone. Parsley has a shortcut for this since it's a common case: you
can use `<>` around a rule to make it return the slice of input it
consumes, ignoring the actual return value of the rule. For example::
x = parsley.makeGrammar("""
digit = anything:x ?(x in '0123456789')
number = <digit+>:ds -> int(ds)
""", {})
print x("11235").number()
::
11235
Here, `<digit+>` returns the string `"11235"`, since that's the
portion of the input that `digit+` matched. (In this case it's the
entire input, but we'll see some more complex cases soon.) Since it
ignores the list returned by `digit+`, leaving the `-> x` out of
`digit` doesn't change the result.
**********************
Building A Calculator
**********************
Now let's look at using these rules in a more complicated parser. We
have support for parsing numbers; let's do addition, as well.
::
x = parsley.makeGrammar("""
digit = anything:x ?(x in '0123456789')
number = <digit+>:ds -> int(ds)
expr = number:left ( '+' number:right -> left + right
| -> left)
""", {})
print x("17+34").expr()
print x("18").expr()
::
51
18
Parentheses group expressions just like in Python. the '`|`' operator
is like `or` in Python - it short-circuits. It tries each expression
until it finds one that matches. For `"17+34"`, the `number` rule
matches "17", then Parsley tries to match `+` followed by another
`number`. Since "+" and "34" are the next things in the input, those
match, and it then runs the Python expression `left + right` and
returns its value. For the input `"18"` it does the same, but `+` does
not match, so Parsley tries the next thing after `|`. Since this is
just a Python expression, the match succeeds and the number 18 is
returned.
Now let's add subtraction::
digit = anything:x ?(x in '0123456789')
number = <digit+>:ds -> int(ds)
expr = number:left ( '+' number:right -> left + right
| '-' number:right -> left - right
| -> left)
This will accept things like '5-4' now.
Since parsing numbers is so common and useful, Parsley actually has
'digit' as a builtin rule, so we don't even need to define it
ourselves. We'll leave it out in further examples and rely on the
version Parsley provides.
Normally we like to allow whitespace in our expressions, so let's add
some support for spaces::
number = <digit+>:ds -> int(ds)
ws = ' '*
expr = number:left ws ('+' ws number:right -> left + right
|'-' ws number:right -> left - right
| -> left)
Now we can handle "17 +34", "2 - 1", etc.
We could go ahead and add multiplication and division here (and
hopefully it's obvious how that would work), but let's complicate
things further and allow multiple operations in our expressions --
things like "1 - 2 + 3".
There's a couple different ways to do this. Possibly the easiest is to
build a list of numbers and operations, then do the math.::
x = parsley.makeGrammar("""
number = <digit+>:ds -> int(ds)
ws = ' '*
add = '+' ws number:n -> ('+', n)
sub = '-' ws number:n -> ('-', n)
addsub = ws (add | sub)
expr = number:left (addsub+:right -> right
| -> left)
""", {})
print x("1 + 2 - 3").expr()
::
[('+', 2), ('-, 3)]
Oops, this is only half the job done. We're collecting the operators
and values, but now we need to do the actual calculation. The easiest
way to do it is probably to write a Python function and call it from
inside the grammar.
So far we have been passing an empty dict as the second argument to
``makeGrammar``. This is a dict of variable bindings that can be used
in Python expressions in the grammar. So we can pass Python objects,
such as functions, this way::
def calculate(start, pairs):
result = start
for op, value in pairs:
if op == '+':
result += value
elif op == '-':
result -= value
return result
x = parsley.makeGrammar("""
number = <digit+>:ds -> int(ds)
ws = ' '*
add = '+' ws number:n -> ('+', n)
sub = '-' ws number:n -> ('-', n)
addsub = ws (add | sub)
expr = number:left (addsub+:right -> calculate(left, right)
| -> left)
""", {"calculate": calculate})
print x("4 + 5 - 6").expr()
::
3
Introducing this function lets us simplify even further: instead of
using ``addsub+``, we can use ``addsub*``, since ``calculate(left, [])``
will return ``left`` -- so now ``expr`` becomes::
expr = number:left addsub*:right -> calculate(left, right)
So now let's look at adding multiplication and division. Here, we run
into precedence rules: should "4 * 5 + 6" give us 26, or 44? The
traditional choice is for multiplication and division to take
precedence over addition and subtraction, so the answer should
be 26. We'll resolve this by making sure multiplication and division
happen before addition and subtraction are considered::
def calculate(start, pairs):
result = start
for op, value in pairs:
if op == '+':
result += value
elif op == '-':
result -= value
elif op == '*':
result *= value
elif op == '/':
result /= value
return result
x = parsley.makeGrammar("""
number = <digit+>:ds -> int(ds)
ws = ' '*
add = '+' ws expr2:n -> ('+', n)
sub = '-' ws expr2:n -> ('-', n)
mul = '*' ws number:n -> ('*', n)
div = '/' ws number:n -> ('/', n)
addsub = ws (add | sub)
muldiv = ws (mul | div)
expr = expr2:left addsub*:right -> calculate(left, right)
expr2 = number:left muldiv*:right -> calculate(left, right)
""", {"calculate": calculate})
print x("4 * 5 + 6").expr()
::
26
Notice particularly that ``add``, ``sub``, and ``expr`` all call the
``expr2`` rule now where they called ``number`` before. This means
that all the places where a number was expected previously, a
multiplication or division expression can appear instead.
Finally let's add parentheses, so you can override the precedence and
write "4 * (5 + 6)" when you do want 44. We'll do this by adding a
``value`` rule that accepts either a number or an expression in
parentheses, and replace existing calls to ``number`` with calls to
``value``.
::
def calculate(start, pairs):
result = start
for op, value in pairs:
if op == '+':
result += value
elif op == '-':
result -= value
elif op == '*':
result *= value
elif op == '/':
result /= value
return result
x = parsley.makeGrammar("""
number = <digit+>:ds -> int(ds)
parens = '(' ws expr:e ws ')' -> e
value = number | parens
ws = ' '*
add = '+' ws expr2:n -> ('+', n)
sub = '-' ws expr2:n -> ('-', n)
mul = '*' ws value:n -> ('*', n)
div = '/' ws value:n -> ('/', n)
addsub = ws (add | sub)
muldiv = ws (mul | div)
expr = expr2:left addsub*:right -> calculate(left, right)
expr2 = value:left muldiv*:right -> calculate(left, right)
""", {"calculate": calculate})
print x("4 * (5 + 6) + 1").expr()
::
45
And there you have it: a four-function calculator with precedence and
parentheses.
|