1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219
|
Nested Brackets Mini-HOWTO
==========================
<dl>
<dt>Author:</dt>
<dd class="vcard">
<a class="fn url" href="http://claimid.com/vlasovskikh">Andrey Vlasovskikh</a>
</dd>
<dt>License:</dt>
<dd>
<a href="http://creativecommons.org/licenses/by-nc-sa/3.0/">
Creative Commons Attribution-Noncommercial-Share Alike 3.0
</a>
</dd>
<dt>Library Homepage:</dt>
<dd>
<a href="http://code.google.com/p/funcparserlib/">
http://code.google.com/p/funcparserlib/
</a>
</dd>
<dt>Library Version:</dt>
<dd>0.3.5</dd>
</dl>
Intro
-----
Let's try out `funcparserlib` using a tiny example: parsing strings of nested
curly brackets. It is well known that it can't be done with regular expressions
so we need a parser. For more complex examples see [The funcparserlib
Tutorial][tutorial] or
other examples at [the funcparserlib homepage][funcparserlib].
[funcparserlib]: http://code.google.com/p/funcparserlib/
[tutorial]: http://archlinux.folding-maps.org/2009/funcparserlib/Tutorial
Here is the EBNF grammar of our curly brackets language:
nested = "{" , { nested } , "}" ;
i. e. `nested` is a sequence of the symbol `{`, followed by zero or more
occurences of the `nested` production itself, followed by the symbol `}`. Let's
develop a parser for this grammar.
We will parse plain strings, but in real life you may wish to use
`funcparserlib.lexer` or any other lexer to tokenize the input and parse tokens,
not just symbols.
We will use the following `funcparserlib` functions: `a`, `forward_decl`,
`maybe`, `finished`, `many`, `skip`, `pretty_tree`. The library actually exports
only 11 functions, so the API is quite compact.
Nested Brackets Checker
-----------------------
Basic usage:
>>> from funcparserlib.parser import a
>>> brackets = a('{') + a('}')
>>> brackets.parse('{}')
('{', '}')
Let's write a nested brackets checker:
>>> from funcparserlib.parser import forward_decl, maybe
>>> nested = forward_decl()
>>> nested.define(a('{') + maybe(nested) + a('}'))
Test it:
>>> nested.parse('{}')
('{', None, '}')
>>> nested.parse('{{}}')
('{', ('{', None, '}'), '}')
>>> nested.parse('{{}')
Traceback (most recent call last):
...
NoParseError: no tokens left in the stream: <EOF>
>>> nested.parse('{foo}')
Traceback (most recent call last):
...
NoParseError: got unexpected token: f
>>> nested.parse('{}foo')
('{', None, '}')
In the last test we have parsed only a valid subsequence. Let's ensure that all
the input symbols have been parsed:
>>> from funcparserlib.parser import finished
>>> input = nested + finished
Test it:
>>> input.parse('{}foo')
Traceback (most recent call last):
...
NoParseError: should have reached <EOF>: f
Allow zero or more nested brackets:
>>> from funcparserlib.parser import many
>>> nested = forward_decl()
>>> nested.define(a('{') + many(nested) + a('}'))
>>> input = nested + finished
Test it:
>>> input.parse('{{}{}}')
('{', [('{', [], '}'), ('{', [], '}')], '}', None)
Skip `None`, the result of `finished`:
>>> from funcparserlib.parser import skip
>>> end_ = skip(finished)
>>> input = nested + end_
Test it:
>>> input.parse('{{}{}}')
('{', [('{', [], '}'), ('{', [], '}')], '}')
Textual Parse Tree
------------------
Objectify a parse tree:
>>> class Bracket(object):
... def __init__(self, kids):
... self.kids = kids
... def __repr__(self):
... return 'Bracket(%r)' % self.kids
>>> a_ = lambda x: skip(a(x))
>>> nested = forward_decl()
>>> nested.define(a_('{') + many(nested) + a_('}') >> Bracket)
>>> input = nested + end_
Test it:
>>> nested.parse('{{{}{}}{}}')
Bracket([Bracket([Bracket([]), Bracket([])]), Bracket([])])
Draw a textual parse tree:
>>> from funcparserlib.util import pretty_tree
>>> def ptree(t):
... def kids(x):
... if isinstance(x, Bracket):
... return x.kids
... else:
... return []
... def show(x):
... if isinstance(x, Bracket):
... return '{}'
... else:
... return repr(x)
... return pretty_tree(t, kids, show)
Test it:
>>> print ptree(nested.parse('{{{}{}}{}}'))
{}
|-- {}
| |-- {}
| `-- {}
`-- {}
>>> print ptree(nested.parse('{{{{}}{}}{{}}{}{{}{}}}'))
{}
|-- {}
| |-- {}
| | `-- {}
| `-- {}
|-- {}
| `-- {}
|-- {}
`-- {}
|-- {}
`-- {}
Nesting Level
-------------
Let's count the nesting level:
>>> def count(x):
... return 1 if len(x) == 0 else max(x) + 1
>>> nested = forward_decl()
>>> nested.define(a_('{') + many(nested) + a_('}') >> count)
>>> input = nested + end_
Test it:
>>> input.parse('{}')
1
>>> input.parse('{{{}}}')
3
>>> input.parse('{{}{{{}}}}')
4
>>> input.parse('{{{}}{}}')
3
<!-- vim: set ft=markdown tw=80: -->
|