1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
|
# Parser configuration
This section describes how to alter parser default behaviour.
---
There are some aspect of parsing that can be configured using parser and/or
`ParsingExpression` parameters. Arpeggio has some sane default behaviour but
gives the user possibility to alter it.
This section describes various parser parameters.
## Case insensitive parsing
By default Arpeggio is case sensitive. If you wish to do case insensitive
parsing set parser parameter `ignore_case` to `True`.
```python
parser = ParserPython(calc, ignore_case=True)
```
## White-space handling
Arpeggio by default skips white-spaces. You can change this behaviour with the
parameter `skipws` given to parser constructor.
```python
parser = ParserPython(calc, skipws=False)
```
You can also change what is considered a whitespace by Arpeggio using the `ws`
parameter. It is a plain string that consists of white-space characters. By
default it is set to `"\t\n\r "`.
For example, to prevent a newline to be treated as whitespace you could write:
```python
parser = ParserPython(calc, ws='\t\r ')
```
!!! note
These parameters can be used on the ``Sequence`` level so one could write
grammar like this:
def grammar(): return Sequence("one", "two", "three", skipws=False),
"four"
parser = ParserPython(grammar)
pt = parser.parse("onetwothree four")
## Keyword handling
By setting a `autokwd` parameter to `True` a word boundary match for
keyword-like matches will be performed.
This parameter is disabled by default.
def grammar(): return "one", "two", "three"
parser = ParserPython(grammar, autokwd=True)
# If autokwd is enabled this should parse without error.
parser.parse("one two three")
# But this will not parse as the match is done using word boundaries
# so this is considered a one word.
parser.parse("onetwothree")
## Comment handling
Support for comments in your language can be specified as another set of
grammar rules. See [simple.py
example](https://github.com/textX/Arpeggio/blob/master/examples/simple/).
Parser is constructed using two parameters.
```python
parser = ParserPython(simpleLanguage, comment)
```
First parameter is the root rule of main parse model while the second is a rule
for comments.
During parsing comment parse trees are kept in the separate list thus comments
will not show in the main parse tree.
## Parse tree reduction
Non-terminals are by default created for each rule. Sometimes it can result in
trees of great depth. You can alter this behaviour setting `reduce_tree`
parameter to `True`.
```python
parser = ParserPython(calc, reduce_tree=True)
```
In this configuration non-terminals a with single child will be removed from the
parse tree.
<a href="../images/calc_parse_tree.dot.png" target="_blank"><img src="../images/calc_parse_tree.dot.png"/></a>
For example, `calc` parse tree above will look like this:
<a href="../images/calc_parse_tree_reduced.dot.png" target="_blank"><img src="../images/calc_parse_tree_reduced.dot.png"/></a>
Notice the removal of each non-terminal with a single child.
!!! warning
Be aware that [semantic analysis](semantics.md) operates on nodes of
finished parse tree. Therefore, if you use [tree
reduction](configuration.md#parse-tree-reduction), visitor methods will not
get called for the removed nodes.
## Newline termination for Repetitions
By default `Repetition` parsing expressions (i.e. `ZeroOrMore` and `OneOrMore`)
will obey `skipws` and `ws` settings but there are situations where repetitions
should not pass the end of the current line. For this feature `eolterm`
parameter is introduced which can be set on a repetition and will ensure that it
terminates before entering a new line.
def grammar(): return first, second
def first(): return ZeroOrMore(["a", "b"], eolterm=True)
def second(): return "a"
# first rule should match only first line
# so that second rule will match "a" on the new line
input = """a a b a b b
a"""
parser = ParserPython(grammar)
result = parser.parse(input)
## Separator for Repetitions
It is possible to specify parsing expression that will be used in between each
two matches in repetitions.
For example:
def grammar(): return ZeroOrMore(["a", "b"], sep=",")
# Commas will be treated as separators between elements
input = "a , b, b, a"
parser = ParserPython(grammar)
result = parser.parse(input)
`sep` can be any valid parsing expression.
### Memoization (a.k.a. packrat parsing)
This technique is based on memoizing result on each parsing expression rule. For
some grammars with a lot of backtracking this can yield a significant speed
increase at the expense of some memory used for the memoization cache.
Starting with Arpeggio 1.5 this feature is disabled by default. If you think
that parsing is slow, try to enable memoization by setting `memoization`
parameter to `True` during parser instantiation.
```python
parser = ParserPython(grammar, memoization=True)
```
|