File: tree_construction.md

package info (click to toggle)
python-lark 1.2.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,788 kB
  • sloc: python: 13,305; javascript: 88; makefile: 34; sh: 8
file content (159 lines) | stat: -rw-r--r-- 3,612 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# Tree Construction Reference


Lark builds a tree automatically based on the structure of the grammar, where each rule that is matched becomes a branch (node) in the tree, and its children are its matches, in the order of matching.

For example, the rule `node: child1 child2` will create a tree node with two children. If it is matched as part of another rule (i.e. if it isn't the root), the new rule's tree node will become its parent.

Using `item+` or `item*` will result in a list of items, equivalent to writing `item item item ..`.

Using `item?` will return the item if it matched, or nothing.

If `maybe_placeholders=True` (the default), then using `[item]` will return the item if it matched, or the value `None`, if it didn't.

If `maybe_placeholders=False`, then `[]` behaves like `()?`.

## Terminals

Terminals are always values in the tree, never branches.

Lark filters out certain types of terminals by default, considering them punctuation:

- Terminals that won't appear in the tree are:

    - Unnamed literals (like `"keyword"` or `"+"`)
    - Terminals whose name starts with an underscore (like `_DIGIT`)

- Terminals that *will* appear in the tree are:

    - Unnamed regular expressions (like `/[0-9]/`)
    - Named terminals whose name starts with a letter (like `DIGIT`)

Note: Terminals composed of literals and other terminals always include the entire match without filtering any part.

**Example:**
```
start:  PNAME pname

PNAME:  "(" NAME ")"
pname:  "(" NAME ")"

NAME:   /\w+/
%ignore /\s+/
```
Lark will parse "(Hello) (World)" as:

    start
        (Hello)
        pname World

Rules prefixed with `!` will retain all their literals regardless.




**Example:**

```perl
    expr: "(" expr ")"
        | NAME+

    NAME: /\w+/

    %ignore " "
```

Lark will parse "((hello world))" as:

    expr
        expr
            expr
                "hello"
                "world"

The brackets do not appear in the tree by design. The words appear because they are matched by a named terminal.


## Shaping the tree

Users can alter the automatic construction of the tree using a collection of grammar features.

### Inlining rules with `_`

Rules whose name begins with an underscore will be inlined into their containing rule.

**Example:**

```perl
    start: "(" _greet ")"
    _greet: /\w+/ /\w+/
```

Lark will parse "(hello world)" as:

    start
        "hello"
        "world"

### Conditionally inlining rules with `?`

Rules that receive a question mark (?) at the beginning of their definition, will be inlined if they have a single child, after filtering.

**Example:**

```ruby
    start: greet greet
    ?greet: "(" /\w+/ ")"
          | /\w+/ /\w+/
```

Lark will parse "hello world (planet)" as:

    start
        greet
            "hello"
            "world"
        "planet"

### Pinning rule terminals with `!`

Rules that begin with an exclamation mark will keep all their terminals (they won't get filtered).

```perl
    !expr: "(" expr ")"
         | NAME+
    NAME: /\w+/
    %ignore " "
```

Will parse "((hello world))" as:

    expr
      (
      expr
        (
        expr
          hello
          world
        )
      )

Using the `!` prefix is usually a "code smell", and may point to a flaw in your grammar design.

### Aliasing rules

Aliases - options in a rule can receive an alias. It will be then used as the branch name for the option, instead of the rule name.

**Example:**

```ruby
    start: greet greet
    greet: "hello"
         | "world" -> planet
```

Lark will parse "hello world" as:

    start
        greet
        planet