File: TUTORIAL.md

package info (click to toggle)
golang-github-alecthomas-participle 0.7.1-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 564 kB
  • sloc: makefile: 2
file content (255 lines) | stat: -rw-r--r-- 6,186 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
# Participle parser tutorial

<!-- MarkdownTOC -->

1. [Introduction](#introduction)
1. [The complete grammar](#the-complete-grammar)
1. [Root of the .ini AST \(structure, fields\)](#root-of-the-ini-ast-structure-fields)
1. [.ini properties \(named tokens, capturing, literals\)](#ini-properties-named-tokens-capturing-literals)
1. [.ini property values \(alternates, recursive structs, sequences\)](#ini-property-values-alternates-recursive-structs-sequences)
1. [Complete, but limited, .ini grammar \(top-level properties only\)](#complete-but-limited-ini-grammar-top-level-properties-only)
1. [Extending our grammar to support sections](#extending-our-grammar-to-support-sections)
1. [\(Optional\) Source positional information](#optional-source-positional-information)
1. [Parsing using our grammar](#parsing-using-our-grammar)

<!-- /MarkdownTOC -->

## Introduction

Writing a parser in Participle typically involves starting from the "root" of
the AST, annotating fields with the grammar, then recursively expanding until
it is complete. The AST is expressed via Go data types and the grammar is
expressed through struct field tags, as a form of EBNF.

The parser we're going to create for this tutorial parses .ini files
like this:

```ini
age = 21
name = "Bob Smith"

[address]
city = "Beverly Hills"
postal_code = 90210
```

## The complete grammar

I think it's useful to see the complete grammar first, to see what we're
working towards. Read on below for details.

 ```go
 type INI struct {
   Properties []*Property `@@*`
   Sections   []*Section  `@@*`
 }

 type Section struct {
   Identifier string      `"[" @Ident "]"`
   Properties []*Property `@@*`
 }

 type Property struct {
   Key   string `@Ident "="`
   Value *Value `@@`
 }

 type Value struct {
   String *string  `  @String`
   Number *float64 `| @Float`
 }
 ```

## Root of the .ini AST (structure, fields)

The first step is to create a root struct for our grammar. In the case of our
.ini parser, this struct will contain a sequence of properties:

```go
type INI struct {
  Properties []*Property
}

type Property struct {
}
```

## .ini properties (named tokens, capturing, literals)

Each property in an .ini file has an identifier key:

```go
type Property struct {
  Key string
}
```

The default lexer tokenises Go source code, and includes an `Ident` token type
that matches identifiers. To match this token we simply use the token type
name:

```go
type Property struct {
  Key string `Ident`
}
```

This will *match* identifiers, but not *capture* them into the `Key` field. To
capture input tokens into AST fields, prefix any grammar node with `@`:

```go
type Property struct {
  Key string `@Ident`
}
```

In .ini files, each key is separated from its value with a literal `=`. To
match a literal, enclose the literal in double quotes:

```go
type Property struct {
  Key string `@Ident "="`
}
```

> Note: literals in the grammar must match tokens from the lexer *exactly*. In
> this example if the lexer does not output `=` as a distinct token the
> grammar will not match.

## .ini property values (alternates, recursive structs, sequences)

For the purposes of our example we are only going to support quoted string
and numeric property values. As each value can be *either* a string or a float
we'll need something akin to a sum type. Go's type system cannot express this
directly, so we'll use the common approach of making each element a pointer.
The selected "case" will *not* be nil.

```go
type Value struct {
  String *string
  Number *float64
}
```

> Note: Participle will hydrate pointers as necessary.

To express matching a set of alternatives we use the `|` operator:

```go
type Value struct {
  String *string  `  @String`
  Number *float64 `| @Float`
}
```

> Note: the grammar can cross fields.

Next, we'll match values and capture them into the `Property`. To recursively
capture structs use `@@` (capture self):

```go
type Property struct {
  Key   string `@Ident "="`
  Value *Value `@@`
}
```

Now that we can parse a `Property` we need to go back to the root of the
grammar. We want to parse 0 or more properties. To do this, we use `<expr>*`.
Participle will accumulate each match into the slice until matching fails,
then move to the next node in the grammar.

```go
type INI struct {
  Properties []*Property `@@*`
}
```

> Note: tokens can also be accumulated into strings, appending each match.

## Complete, but limited, .ini grammar (top-level properties only)

We now have a functional, but limited, .ini parser!

```go
type INI struct {
  Properties []*Property `@@*`
}

type Property struct {
  Key string   `@Ident "="`
  Value *Value `@@`
}

type Value struct {
  String *string  `  @String`
  Number *float64 `| @Float`
}
```

## Extending our grammar to support sections

Adding support for sections is simply a matter of utilising the constructs
we've just learnt. A section consists of a header identifier, and a sequence
of properties:

```go
type Section struct {
  Identifier string      `"[" @Ident "]"`
  Properties []*Property `@@*`
}
```

Simple!

Now we just add a sequence of `Section`s to our root node:

```go
type INI struct {
  Properties []*Property `@@*`
  Sections   []*Section  `@@*`
}
```

And we're done!

## (Optional) Source positional information

If a grammar node includes a field with the name `Pos` and type `lexer.Position`, it will be automatically populated by positional information. eg.

```go
type Value struct {
  Pos    lexer.Position
  String *string  `  @String`
  Number *float64 `| @Float`
}
```

This is useful for error reporting.

## Parsing using our grammar

To parse with this grammar we first construct the parser (we'll use the
default lexer for now):

```go
parser, err := participle.Build(&INI{})
```

Then create a root node and parse into it with `parser.Parse{,String,Bytes}()`:

```go
ini := &INI{}
err = parser.ParseString(`
age = 21
name = "Bob Smith"

[address]
city = "Beverly Hills"
postal_code = 90210
`, ini)
```

You can find the full example [here](_examples/ini/main.go), alongside
other examples including an SQL `SELECT` parser and a full
[Thrift](https://thrift.apache.org/) parser.