File: parsing.md

package info (click to toggle)
coq 9.1.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 35,968 kB
  • sloc: ml: 239,908; sh: 4,355; python: 2,985; ansic: 2,644; makefile: 874; lisp: 171; javascript: 63; xml: 24; sed: 2
file content (562 lines) | stat: -rw-r--r-- 23,180 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
# Parsing

Rocq's parser is based on Camlp5 using an extensible grammar.  Somewhat helpful
Camlp5 documentation is available [here](http://camlp5.github.io/doc/htmlc/grammars.html).
However, the Camlp5 code has been copied into the Rocq source tree and may differ
from the Camlp5 release.

Notable attributes of the parser include:

* The grammar is extensible at run time.  This is essential for supporting notations
  and optionally-loaded plugins that extend the grammar.

* The grammar is split into multiple source files.  Nonterminals can be local to a file
  or global.

* While 95% of the nonterminals and almost all the productions are defined in the grammar,
  a few are defined directly in OCaml code.  Since many developers have worked on the parser
  over the years, this code can be idiosyncratic, reflecting various coding styles.

* The parser is a recursive descent parser that, by default, only looks at the next token
  to make a parsing decision.  It's possible to hand-code additional lookahead where
  necessary by writing OCaml code.

* There's no code that checks whether a grammar is ambiguous or whether every production
  can be recognized.  Developers who modify the grammar may, in some cases, need to structure their
  added productions in specific ways to ensure that their additions are parsable and that they
  don't break existing productions.

## Contents ##

- [Vocabulary](#vocabulary)
- [Grammars: `*.mlg` File Structure](#grammars-mlg-file-structure)
- [Grammars: Nonterminals and Productions](#grammars-nonterminals-and-productions)
  - [Alternate production syntax](#alternate-production-syntax)
- [Usage notes](#usage-notes)
  - [Other components](#other-components)
  - [Parsing productions](#parsing-productions)
  - [Lookahead](#lookahead)

## Vocabulary

- *entry*: a nonterminal. Entries are values of type
  `'a Procq.Entry.t` (`Entry.t` in the rest of this document),
  where `'a` is the type of the parsed value.
  For instance `Procq.Prim.qualid : qualid Entry.t` parses qualified identifiers ("qualid").

- *argument*: a nonterminal with associated functions (e.g. how to
  print its values). Arguments are values of type
  `('raw,'glb,'top) Genarg.genarg_type`, typically named `wit_foo`,
  with an associated entry of type `'raw Entry.t` typically named `foo`.

## Grammars: `*.mlg` File Structure ##

Grammars are defined in `*.mlg` files, which `rocq preprocess-mlg` compiles into `*.ml` files at build time.
`preprocess-mlg` code is in the `coqpp` directory.  `coqpp` uses yacc and lex to parse the grammar files.
You can examine its yacc and lex input files in `coqpp_lex.mll` and `coqpp_parse.mly` for
details not fully covered here.

In addition, there is a `doc_grammar` build utility that uses the `coqpp` parser to extract the
grammar, then edits and inserts it into the documentation.  This is described in
[`doc/tools/docgram/README.md`](../../doc/tools/docgram/README.md).
`doc_grammar` generates
[`doc/tools/docgram/fullGrammar`](../../doc/tools/docgram/fullGrammar),
which has the full grammar for Rocq
(not including some optionally-loaded plugins).  This may be easier to read since everything is
in one file and the parser action routines and other OCaml code are omitted.

`*.mlg` files contain the following types of nodes (See `node` in the yacc grammar).  This part is
very specific to Rocq (not so similar to Camlp5).

Note that you can reverse engineer many details by comparing the `.mlg` input file with
the `.ml` generated by `coqpp`.

### OCaml code

OCaml code enclosed in curly braces, which is copied verbatim to the generated `*.ml` file

### Comments

Comments in the `*.mlg` file are in the form `(* … *)`. They are not copied
to the generated `*.ml` file.  Comments in OCaml code are preserved.

### DECLARE PLUGIN

`DECLARE PLUGIN "plugin"` - associates the file with a specific
plugin, for example "rocq-runtime.plugins.ltac". The string is the
plugin's ocamlfind name (also used to load it in the `.v` file with
`Declare ML Module`).

### GRAMMAR EXTEND

`GRAMMAR EXTEND` - adds additional entries and productions to the grammar and declares global
entries referenced in the `GRAMMAR EXTEND`:

```
GRAMMAR EXTEND Gram
GLOBAL:
  bignat bigint …;
<nonterminal definitions>
END
```

Global entries should be available in the current OCaml scope,
eg `open Procq.Prim` makes `qualid` available. (qualified entry names are not supported)

`GRAMMAR EXTEND` should be used only for large syntax additions.  To add new commands
and tactics, use these instead:

- `VERNAC COMMAND EXTEND` (and variant `VERNAC { entry } EXTEND`) to add new commands
- `TACTIC EXTEND` to add new ltac1 tactics
- `ARGUMENT EXTEND` to add new arguments for `TACTIC EXTEND` and `VERNAC EXTEND`
- `VERNAC ARGUMENT EXTEND` to add new arguments for `VERNAC EXTEND`

These constructs provide essential semantic information that's provided in a more complex,
less readable way with `GRAMMAR EXTEND`.

### VERNAC EXTEND

`VERNAC COMMAND EXTEND` - adds new command syntax by adding productions to the
`command` entry.

`VERNAC { entry } EXTEND` - adds new command syntax to the `entry` entry.
This is typically used for proof modes (see also `Pvernac.register_proof_mode` and the user doc for proof modes).

Simple example:

```
VERNAC COMMAND EXTEND CmdName CLASSIFIED AS SIDEFF
| [ "Cmd" arg(a) ] -> { do_interp a }
END
```
defines a command composed of the string `Cmd` followed by argument `wit_arg`.
Running the command calls `do_interp a` where `a` is the result of parsing `arg`.

Example with all optional elements:
```
VERNAC COMMAND EXTEND CmdName CLASSIFIED AS SIDEFF
| #[ attr ] ![ state ] [ "Cmd" arg(a) ]
  => { rule_classif a }
  SYNTERP AS synv { do_synterp attr a }
  -> { do_interp attr a synv }
END
```

Productions here are represented with [alternate syntax](#alternate-production-syntax), described later.

Nonterminals in `VERNAC EXTEND` parsing rules (e.g. `arg` in the example)
are arguments, possibly with [nonterminal modifiers](#nonterminal-modifiers).
The name prefixed by `wit_` must be available in the current OCaml scope.

New commands should be added using this construct rather than `GRAMMAR EXTEND` so
they are correctly registered, such as having the correct command classifier.

#### Extend name

Each `VERNAC EXTEND` must have a unique name ("CmdName" in the example)
(unique for the plugin, you will get an anomaly at runtime if it's
not).

#### Classification

Commands must be "classified" for the STM.
Classifying a command is generating a
`Vernacextend.vernac_classification` (see comments on its
declaration for the meaning of the different values).
This is done by appending one of the following annotations to the `VERNAC COMMAND EXTEND CmdName` grammar extension:
- `CLASSIFIED AS QUERY` for `VtQuery`
- `CLASSIFIED AS SIDEFF` for `VtSideff ([], VtLater)` (most commands are in this case)
- `CLASSIFIED BY { code }` where code is the classification, when it is static
- adding `{ code } =>` after the parsing rule (before its
  interpretation), where `code` is the classification and has access
  to argument values.

  Used when the classification is not the same for the whole block
  or depends on the parsed values
  (eg `| [ "foo" arg(x) ] => { classify x } -> { interpret x }`).

  Per-rule classification overrides block level classification.
  Block classification should be omitted (it will be ignored)
  if all rules have their own classification.

#### State access

Commands must explicitly declare when they use some of the global
state: the current proof state, program obligations state, and
bodies of `Qed` proofs. This is done by adding a `STATE state`
specifier for the whole block, or overriding it per-rule with `![ state ]`
before the parsing rule (after attributes if any).

`preprocess-mlg` associates the state specifier with combinators in
`Vernactypes`, and the interpretation must have the type expected by
the combinator (except when it expects a thunk, in which case thunking is implicit).

The following state specifiers
are understood (see `Coqpp_main.understand_state` if this isn't up to date):
- `close_proof` for `vtcloseproof`, note that closing proofs from plugin commands is buggy.
- `open_proof` for `vtopenproof`.
- `proof` for `vtmodifyproof`
- `proof_opt_query` for `vtreadproofopt`
- `proof_query` for `vtreadproof`
- `read_program` for `vtreadprogram`
- `program` for `vtmodifyprogram`
- `declare_program` for `vtdeclareprogram`
- `program_interactive` for `vtopenproofprogram`
- `opaque_access` for `vtopaqueaccess`

So for instance `open_proof` requires the interpretation to have type `Declare.Proof.t`
(implicitly thunked),
`proof` requires it to have type `pstate:Declare.Proof.t -> Declare.Proof.t`.

#### Attributes

By default commands support no attributes (except for the general
attributes like `#[warnings]`). Attributes may be supported by
adding the list of supported attributes `#[ x = a; y = b ]` before
the parsing rule. `x` and `y` will be bound to the parsed attribute
in the interpretation, `a` and `b` are qualified identifiers of type
`'att Attributes.attribute`. Punning `#[ x ]` for `#[ x = x ]` is supported.
You can delay attribute interpretation (e.g. if it depends on the command arguments)
by using `Attributes.raw_attributes`.

#### Synterp

If the command modifies the parser or modifies state which may be
used to change the parser in a later command, these operations must
be separated from the main interpretation and done in the "synterp"
phase. The "synterp" phase does not have access to the "interp"
state (which notably includes the global env).

This is done with `SYNTERP AS synv { synterp }`: `synterp` will be
evaluated in the synterp phase with the command arguments and attributes bound,
and its result will be bound to `synv` for the interp phase.

### TACTIC EXTEND

`TACTIC EXTEND` - adds new tactic (Ltac1) syntax by adding productions to `simple_tactic`.
For example:

```
TACTIC EXTEND field_lookup
| [ "field_lookup" tactic(f) "[" constr_list(lH) "]" ne_constr_list(lt) ] ->
      { let (t,l) = List.sep_last lt in field_lookup f lH l t }
END
```

adds a new tactic `field_lookup`.

New tactics should be added using this construct rather than `GRAMMAR EXTEND`.

Nonterminals in `VERNAC EXTEND` parsing rules
(eg `tactic`, `constr_list`, `ne_constr_list` in the example)
are arguments, possibly with [nonterminal modifiers](#nonterminal-modifiers).
The name prefixed by `wit_` must be available in the current OCaml scope.

TODO: explain DEPRECATED, LEVEL (not shown)

### ARGUMENT EXTEND

`ARGUMENT EXTEND` - defines a new argument which can be used in `TACTIC EXTEND` and `VERNAC EXTEND`

```
ARGUMENT EXTEND ast_closure_term
     TYPED AS type_info
     PRINTED BY { pp_ast_closure_term }
     INTERPRETED BY { interp_ast_closure_term }
     GLOBALIZED BY { glob_ast_closure_term }
     SUBSTITUTED BY { subst_ast_closure_term }
     RAW_PRINTED BY { pp_ast_closure_term }
     GLOB_PRINTED BY { pp_ast_closure_term }
  | [ term_annotation(a) constr(c) ] -> { mk_ast_closure_term a c }
END
```

See comments in `tacentries.mli` for partial information on the various
arguments.

Nonterminals in `ARGUMENT EXTEND` parsing rules (eg `term_annotation` and `constr` in the example)
are entry names, possibly with [nonterminal modifiers](#nonterminal-modifiers).

### VERNAC ARGUMENT EXTEND

`VERNAC ARGUMENT EXTEND` - (part of `argument_extend` in the yacc grammar) defines
productions for a single nonterminal which can be used in `VERNAC EXTEND`.  For example:

```
VERNAC ARGUMENT EXTEND ring_mod
  PRINTED BY { pr_ring_mod env sigma }
  | [ "decidable" constr(eq_test) ] -> { Ring_kind(Computational eq_test) }
  | [ "abstract" ] -> { Ring_kind Abstract }
  | [ "morphism" constr(morph) ] -> { Ring_kind(Morphism morph) }
  | [ "constants" "[" tactic(cst_tac) "]" ] -> { Const_tac(CstTac cst_tac) }
  | [ "closed" "[" ne_global_list(l) "]" ] -> { Const_tac(Closed l) }
  | [ "preprocess" "[" tactic(pre) "]" ] -> { Pre_tac pre }
  | [ "postprocess" "[" tactic(post) "]" ] -> { Post_tac post }
  | [ "setoid" constr(sth) constr(ext) ] -> { Setoid(sth,ext) }
  | [ "sign" constr(sign_spec) ] -> { Sign_spec sign_spec }
  | [ "power" constr(pow_spec) "[" ne_global_list(l) "]" ] ->
           { Pow_spec (Closed l, pow_spec) }
  | [ "power_tac" constr(pow_spec) "[" tactic(cst_tac) "]" ] ->
           { Pow_spec (CstTac cst_tac, pow_spec) }
  | [ "div" constr(div_spec) ] -> { Div_spec div_spec }
END
```

`PRINTED BY` specifies how the argument should be printed (used by
`-beautify`, `-time`, and various debug printers).
The `PRINTED BY` code has variables `env : Environ.env` and `sigma : Evd.evar_map` bound.

Nonterminals in `VERNAC ARGUMENT EXTEND` (e.g. `constr`, `tactic`, `ne_global_list` in the example)
are entry names, possibly with [nonterminal modifiers](#nonterminal-modifiers).

### DOC_GRAMMAR

`DOC_GRAMMAR` - Used in `doc_grammar`-generated files to permit simplified syntax

## Grammars: Nonterminals and Productions

Here's a simple nonterminal definition in the Camlp5 format:

  ```
  universe:
    [ [ IDENT "max"; "("; ids = LIST1 universe_expr SEP ","; ")" -> { ids }
      | u = universe_expr -> { [u] } ] ]
  ;
  ```

In which:
* `universe` is the nonterminal being defined
* productions are separated by `|` and, as a group, are enclosed in `[ [ … ] ];`
* `u = universe_expr` refers to the `universe_expr` nonterminal.  `u` is bound to
  the value returned by that nonterminal's action routine, which can be
  referred to in the action routine.  For `ids = LIST1 universe_expr SEP ","`,
  `ids` is bound to the list of values returned by `universe_expr`.
* `-> { … }` contains the OCaml action routine, which is executed when the production is recognized
  and returns a value
* Semicolons separate adjacent grammatical elements (nonterminals, strings or other constructs)

Grammatical elements that appear in productions are:

- nonterminal names - identifiers in the form `[a-zA-Z0-9_]*`.  These correspond to variables in
  the generated `.ml` code.  In some cases a qualified name, such as `Prim.name`, is used.
- `"…"` - a literal string that becomes a keyword and cannot be used as an `ident`.
  The string doesn't have to be a valid identifier; frequently the string will contain only
  punctuation characters.  Generally we try to avoid adding new keywords that are also valid
  identifiers--though there is an unresolved debate among the developers about whether having more
  such keywords in general is good (e.g. it makes it easier to highlight keywords in GUIs)
  or bad (more keywords for the user to avoid and new keywords may require changes to existing
  proof files).
- `IDENT "…"` - a literal string that has the form of an `ident` that doesn't become
  a keyword
- `OPT element` - optionally include `element` (e.g. a nonterminal, IDENT "…" or "…").
  The value is of type `'a option`.
- `LIST1 element` - a list of one or more `element`s.  The value is of type `'a list`.
- `LIST0 element` - an optional list of `element`s
- `LIST1 element SEP sep` - a list of `element`s separated by `sep`
- `LIST0 element SEP sep` - an optional list of `element`s separated by `sep`
- `( elements )` - grouping to represent a series of elements as a unit,
  useful within `OPT` and `LIST*`.
- `[ elements1 | elements2 | … ]` - alternatives (either `elements1` or `elements2` or …),
  actually nested productions, each of which can have its own action routines

Nonterminals can also be defined with multiple levels to specify precedence and associativity
of its productions.  This is described in the Rocq documentation under the `Print Grammar`
command.  The first square bracket around a nonterminal definition is for grouping
level definitions, which are separated with `|`, for example:

  ```
  ltac_expr:
      [ "5" RIGHTA
        [ te = binder_tactic -> { te } ]
      | "4" LEFTA
      :
  ```

Grammar extensions can specify what level they are modifying, for example:

  ```
  ltac_expr: LEVEL "1" [ RIGHTA
    [ tac = ltac_expr; intros = ssrintros_ne -> { tclintros_expr ~loc tac intros }
    ] ];
  ```

### Alternate production syntax ###

Except for `GRAMMAR EXTEND`, the `EXTEND` nodes in the `*.mlg`s use simplified syntax in
productions that's similar to what's used in the `Tactic Notation` and
`Ltac2 Notation` commands.  For example:

  ```
  TACTIC EXTEND cc
  | [ "congruence" ] -> { congruence_tac 1000 [] }
  | [ "congruence" integer(n) ] -> { congruence_tac n [] }
  | [ "congruence" "with" ne_constr_list(l) ] -> { congruence_tac 1000 l }
  | [ "congruence" integer(n) "with" ne_constr_list(l) ] ->
     { congruence_tac n l }
  END
  ```

For `VERNAC EXTEND` and `TACTIC EXTEND`, nonterminals appearing in the
alternate production syntax are "arguments" (`Genarg.genarg_type`) accessed by prefixing the name with `wit_`. Some commonly used arguments are defined in
`stdarg.ml`. Others are defined in the code generated by `ARGUMENT EXTEND`
and `VERNAC ARGUMENT EXTEND` constructs. References to
nonterminals that don't have `wit_*` symbols cause compilation errors.

The differences are:
* The outer `: [ … ];` is omitted.  Each production is enclosed in `| [ … ]`.
* The action routine is outside the square brackets
* Literal strings that are valid identifiers don't become reserved keywords
* No semicolons separating elements of the production
* `integer(n)` is used to bind a nonterminal value to a variable instead of `n = integer`
* [nonterminal modifiers](#nonterminal-modifiers) are used
* There's no way to define `LEVEL`s
* There's no equivalent to `( elements )` or `[ elements1 | elements2 | … ]`, which may
  require repeating similar syntax several times.  For example, this single production
  is equivalent to 8 productions in `TACTIC EXTEND` representing all possible expansions of
  three `OPT`s:

  ```
  | IDENT "Add"; IDENT "Parametric"; IDENT "Relation"; LIST0 binder; ":"; constr; constr;
          OPT [ IDENT "reflexivity"; IDENT "proved"; IDENT "by"; constr -> { … } ];
          OPT [ IDENT "symmetry"; IDENT "proved"; IDENT "by"; constr -> { … } ];
          OPT [ IDENT "transitivity"; IDENT "proved"; IDENT "by"; constr -> { … } ];
          IDENT "as"; ident -> { … }
  ```

### Nonterminal modifiers

In the alternate syntax nonterminal names may be mangled by adding
prefixes and suffixes to parse lists and options:

* `ne_entry_list` for `LIST1 entry`
* `entry_list` for `LIST0 entry`
* `ne_entry_list_sep(var, sep)` for `LIST1 entry SEP sep` where the list is bound to `var`
* `entry_list_sep(var, sep)` for `LIST0 entry SEP sep` where the list is bound to `var`
* `entry_opt` for `OPT entry`

This works regardless of whether the nonterminal is an entry or an
argument. The demangled entry or argument name must be available in
the current OCaml scope (with `wit_` prefix for arguments).

## Usage notes

### Other components

Rocq's lexer is in `clexer.ml`.  Its 10 token types are defined in `tok.ml`.

The parser is in `grammar.ml`.  The extensive use of GADT (generalized algebraic datatypes)
makes it harder for the uninitiated to understand it.

When the parser is invoked, the call tells the parser which nonterminal to parse.  `vernac_control`
is the start symbol for commands.  `tactic_mode` is the start symbol for tactics.
Tactics give syntax errors if Rocq is not in proof mode.  There are additional details
not mentioned here.

### Parsing productions

Some thoughts, not to be taken as identifying all the issues:

Since the parser examines only the next token to make a parsing decision (and perhaps
because of other potentially fixable limitations), some productions have to be ordered
or structured in a particular way to parse correctly in all cases.

For example, consider these productions:

  ```
  command: [ [
    | IDENT "Print"; p = printable -> { VernacPrint p }
    | IDENT "Print"; qid = smart_global; l = OPT univ_name_list -> { VernacPrint (PrintName (qid,l)) }
    | IDENT "Print"; IDENT "Module"; "Type"; qid = global ->
        { VernacPrint (PrintModuleType qid) }
    | IDENT "Print"; IDENT "Module"; qid = global ->
        { VernacPrint (PrintModule qid) }
    | IDENT "Print"; IDENT "Namespace" ; ns = dirpath ->
        { VernacPrint (PrintNamespace ns) }
    :

  printable:
    [ [ IDENT "Term"; qid = smart_global; l = OPT univ_name_list -> { PrintName (qid,l) }
      | IDENT "All" -> { PrintFullContext }
      | IDENT "Section"; s = global -> { PrintSectionContext s }
      :
  ```

Reversing the order of the first two productions in `command` causes the `All` in `Print All` to
be parsed incorrectly as a `smart_global`, making that command unavailable.  `Print Namespace nat.`
still works correctly, though.

Similarly, the production for `Print Module Type` has to appear before `Print Module <global>`
in order to be reachable.

Internally, the parser generates a tree that represents the possible prefixes for the
productions of a nonterminal as described in
[the Camlp5 documentation](http://camlp5.github.io/doc/htmlc/grammars.html#b:Rules-insertion).

Here's another example in which the way the productions are written matters.  `OPT` at
the beginning of a production doesn't always work well:

  ```
  command: [ [
      | IDENT "Foo"; n = natural -> { VernacBack 1 }
      | OPT (IDENT "ZZ"); IDENT "Foo" -> { VernacBack 1 }
      :
  ```

`Foo.` looks like it should be accepted, but it gives a parse error:

  ```
  Unnamed_thm < Foo.
  Toplevel input, characters 3-4:
  > Foo.
  >    ^
  Error:
  Syntax error: [prim:natural] expected after 'Foo' (in [vernac:command]).
  ```

Reversing the order of the productions doesn't help, but splitting
the 'OPT' production into 2 productions works:

  ```
      | IDENT "Foo" -> { VernacBack 1 }
      | IDENT "ZZ"; IDENT "Foo" -> { VernacBack 1 }
      | IDENT "Foo"; n = natural -> { VernacBack 1 }

  ```

On the other hand, `OPT` works just fine when the parser has already found the
right production.  For example `Back` and `Back <natural>` can be combined using
an `OPT`:

  ```
      | IDENT "Back"; n = OPT natural -> { VernacBack (Option.default 1 n) }
  ```

### Lookahead

It's possible to look ahead more than one symbol using OCaml code.  Generally we
avoid doing this unless there's a strong reason to do so.  For example, this
code defines a new nonterminal `local_test_lpar_id_colon` that checks that
the next 3 tokens are `"("` `ident` and `":"` without consuming any input:

  ```
  let local_test_lpar_id_colon =
    let open Procq.Lookahead in
    to_entry "lpar_id_colon" begin
      lk_kw "(" >> lk_ident >> lk_kw ":"
    end
  ```

This one checks that the next 2 tokens are `"["` and `"|"` with no space between.
This is a special case: intropatterns can have sequences like `"[|]"` that are
3 different tokens with empty nonterminals between them.  Making `"[|"` a keyword
would break existing code with "[|]":

  ```
  let test_array_opening =
    let open Procq.Lookahead in
    to_entry "test_array_opening" begin
      lk_kw "[" >> lk_kw "|" >> check_no_space
    end
  ```

TODO: how to add a tactic or command