File: parsers.html

package info (click to toggle)
camlp5 6.06-1
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 7,428 kB
  • sloc: ml: 77,055; sh: 1,417; makefile: 1,211
file content (719 lines) | stat: -rw-r--r-- 25,903 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
 "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <!-- $Id: parsers.html,v 6.4 2012-01-09 14:22:20 deraugla Exp $ -->
  <!-- Copyright (c) INRIA 2007-2012 -->
  <title>parsers</title>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <link rel="stylesheet" type="text/css" href="styles/base.css"
        title="Normal" />
</head>
<body>

<div id="menu">
</div>

<div id="content">

<h1 class="top">Stream parsers</h1>

<p>We describe here the syntax and the semantics of the parsers of
  streams of Camlp5. Streams are kinds of lazy lists. The parsers of
  these streams use recursive descendent method without backtracking,
  which is the most natural one in functional languages. In
  particular, parsers are normal functions.</p>

<p>Notice that the parsers have existed in OCaml since many years (the
  beginning of the 90ies), but some new features have been added in
  2007 (lookahead, "no error" optimization, let..in statement and left
  factorization) in Camlp5 distribution. This chapter describes them
  also.</p>

<div id="tableofcontents">
</div>

<h2>Introduction</h2>

<p>Parsers apply to values of type "Stream.t" defined in the module
  "Stream" of the standard library of OCaml. Like the type "list", the
  type "Stream.t" has a type parameter, indicating the type of its
  elements. They differ from the lists that they are lazy (the
  elements are evaluated as long as the parser need them for its
  actions), and imperative (parsers deletes their first elements when
  they take their parsing decisions): notice that purely functional
  parsers exist in Camlp5, where the corresponding streams are lazy
  and functional, the analyzed elements remaining in the initial
  stream and the semantic action returning the resulting stream
  together with the normal result, which allow natural limited
  backtrack but have the drawback that it is not easy to find the
  position of parsing errors when they happen.</p>

<p>Parsers of lazy+imperative streams, which are described here, use a
  method named "recursive descendent": they look at the first element,
  they decide what to do in function of its value, and continue the
  parsing with the remaining elements. Parsers can call other parsers,
  and can be recursive, like normal functions.</p>

<p>Actually, parsers are just pure syntactic sugar. When writing a
  parser in the syntax of the parser, Camlp5 transforms them into
  normal call to functions, use of patterns matchings and try..with
  statements.  The pretty printer of Camlp5, by default, displays this
  expanded result, without syntax of parsers. A pretty printing kit,
  when added, can rebuild the parsers in their initial syntax and
  display it.</p>

<h2>Syntax</h2>

<p>The syntax of the parsers, when loading "pa_rp.cmo" (or already
  included in the command "camlp5r"), is the following:</p>

<pre>
            expression ::= parser
                         | match-with-parser
                parser ::= "parser" pos-opt "[" parser-cases "]"
                         | "parser" pos-opt parser-case
     match-with-parser ::= "match" expression "with" parser
          parser-cases ::= parser-cases parser-case
                         | &lt;nothing&gt;
           parser-case ::= "[:" stream-pattern ":]" pos-opt "->" expression
        stream-pattern ::= stream-patt-comp
                         | stream-patt-comp ";" stream-patt-cont
                         | "let" LIDENT "=" expression "in" stream-pattern
                         | &lt;nothing&gt;
      stream-patt-cont ::= stream-patt-comp-err
                         | stream-patt-comp-err ";" stream-patt-cont
                         | "let" LIDENT "=" expression "in" stream-patt-cont
  stream-patt-comp-err ::= stream-patt-comp
                         | stream-patt-comp "?" expression
                         | stream-patt-comp "!"
      stream-patt-comp ::= "`" pattern
                         | "`" pattern "when" expression
                         | "?=" lookaheads
                         | pattern "=" expression
                         | pattern
            lookaheads ::= lookaheads "|" lookahead
                         | lookahead
             lookahead ::= "[" patterns "]"
              patterns ::= patterns pattern
                         | pattern
               pos-opt ::= pattern
                         | &lt;nothing&gt;
</pre>

<h2>Streams</h2>

<p>The parsers are functions taking streams as parameter. Streams are
  are values of type "<code>Stream.t a</code>" for some type
  "<code>a</code>". It is possible to build streams using the
  functions defined in the module "<code>Stream</code>":</p>

<h3>Stream.from</h3>

<p>"<code>Stream.from f</code>" returns a stream built from the
  function "<code>f</code>". To create a new stream element, the
  function "<code>f</code>" is called with the current stream count,
  starting with zero. The user function "<code>f</code>" must return
  either "<code>Some &lt;value&gt;</code>" for a value or
  "<code>None</code>" to specify the end of the stream.</p>

<h3>Stream.of_list</h3>

<p>Return a stream built from the list in the same order.</p>

<h3>Stream.of_string</h3>

<p>Return a stream of the characters of the string parameter.</p>

<h3>Stream.of_channel</h3>

<p>Return a stream of the characters read from the input channel
  parameter.</p>

<h2>Semantics of parsers</h2>

<h3>Parser</h3>

<p>A parser, defined with the syntax "parser" above, is of type
  "<code>Stream.t a -> b</code>" where "a" is the type of the elements
  of the streams and "b" the type of the result. The parser cases are
  tested in the order they are defined until one of them applies. The
  result is the semantic action of the parser case which applies. If
  no parser case applies, the exception "<code>Stream.Failure</code>"
  is raised.</p>

<p>When testing a parser case, if the first stream pattern component
  matches, all remaining stream pattern components of the stream
  pattern must match also. If one does not match, the parser raises
  the exception "<code>Stream.Error</code>" which has a parameter of
  type string: by default, this string is the empty string, but if the
  stream pattern component which does not match is followed by a
  question mark and an expression, this expression is evaluated and
  given as parameter to "<code>Stream.Error</code>".</p>

<p>In short, a parser can return with three ways:</p>

<ul>
  <li>A normal result, of type "<code>b</code>" for a parser of type
    "<code>Stream.t a -> b</code>".</li>
  <li>Raising the exception "<code>Stream.Failure</code>".</li>
  <li>Raising the exception "<code>Stream.Error</code>".</li>
</ul>

<p>Fundamentally, the exception "<code>Stream.Failure</code>" means
  "this parser does not apply and no element have been removed from
  the initial stream". This is a normal case when parsing: the parser
  locally fails, but the parsing can continue.</p>

<p>Conversely, the exception "<code>Stream.Error</code>" means that
  "this parser encountered a syntax error and elements have probably
  been removed from the stream". In this case, there is no way to
  recover the parsing, and it definitively fails.</p>

<h3>Left factorization</h3>

<p>In parsers, <em>consecutive</em> rules starting with the same
  components are left factorized. It means that they are transformed
  into one only rule starting with the common path, and continuing
  with a call to a parser separating the two cases. The order is
  kept, except that the possible empty rule is inserted at the
  end.</p>

<p>For example, the parser:</p>

<pre>
  parser
  [ [: `If; e1 = expr; `Then; e2 = expr; `Else; e3 = expr :] -> f e1 e2 e3
  | [: `If; e1 = expr; `Then; e2 = expr :] -> g e1 e2 ]
</pre>

<p>is transformed into:</p>

<pre>
  parser
    [: `If; e1 = expr; `Then; e2 = expr;
       a =
         parser
         [ [: `Else; e3 = expr :] -> f e1 e2 e3
         | [: :] -> g e1 e2 ] :] -> a
</pre>

<p>The version where rules are inverted:</p>

<pre>
  parser
  [ [: `If; e1 = expr; `Then; e2 = expr :] -> g e1 e2
  | [: `If; e1 = expr; `Then; e2 = expr; `Else; e3 = expr :] -> f e1 e2 e3 ]
</pre>

<p>is transformed into the same parser.</p>

<p>Notice that:

<ul>
  <li>Only <em>consecutive</em> rules are left factorized. In the
    following parser:
<pre>
  parser
  [ [: `If; e1 = expr; `Then; e2 = expr; `Else; e3 = expr :] -> ...
  | [: a = b :] -> a
  | [: `If; e1 = expr; `Then; e2 = expr :] -> ... ]
</pre>
    the two rules starting with "<tt>If</tt>" are not left factorized,
    and the second "<tt>If</tt>" rule will never work.
  </li>
  <li>The components which are not <em>identical</em> are not
    factorized. In the following parser:
<pre>
  parser
  [ [: `If; e1 = expr; `Then; e2 = expr; `Else; e3 = expr :] -> ...
  | [: `If; e4 = expr; `Then; e2 = expr :] -> ... ]
</pre>
    only the first component, "<tt>If</tt>" is factorized, the second
    one being different because of different patterns ("<tt>e1</tt>" and
      "<tt>e4</tt>").
  </li>
</ul>

<h3>Match with parser</h3>

<p>The syntax "match expression with parser" allows to match a stream
  against a parser. It is, for "parser", the equivalent of "match
  expression with" for "fun". The same way we could say:</p>

<pre>
  match expression with ...
</pre>

<p>could be considered as an equivalent to:</p>

<pre>
  (fun ...) expression
</pre>

<p>we could consider that:</p>

<pre>
  match expression with parser ...
</pre>

<p>is an equivalent to:</p>

<pre>
  (parser ...) expression
</pre>

<h3>Error messages</h3>

<p>A "<code>Stream.Error</code>" exception is raised when a stream
  pattern component does not match and that it is not the first one of
  the parser case. This exception has a parameter of type string,
  useful to specify the error message. By default, this is the empty
  string. To specify an error message, add a question mark and an
  expression after the stream pattern component. A typical error
  message is "that stream pattern component expected". Example with
  the parser of "if..then..else.." above:</p>

<pre>
  parser
    [: `If; e1 = expr ? "expression expected after 'if'";
       `Then ? "'then' expected";
       e2 = expr ? "expression expected after 'then'";
       a =
         parser
         [ [: `Else; e3 = expr ? "expression expected" :] -> f e1 e2 e3
         | [: :] -> g e1 e2 ] :] -> a
</pre>

<p>Notice that the expression after the question mark is evaluated
  only in case of syntax error. Therefore, it can be a complicated
  call to a complicated function without slowing down the normal
  parsing.</p>


<h3>Stream pattern component</h3>

<p>In a stream pattern (starting with "<code>[:</code>" and ending
  with "<code>:]</code>"), the stream pattern components are separated
  with the semicolon character. There are three cases of stream
  pattern components with some sub-cases for some of them, and an
  extra syntax can be used with a "let..in" construction. The three
  cases are:</p>

<ul>
  <li>A direct test of one or several stream elements
    (called <b>terminal</b> symbol), in three ways:
    <ol>
      <li>The character "backquote" followed by a pattern, meaning: if
        the stream starts with an element which is matched by this
        pattern, the stream pattern component matches, and the stream
        element is removed from the stream.</li>
      <li>The character "backquote" followed by a pattern, the keyword
        "when" and an expression of type "<code>bool</code>", meaning:
        if the stream starts with an element which is matched by this
        pattern and if the evaluation of the expression is
        "<code>True</code>", the stream pattern component matches, and
        the first element of the stream is removed.</li>
      <li>The character "question mark" followed by the character
        "equal" and a lookahead expression (see further), meaning: if
        the lookahead applies, the stream pattern component
        matches. The lookahead may unfreeze one or several elements on
        the stream, but does not remove them.</li>
    </ol>
  </li>
  <li>A pattern followed by the "equal" sign and an expression of type
    "<code>Stream.t x -> y</code>" for some types "<code>x</code>" and
    "<code>y</code>". This expression is called a <b>non terminal</b>
    symbol. It means: call the expression (which is a parser) with the
    current stream. If this sub-parser:

    <ol>
      <li>Returns an element, the pattern is bound to this result and
        the next stream pattern component is tested.</li>
      <li>Raises the exception "<code>Stream.Failure</code>", there
        are two cases:

        <ul>
          <li>if the stream pattern component is the first one of the
            stream case, the current parser also fails with the
            exception "<code>Stream.Failure</code>".</li>
          <li>if the stream pattern component is not the first one of
            the stream case, the current parser fails with the
            exception "<code>Stream.Error</code>".</li>
        </ul>

        In this second case:

        <ul>
          <li>If the stream pattern component is followed by a
            "question mark" and an expression (which must be of type
            "<code>string</code>"), the expression is evaluated and
            given as parameter of the exception
            "<tt>Stream.Error</tt>".</li>
          <li>If the expression is followed by an "exclamation mark",
            the test and conversion from "<code>Stream.Failure</code>"
            to "<code>Stream.Error</code>" is not done, and the parser
            just raises "<code>Stream.Failure</code>" again. This is
            an optimization which must be assumed by the programmer,
            in general when he knows that the sub-parser called never
            raises "<code>Stream.Failure</code>" (for example if the
            called parser ends with a parser case containing an empty
            stream pattern). See "no error optionization" below.</li>
          <li>Otherwise the exception parameter is the empty string.</li>
        </ul>
      </li>
    </ol>
  </li>
  <li>A pattern, which is bound to the current stream.</li>
</ul>

<p>Notice that patterns are bound immediately and can be used in the
  next stream pattern component.</p>

<h3>Let statement</h3>

<p>Between stream pattern components, it is possible to use the
  "let..in" construction. This is not considered as a real stream
  pattern component, in the fact that is is not tested against the
  exception "<code>Stream.Failure</code>" it may raise. It can be
  useful for intermediate computation. In particular, it is used
  internally by the lexers (see chapter
  about <a href="lexers.html">lexers</a> as character stream
  parsers).</p>

<p>Example of use, when an expression have to be used several times
  (in the example, "<code>d a</code>", which is bound to the variable
  "<code>c</code>"):</p>

<pre>
  parser
    [: a = b;
       let c = d a in
       e =
         parser
         [ [: f = g :] -> h c
         | [: :] -> c ] :] -> e
</pre>

<h3>Lookahead</h3>

<p>The lookahead feature allows to look at several terminals in the
  stream without removing them, in order to take decisions when more
  than one terminal is necessary.</p>

<p>For example, when parsing the normal syntax of the OCaml language,
  there is a problem, in recursing descendent parsing, for the cases
  where to treat and differentiate the following inputs:</p>

<pre>
  (-x+1)
  (-)
</pre>

<p>The first case is treated in a rule, telling: "a left parenthesis,
  followed by an expression, and a right parenthesis". The second one
  is "a left parenthesis, an operator, a right
  parenthesis". Programming it like this (left factorizing the first
  parenthesis):</p>

<pre>
  parser
    [: `Lparen;
       e =
         parser
         [ [: e = expr; `Rparen :] -> e
         | [: `Minus; `Rparen :] -> minus_op ] :] -> e
</pre>

<p>does not work if the input is "<code>(-)</code>" because the rule
  "<code>e = expr</code>" accepts the minus sign as expression start,
  removing it from the input stream and fails as parsing error, while
  encountering the right parenthesis.</p>

<p>Conversely, writing it this way:</p>

<pre>
  parser
    [: `Lparen;
       e =
         parser
         [ [: `Minus; `Rparen :] -> minus_op
         | [: e = expr; `Rparen :] -> e ] :] -> e
</pre>

<p>does not help, because if the input is "<code>(-x+1)</code>" the
  rule above starting with "<code>`Minus</code>" is accepted and the
  exception "<code>Stream.Error</code>" is raised while encountering
  the variable "<code>x</code>" since a right parenthesis is
  expected.</p>

<p>In general, this kind of situation is best resolved by a left
  factorization of the parser cases (see the section "Semantics"
  above), but that is not possible in this case. The solution is to
  test whether the character after the minus sign is a right
  parenthesis:</p>

<pre>
  parser
    [: `Lparen;
       e =
         parser
         [ [: ?= [ _ Rparen ]; `Minus; `Rparen :] -> minus_op
         | [: e = expr; `Rparen :] -> e ] :] -> e
</pre>

<p>It is possible to put several lists of patterns separated by a
  vertical bar in the lookahead construction, but with a limitation
  (due to the implementation): all lists of patterns must have the
  same number of elements.</p>

<h3>No error optimization</h3>

<p>The "no error optimization" is the fact to end a stream pattern
  component of kind "non-terminal" ("pattern" "equal" "expression") by
  the character "exclamation mark". Like said above, this inhibits the
  transformation of the exception "<code>Stream.Failure</code>",
  possibly raised by the called parser, into the exception
  "<code>Stream.Error</code>".</p>

<p>The code:</p>

<pre>
  parser [: a = b; c = d ! :] -> e
</pre>

<p>is equivalent to:</p>

<pre>
  parser [: a = b; s :] -> let c = d s in e
</pre>

<p>One interest of the first syntax is that it shows to readers that
  "<code>d</code>" is indeed a syntactic sub-parser. In the second
  syntax, it is called in the semantic action, which makes the parser
  case not so clear, as far as readability is concerned.</p>

<p>If the stream pattern component is at end of the stream pattern,
  this allow possible tail recursion by the OCaml compiler, in the
  following case:</p>

<pre>
  parser [: a = b; c = d ! :] -> c
</pre>

<p>since it is equivalent (with the fact that "<code>c</code>" is at
  the same time the pattern of the last case and the expression of the
  parser case semantic action) to:</p>

<pre>
  parser [: a = b; s :] -> d s
</pre>

<p>The call to "<code>d s</code>" can be a tail recursive
  call. Without the use of the "exclamation mark" in the rule, the
  equivalent code is:</p>

<pre>
  parser [: a = b; s :] ->
    try d s with [ Stream.Failure -> raise (Stream.Error "") ]
</pre>

<p>which is not tail recursive (due to the "try..with" construction
  pushes a context), preventing the compiler to optimize its
  code. This can be important when many recursive calls happen, since
  it can overflow the OCaml stack.</p>

<h3>Position</h3>

<p>The optional "pattern" before and after a stream pattern is bound
  to the current stream count. Indeed, streams internally contain a
  count of their elements. At the beginning the count is zero. When an
  element is removed, the count is incremented. The example:</p>

<pre>
  parser [: a = b :] ep -> c
</pre>

<p>is equivalent to:</p>

<pre>
  parser [: a = b; s :] -> let ep = Stream.count s in c
</pre>

<p>There is no direct syntax equivalent to the optional pattern at
  beginning of the stream pattern:</p>

<pre>
  parser bp [: a = b :] -> c
</pre>

<p>These optional patterns allow disposal of the stream count at the
  beginning and at the end of the parser case, allowing to compute
  locations of the rule in the source. In particular, if the stream is
  a stream of characters, these counts are the source location in
  number of characters.</p>

<h3>Semantic action</h3>

<p>In a parser case, after the stream pattern, there is an "arrow" and
  an expression, called the "semantic action". If the parser case is
  matched the parser returns with the evaluated expression whose
  environment contains all values bound in the stream pattern.</p>

<h2>Remarks</h2>

<h3>Simplicity vs Associativity</h3>

<p>This parsing technology has the advantage of simplicity of use and
  understanding, but it does not treat the associativity of
  operators. For example, if you write a parser like this (to compute
  arithmetic expressions):</p>

<pre>
  value rec expr =
    parser
    [ [: e1 = expr; `'+'; e2 = expr :] -> e1 + e2
    | [: `('0'..'9' as c) :] -> Char.code c - Char.code '0' ]
</pre>

<p>this would loop endlessly, exactly as if you wrote code starting
  with:</p>

<pre>
  value rec expr e =
    let e1 = expr e in
    ...
</pre>

<p>One solution is to treat the associativity "by hand": by reading a
  sub-expression, then looping with a parser which parses the operator
  and another sub-expression, and so on.</p>

<p>An alternative solution is to write parsing "combinators". Indeed,
  parsers being normal functions, it is possible to make a function
  which takes a parser as parameter and returning a parser using
  it. For example, left and right associativity parsing
  combinators:</p>

<pre>
  value rec left_assoc op elem =
    let rec op_elem x =
      parser
      [ [: t = op; y = elem; r = op_elem (t x y) :] -> r
      | [: :] -> x ]
    in
    parser [: x = elem; r = op_elem x :] -> r
  ;

  value rec right_assoc op elem =
    let rec op_elem x =
      parser
      [ [: t = op; y = elem; r = op_elem y :] -> t x r
      | [: :] -> x ]
    in
    parser [: x = elem; r = op_elem x :] -> r
  ;
</pre>

<p>which can be used, e.g. like this:</p>

<pre>
  value expr =
    List.fold_right (fun op elem -> op elem)
      [left_assoc (parser [: `'+' :] -> fun x y -> x +. y);
       left_assoc (parser [: `'*' :] -> fun x y -> x *. y);
       right_assoc (parser [: `'^' :] -> fun x y -> x ** y)]
      (parser [: `('0'..'9' as c) :] -> float (Char.code c - Char.code '0'))
  ;
</pre>

<p>and tested, e.g. in the toplevel, like that:</p>

<pre>
  expr (Stream.of_string "2^3^2+1");
</pre>

<p>The same way, it is possible to parse non-context free grammars, by
  programming parsers returning other parsers.</p>

<p>A third solution, to resolve the problem of associativity, is to
  use the grammars of Camlp5, which have the other advantage that they
  are extensible.</p>

<h3>Lexing vs Parsing</h3>

<p>In general, while analyzing a language, there are two levels:</p>

<ul>
  <li>The level where the input, considered as a stream of characters,
    is read to make a stream of tokens (for example "words", if it is
    a human language, or punctuation). This level is generally called
    "lexing".</li>
  <li>The level where the input is a stream of tokens where grammar
    rules are parsed. This level is generally called "parsing".</li>
</ul>

<p>The "parser" construction described here can be used for both,
  thanks to the polymorphism of OCaml:</p>

<ul>
  <li>The lexing level is a "parser" of streams of characters
    returning tokens.</li>
  <li>The parsing level is a "parser" of streams of tokens returning
    syntax trees.</li>
</ul>

<p>By comparison, the programs "lex" and "yacc" use two different
  technologies. With "parser"s, it is possible to use the same one for
  both.</p>

<h3>Lexer syntax vs Parser syntax</h3>

<p>For "lexers", i.e. for the specific case of parsers when the input
  is a stream of characters, it is possible to use a shorter
  syntax. See the chapter on <a href="lexers.html">lexers</a>. They
  have another syntax, shorter and adapted for the specific type
  "<code>char</code>". But they still are internally parsers of
  streams with the same semantics.</p>

<h3>Purely functional parsers</h3>

<p>This system of parsers is imperative: while parsing, the stream
  advances and the already parsed terminals disappear from the stream
  structure. This is useful because it is not necessary to return the
  remaining stream together with the normal result. This is the reason
  there is this "<tt>Stream.Error</tt>" exception: when it is raised,
  it means that some terminals have been consummed from the stream,
  which are definitively lost, and therefore that are no more possible
  parser cases to try.</p>

<p>An alternative is to use <a href="fparsers.html">functional
  parsers</a> which use a new stream type, lazy but not
  destructive. Their advantage is that they use a limited backtrack:
  the case of "if..then..else.." and the shorter "if..then.." work
  without having to left factorize the parser cases, and there is no
  need to lookahead. They have no equivalent to the exception
  "<code>Stream.Error</code>": when all cases are tested, and have
  failed, the parsers return the value "<code>None</code>". The
  drawback is that, when a parsing error happens, it is not easily
  possible to know the location of the error in the input, as the
  initial stream has not been modified: the system would indicate a
  failure at the first character of the first line: this is a general
  drawback of backtracking parsers. See the solutions found to this
  problem in the chapter about <a href="fparsers.html">purely
  functional parsers</a>.</p>

<p>A second alternative is to use the
  <a href="bparsers.html">backtracking parsers</a>. They use the same
  stream type as the functional parsers, but they test more cases than
  them. They have the same advantages and drawbacks than the functional
  parsers.</p>

<div class="trailer">
</div>

</div>

</body>
</html>