1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<!-- $Id: parsers.html,v 6.4 2012-01-09 14:22:20 deraugla Exp $ -->
<!-- Copyright (c) INRIA 2007-2012 -->
<title>parsers</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<link rel="stylesheet" type="text/css" href="styles/base.css"
title="Normal" />
</head>
<body>
<div id="menu">
</div>
<div id="content">
<h1 class="top">Stream parsers</h1>
<p>We describe here the syntax and the semantics of the parsers of
streams of Camlp5. Streams are kinds of lazy lists. The parsers of
these streams use recursive descendent method without backtracking,
which is the most natural one in functional languages. In
particular, parsers are normal functions.</p>
<p>Notice that the parsers have existed in OCaml since many years (the
beginning of the 90ies), but some new features have been added in
2007 (lookahead, "no error" optimization, let..in statement and left
factorization) in Camlp5 distribution. This chapter describes them
also.</p>
<div id="tableofcontents">
</div>
<h2>Introduction</h2>
<p>Parsers apply to values of type "Stream.t" defined in the module
"Stream" of the standard library of OCaml. Like the type "list", the
type "Stream.t" has a type parameter, indicating the type of its
elements. They differ from the lists that they are lazy (the
elements are evaluated as long as the parser need them for its
actions), and imperative (parsers deletes their first elements when
they take their parsing decisions): notice that purely functional
parsers exist in Camlp5, where the corresponding streams are lazy
and functional, the analyzed elements remaining in the initial
stream and the semantic action returning the resulting stream
together with the normal result, which allow natural limited
backtrack but have the drawback that it is not easy to find the
position of parsing errors when they happen.</p>
<p>Parsers of lazy+imperative streams, which are described here, use a
method named "recursive descendent": they look at the first element,
they decide what to do in function of its value, and continue the
parsing with the remaining elements. Parsers can call other parsers,
and can be recursive, like normal functions.</p>
<p>Actually, parsers are just pure syntactic sugar. When writing a
parser in the syntax of the parser, Camlp5 transforms them into
normal call to functions, use of patterns matchings and try..with
statements. The pretty printer of Camlp5, by default, displays this
expanded result, without syntax of parsers. A pretty printing kit,
when added, can rebuild the parsers in their initial syntax and
display it.</p>
<h2>Syntax</h2>
<p>The syntax of the parsers, when loading "pa_rp.cmo" (or already
included in the command "camlp5r"), is the following:</p>
<pre>
expression ::= parser
| match-with-parser
parser ::= "parser" pos-opt "[" parser-cases "]"
| "parser" pos-opt parser-case
match-with-parser ::= "match" expression "with" parser
parser-cases ::= parser-cases parser-case
| <nothing>
parser-case ::= "[:" stream-pattern ":]" pos-opt "->" expression
stream-pattern ::= stream-patt-comp
| stream-patt-comp ";" stream-patt-cont
| "let" LIDENT "=" expression "in" stream-pattern
| <nothing>
stream-patt-cont ::= stream-patt-comp-err
| stream-patt-comp-err ";" stream-patt-cont
| "let" LIDENT "=" expression "in" stream-patt-cont
stream-patt-comp-err ::= stream-patt-comp
| stream-patt-comp "?" expression
| stream-patt-comp "!"
stream-patt-comp ::= "`" pattern
| "`" pattern "when" expression
| "?=" lookaheads
| pattern "=" expression
| pattern
lookaheads ::= lookaheads "|" lookahead
| lookahead
lookahead ::= "[" patterns "]"
patterns ::= patterns pattern
| pattern
pos-opt ::= pattern
| <nothing>
</pre>
<h2>Streams</h2>
<p>The parsers are functions taking streams as parameter. Streams are
are values of type "<code>Stream.t a</code>" for some type
"<code>a</code>". It is possible to build streams using the
functions defined in the module "<code>Stream</code>":</p>
<h3>Stream.from</h3>
<p>"<code>Stream.from f</code>" returns a stream built from the
function "<code>f</code>". To create a new stream element, the
function "<code>f</code>" is called with the current stream count,
starting with zero. The user function "<code>f</code>" must return
either "<code>Some <value></code>" for a value or
"<code>None</code>" to specify the end of the stream.</p>
<h3>Stream.of_list</h3>
<p>Return a stream built from the list in the same order.</p>
<h3>Stream.of_string</h3>
<p>Return a stream of the characters of the string parameter.</p>
<h3>Stream.of_channel</h3>
<p>Return a stream of the characters read from the input channel
parameter.</p>
<h2>Semantics of parsers</h2>
<h3>Parser</h3>
<p>A parser, defined with the syntax "parser" above, is of type
"<code>Stream.t a -> b</code>" where "a" is the type of the elements
of the streams and "b" the type of the result. The parser cases are
tested in the order they are defined until one of them applies. The
result is the semantic action of the parser case which applies. If
no parser case applies, the exception "<code>Stream.Failure</code>"
is raised.</p>
<p>When testing a parser case, if the first stream pattern component
matches, all remaining stream pattern components of the stream
pattern must match also. If one does not match, the parser raises
the exception "<code>Stream.Error</code>" which has a parameter of
type string: by default, this string is the empty string, but if the
stream pattern component which does not match is followed by a
question mark and an expression, this expression is evaluated and
given as parameter to "<code>Stream.Error</code>".</p>
<p>In short, a parser can return with three ways:</p>
<ul>
<li>A normal result, of type "<code>b</code>" for a parser of type
"<code>Stream.t a -> b</code>".</li>
<li>Raising the exception "<code>Stream.Failure</code>".</li>
<li>Raising the exception "<code>Stream.Error</code>".</li>
</ul>
<p>Fundamentally, the exception "<code>Stream.Failure</code>" means
"this parser does not apply and no element have been removed from
the initial stream". This is a normal case when parsing: the parser
locally fails, but the parsing can continue.</p>
<p>Conversely, the exception "<code>Stream.Error</code>" means that
"this parser encountered a syntax error and elements have probably
been removed from the stream". In this case, there is no way to
recover the parsing, and it definitively fails.</p>
<h3>Left factorization</h3>
<p>In parsers, <em>consecutive</em> rules starting with the same
components are left factorized. It means that they are transformed
into one only rule starting with the common path, and continuing
with a call to a parser separating the two cases. The order is
kept, except that the possible empty rule is inserted at the
end.</p>
<p>For example, the parser:</p>
<pre>
parser
[ [: `If; e1 = expr; `Then; e2 = expr; `Else; e3 = expr :] -> f e1 e2 e3
| [: `If; e1 = expr; `Then; e2 = expr :] -> g e1 e2 ]
</pre>
<p>is transformed into:</p>
<pre>
parser
[: `If; e1 = expr; `Then; e2 = expr;
a =
parser
[ [: `Else; e3 = expr :] -> f e1 e2 e3
| [: :] -> g e1 e2 ] :] -> a
</pre>
<p>The version where rules are inverted:</p>
<pre>
parser
[ [: `If; e1 = expr; `Then; e2 = expr :] -> g e1 e2
| [: `If; e1 = expr; `Then; e2 = expr; `Else; e3 = expr :] -> f e1 e2 e3 ]
</pre>
<p>is transformed into the same parser.</p>
<p>Notice that:
<ul>
<li>Only <em>consecutive</em> rules are left factorized. In the
following parser:
<pre>
parser
[ [: `If; e1 = expr; `Then; e2 = expr; `Else; e3 = expr :] -> ...
| [: a = b :] -> a
| [: `If; e1 = expr; `Then; e2 = expr :] -> ... ]
</pre>
the two rules starting with "<tt>If</tt>" are not left factorized,
and the second "<tt>If</tt>" rule will never work.
</li>
<li>The components which are not <em>identical</em> are not
factorized. In the following parser:
<pre>
parser
[ [: `If; e1 = expr; `Then; e2 = expr; `Else; e3 = expr :] -> ...
| [: `If; e4 = expr; `Then; e2 = expr :] -> ... ]
</pre>
only the first component, "<tt>If</tt>" is factorized, the second
one being different because of different patterns ("<tt>e1</tt>" and
"<tt>e4</tt>").
</li>
</ul>
<h3>Match with parser</h3>
<p>The syntax "match expression with parser" allows to match a stream
against a parser. It is, for "parser", the equivalent of "match
expression with" for "fun". The same way we could say:</p>
<pre>
match expression with ...
</pre>
<p>could be considered as an equivalent to:</p>
<pre>
(fun ...) expression
</pre>
<p>we could consider that:</p>
<pre>
match expression with parser ...
</pre>
<p>is an equivalent to:</p>
<pre>
(parser ...) expression
</pre>
<h3>Error messages</h3>
<p>A "<code>Stream.Error</code>" exception is raised when a stream
pattern component does not match and that it is not the first one of
the parser case. This exception has a parameter of type string,
useful to specify the error message. By default, this is the empty
string. To specify an error message, add a question mark and an
expression after the stream pattern component. A typical error
message is "that stream pattern component expected". Example with
the parser of "if..then..else.." above:</p>
<pre>
parser
[: `If; e1 = expr ? "expression expected after 'if'";
`Then ? "'then' expected";
e2 = expr ? "expression expected after 'then'";
a =
parser
[ [: `Else; e3 = expr ? "expression expected" :] -> f e1 e2 e3
| [: :] -> g e1 e2 ] :] -> a
</pre>
<p>Notice that the expression after the question mark is evaluated
only in case of syntax error. Therefore, it can be a complicated
call to a complicated function without slowing down the normal
parsing.</p>
<h3>Stream pattern component</h3>
<p>In a stream pattern (starting with "<code>[:</code>" and ending
with "<code>:]</code>"), the stream pattern components are separated
with the semicolon character. There are three cases of stream
pattern components with some sub-cases for some of them, and an
extra syntax can be used with a "let..in" construction. The three
cases are:</p>
<ul>
<li>A direct test of one or several stream elements
(called <b>terminal</b> symbol), in three ways:
<ol>
<li>The character "backquote" followed by a pattern, meaning: if
the stream starts with an element which is matched by this
pattern, the stream pattern component matches, and the stream
element is removed from the stream.</li>
<li>The character "backquote" followed by a pattern, the keyword
"when" and an expression of type "<code>bool</code>", meaning:
if the stream starts with an element which is matched by this
pattern and if the evaluation of the expression is
"<code>True</code>", the stream pattern component matches, and
the first element of the stream is removed.</li>
<li>The character "question mark" followed by the character
"equal" and a lookahead expression (see further), meaning: if
the lookahead applies, the stream pattern component
matches. The lookahead may unfreeze one or several elements on
the stream, but does not remove them.</li>
</ol>
</li>
<li>A pattern followed by the "equal" sign and an expression of type
"<code>Stream.t x -> y</code>" for some types "<code>x</code>" and
"<code>y</code>". This expression is called a <b>non terminal</b>
symbol. It means: call the expression (which is a parser) with the
current stream. If this sub-parser:
<ol>
<li>Returns an element, the pattern is bound to this result and
the next stream pattern component is tested.</li>
<li>Raises the exception "<code>Stream.Failure</code>", there
are two cases:
<ul>
<li>if the stream pattern component is the first one of the
stream case, the current parser also fails with the
exception "<code>Stream.Failure</code>".</li>
<li>if the stream pattern component is not the first one of
the stream case, the current parser fails with the
exception "<code>Stream.Error</code>".</li>
</ul>
In this second case:
<ul>
<li>If the stream pattern component is followed by a
"question mark" and an expression (which must be of type
"<code>string</code>"), the expression is evaluated and
given as parameter of the exception
"<tt>Stream.Error</tt>".</li>
<li>If the expression is followed by an "exclamation mark",
the test and conversion from "<code>Stream.Failure</code>"
to "<code>Stream.Error</code>" is not done, and the parser
just raises "<code>Stream.Failure</code>" again. This is
an optimization which must be assumed by the programmer,
in general when he knows that the sub-parser called never
raises "<code>Stream.Failure</code>" (for example if the
called parser ends with a parser case containing an empty
stream pattern). See "no error optionization" below.</li>
<li>Otherwise the exception parameter is the empty string.</li>
</ul>
</li>
</ol>
</li>
<li>A pattern, which is bound to the current stream.</li>
</ul>
<p>Notice that patterns are bound immediately and can be used in the
next stream pattern component.</p>
<h3>Let statement</h3>
<p>Between stream pattern components, it is possible to use the
"let..in" construction. This is not considered as a real stream
pattern component, in the fact that is is not tested against the
exception "<code>Stream.Failure</code>" it may raise. It can be
useful for intermediate computation. In particular, it is used
internally by the lexers (see chapter
about <a href="lexers.html">lexers</a> as character stream
parsers).</p>
<p>Example of use, when an expression have to be used several times
(in the example, "<code>d a</code>", which is bound to the variable
"<code>c</code>"):</p>
<pre>
parser
[: a = b;
let c = d a in
e =
parser
[ [: f = g :] -> h c
| [: :] -> c ] :] -> e
</pre>
<h3>Lookahead</h3>
<p>The lookahead feature allows to look at several terminals in the
stream without removing them, in order to take decisions when more
than one terminal is necessary.</p>
<p>For example, when parsing the normal syntax of the OCaml language,
there is a problem, in recursing descendent parsing, for the cases
where to treat and differentiate the following inputs:</p>
<pre>
(-x+1)
(-)
</pre>
<p>The first case is treated in a rule, telling: "a left parenthesis,
followed by an expression, and a right parenthesis". The second one
is "a left parenthesis, an operator, a right
parenthesis". Programming it like this (left factorizing the first
parenthesis):</p>
<pre>
parser
[: `Lparen;
e =
parser
[ [: e = expr; `Rparen :] -> e
| [: `Minus; `Rparen :] -> minus_op ] :] -> e
</pre>
<p>does not work if the input is "<code>(-)</code>" because the rule
"<code>e = expr</code>" accepts the minus sign as expression start,
removing it from the input stream and fails as parsing error, while
encountering the right parenthesis.</p>
<p>Conversely, writing it this way:</p>
<pre>
parser
[: `Lparen;
e =
parser
[ [: `Minus; `Rparen :] -> minus_op
| [: e = expr; `Rparen :] -> e ] :] -> e
</pre>
<p>does not help, because if the input is "<code>(-x+1)</code>" the
rule above starting with "<code>`Minus</code>" is accepted and the
exception "<code>Stream.Error</code>" is raised while encountering
the variable "<code>x</code>" since a right parenthesis is
expected.</p>
<p>In general, this kind of situation is best resolved by a left
factorization of the parser cases (see the section "Semantics"
above), but that is not possible in this case. The solution is to
test whether the character after the minus sign is a right
parenthesis:</p>
<pre>
parser
[: `Lparen;
e =
parser
[ [: ?= [ _ Rparen ]; `Minus; `Rparen :] -> minus_op
| [: e = expr; `Rparen :] -> e ] :] -> e
</pre>
<p>It is possible to put several lists of patterns separated by a
vertical bar in the lookahead construction, but with a limitation
(due to the implementation): all lists of patterns must have the
same number of elements.</p>
<h3>No error optimization</h3>
<p>The "no error optimization" is the fact to end a stream pattern
component of kind "non-terminal" ("pattern" "equal" "expression") by
the character "exclamation mark". Like said above, this inhibits the
transformation of the exception "<code>Stream.Failure</code>",
possibly raised by the called parser, into the exception
"<code>Stream.Error</code>".</p>
<p>The code:</p>
<pre>
parser [: a = b; c = d ! :] -> e
</pre>
<p>is equivalent to:</p>
<pre>
parser [: a = b; s :] -> let c = d s in e
</pre>
<p>One interest of the first syntax is that it shows to readers that
"<code>d</code>" is indeed a syntactic sub-parser. In the second
syntax, it is called in the semantic action, which makes the parser
case not so clear, as far as readability is concerned.</p>
<p>If the stream pattern component is at end of the stream pattern,
this allow possible tail recursion by the OCaml compiler, in the
following case:</p>
<pre>
parser [: a = b; c = d ! :] -> c
</pre>
<p>since it is equivalent (with the fact that "<code>c</code>" is at
the same time the pattern of the last case and the expression of the
parser case semantic action) to:</p>
<pre>
parser [: a = b; s :] -> d s
</pre>
<p>The call to "<code>d s</code>" can be a tail recursive
call. Without the use of the "exclamation mark" in the rule, the
equivalent code is:</p>
<pre>
parser [: a = b; s :] ->
try d s with [ Stream.Failure -> raise (Stream.Error "") ]
</pre>
<p>which is not tail recursive (due to the "try..with" construction
pushes a context), preventing the compiler to optimize its
code. This can be important when many recursive calls happen, since
it can overflow the OCaml stack.</p>
<h3>Position</h3>
<p>The optional "pattern" before and after a stream pattern is bound
to the current stream count. Indeed, streams internally contain a
count of their elements. At the beginning the count is zero. When an
element is removed, the count is incremented. The example:</p>
<pre>
parser [: a = b :] ep -> c
</pre>
<p>is equivalent to:</p>
<pre>
parser [: a = b; s :] -> let ep = Stream.count s in c
</pre>
<p>There is no direct syntax equivalent to the optional pattern at
beginning of the stream pattern:</p>
<pre>
parser bp [: a = b :] -> c
</pre>
<p>These optional patterns allow disposal of the stream count at the
beginning and at the end of the parser case, allowing to compute
locations of the rule in the source. In particular, if the stream is
a stream of characters, these counts are the source location in
number of characters.</p>
<h3>Semantic action</h3>
<p>In a parser case, after the stream pattern, there is an "arrow" and
an expression, called the "semantic action". If the parser case is
matched the parser returns with the evaluated expression whose
environment contains all values bound in the stream pattern.</p>
<h2>Remarks</h2>
<h3>Simplicity vs Associativity</h3>
<p>This parsing technology has the advantage of simplicity of use and
understanding, but it does not treat the associativity of
operators. For example, if you write a parser like this (to compute
arithmetic expressions):</p>
<pre>
value rec expr =
parser
[ [: e1 = expr; `'+'; e2 = expr :] -> e1 + e2
| [: `('0'..'9' as c) :] -> Char.code c - Char.code '0' ]
</pre>
<p>this would loop endlessly, exactly as if you wrote code starting
with:</p>
<pre>
value rec expr e =
let e1 = expr e in
...
</pre>
<p>One solution is to treat the associativity "by hand": by reading a
sub-expression, then looping with a parser which parses the operator
and another sub-expression, and so on.</p>
<p>An alternative solution is to write parsing "combinators". Indeed,
parsers being normal functions, it is possible to make a function
which takes a parser as parameter and returning a parser using
it. For example, left and right associativity parsing
combinators:</p>
<pre>
value rec left_assoc op elem =
let rec op_elem x =
parser
[ [: t = op; y = elem; r = op_elem (t x y) :] -> r
| [: :] -> x ]
in
parser [: x = elem; r = op_elem x :] -> r
;
value rec right_assoc op elem =
let rec op_elem x =
parser
[ [: t = op; y = elem; r = op_elem y :] -> t x r
| [: :] -> x ]
in
parser [: x = elem; r = op_elem x :] -> r
;
</pre>
<p>which can be used, e.g. like this:</p>
<pre>
value expr =
List.fold_right (fun op elem -> op elem)
[left_assoc (parser [: `'+' :] -> fun x y -> x +. y);
left_assoc (parser [: `'*' :] -> fun x y -> x *. y);
right_assoc (parser [: `'^' :] -> fun x y -> x ** y)]
(parser [: `('0'..'9' as c) :] -> float (Char.code c - Char.code '0'))
;
</pre>
<p>and tested, e.g. in the toplevel, like that:</p>
<pre>
expr (Stream.of_string "2^3^2+1");
</pre>
<p>The same way, it is possible to parse non-context free grammars, by
programming parsers returning other parsers.</p>
<p>A third solution, to resolve the problem of associativity, is to
use the grammars of Camlp5, which have the other advantage that they
are extensible.</p>
<h3>Lexing vs Parsing</h3>
<p>In general, while analyzing a language, there are two levels:</p>
<ul>
<li>The level where the input, considered as a stream of characters,
is read to make a stream of tokens (for example "words", if it is
a human language, or punctuation). This level is generally called
"lexing".</li>
<li>The level where the input is a stream of tokens where grammar
rules are parsed. This level is generally called "parsing".</li>
</ul>
<p>The "parser" construction described here can be used for both,
thanks to the polymorphism of OCaml:</p>
<ul>
<li>The lexing level is a "parser" of streams of characters
returning tokens.</li>
<li>The parsing level is a "parser" of streams of tokens returning
syntax trees.</li>
</ul>
<p>By comparison, the programs "lex" and "yacc" use two different
technologies. With "parser"s, it is possible to use the same one for
both.</p>
<h3>Lexer syntax vs Parser syntax</h3>
<p>For "lexers", i.e. for the specific case of parsers when the input
is a stream of characters, it is possible to use a shorter
syntax. See the chapter on <a href="lexers.html">lexers</a>. They
have another syntax, shorter and adapted for the specific type
"<code>char</code>". But they still are internally parsers of
streams with the same semantics.</p>
<h3>Purely functional parsers</h3>
<p>This system of parsers is imperative: while parsing, the stream
advances and the already parsed terminals disappear from the stream
structure. This is useful because it is not necessary to return the
remaining stream together with the normal result. This is the reason
there is this "<tt>Stream.Error</tt>" exception: when it is raised,
it means that some terminals have been consummed from the stream,
which are definitively lost, and therefore that are no more possible
parser cases to try.</p>
<p>An alternative is to use <a href="fparsers.html">functional
parsers</a> which use a new stream type, lazy but not
destructive. Their advantage is that they use a limited backtrack:
the case of "if..then..else.." and the shorter "if..then.." work
without having to left factorize the parser cases, and there is no
need to lookahead. They have no equivalent to the exception
"<code>Stream.Error</code>": when all cases are tested, and have
failed, the parsers return the value "<code>None</code>". The
drawback is that, when a parsing error happens, it is not easily
possible to know the location of the error in the input, as the
initial stream has not been modified: the system would indicate a
failure at the first character of the first line: this is a general
drawback of backtracking parsers. See the solutions found to this
problem in the chapter about <a href="fparsers.html">purely
functional parsers</a>.</p>
<p>A second alternative is to use the
<a href="bparsers.html">backtracking parsers</a>. They use the same
stream type as the functional parsers, but they test more cases than
them. They have the same advantages and drawbacks than the functional
parsers.</p>
<div class="trailer">
</div>
</div>
</body>
</html>
|