1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 2.0">
<title>Geyacc: Parser Algorithm</title>
</head>
<body bgcolor="#FFFFFF">
<table border="0" width="100%">
<tr>
<td><font size="6"><strong>Parser Algorithm</strong></font></td>
<td align="right"><a href="options.html"><img
src="../image/previous.gif" alt="Previous" border="0"
width="40" height="40"></a><a href="precedence.html"><img
src="../image/next.gif" alt="Next" border="0" width="40"
height="40"></a></td>
</tr>
</table>
<hr size="1">
<p>As <em>geyacc</em> reads tokens, it pushes them onto a stack
along with their semantic values. The stack is called the <em>parser
stack</em>. Pushing a token is traditionally called <em><strong>shifting</strong></em>.
For example, suppose the infix calculator has read <font
color="#808000"><tt>1 + 5 *</tt></font>, with a <font
color="#808000"><tt>3</tt></font> to come. The stack will have
four elements, one for each token that was shifted.</p>
<p>But the stack does not always have an element for each token
read. When the last <font size="2" face="Courier New">N</font>
tokens and groupings shifted match the components of a grammar
rule, they can be combined according to that rule. This is called
<em><strong>reduction</strong></em>. Those tokens and groupings
are replaced on the stack by a single grouping whose symbol is
the result (left hand side) of that rule. Running the rule's
action is part of the process of reduction, because this is what
computes the semantic value of the resulting grouping. For
example, if the infix calculator's parser stack contains <font
color="#808000"><tt>1 + 5 * 3</tt></font> and the next input
token is a newline character, then the last three elements can be
reduced to <font color="#808000"><tt>15</tt></font> via the rule:</p>
<blockquote>
<pre><font color="#800080">expr</font><font color="#0000FF">:</font> <font
color="#800080">expr</font> <font color="#FF0000">'*'</font> <font
color="#800080">expr</font><font color="#0000FF"> ;</font></pre>
</blockquote>
<p>Then the stack contains just these three elements <font
color="#808000"><tt>1 + 15</tt></font>. At this point, another
reduction can be made, resulting in the single value <font
color="#808000"><tt>16</tt></font>. Then the newline token can be
shifted.</p>
<p>The parser tries, by shifts and reductions, to reduce the
entire input down to a single grouping whose symbol is the
grammar's <a href="introduction.html#start_symbol">start symbol</a>.
This kind of parser is known in the literature as a <em>bottom-up
parser</em>.</p>
<h2><a name="look_ahead">Look-Ahead Tokens</a></h2>
<p>The <em>geyacc</em> parser does not always reduce immediately
as soon as the last <tt>N</tt> tokens and groupings match a rule.
This is because such a simple strategy is inadequate to handle
most languages. Instead, when a reduction is possible, the parser
sometimes <em>looks ahead</em> at the next token in order to
decide what to do.</p>
<p>When a token is read, it is not immediately shifted; first it
becomes the<em> </em><em><strong>look-ahead token</strong></em>,
which is not on the stack. Now the parser can perform one or more
reductions of tokens and groupings on the stack, while the
look-ahead token remains off to the side. When no more reductions
should take place, the look-ahead token is shifted onto the
stack. This does not mean that all possible reductions have been
done; depending on the token type of the look-ahead token, some
rules may choose to delay their application.</p>
<p>Here is a simple case where look-ahead is needed. These three
rules define expressions which contain binary addition operators
and postfix unary factorial operators <font color="#808000"
size="4"><tt>!</tt></font>, and allow parentheses for grouping.</p>
<blockquote>
<pre><font color="#800080">expr</font><font color="#0000FF">:</font> <font
color="#800080">term</font> <font color="#FF0000">'+'</font> <font
color="#800080">expr</font>
<font color="#0000FF">|</font> <font color="#800080">term</font>
<font color="#0000FF">;</font>
<font color="#800080">term</font><font color="#0000FF">:</font> <font
color="#FF0000">'('</font> <font color="#800080">expr</font> <font
color="#FF0000">')'</font>
<font color="#0000FF">|</font> <font color="#800080">term</font> <font
color="#FF0000">'!'</font>
<font color="#0000FF">|</font> <font color="#FF0000">NUMBER</font>
<font color="#0000FF">;</font></pre>
</blockquote>
<p>Suppose that the tokens <font color="#808000"><tt>1 + 2</tt></font>
have been read and shifted; what should be done? If the following
token is <font color="#808000"><tt>)</tt></font>, then the first
three tokens must be reduced to form an <font color="#800080"><tt>expr</tt></font>.
This is the only valid course, because shifting the<font
color="#808000" size="2" face="Courier New"> </font><font
color="#808000"><tt>)</tt></font><font color="#808000" size="2"
face="Courier New"> </font>would produce a sequence of symbols <font
color="#800080"><tt>term</tt></font><font color="#808000"><tt> )</tt></font>,
and no rule allows this. If the following token is <font
color="#808000" size="4"><tt>!</tt></font>, then it must be
shifted immediately so that <font color="#808000"><tt>2 </tt></font><font
color="#808000" size="4"><tt>!</tt></font> can be reduced to make
a <font color="#800080"><tt>term</tt></font>. If instead the
parser were to reduce before shifting, <font color="#808000"><tt>1
+ 2</tt></font> would become an <font color="#800080"><tt>expr</tt></font>.
It would then be impossible to shift the <font color="#808000"
size="4"><tt>!</tt></font> because doing so would produce on the
stack the sequence of symbols <font color="#800080"><tt>expr</tt></font><tt>
</tt><font color="#808000" size="4"><tt>!</tt></font>. No rule
allows that sequence.</p>
<p>The current look-ahead token is stored in the variable <a
href="skeleton.html#last_token"><font color="#008080" size="2"
face="Courier New"><em>last_token</em></font></a>.</p>
<h2><a name="shift_reduce">Shift/Reduce Conflicts</a></h2>
<p>Suppose we are parsing a language which has if-then and
if-then-else statements, with a pair of rules like this:</p>
<blockquote>
<pre><font color="#800080">if_stmt</font><font
color="#0000FF">:</font> <font color="#FF0000">IF</font> <font
color="#800080">expr</font> <font color="#FF0000">THEN</font> <font
color="#800080">stmt</font>
<font color="#0000FF">|</font> <font color="#FF0000">IF</font> <font
color="#800080">expr</font> <font color="#FF0000">THEN</font> <font
color="#800080">stmt</font> <font color="#FF0000">ELSE</font> <font
color="#800080">stmt</font>
<font color="#0000FF">;</font></pre>
</blockquote>
<p>Here we assume that <font color="#FF0000"><tt>IF</tt></font>, <font
color="#FF0000"><tt>THEN</tt></font> and <font color="#FF0000"><tt>ELSE</tt></font>
are terminal symbols for specific keyword tokens. When the <font
color="#FF0000"><tt>ELSE</tt></font> token is read and becomes
the look-ahead token, the contents of the stack (assuming the
input is valid) are just right for reduction by the first rule.
But it is also legitimate to shift the <font color="#FF0000"><tt>ELSE</tt></font>,
because that would lead to eventual reduction by the second rule.</p>
<p>This situation, where either a shift or a reduction would be
valid, is called a <em><strong>shift/reduce conflict</strong></em>.
<em>Geyacc</em> is designed to resolve these conflicts by
choosing to shift, unless otherwise directed by operator
precedence declarations. To see the reason for this, let's
contrast it with the other alternative. Since the parser prefers
to shift the <font color="#FF0000"><tt>ELSE</tt></font>, the
result is to attach the else-clause to the innermost
if-statement, making these two inputs equivalent:</p>
<blockquote>
<pre><font color="#808000"><strong>if</strong> x <strong>then</strong> <strong>if</strong> y <strong>then</strong> win (); <strong>else</strong> lose;
<strong>if</strong> x <strong>then</strong>
<strong>do</strong> <strong>
if</strong> y <strong>then</strong> win (); <strong>else</strong> lose;
<strong>end</strong>;</font></pre>
</blockquote>
<p>But if the parser chose to reduce when possible rather than
shift, the result would be to attach the else-clause to the
outermost if-statement, making these two inputs equivalent:</p>
<blockquote>
<pre><font color="#808000"><strong>if</strong> x <strong>then</strong> <strong>if</strong> y <strong>then</strong> win (); <strong>else</strong> lose;
<strong>if</strong> x <strong>then</strong>
<strong>do</strong>
<strong>if</strong> y <strong>then</strong> win ();
<strong>end</strong>;
<strong>else</strong>
lose;</font></pre>
</blockquote>
<p>The conflict exists because the grammar as written is
ambiguous: either parsing of the simple nested if-statement is
legitimate. The established convention is that these ambiguities
are resolved by attaching the else-clause to the innermost
if-statement; this is what <em>geyacc</em> accomplishes by
choosing to shift rather than reduce. (It would ideally be
cleaner to write an unambiguous grammar, but that is very hard to
do in this case.) This particular ambiguity was first encountered
in the specifications of Algol 60 and is called the <em>dangling
`else'</em> ambiguity.</p>
<p>To avoid warnings from <em>geyacc</em> about predictable,
legitimate shift/reduce conflicts, use the <a
href="declarations.html#expect"><font color="#0000FF"><tt>%expect</tt></font></a><font
color="#0000FF"><tt> N</tt></font> declaration. There will be no
warning as long as the number of shift/reduce conflicts is
exactly <tt>N</tt>.</p>
<p>The definition of <font color="#800080"><tt>if_stmt</tt></font>
above is solely to blame for the conflict, but the conflict does
not actually appear without additional rules. Here is a complete <em>geyacc</em>
input file that actually manifests the conflict:</p>
<blockquote>
<pre><font color="#0000FF">%token</font> <font
color="#FF0000">IF THEN ELSE VARIABLE</font>
<font color="#0000FF">%%</font>
<font color="#800080">stmt</font><font color="#0000FF">:</font> <font
color="#800080">expr</font>
<font color="#0000FF">|</font> <font color="#800080">if_stmt</font>
<font color="#0000FF">;</font>
<font color="#800080">if_stmt</font>: <font color="#FF0000">IF</font> <font
color="#800080">expr</font> <font color="#FF0000">THEN</font> <font
color="#800080">stmt</font>
<font color="#0000FF">|</font> <font color="#FF0000">IF</font> <font
color="#800080">expr</font> <font color="#FF0000">THEN</font> <font
color="#800080">stmt</font> <font color="#FF0000">ELSE</font> <font
color="#800080">stmt</font>
<font color="#0000FF">;</font>
<font color="#800080">expr</font><font color="#0000FF">:</font> <font
color="#FF0000">VARIABLE</font>
<font color="#0000FF">;</font></pre>
</blockquote>
<p>Another situation where shift/reduce conflicts appear is in
arithmetic expressions. Here shifting is not always the preferred
resolution; the <em>geyacc</em> declarations for operator <a
href="precedence.html">precedence</a> allow you to specify when
to shift and when to reduce.</p>
<h2><a name="states">Parser States</a></h2>
<p>The routine <font color="#008080"><em><tt>parse</tt></em></font>
from class <font color="#008080"><em><tt>YY_PARSER_SKELETON</tt></em></font>
is implemented using a finite-state machine. The values pushed on
the parser stack are not simply token type codes; they represent
the entire sequence of terminal and nonterminal symbols at or
near the top of the stack. The current state collects all the
information about previous input which is relevant to deciding
what to do next.</p>
<p>Each time a look-ahead token is read, the current parser state
together with the type of look-ahead token are looked up in a
table. This table entry can say, "Shift the look-ahead
token". In this case, it also specifies the new parser
state, which is pushed onto the top of the parser stack. Or it
can say, "Reduce using rule number <tt>N</tt>". This
means that a certain number of tokens or groupings are taken off
the top of the stack, and replaced by one grouping. In other
words, that number of states are popped from the stack, and one
new state is pushed.</p>
<p>There is one other alternative: the table can say that the
look-ahead token is erroneous in the current state. This causes <a
href="error.html">error processing</a> to begin.</p>
<h2><a name="reduce_reduce">Reduce/Reduce Conflicts</a></h2>
<p>A <em><strong>reduce/reduce conflict</strong></em> occurs if
there are two or more rules that apply to the same sequence of
input. This usually indicates a serious error in the grammar. For
example, here is an erroneous attempt to define a sequence of
zero or more <font color="#800080"><tt>word</tt></font>
groupings.</p>
<blockquote>
<pre><font color="#800080">sequence</font><font
color="#0000FF">:</font> <font color="#008080">-- Empty</font>
<font color="#0000FF">{</font> <font color="#008080"><em>print </em>(<em>"empty sequence%N"</em>)</font> <font
color="#0000FF">}</font>
<font color="#0000FF">|</font> <font color="#800080">maybeword</font>
<font color="#0000FF">|</font> <font color="#800080">sequence word</font>
<font color="#0000FF">{</font>
<font color="#008080"><em>print </em>(<em>"added word "</em>)<em>
print </em>(</font><font color="#0000FF">$2</font><font
color="#008080">)<em>
print </em>(<em>'%N'</em>)</font>
<font color="#0000FF">}</font>
<font color="#0000FF"> ;</font>
<font color="#800080">maybeword</font><font color="#0000FF">:</font> <font
color="#008080">-- Empty</font>
<font color="#0000FF">{</font> <font color="#008080"><em>print </em>(<em>"empty maybeword%N"</em>)</font> <font
color="#0000FF">}
| </font><font color="#800080">word</font><font
color="#0000FF">
{
</font><font color="#008080"><em>print </em>(<em>"single word "</em>)<em>
print </em>(</font><font color="#0000FF">$1</font><font
color="#008080">)<em>
print </em>(<em>'%N'</em>)</font><font
color="#0000FF">
}
;</font></pre>
</blockquote>
<p>The error is an ambiguity: there is more than one way to parse
a single <font color="#800080"><tt>word</tt></font> into a <font
color="#800080"><tt>sequence</tt></font>. It could be reduced to
a <font color="#800080"><tt>maybeword</tt></font> and then into a
<font color="#800080"><tt>sequence</tt></font> via the second
rule. Alternatively, nothing-at-all could be reduced into a <font
color="#800080"><tt>sequence</tt></font> via the first rule, and
this could be combined with the <font color="#800080"><tt>word</tt></font>
using the third rule for <font color="#800080"><tt>sequence</tt></font>.</p>
<p>There is also more than one way to reduce nothing-at-all into
a <font color="#800080"><tt>sequence</tt></font>. This can be
done directly via the first rule, or indirectly via <font
color="#800080"><tt>maybeword</tt></font> and then the second
rule. You might think that this is a distinction without a
difference, because it does not change whether any particular
input is valid or not. But it does affect which actions are run.
One parsing order runs the second rule's action; the other runs
the first rule's action and the third rule's action. In this
example, the output of the program changes.</p>
<p><em>Geyacc</em> resolves a reduce/reduce conflict by choosing
to use the rule that appears first in the grammar, but it is very
risky to rely on this. Every reduce/reduce conflict must be
studied and usually eliminated. Here is the proper way to define <font
color="#800080"><tt>sequence</tt></font>:</p>
<blockquote>
<pre><font color="#800080">sequence</font><font
color="#0000FF">: </font><font color="#008080">-- Empty</font><font
color="#0000FF">
{ </font><font color="#008080"><em>print </em>(<em>"empty sequence%N"</em>)</font><font
color="#0000FF"> }
| </font><font color="#800080">sequence word</font><font
color="#0000FF">
{
</font><font color="#008080"><em>print </em>(<em>"added word "</em>)<em>
print </em>(</font><font color="#0000FF">$2</font><font
color="#008080">)<em>
print </em>(<em>'%N'</em>)</font><font
color="#0000FF">
}
;</font></pre>
</blockquote>
<p>Here is another common error that yields a reduce/reduce
conflict:</p>
<blockquote>
<pre><font color="#800080">sequence</font><font
color="#0000FF">: </font><font color="#008080">-- Empty</font><font
color="#0000FF">
| </font><font color="#800080">sequence words</font><font
color="#0000FF">
| </font><font color="#800080">sequence redirects</font><font
color="#0000FF">
;
</font><font color="#800080">words</font><font color="#0000FF">: </font><font
color="#008080">-- Empty</font><font color="#0000FF">
| </font><font color="#800080">words word</font><font
color="#0000FF">
;
</font><font color="#800080">redirects</font><font
color="#0000FF">: </font><font color="#008080">-- Empty</font><font
color="#0000FF">
| </font><font color="#800080">redirects redirect</font><font
color="#0000FF">
;</font></pre>
</blockquote>
<p>The intention here is to define a sequence which can contain <font
color="#800080"><tt>word</tt></font> and/or <font color="#800080"><tt>redirect</tt></font>
groupings. The individual definitions of <font color="#800080"><tt>sequence</tt></font>,
<font color="#800080"><tt>words</tt></font> and <font
color="#800080"><tt>redirects</tt></font> are error-free, but the
three together make a subtle ambiguity: even an empty input can
be parsed in infinitely many ways! Consider: nothing-at-all could
be a <font color="#800080"><tt>words</tt></font>. Or it could be
two <font color="#800080"><tt>words</tt></font> in a row, or
three, or any number. It could equally well be a <font
color="#800080"><tt>redirects</tt></font>, or two, or any number.
Or it could be a <font color="#800080"><tt>words</tt></font>
followed by three <font color="#800080"><tt>redirects</tt></font>
and another <font color="#800080"><tt>words</tt></font>. And so
on.</p>
<p>Here are two ways to correct these rules. First, to make it a
single level of sequence:</p>
<blockquote>
<pre><font color="#800080">sequence</font><font
color="#0000FF">:</font><font color="#008080"> -- Empty</font><font
color="#0000FF">
| </font><font color="#800080">sequence word</font><font
color="#0000FF">
| </font><font color="#800080">sequence redirect</font><font
color="#0000FF">
;</font></pre>
</blockquote>
<p>Second, to prevent either a <font color="#800080"><tt>words</tt></font>
or a <font color="#800080"><tt>redirects</tt></font> from being
empty:</p>
<blockquote>
<pre><font color="#800080">sequence</font><font
color="#0000FF">: </font><font color="#008080">-- Empty</font><font
color="#0000FF">
| </font><font color="#800080">sequence words</font><font
color="#0000FF">
| </font><font color="#800080">sequence redirects</font><font
color="#0000FF">
;
</font><font color="#800080">words</font><font color="#0000FF">: </font><font
color="#800080">word</font><font color="#0000FF">
| </font><font color="#800080">words word</font><font
color="#0000FF">
;
</font><font color="#800080">redirects</font><font
color="#0000FF">: </font><font color="#800080">redirect</font><font
color="#0000FF">
| </font><font color="#800080">redirects redirect</font><font
color="#0000FF">
;</font></pre>
</blockquote>
<h2><a name="mysterious_reduce_reduce">Mysterious Reduce/Reduce
Conflicts</a></h2>
<p>Sometimes reduce/reduce conflicts can occur that don't look
warranted. Here is an example:</p>
<blockquote>
<pre><font color="#0000FF">%token </font><font
color="#FF0000">ID</font><font color="#0000FF">
%%
</font><font color="#800080">def</font><font color="#0000FF">: </font><font
color="#800080">param_spec return_spec</font><font
color="#0000FF"> </font><font color="#FF0000">','</font><font
color="#0000FF">
;
</font><font color="#800080">param_spec</font><font
color="#0000FF">: </font><font color="#800080">type</font><font
color="#0000FF">
| </font><font color="#800080">name_list </font><font
color="#FF0000">':'</font><font color="#800080"> type</font><font
color="#0000FF">
;
</font><font color="#800080">return_spec</font><font
color="#0000FF">: </font><font color="#800080">type</font><font
color="#0000FF">
| </font><font color="#800080">name</font><font
color="#0000FF"> </font><font color="#FF0000">':'</font><font
color="#0000FF"> </font><font color="#800080">type</font><font
color="#0000FF">
;
</font><font color="#800080">type</font><font color="#0000FF">: </font><font
color="#FF0000">ID</font><font color="#0000FF">
;
</font><font color="#800080">name</font><font color="#0000FF">: </font><font
color="#FF0000">ID</font><font color="#0000FF">
;
</font><font color="#800080">name_list</font><font
color="#0000FF">: </font><font color="#800080">name</font><font
color="#0000FF">
| </font><font color="#800080">name </font><font
color="#FF0000">','</font><font color="#800080"> name_list</font><font
color="#0000FF">
;</font></pre>
</blockquote>
<p>It would seem that this grammar can be parsed with only a
single token of look-ahead: when a <font color="#800080"><tt>param_spec</tt></font>
is being read, an <font color="#FF0000"><tt>ID</tt></font> is a <font
color="#800080"><tt>name</tt></font> if a comma or colon follows,
or a <font color="#800080"><tt>type</tt></font> if another <font
color="#FF0000"><tt>ID</tt></font> follows. In other words, this
grammar is <font size="2">LR</font>(1).</p>
<p>However, <em>geyacc</em>, like most parser generators, cannot
actually handle all <font size="2">LR</font>(1) grammars. In this
grammar, two contexts, that after an <font color="#FF0000"><tt>ID</tt></font>
at the beginning of a <font color="#800080"><tt>param_spec</tt></font>
and likewise at the beginning of a <font color="#800080"><tt>return_spec</tt></font>,
are similar enough that <em>geyacc</em> assumes they are the
same. They appear similar because the same set of rules would be
active <font face="Symbol">-</font> the rule for reducing to a <font
color="#800080"><tt>name</tt></font> and that for reducing to a <font
color="#800080"><tt>type</tt></font>. <em>Geyacc</em> is unable
to determine at that stage of processing that the rules would
require different look-ahead tokens in the two contexts, so it
makes a single parser state for them both. Combining the two
contexts causes a conflict later. In parser terminology, this
occurrence means that the grammar is not <font size="2">LALR</font>(1).</p>
<p>In general, it is better to fix deficiencies than to document
them. But this particular deficiency is intrinsically hard to
fix; parser generators that can handle <font size="2">LR</font>(1)
grammars are hard to write and tend to produce parsers that are
very large. In practice, <em>geyacc</em> is more useful as it is
now.</p>
<p>When the problem arises, you can often fix it by identifying
the two parser states that are being confused, and adding
something to make them look distinct. In the above example,
adding one rule to <font color="#800080"><tt>return_spec</tt></font>
as follows makes the problem go away:</p>
<blockquote>
<pre><font color="#0000FF">%token </font><font
color="#FF0000">BOGUS</font><font color="#0000FF">
...
%%
...
</font><font color="#800080">return_spec</font><font
color="#0000FF">: </font><font color="#800080">type</font><font
color="#0000FF">
| </font><font color="#800080">name</font><font
color="#0000FF"> </font><font color="#FF0000">':'</font><font
color="#0000FF"> </font><font color="#800080">type</font><font
color="#0000FF">
</font><font color="#008080">-- This rule is never used.</font><font
color="#0000FF">
|</font><font color="#FF0000"> ID BOGUS</font><font
color="#0000FF">
;</font></pre>
</blockquote>
<p>This corrects the problem because it introduces the
possibility of an additional active rule in the context after the
<font color="#FF0000"><tt>ID</tt></font> at the beginning of <font
color="#800080"><tt>return_spec</tt></font>. This rule is not
active in the corresponding context in a <font color="#800080"><tt>param_spec</tt></font>,
so the two contexts receive distinct parser states. As long as
the token <font color="#FF0000"><tt>BOGUS</tt></font> is never
generated by <font color="#008080"><em><tt>read_token</tt></em></font>,
the added rule cannot alter the way actual input is parsed.</p>
<p>In this particular example, there is another way to solve the
problem: rewrite the rule for <font color="#800080"><tt>return_spec</tt></font>
to use <font color="#FF0000"><tt>ID</tt></font> directly instead
of via <font color="#800080"><tt>name</tt></font>. This also
causes the two confusing contexts to have different sets of
active rules, because the one for <font color="#800080"><tt>return_spec</tt></font>
activates the altered rule for <font color="#800080"><tt>return_spec</tt></font>
rather than the one for <font color="#800080"><tt>name</tt></font>.</p>
<blockquote>
<pre><font color="#800080">param_spec</font><font
color="#0000FF">: </font><font color="#800080">type</font><font
color="#0000FF">
| </font><font color="#800080">name_list</font><font
color="#0000FF"> </font><font color="#FF0000">':'</font><font
color="#0000FF"> </font><font color="#800080">type</font><font
color="#0000FF">
;
</font><font color="#800080">return_spec</font><font
color="#0000FF">: </font><font color="#800080">type</font><font
color="#0000FF">
| </font><font color="#FF0000">ID ':'</font><font
color="#0000FF"> </font><font color="#800080">type</font><font
color="#0000FF">
;</font></pre>
</blockquote>
<hr size="1">
<table border="0" width="100%">
<tr>
<td><address>
<font size="2"><b>Copyright 1998</b></font><font
size="1"><b>, </b></font><font size="2"><strong>Eric
Bezault</strong></font><strong> </strong><font
size="2"><br>
<strong>mailto:</strong></font><a
href="mailto:ericb@gobosoft.com"><font size="2">ericb@gobosoft.com</font></a><font
size="2"><br>
<strong>http:</strong></font><a
href="http://www.gobosoft.com"><font size="2">//www.gobosoft.com</font></a><font
size="2"><br>
<strong>Last Updated:</strong> 5 August 1998</font><br>
<!--webbot bot="PurpleText"
preview="
$Date: 1999/06/12 18:55:21 $
$Revision: 1.9 $"
-->
</address>
</td>
<td align="right" valign="top"><a
href="http://www.gobosoft.com"><img
src="../image/home.gif" alt="Home" border="0" width="40"
height="40"></a><a href="index.html"><img
src="../image/toc.gif" alt="Toc" border="0" width="40"
height="40"></a><a href="options.html"><img
src="../image/previous.gif" alt="Previous" border="0"
width="40" height="40"></a><a href="precedence.html"><img
src="../image/next.gif" alt="Next" border="0" width="40"
height="40"></a></td>
</tr>
</table>
</body>
</html>
|