File: algorithm.html

package info (click to toggle)
gobo 1.5-2
  • links: PTS
  • area: main
  • in suites: potato
  • size: 7,728 kB
  • ctags: 13,070
  • sloc: ansic: 85,961; lex: 2,758; yacc: 2,298; sh: 1,464; makefile: 64
file content (616 lines) | stat: -rw-r--r-- 27,548 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>

<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 2.0">
<title>Geyacc: Parser Algorithm</title>
</head>

<body bgcolor="#FFFFFF">

<table border="0" width="100%">
    <tr>
        <td><font size="6"><strong>Parser Algorithm</strong></font></td>
        <td align="right"><a href="options.html"><img
        src="../image/previous.gif" alt="Previous" border="0"
        width="40" height="40"></a><a href="precedence.html"><img
        src="../image/next.gif" alt="Next" border="0" width="40"
        height="40"></a></td>
    </tr>
</table>

<hr size="1">

<p>As <em>geyacc</em> reads tokens, it pushes them onto a stack
along with their semantic values. The stack is called the <em>parser
stack</em>. Pushing a token is traditionally called <em><strong>shifting</strong></em>.
For example, suppose the infix calculator has read <font
color="#808000"><tt>1 + 5 *</tt></font>, with a <font
color="#808000"><tt>3</tt></font> to come. The stack will have
four elements, one for each token that was shifted.</p>

<p>But the stack does not always have an element for each token
read. When the last <font size="2" face="Courier New">N</font>
tokens and groupings shifted match the components of a grammar
rule, they can be combined according to that rule. This is called
<em><strong>reduction</strong></em>. Those tokens and groupings
are replaced on the stack by a single grouping whose symbol is
the result (left hand side) of that rule. Running the rule's
action is part of the process of reduction, because this is what
computes the semantic value of the resulting grouping. For
example, if the infix calculator's parser stack contains <font
color="#808000"><tt>1 + 5 * 3</tt></font> and the next input
token is a newline character, then the last three elements can be
reduced to <font color="#808000"><tt>15</tt></font> via the rule:</p>

<blockquote>
    <pre><font color="#800080">expr</font><font color="#0000FF">:</font> <font
color="#800080">expr</font> <font color="#FF0000">'*'</font> <font
color="#800080">expr</font><font color="#0000FF"> ;</font></pre>
</blockquote>

<p>Then the stack contains just these three elements <font
color="#808000"><tt>1 + 15</tt></font>. At this point, another
reduction can be made, resulting in the single value <font
color="#808000"><tt>16</tt></font>. Then the newline token can be
shifted.</p>

<p>The parser tries, by shifts and reductions, to reduce the
entire input down to a single grouping whose symbol is the
grammar's <a href="introduction.html#start_symbol">start symbol</a>.
This kind of parser is known in the literature as a <em>bottom-up
parser</em>.</p>

<h2><a name="look_ahead">Look-Ahead Tokens</a></h2>

<p>The <em>geyacc</em> parser does not always reduce immediately
as soon as the last <tt>N</tt> tokens and groupings match a rule.
This is because such a simple strategy is inadequate to handle
most languages. Instead, when a reduction is possible, the parser
sometimes <em>looks ahead</em> at the next token in order to
decide what to do.</p>

<p>When a token is read, it is not immediately shifted; first it
becomes the<em> </em><em><strong>look-ahead token</strong></em>,
which is not on the stack. Now the parser can perform one or more
reductions of tokens and groupings on the stack, while the
look-ahead token remains off to the side. When no more reductions
should take place, the look-ahead token is shifted onto the
stack. This does not mean that all possible reductions have been
done; depending on the token type of the look-ahead token, some
rules may choose to delay their application.</p>

<p>Here is a simple case where look-ahead is needed. These three
rules define expressions which contain binary addition operators
and postfix unary factorial operators <font color="#808000"
size="4"><tt>!</tt></font>, and allow parentheses for grouping.</p>

<blockquote>
    <pre><font color="#800080">expr</font><font color="#0000FF">:</font> <font
color="#800080">term</font> <font color="#FF0000">'+'</font> <font
color="#800080">expr</font>
    <font color="#0000FF">|</font> <font color="#800080">term</font>
    <font color="#0000FF">;</font>

<font color="#800080">term</font><font color="#0000FF">:</font> <font
color="#FF0000">'('</font> <font color="#800080">expr</font> <font
color="#FF0000">')'</font>
    <font color="#0000FF">|</font> <font color="#800080">term</font> <font
color="#FF0000">'!'</font>
    <font color="#0000FF">|</font> <font color="#FF0000">NUMBER</font>
    <font color="#0000FF">;</font></pre>
</blockquote>

<p>Suppose that the tokens <font color="#808000"><tt>1 + 2</tt></font>
have been read and shifted; what should be done? If the following
token is <font color="#808000"><tt>)</tt></font>, then the first
three tokens must be reduced to form an <font color="#800080"><tt>expr</tt></font>.
This is the only valid course, because shifting the<font
color="#808000" size="2" face="Courier New"> </font><font
color="#808000"><tt>)</tt></font><font color="#808000" size="2"
face="Courier New"> </font>would produce a sequence of symbols <font
color="#800080"><tt>term</tt></font><font color="#808000"><tt> )</tt></font>,
and no rule allows this. If the following token is <font
color="#808000" size="4"><tt>!</tt></font>, then it must be
shifted immediately so that <font color="#808000"><tt>2 </tt></font><font
color="#808000" size="4"><tt>!</tt></font> can be reduced to make
a <font color="#800080"><tt>term</tt></font>. If instead the
parser were to reduce before shifting, <font color="#808000"><tt>1
+ 2</tt></font> would become an <font color="#800080"><tt>expr</tt></font>.
It would then be impossible to shift the <font color="#808000"
size="4"><tt>!</tt></font> because doing so would produce on the
stack the sequence of symbols <font color="#800080"><tt>expr</tt></font><tt>
</tt><font color="#808000" size="4"><tt>!</tt></font>. No rule
allows that sequence.</p>

<p>The current look-ahead token is stored in the variable <a
href="skeleton.html#last_token"><font color="#008080" size="2"
face="Courier New"><em>last_token</em></font></a>.</p>

<h2><a name="shift_reduce">Shift/Reduce Conflicts</a></h2>

<p>Suppose we are parsing a language which has if-then and
if-then-else statements, with a pair of rules like this:</p>

<blockquote>
    <pre><font color="#800080">if_stmt</font><font
color="#0000FF">:</font> <font color="#FF0000">IF</font> <font
color="#800080">expr</font> <font color="#FF0000">THEN</font> <font
color="#800080">stmt</font>
    <font color="#0000FF">|</font> <font color="#FF0000">IF</font> <font
color="#800080">expr</font> <font color="#FF0000">THEN</font> <font
color="#800080">stmt</font> <font color="#FF0000">ELSE</font> <font
color="#800080">stmt</font>
    <font color="#0000FF">;</font></pre>
</blockquote>

<p>Here we assume that <font color="#FF0000"><tt>IF</tt></font>, <font
color="#FF0000"><tt>THEN</tt></font> and <font color="#FF0000"><tt>ELSE</tt></font>
are terminal symbols for specific keyword tokens. When the <font
color="#FF0000"><tt>ELSE</tt></font> token is read and becomes
the look-ahead token, the contents of the stack (assuming the
input is valid) are just right for reduction by the first rule.
But it is also legitimate to shift the <font color="#FF0000"><tt>ELSE</tt></font>,
because that would lead to eventual reduction by the second rule.</p>

<p>This situation, where either a shift or a reduction would be
valid, is called a <em><strong>shift/reduce conflict</strong></em>.
<em>Geyacc</em> is designed to resolve these conflicts by
choosing to shift, unless otherwise directed by operator
precedence declarations. To see the reason for this, let's
contrast it with the other alternative. Since the parser prefers
to shift the <font color="#FF0000"><tt>ELSE</tt></font>, the
result is to attach the else-clause to the innermost
if-statement, making these two inputs equivalent:</p>

<blockquote>
    <pre><font color="#808000"><strong>if</strong> x <strong>then</strong> <strong>if</strong> y <strong>then</strong> win (); <strong>else</strong> lose;

<strong>if</strong> x <strong>then</strong>
    <strong>do</strong> <strong>
        if</strong> y <strong>then</strong> win (); <strong>else</strong> lose;
    <strong>end</strong>;</font></pre>
</blockquote>

<p>But if the parser chose to reduce when possible rather than
shift, the result would be to attach the else-clause to the
outermost if-statement, making these two inputs equivalent:</p>

<blockquote>
    <pre><font color="#808000"><strong>if</strong> x <strong>then</strong> <strong>if</strong> y <strong>then</strong> win (); <strong>else</strong> lose;

<strong>if</strong> x <strong>then</strong>
    <strong>do</strong>
        <strong>if</strong> y <strong>then</strong> win ();
    <strong>end</strong>;
<strong>else</strong>
    lose;</font></pre>
</blockquote>

<p>The conflict exists because the grammar as written is
ambiguous: either parsing of the simple nested if-statement is
legitimate. The established convention is that these ambiguities
are resolved by attaching the else-clause to the innermost
if-statement; this is what <em>geyacc</em> accomplishes by
choosing to shift rather than reduce. (It would ideally be
cleaner to write an unambiguous grammar, but that is very hard to
do in this case.) This particular ambiguity was first encountered
in the specifications of Algol 60 and is called the <em>dangling
`else'</em> ambiguity.</p>

<p>To avoid warnings from <em>geyacc</em> about predictable,
legitimate shift/reduce conflicts, use the <a
href="declarations.html#expect"><font color="#0000FF"><tt>%expect</tt></font></a><font
color="#0000FF"><tt> N</tt></font> declaration. There will be no
warning as long as the number of shift/reduce conflicts is
exactly <tt>N</tt>.</p>

<p>The definition of <font color="#800080"><tt>if_stmt</tt></font>
above is solely to blame for the conflict, but the conflict does
not actually appear without additional rules. Here is a complete <em>geyacc</em>
input file that actually manifests the conflict:</p>

<blockquote>
    <pre><font color="#0000FF">%token</font> <font
color="#FF0000">IF THEN ELSE VARIABLE</font>
<font color="#0000FF">%%</font>
<font color="#800080">stmt</font><font color="#0000FF">:</font> <font
color="#800080">expr</font>
    <font color="#0000FF">|</font> <font color="#800080">if_stmt</font>
    <font color="#0000FF">;</font>

<font color="#800080">if_stmt</font>: <font color="#FF0000">IF</font> <font
color="#800080">expr</font> <font color="#FF0000">THEN</font> <font
color="#800080">stmt</font>
    <font color="#0000FF">|</font> <font color="#FF0000">IF</font> <font
color="#800080">expr</font> <font color="#FF0000">THEN</font> <font
color="#800080">stmt</font> <font color="#FF0000">ELSE</font> <font
color="#800080">stmt</font>
    <font color="#0000FF">;</font>

<font color="#800080">expr</font><font color="#0000FF">:</font> <font
color="#FF0000">VARIABLE</font>
    <font color="#0000FF">;</font></pre>
</blockquote>

<p>Another situation where shift/reduce conflicts appear is in
arithmetic expressions. Here shifting is not always the preferred
resolution; the <em>geyacc</em> declarations for operator <a
href="precedence.html">precedence</a> allow you to specify when
to shift and when to reduce.</p>

<h2><a name="states">Parser States</a></h2>

<p>The routine <font color="#008080"><em><tt>parse</tt></em></font>
from class <font color="#008080"><em><tt>YY_PARSER_SKELETON</tt></em></font>
is implemented using a finite-state machine. The values pushed on
the parser stack are not simply token type codes; they represent
the entire sequence of terminal and nonterminal symbols at or
near the top of the stack. The current state collects all the
information about previous input which is relevant to deciding
what to do next.</p>

<p>Each time a look-ahead token is read, the current parser state
together with the type of look-ahead token are looked up in a
table. This table entry can say, &quot;Shift the look-ahead
token&quot;. In this case, it also specifies the new parser
state, which is pushed onto the top of the parser stack. Or it
can say, &quot;Reduce using rule number <tt>N</tt>&quot;. This
means that a certain number of tokens or groupings are taken off
the top of the stack, and replaced by one grouping. In other
words, that number of states are popped from the stack, and one
new state is pushed.</p>

<p>There is one other alternative: the table can say that the
look-ahead token is erroneous in the current state. This causes <a
href="error.html">error processing</a> to begin.</p>

<h2><a name="reduce_reduce">Reduce/Reduce Conflicts</a></h2>

<p>A <em><strong>reduce/reduce conflict</strong></em> occurs if
there are two or more rules that apply to the same sequence of
input. This usually indicates a serious error in the grammar. For
example, here is an erroneous attempt to define a sequence of
zero or more <font color="#800080"><tt>word</tt></font>
groupings.</p>

<blockquote>
    <pre><font color="#800080">sequence</font><font
color="#0000FF">:</font> <font color="#008080">-- Empty</font>
        <font color="#0000FF">{</font> <font color="#008080"><em>print </em>(<em>&quot;empty sequence%N&quot;</em>)</font> <font
color="#0000FF">}</font>
    <font color="#0000FF">|</font> <font color="#800080">maybeword</font>
    <font color="#0000FF">|</font> <font color="#800080">sequence word</font>
        <font color="#0000FF">{</font>
            <font color="#008080"><em>print </em>(<em>&quot;added word &quot;</em>)<em>
            print </em>(</font><font color="#0000FF">$2</font><font
color="#008080">)<em>
            print </em>(<em>'%N'</em>)</font>
        <font color="#0000FF">}</font>
   <font color="#0000FF"> ;</font>

<font color="#800080">maybeword</font><font color="#0000FF">:</font> <font
color="#008080">-- Empty</font>
        <font color="#0000FF">{</font> <font color="#008080"><em>print </em>(<em>&quot;empty maybeword%N&quot;</em>)</font> <font
color="#0000FF">}
    | </font><font color="#800080">word</font><font
color="#0000FF">
        {
            </font><font color="#008080"><em>print </em>(<em>&quot;single word &quot;</em>)<em>
            print </em>(</font><font color="#0000FF">$1</font><font
color="#008080">)<em>
            print </em>(<em>'%N'</em>)</font><font
color="#0000FF">
        }
    ;</font></pre>
</blockquote>

<p>The error is an ambiguity: there is more than one way to parse
a single <font color="#800080"><tt>word</tt></font> into a <font
color="#800080"><tt>sequence</tt></font>. It could be reduced to
a <font color="#800080"><tt>maybeword</tt></font> and then into a
<font color="#800080"><tt>sequence</tt></font> via the second
rule. Alternatively, nothing-at-all could be reduced into a <font
color="#800080"><tt>sequence</tt></font> via the first rule, and
this could be combined with the <font color="#800080"><tt>word</tt></font>
using the third rule for <font color="#800080"><tt>sequence</tt></font>.</p>

<p>There is also more than one way to reduce nothing-at-all into
a <font color="#800080"><tt>sequence</tt></font>. This can be
done directly via the first rule, or indirectly via <font
color="#800080"><tt>maybeword</tt></font> and then the second
rule. You might think that this is a distinction without a
difference, because it does not change whether any particular
input is valid or not. But it does affect which actions are run.
One parsing order runs the second rule's action; the other runs
the first rule's action and the third rule's action. In this
example, the output of the program changes.</p>

<p><em>Geyacc</em> resolves a reduce/reduce conflict by choosing
to use the rule that appears first in the grammar, but it is very
risky to rely on this. Every reduce/reduce conflict must be
studied and usually eliminated. Here is the proper way to define <font
color="#800080"><tt>sequence</tt></font>:</p>

<blockquote>
    <pre><font color="#800080">sequence</font><font
color="#0000FF">: </font><font color="#008080">-- Empty</font><font
color="#0000FF">
        { </font><font color="#008080"><em>print </em>(<em>&quot;empty sequence%N&quot;</em>)</font><font
color="#0000FF"> }
    | </font><font color="#800080">sequence word</font><font
color="#0000FF">
        {
            </font><font color="#008080"><em>print </em>(<em>&quot;added word &quot;</em>)<em>
            print </em>(</font><font color="#0000FF">$2</font><font
color="#008080">)<em>
            print </em>(<em>'%N'</em>)</font><font
color="#0000FF">
        }
    ;</font></pre>
</blockquote>

<p>Here is another common error that yields a reduce/reduce
conflict:</p>

<blockquote>
    <pre><font color="#800080">sequence</font><font
color="#0000FF">: </font><font color="#008080">-- Empty</font><font
color="#0000FF">
    | </font><font color="#800080">sequence words</font><font
color="#0000FF">
    | </font><font color="#800080">sequence redirects</font><font
color="#0000FF">
    ;

</font><font color="#800080">words</font><font color="#0000FF">: </font><font
color="#008080">-- Empty</font><font color="#0000FF">
    | </font><font color="#800080">words word</font><font
color="#0000FF">
    ;

</font><font color="#800080">redirects</font><font
color="#0000FF">: </font><font color="#008080">-- Empty</font><font
color="#0000FF">
    | </font><font color="#800080">redirects redirect</font><font
color="#0000FF">
    ;</font></pre>
</blockquote>

<p>The intention here is to define a sequence which can contain <font
color="#800080"><tt>word</tt></font> and/or <font color="#800080"><tt>redirect</tt></font>
groupings. The individual definitions of <font color="#800080"><tt>sequence</tt></font>,
<font color="#800080"><tt>words</tt></font> and <font
color="#800080"><tt>redirects</tt></font> are error-free, but the
three together make a subtle ambiguity: even an empty input can
be parsed in infinitely many ways! Consider: nothing-at-all could
be a <font color="#800080"><tt>words</tt></font>. Or it could be
two <font color="#800080"><tt>words</tt></font> in a row, or
three, or any number. It could equally well be a <font
color="#800080"><tt>redirects</tt></font>, or two, or any number.
Or it could be a <font color="#800080"><tt>words</tt></font>
followed by three <font color="#800080"><tt>redirects</tt></font>
and another <font color="#800080"><tt>words</tt></font>. And so
on.</p>

<p>Here are two ways to correct these rules. First, to make it a
single level of sequence:</p>

<blockquote>
    <pre><font color="#800080">sequence</font><font
color="#0000FF">:</font><font color="#008080"> -- Empty</font><font
color="#0000FF">
    | </font><font color="#800080">sequence word</font><font
color="#0000FF">
    | </font><font color="#800080">sequence redirect</font><font
color="#0000FF">
    ;</font></pre>
</blockquote>

<p>Second, to prevent either a <font color="#800080"><tt>words</tt></font>
or a <font color="#800080"><tt>redirects</tt></font> from being
empty:</p>

<blockquote>
    <pre><font color="#800080">sequence</font><font
color="#0000FF">: </font><font color="#008080">-- Empty</font><font
color="#0000FF">
    | </font><font color="#800080">sequence words</font><font
color="#0000FF">
    | </font><font color="#800080">sequence redirects</font><font
color="#0000FF">
    ;

</font><font color="#800080">words</font><font color="#0000FF">: </font><font
color="#800080">word</font><font color="#0000FF">
    | </font><font color="#800080">words word</font><font
color="#0000FF">
    ;

</font><font color="#800080">redirects</font><font
color="#0000FF">: </font><font color="#800080">redirect</font><font
color="#0000FF">
    | </font><font color="#800080">redirects redirect</font><font
color="#0000FF">
    ;</font></pre>
</blockquote>

<h2><a name="mysterious_reduce_reduce">Mysterious Reduce/Reduce
Conflicts</a></h2>

<p>Sometimes reduce/reduce conflicts can occur that don't look
warranted. Here is an example:</p>

<blockquote>
    <pre><font color="#0000FF">%token </font><font
color="#FF0000">ID</font><font color="#0000FF">

%%
</font><font color="#800080">def</font><font color="#0000FF">: </font><font
color="#800080">param_spec return_spec</font><font
color="#0000FF"> </font><font color="#FF0000">','</font><font
color="#0000FF">
    ;
</font><font color="#800080">param_spec</font><font
color="#0000FF">: </font><font color="#800080">type</font><font
color="#0000FF">
    | </font><font color="#800080">name_list </font><font
color="#FF0000">':'</font><font color="#800080"> type</font><font
color="#0000FF">
    ;
</font><font color="#800080">return_spec</font><font
color="#0000FF">: </font><font color="#800080">type</font><font
color="#0000FF">
    | </font><font color="#800080">name</font><font
color="#0000FF"> </font><font color="#FF0000">':'</font><font
color="#0000FF"> </font><font color="#800080">type</font><font
color="#0000FF">
    ;
</font><font color="#800080">type</font><font color="#0000FF">: </font><font
color="#FF0000">ID</font><font color="#0000FF">
    ;
</font><font color="#800080">name</font><font color="#0000FF">: </font><font
color="#FF0000">ID</font><font color="#0000FF">
    ;
</font><font color="#800080">name_list</font><font
color="#0000FF">: </font><font color="#800080">name</font><font
color="#0000FF">
    | </font><font color="#800080">name </font><font
color="#FF0000">','</font><font color="#800080"> name_list</font><font
color="#0000FF">
    ;</font></pre>
</blockquote>

<p>It would seem that this grammar can be parsed with only a
single token of look-ahead: when a <font color="#800080"><tt>param_spec</tt></font>
is being read, an <font color="#FF0000"><tt>ID</tt></font> is a <font
color="#800080"><tt>name</tt></font> if a comma or colon follows,
or a <font color="#800080"><tt>type</tt></font> if another <font
color="#FF0000"><tt>ID</tt></font> follows. In other words, this
grammar is <font size="2">LR</font>(1).</p>

<p>However, <em>geyacc</em>, like most parser generators, cannot
actually handle all <font size="2">LR</font>(1) grammars. In this
grammar, two contexts, that after an <font color="#FF0000"><tt>ID</tt></font>
at the beginning of a <font color="#800080"><tt>param_spec</tt></font>
and likewise at the beginning of a <font color="#800080"><tt>return_spec</tt></font>,
are similar enough that <em>geyacc</em> assumes they are the
same. They appear similar because the same set of rules would be
active <font face="Symbol">-</font> the rule for reducing to a <font
color="#800080"><tt>name</tt></font> and that for reducing to a <font
color="#800080"><tt>type</tt></font>. <em>Geyacc</em> is unable
to determine at that stage of processing that the rules would
require different look-ahead tokens in the two contexts, so it
makes a single parser state for them both. Combining the two
contexts causes a conflict later. In parser terminology, this
occurrence means that the grammar is not <font size="2">LALR</font>(1).</p>

<p>In general, it is better to fix deficiencies than to document
them. But this particular deficiency is intrinsically hard to
fix; parser generators that can handle <font size="2">LR</font>(1)
grammars are hard to write and tend to produce parsers that are
very large. In practice, <em>geyacc</em> is more useful as it is
now.</p>

<p>When the problem arises, you can often fix it by identifying
the two parser states that are being confused, and adding
something to make them look distinct. In the above example,
adding one rule to <font color="#800080"><tt>return_spec</tt></font>
as follows makes the problem go away:</p>

<blockquote>
    <pre><font color="#0000FF">%token </font><font
color="#FF0000">BOGUS</font><font color="#0000FF">
...
%%
...
</font><font color="#800080">return_spec</font><font
color="#0000FF">: </font><font color="#800080">type</font><font
color="#0000FF">
    | </font><font color="#800080">name</font><font
color="#0000FF"> </font><font color="#FF0000">':'</font><font
color="#0000FF"> </font><font color="#800080">type</font><font
color="#0000FF">
        </font><font color="#008080">-- This rule is never used.</font><font
color="#0000FF">
    |</font><font color="#FF0000"> ID BOGUS</font><font
color="#0000FF">
    ;</font></pre>
</blockquote>

<p>This corrects the problem because it introduces the
possibility of an additional active rule in the context after the
<font color="#FF0000"><tt>ID</tt></font> at the beginning of <font
color="#800080"><tt>return_spec</tt></font>. This rule is not
active in the corresponding context in a <font color="#800080"><tt>param_spec</tt></font>,
so the two contexts receive distinct parser states. As long as
the token <font color="#FF0000"><tt>BOGUS</tt></font> is never
generated by <font color="#008080"><em><tt>read_token</tt></em></font>,
the added rule cannot alter the way actual input is parsed.</p>

<p>In this particular example, there is another way to solve the
problem: rewrite the rule for <font color="#800080"><tt>return_spec</tt></font>
to use <font color="#FF0000"><tt>ID</tt></font> directly instead
of via <font color="#800080"><tt>name</tt></font>. This also
causes the two confusing contexts to have different sets of
active rules, because the one for <font color="#800080"><tt>return_spec</tt></font>
activates the altered rule for <font color="#800080"><tt>return_spec</tt></font>
rather than the one for <font color="#800080"><tt>name</tt></font>.</p>

<blockquote>
    <pre><font color="#800080">param_spec</font><font
color="#0000FF">: </font><font color="#800080">type</font><font
color="#0000FF">
    | </font><font color="#800080">name_list</font><font
color="#0000FF"> </font><font color="#FF0000">':'</font><font
color="#0000FF"> </font><font color="#800080">type</font><font
color="#0000FF">
    ;
</font><font color="#800080">return_spec</font><font
color="#0000FF">: </font><font color="#800080">type</font><font
color="#0000FF">
    | </font><font color="#FF0000">ID ':'</font><font
color="#0000FF"> </font><font color="#800080">type</font><font
color="#0000FF">
    ;</font></pre>
</blockquote>

<hr size="1">

<table border="0" width="100%">
    <tr>
        <td><address>
            <font size="2"><b>Copyright  1998</b></font><font
            size="1"><b>, </b></font><font size="2"><strong>Eric
            Bezault</strong></font><strong> </strong><font
            size="2"><br>
            <strong>mailto:</strong></font><a
            href="mailto:ericb@gobosoft.com"><font size="2">ericb@gobosoft.com</font></a><font
            size="2"><br>
            <strong>http:</strong></font><a
            href="http://www.gobosoft.com"><font size="2">//www.gobosoft.com</font></a><font
            size="2"><br>
            <strong>Last Updated:</strong> 5 August 1998</font><br>
            <!--webbot bot="PurpleText"
            preview="
$Date: 1999/06/12 18:55:21 $ 
$Revision: 1.9 $"
            --> 
        </address>
        </td>
        <td align="right" valign="top"><a
        href="http://www.gobosoft.com"><img
        src="../image/home.gif" alt="Home" border="0" width="40"
        height="40"></a><a href="index.html"><img
        src="../image/toc.gif" alt="Toc" border="0" width="40"
        height="40"></a><a href="options.html"><img
        src="../image/previous.gif" alt="Previous" border="0"
        width="40" height="40"></a><a href="precedence.html"><img
        src="../image/next.gif" alt="Next" border="0" width="40"
        height="40"></a></td>
    </tr>
</table>
</body>
</html>