File: bison_10.html

package info (click to toggle)
bisonc%2B%2B 6.09.02-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,984 kB
  • sloc: cpp: 9,375; ansic: 1,505; fortran: 1,134; makefile: 1,062; sh: 526; yacc: 84; lex: 60
file content (334 lines) | stat: -rw-r--r-- 14,900 bytes parent folder | download | duplicates (11)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
                      "http://www.w3.org/TR/html40/loose.dtd">
<HTML>
<!-- Created on January, 28 2005 by texi2html 1.66 -->
<!--
Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author)
            Karl Berry  <karl@freefriends.org>
            Olaf Bachmann <obachman@mathematik.uni-kl.de>
            and many others.
Maintained by: Many creative people <dev@texi2html.cvshome.org>
Send bugs and suggestions to <users@texi2html.cvshome.org>

-->
<HEAD>
<TITLE>Bison 2.21.5: Context Dependency</TITLE>

<META NAME="description" CONTENT="Bison 2.21.5: Context Dependency">
<META NAME="keywords" CONTENT="Bison 2.21.5: Context Dependency">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META NAME="Generator" CONTENT="texi2html 1.66">

</HEAD>

<BODY LANG="en" BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#800080" ALINK="#FF0000">

<A NAME="SEC81"></A>
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_9.html#SEC80"> &lt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC82"> &gt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_9.html#SEC80"> &lt;&lt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison.html#SEC_Top"> Up </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_11.html#SEC85"> &gt;&gt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison.html#SEC_Top">Top</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_toc.html#SEC_Contents">Contents</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_15.html#SEC92">Index</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_abt.html#SEC_About"> ? </A>]</TD>
</TR></TABLE>
<H1> 7. Handling Context Dependencies </H1>
<!--docid::SEC81::-->
<P>

The Bison paradigm is to parse tokens first, then group them into larger
syntactic units.  In many languages, the meaning of a token is affected by
its context.  Although this violates the Bison paradigm, certain techniques
(known as <EM>kludges</EM>) may enable you to write Bison parsers for such
languages.
</P>
<P>

<TABLE BORDER="0" CELLSPACING="0">
<TR><TD ALIGN="left" VALIGN="TOP"><A HREF="bison_10.html#SEC82">7.1 Semantic Info in Token Types</A></TD><TD>&nbsp;&nbsp;</TD><TD ALIGN="left" VALIGN="TOP">Token parsing can depend on the semantic context.</TD></TR>
<TR><TD ALIGN="left" VALIGN="TOP"><A HREF="bison_10.html#SEC83">7.2 Lexical Tie-ins</A></TD><TD>&nbsp;&nbsp;</TD><TD ALIGN="left" VALIGN="TOP">Token parsing can depend on the syntactic context.</TD></TR>
<TR><TD ALIGN="left" VALIGN="TOP"><A HREF="bison_10.html#SEC84">7.3 Lexical Tie-ins and Error Recovery</A></TD><TD>&nbsp;&nbsp;</TD><TD ALIGN="left" VALIGN="TOP">Lexical tie-ins have implications for how
                        error recovery rules must be written.</TD></TR>
</TABLE>
<P>

(Actually, &quot;kludge&quot; means any technique that gets its job done but is
neither clean nor robust.)
</P>
<P>

<A NAME="Semantic Tokens"></A>
<HR SIZE="6">
<A NAME="SEC82"></A>
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC81"> &lt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC83"> &gt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC81"> &lt;&lt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC81"> Up </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_11.html#SEC85"> &gt;&gt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison.html#SEC_Top">Top</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_toc.html#SEC_Contents">Contents</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_15.html#SEC92">Index</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_abt.html#SEC_About"> ? </A>]</TD>
</TR></TABLE>
<H2> 7.1 Semantic Info in Token Types </H2>
<!--docid::SEC82::-->
<P>

The C language has a context dependency: the way an identifier is used
depends on what its current meaning is.  For example, consider this:
</P>
<P>

<TABLE><tr><td>&nbsp;</td><td class=example><pre>foo (x);
</pre></td></tr></table><P>

This looks like a function call statement, but if <CODE>foo</CODE> is a typedef
name, then this is actually a declaration of <CODE>x</CODE>.  How can a Bison
parser for C decide how to parse this input?
</P>
<P>

The method used in GNU C is to have two different token types,
<CODE>IDENTIFIER</CODE> and <CODE>TYPENAME</CODE>.  When <CODE>yylex</CODE> finds an
identifier, it looks up the current declaration of the identifier in order
to decide which token type to return: <CODE>TYPENAME</CODE> if the identifier is
declared as a typedef, <CODE>IDENTIFIER</CODE> otherwise.
</P>
<P>

The grammar rules can then express the context dependency by the choice of
token type to recognize.  <CODE>IDENTIFIER</CODE> is accepted as an expression,
but <CODE>TYPENAME</CODE> is not.  <CODE>TYPENAME</CODE> can start a declaration, but
<CODE>IDENTIFIER</CODE> cannot.  In contexts where the meaning of the identifier
is <EM>not</EM> significant, such as in declarations that can shadow a
typedef name, either <CODE>TYPENAME</CODE> or <CODE>IDENTIFIER</CODE> is
accepted--there is one rule for each of the two token types.
</P>
<P>

This technique is simple to use if the decision of which kinds of
identifiers to allow is made at a place close to where the identifier is
parsed.  But in C this is not always so: C allows a declaration to
redeclare a typedef name provided an explicit type has been specified
earlier:
</P>
<P>

<TABLE><tr><td>&nbsp;</td><td class=example><pre>typedef int foo, bar, lose;
static foo (bar);        /* redeclare <CODE>bar</CODE> as static variable */
static int foo (lose);   /* redeclare <CODE>foo</CODE> as function */
</pre></td></tr></table><P>

Unfortunately, the name being declared is separated from the declaration
construct itself by a complicated syntactic structure--the &quot;declarator&quot;.
</P>
<P>

As a result, the part of Bison parser for C needs to be duplicated, with
all the nonterminal names changed: once for parsing a declaration in which
a typedef name can be redefined, and once for parsing a declaration in
which that can't be done.  Here is a part of the duplication, with actions
omitted for brevity:
</P>
<P>

<TABLE><tr><td>&nbsp;</td><td class=example><pre>initdcl:
          declarator maybeasm '='
          init
        | declarator maybeasm
        ;

notype_initdcl:
          notype_declarator maybeasm '='
          init
        | notype_declarator maybeasm
        ;
</pre></td></tr></table><P>

Here <CODE>initdcl</CODE> can redeclare a typedef name, but <CODE>notype_initdcl</CODE>
cannot.  The distinction between <CODE>declarator</CODE> and
<CODE>notype_declarator</CODE> is the same sort of thing.
</P>
<P>

There is some similarity between this technique and a lexical tie-in
(described next), in that information which alters the lexical analysis is
changed during parsing by other parts of the program.  The difference is
here the information is global, and is used for other purposes in the
program.  A true lexical tie-in has a special-purpose flag controlled by
the syntactic context.
</P>
<P>

<A NAME="Lexical Tie-ins"></A>
<HR SIZE="6">
<A NAME="SEC83"></A>
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC82"> &lt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC84"> &gt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC81"> &lt;&lt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC81"> Up </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_11.html#SEC85"> &gt;&gt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison.html#SEC_Top">Top</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_toc.html#SEC_Contents">Contents</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_15.html#SEC92">Index</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_abt.html#SEC_About"> ? </A>]</TD>
</TR></TABLE>
<H2> 7.2 Lexical Tie-ins </H2>
<!--docid::SEC83::-->
<P>

One way to handle context-dependency is the <EM>lexical tie-in</EM>: a flag
which is set by Bison actions, whose purpose is to alter the way tokens are
parsed.
</P>
<P>

For example, suppose we have a language vaguely like C, but with a special
construct `<SAMP>hex (<VAR>hex-expr</VAR>)</SAMP>'.  After the keyword <CODE>hex</CODE> comes
an expression in parentheses in which all integers are hexadecimal.  In
particular, the token `<SAMP>a1b</SAMP>' must be treated as an integer rather than
as an identifier if it appears in that context.  Here is how you can do it:
</P>
<P>

<TABLE><tr><td>&nbsp;</td><td class=example><pre>%{
int hexflag;
%}
%%
<small>...</small>
expr:   IDENTIFIER
        | constant
        | HEX '('
                { hexflag = 1; }
          expr ')'
                { hexflag = 0;
                   $$ = $4; }
        | expr '+' expr
                { $$ = make_sum ($1, $3); }
        <small>...</small>
        ;

constant:
          INTEGER
        | STRING
        ;
</pre></td></tr></table><P>

Here we assume that <CODE>yylex</CODE> looks at the value of <CODE>hexflag</CODE>; when
it is nonzero, all integers are parsed in hexadecimal, and tokens starting
with letters are parsed as integers if possible.
</P>
<P>

The declaration of <CODE>hexflag</CODE> shown in the C declarations section of
the parser file is needed to make it accessible to the actions 
(see section <A HREF="bison_6.html#SEC35">The C Declarations Section</A>).  You must also write the code in <CODE>yylex</CODE>
to obey the flag.
</P>
<P>

<A NAME="Tie-in Recovery"></A>
<HR SIZE="6">
<A NAME="SEC84"></A>
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC83"> &lt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_11.html#SEC85"> &gt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC81"> &lt;&lt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC81"> Up </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_11.html#SEC85"> &gt;&gt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison.html#SEC_Top">Top</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_toc.html#SEC_Contents">Contents</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_15.html#SEC92">Index</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_abt.html#SEC_About"> ? </A>]</TD>
</TR></TABLE>
<H2> 7.3 Lexical Tie-ins and Error Recovery </H2>
<!--docid::SEC84::-->
<P>

Lexical tie-ins make strict demands on any error recovery rules you have.
See section <A HREF="bison_9.html#SEC80">6. Error Recovery</A>.
</P>
<P>

The reason for this is that the purpose of an error recovery rule is to
abort the parsing of one construct and resume in some larger construct.
For example, in C-like languages, a typical error recovery rule is to skip
tokens until the next semicolon, and then start a new statement, like this:
</P>
<P>

<TABLE><tr><td>&nbsp;</td><td class=example><pre>stmt:   expr ';'
        | IF '(' expr ')' stmt { <small>...</small> }
        <small>...</small>
        error ';'
                { hexflag = 0; }
        ;
</pre></td></tr></table><P>

If there is a syntax error in the middle of a `<SAMP>hex (<VAR>expr</VAR>)</SAMP>'
construct, this error rule will apply, and then the action for the
completed `<SAMP>hex (<VAR>expr</VAR>)</SAMP>' will never run.  So <CODE>hexflag</CODE> would
remain set for the entire rest of the input, or until the next <CODE>hex</CODE>
keyword, causing identifiers to be misinterpreted as integers.
</P>
<P>

To avoid this problem the error recovery rule itself clears <CODE>hexflag</CODE>.
</P>
<P>

There may also be an error recovery rule that works within expressions.
For example, there could be a rule which applies within parentheses
and skips to the close-parenthesis:
</P>
<P>

<TABLE><tr><td>&nbsp;</td><td class=example><pre>expr:   <small>...</small>
        | '(' expr ')'
                { $$ = $2; }
        | '(' error ')'
        <small>...</small>
</pre></td></tr></table><P>

If this rule acts within the <CODE>hex</CODE> construct, it is not going to abort
that construct (since it applies to an inner level of parentheses within
the construct).  Therefore, it should not clear the flag: the rest of
the <CODE>hex</CODE> construct should be parsed with the flag still in effect.
</P>
<P>

What if there is an error recovery rule which might abort out of the
<CODE>hex</CODE> construct or might not, depending on circumstances?  There is no
way you can write the action to determine whether a <CODE>hex</CODE> construct is
being aborted or not.  So if you are using a lexical tie-in, you had better
make sure your error recovery rules are not of this kind.  Each rule must
be such that you can be sure that it always will, or always won't, have to
clear the flag.
</P>
<P>

<A NAME="Debugging"></A>
<HR SIZE="6">
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_10.html#SEC81"> &lt;&lt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_11.html#SEC85"> &gt;&gt; </A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT"> &nbsp; <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison.html#SEC_Top">Top</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_toc.html#SEC_Contents">Contents</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_15.html#SEC92">Index</A>]</TD>
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="bison_abt.html#SEC_About"> ? </A>]</TD>
</TR></TABLE>
<BR>
<FONT SIZE="-1">
This document was generated
by <I>Frank B. Brokken</I> on <I>January, 28 2005</I>
using <A HREF="http://texi2html.cvshome.org"><I>texi2html</I></A>
</FONT>

</BODY>
</HTML>