1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
|
The b() parser does em(not) always perform a reduction when a state is reached
where an item has its dot position beyond the last element of its production
rule. For most languages such a simple strategy is incorrect. Instead, when a
reduction is possible, the parser sometimes `looks ahead' to the next token to
decide what to do.
Whenever a token is read, it is not immediately shifted; first it becomes the
em(look-ahead) token, which is not yet shifted on the stack. This allows the
parser to perform one or more reductions, with the look-ahead token still
waiting to be processed. Only when all available reductions have been
performed the look-ahead token is shifted on the stack. The phrase `all
em(available) reductions' does not necessarily mean all em(possible)
reductions. Depending on the look-ahead token, a shift rather than a reduce
may be performed in states in which both actions are possible.
Here is a simple case where a look-ahead token is required. The production
rules define expressions which may contain binary addition operators and
postfix unary factorial operators (`tt(!)'), as well as parentheses for
grouping expressions:
verb(
expr:
term '+' expr
|
term
;
term:
'(' expr ')'
|
term '!'
|
NUMBER
;
)
Suppose that the tokens `tt(1 + 2)' have been read and shifted; what
should be done? If the following token is `CLOSEPAR', then the first three
tokens must be reduced, forming an tt(expr). This is the only valid course,
because shifting the `CLOSEPAR' would produce the sequence of symbols
verb(
term 'CLOSEPAR'
)
which is not syntactically correct.
But if the next token is `tt(!)', then that token must be shifted so that
`tt(2 !)' can be reduced to recognize a tt(term). If in this case the parser
would perform a reduction then `tt(1 + 2)' would become an tt(expr). In that
case the `tt(!)' can't be shifted because doing so would result in the
sequence
verb(
expr '!'
)
which is also syntactically incorrect.
COMMENT(
As a technical aside: the current look-ahead token is stored in the parser's
private data member tt(d_token). This data member is not normally modified by
user-defined member functions. See section ref(SPECIAL).
END)
|