File: lookahead.yo

package info (click to toggle)

bisonc%2B%2B 6.09.02-1

links: PTS, VCS
area: main
in suites:
size: 5,984 kB
sloc: cpp: 9,375; ansic: 1,505; fortran: 1,134; makefile: 1,062; sh: 526; yacc: 84; lex: 60

file content (60 lines) | stat: -rw-r--r-- 2,308 bytes

parent folder | download | duplicates (6)

The b() parser does em(not) always perform a reduction when a state is reached
where an item has its dot position beyond the last element of its production
rule. For most languages such a simple strategy is incorrect. Instead, when a
reduction is possible, the parser sometimes `looks ahead' to the next token to
decide what to do.

Whenever a token is read, it is not immediately shifted; first it becomes the
em(look-ahead) token, which is not yet shifted on the stack. This allows the
parser to perform one or more reductions, with the look-ahead token still
waiting to be processed. Only when all available reductions have been
performed the look-ahead token is shifted on the stack. The phrase `all
em(available) reductions' does not necessarily mean all em(possible)
reductions. Depending on the look-ahead token, a shift rather than a reduce
may be performed in states in which both actions are possible.

Here is a simple case where a look-ahead token is required. The production
rules define expressions which may contain binary addition operators and
postfix unary factorial operators (`tt(!)'), as well as parentheses for
grouping expressions:
    verb(
    expr:     
        term '+' expr
    | 
        term
    ;

    term:     
        '(' expr ')'
    | 
        term '!'
    | 
        NUMBER
    ;
        )
    Suppose that the tokens `tt(1 + 2)' have been read and shifted; what
should be done? If the following token is `CLOSEPAR', then the first three
tokens must be reduced, forming an tt(expr). This is the only valid course,
because shifting the `CLOSEPAR' would produce the sequence of symbols
    verb(
    term 'CLOSEPAR'
    )
    which is not syntactically correct.

But if the next token is `tt(!)', then that token must be shifted so that
`tt(2 !)' can be reduced to recognize a tt(term). If in this case the parser
would perform a reduction then `tt(1 + 2)' would become an tt(expr). In that
case the `tt(!)' can't be shifted because doing so would result in the
sequence 
    verb(
    expr '!'
    )
    which is also syntactically incorrect.

COMMENT(
As a technical aside: the current look-ahead token is stored in the parser's
private data member tt(d_token). This data member is not normally modified by
user-defined member functions. See section ref(SPECIAL).
END)