File: intro.yo

package info (click to toggle)
bisonc%2B%2B 4.09.02-1
links: PTS, VCS
area: main
in suites: jessie, jessie-kfreebsd
size: 5,412 kB
ctags: 2,871
sloc: cpp: 9,459; ansic: 1,434; makefile: 1,091; sh: 286; yacc: 84; lex: 60
file content (90 lines) | stat: -rw-r--r-- 4,295 bytes
parent folder | download | duplicates (4)
In this chapter the algorithm used by b() is discussed. Generating grammar
parsers of course begins with a grammar which is analyzed. The analysis
consists of several phases:
    itemization(
    it() First, a set of tt(FIRST) tokens is determined. The tt(FIRST) set of
a nonterminal defines all terminal tokens that can be encountered when
beginning to recognize that nonterminal.
    it() Next, the set of tt(FOLLOW) tokens is determined. A tt(FOLLOW) set of
a nonterminal defines all terminal tokens that can be encountered next,
following the recognition of that nonterminal.
    it() Having determined the tt(FIRST) and tt(FOLLOW) sets, the grammar
itself is analyzed. Starting from the start rule all possible syntactically
correct derivations of the grammar are determined.
    it() At each state actions are defined that are eventually used by the
parser to determine what to do in a particular state. These transitions are
based on the next available token and may either involve a transition to
another state (called a em(shift), since it involves processing a token and
thus determining the next token, which involves `shifting' over the next part
of the input) or it may involve a reduction (a em(reduce) action), which
reduces the parser's stack somewhat and optionally results in executing an
em(action) associated with a particular rule.
    it() Sometimes the analysis takes the parser generator to a state in which
a choice between a em(shift) or a em(reduce) action must be made. This is
called a em(shift-reduce) conflict. Sometimes the parser generator is able to
solve these conflicts itself, by looking at the em(lookahead) sets for each of
the continuations. If the next token is not found in the lookaheadset of
either a shift or a reduce action that particular continuation can be
discarded. Likewise, two reductions may be encountered in a state (a
em(reduce-reduce) conflict). Here the same reasoning can be applied, causing
one of the reductions to be discarded. In all other cases a true conflict is
observed which somehow should be solved by the grammar designer. By default a
em(shift) is preferred over a em(reduce) action. With a em(reduce-reduce)
conflict the default action is to reduce according to the reduction by the
rule listed first in the grammar.
    it() In cases where input is not according to the rules of the grammar, a
em(syntactic error) is observed. Error recovery may be attempted, to allow the
parser to continue parsing. This is a desirable characteristic since it
provides the user of a program with a full syntactic error report, rather than
one error at a time.
    it() Following the analysis of the grammar, code is generated and the
parsing algorithm (implemented in the parser's tt(parse()) function) will
process input according to the tables generated by the parser generator. 
    )

    All the above phases will be illustrated and discussed in the next
sections. Additional details of the parsing process can be found in various
books about compiler construction, e.g., in Aho, Sethi and Ullman's (2003)
book bf(Compilers) (Addison-Wesley).

In the sections below, the following grammar will be used to illustrate the
various phases:
        verb(
    %token NR
    
    %left '+'
    
    %%
    
    start:
        start expr
    |
        // empty
    ;
    
    expr:
        NR
    |
        expr '+' expr
    ;
        )
    The grammar is interesting since it has a rule containing an empty
alternative and since it formally suffers from a shift-reduce conflict. The
shift-reduce conflict is solved by explictly assigning a priority and
association to the tt('+') token.

    The analysis starts by defining an additional rule, which is recognized at
end of input. This rule and the rules specified in the grammar together define
what is known as the em(augmented grammar). In the coming sections
the symbol tt($) is used to indicate `end of
input'. From the above grammar the following augmented grammar is derived:
        verb(
    1.  start:      start expr
    2.  start:      // empty
    3.  expr:       NR
    4.  expr:       expr '+' expr
    5.  start_$:    start   (input ends here)
        )

    b() itself will produce an analysis of any grammar it is offered when the
option tt(--construction) is provided.