File: intro.yo

package info (click to toggle)
bisonc%2B%2B 6.09.02-1
links: PTS, VCS
area: main
in suites: forky, sid
size: 5,984 kB
sloc: cpp: 9,375; ansic: 1,505; fortran: 1,134; makefile: 1,062; sh: 526; yacc: 84; lex: 60
file content (89 lines) | stat: -rw-r--r-- 4,242 bytes
parent folder | download | duplicates (6)
This chapter describes the algorithm that is used by b(). Generating parsers
of course begins with a grammar to be analyzed. The analysis consists of
these steps:
    itemization(
    it() First, a set of tt(FIRST) tokens is determined. The tt(FIRST) set of
a nonterminal defines all terminal tokens that can be encountered when
beginning to recognize that nonterminal.
    it() Having determined the tt(FIRST) set, the grammar itself is
analyzed. Starting from the start rule all possible syntactically correct
derivations of the grammar are determined.
    it() At each state actions are defined that are eventually used by the
parser to determine what to do in a particular state. These actions are based
on the next available token and may either involve a transition to another
state (called a em(shift), since it involves processing a token by `shifting'
over the next part of the input) or it may involve a reduction (a em(reduce)
action), in which the parser stack's size is somewhat reduced. In the latter
case, if an em(action) is associated with a particular rule, the action is
executed as a side effect of the reduction.
    it() Sometimes the analysis takes the parser generator to a state in which
a choice between a em(shift) or a em(reduce) action must be made. This is
called a em(shift-reduce) conflict. Sometimes the parser generator is able to
solve these conflicts itself, by looking at the next available token (the
em(look-ahead) token). If the next token is not a possible continuation for
either a shift or a reduce action that particular continuation can be
discarded. Likewise, two reductions may be encountered in a state (a
em(reduce-reduce) conflict). Here the same reasoning can be applied, maybe
resulting in discarding one of the possible reductions. In all other cases
(i.e., if the look-ahead token is possible for both continuations) a true
conflict is observed which somehow is solved by the grammar designer. If the
designer doesn't select an particular action, the parser generator reports a
conflict and selects a default action. By default a em(shift) is used, rather
than a em(reduce). With a em(reduce-reduce) conflict the default action is to
reduce by the rule listed first in the grammar.
    it() In cases where input is not structured according to the rules of the
grammar, a em(syntactic error) is observed. Error recovery may be attempted,
to allow the parser to continue parsing. This is a desirable characteristic
since it provides the user of a program with a full syntactic error report,
rather than one error at a time.
    it() Following the analysis of the grammar, code is generated and the
parsing algorithm (implemented in the parser's tt(parse()) function) 
processes input according to the tables generated by the parser generator. 
    )

    All the above phases are illustrated and discussed in the next
sections. Additional details of the parsing process can be found in various
books about compiler construction, e.g., in Aho, Sethi and Ullman's (2003)
book bf(Compilers) (Addison-Wesley).

In the sections below, the following grammar is used to illustrate the
various phases:
        verb(
    %token NR
    
    %left '+'
    
    %%
    
    start:
        start expr
    |
        // empty
    ;
    
    expr:
        NR
    |
        expr '+' expr
    ;
        )
    The grammar is interesting as it has a rule containing an empty
alternative and because it harbors a shift-reduce conflict. The shift-reduce
conflict is solved by explictly assigning a priority and association to the
tt('+') token.

    The analysis starts by defining an additional rule, which is recognized
(reduced) at end of input. This rule and the rules specified in the grammar
together define what is known as the em(augmented grammar). In the coming
sections the symbol tt($) is used to indicate `end of input'. From the above
grammar the following augmented grammar is derived:
        verb(
    1.  start:      start expr
    2.  start:      // empty
    3.  expr:       NR
    4.  expr:       expr '+' expr
    5.  start_$:    start   (input ends here)
        )

    b() itself produces an extensive analysis of any grammar it is offered
when the option tt(--construction) is provided.