File: midrule.yo

package info (click to toggle)
bisonc%2B%2B 6.09.02-1
links: PTS, VCS
area: main
in suites: forky, sid
size: 5,984 kB
sloc: cpp: 9,375; ansic: 1,505; fortran: 1,134; makefile: 1,062; sh: 526; yacc: 84; lex: 60
file content (159 lines) | stat: -rw-r--r-- 5,922 bytes
parent folder | download | duplicates (6)
Occasionally it is useful to insert an action block somewhere in the middle of
a production rule. These so-called mid-rule action blocks are comparable to
the usual action blocks, appearing at the end of production rules, but
mid-rule action blocks are executed before the parser has recognized the
production rule's components that follow them.

A mid-rule action block can refer to the components preceding it using tt($i),
but it may not (cannot) refer to subsequent components because it is executed
before they have been observed.

The mid-rule action block itself counts as one of the components of the
production rule. In the example shown below, tt(stmnt) can be referred to as
tt($6) in the final action block.

Mid-rule action blocks can also have semantic values. When using tt(%union) or
tt(%polymorphic) and a rule's nonterminal is associated with a union field or
polymorphic token, then mid-rule action blocks loose those associations. When
the tt($$, $$.) or tt($$->) shorthand notations appear in mid-action blocks of
production rules whose nonterminal is associated with a polymorphic type or
union field then a warning is issued that automatic type associations do not
apply. Using the tt(_$$) shorthand notation prevents the warning from being
issued. 

Here is an example from a hypothetical grammar, defining a tt(let) statement
that looks like `this:
        verb(
    let (variable) statement
        )
    Here, tt(variable) is the name of a variable that must only exists for the
tt(statement's) lifetime. To parse this construct, we must insert tt(variable)
into a symbol table before parsing tt(statement), and must remove it
afterward. Here is how it is done:
        verb(
    stmt:   
        LET '(' variable ')'
        {
            _$$.assign<SYMTAB>(symtab());
            addVariable($3); 
        }
        stmt    
        { 
            $$ = $6;
            restoreSymtab($5); 
        }
        )
    As soon as `tt(let (variable))' has been recognized, the first action is
executed. It saves the current symbol table as the mid-rule's semantic value,
using the polymorphic tag tt(SYMTAB) (which could be associated with, e.g., an
tt(std::unordered_map)).  Then tt(addVariable) receives the new variable's
name, adding it to the current symbol table. Once the first action is
finished, the embedded statement (tt(stmt)) is parsed. Note that the mid-rule
action is component number 5, so `tt(stmt)' is component number 6.

Once tt(statement) has been parsed, its semantic value is returned as the
semantic value of the production rule's nonterminal. Then the semantic value
from the mid-rule action block is used to restore the symbol table to its
original state. This removes the temporary tt(let)-variable from the list so
that it won't appear to exist while the rest of the program is parsed.

Defining mid-rule action blocks before a rule has completely been recognized
often leads to conflicts since the parser, because of the single look-ahead
tokoen, must make a decision about the parsing sequence to use.  For example,
the following two rules, without mid-rule actions, can coexist because the
parser can always process the open-brace token and only then look at the next
token before deciding whether there is a declaration or not:
        verb(
    compound: 
        '{' declarations statements '}'
    | 
        '{' statements '}'
    ;
        )
    But when we an action block is inserted before the first open-parenthesis
a conflict is introduced:
        verb(
    compound: 
        { 
            prepareForLocalVariables(); 
        }
        '{' declarations statements '}'
    | 
        '{' statements '}'
    ;
        )
    Now the parser, encountering the open-parenthesis, must decide whether the
open-parenthesis belongs to the first or second production rule. It must make
this decision because it must execute the statement in the action block if it
selects the first production rule, while no action is required when selecting
the second production rule (cf. section ref(LOOKAHEAD)).

This problem cannot be solved by putting identical actions into the two rules,
like so:
        verb(
        { 
            prepareForLocalVariables(); 
        }
        '{' declarations statements '}'
    | 
        { 
            prepareForLocalVariables(); 
        }
        '{' statements '}'
    ;
        )
    But this does not solve the issue, because b() considers all action blocks
as unique elements of production rules, and does not inspect the action
blocks' contents.

A solution to the above problem consists of hiding the action block in a
support nonterminal symbol which recognizes the first block-open brace and
then performs the required preparations:
        verb(
    openblock:
        '{'
        { 
            prepareForLocalVariables(); 
        }
    ;

    compound: 
            openblock declarations statements '}'
    | 
            openblock statements '}'
    ;
        )
    Now b() can execute the action in the rule for subroutine without
deciding which rule for compound it eventually uses. Note that the action
is now at the end of its rule. Any mid-rule action can be converted to an
end-of-rule action in this way, and this is what b() actually does to
implement mid-rule actions. B() converts mid-rule action blocks to hash-tag
numbered elements in production rules. In a rule like
        verb(
    string:
        {
            ...
        }
        TOKEN
        ...
    ;
        )
    warning messages referring to the mid-rule action block could look like
this: 
        verb(
    [grammar: warning] Line 10: `rule 3: string ->  #0001': ...
        )
    Here, `#0001' is the hidden nonterminal merely containing the mid-rule
action block. It is as though we had written this grammar:
        verb(
    string:
        #0001 TOKEN
        ...
    ;

    #0001:
        {
            ...
        }
    ;
        )