1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
|
B() implements the parsing function in the member function tt(parse()). This
function obtains its tokens from the member tt(lex()) and processes all tokens
until a syntactic error, a non-recoverable error, or the end of input is
encountered.
The algorithm used by tt(parse()) is the same, irrespective of the used
grammar. In fact, the tt(parse()) member's behavior is completely determined
by the tables generated by b().
The parsing algorithm is known as the em(shift-reduce) (S/R) algorithm, and it
allows tt(parse()) to perform two actions while processing series of tokens:
itemization(
it() When a token is received in a state in which that token is required
for a transition to another state (e.g., a tt(NR) token is observed in
state 1 of the example's grammar) a transition to state 3 is performed.
it() When a state is reached which calls for a (default) reduction (e.g.,
state 3 of the example's grammar) a reduction is performed.
)
The parsing function maintains two stacks, which are manipulated by the above
two actions: a state stack and a value stack. These stacks are not accessible
to the parser: they are private data structures defined in the parser's base
class. The parsing member tt(parse()) may use the following member functions
to manipulate these stacks:
itemization(
itt(push_(stateIdx)) pushes tt(stateIdx) on the state stack and pushes
the current semantic value (i.e., tt(LTYPE_ d_val_)) on the value stack;
itt(pop_(size_t count = 1)) removes tt(count) elements from the two
stacks;
itt(top_()) returns the state currently on top of the state stack;
)
Apart from the state- and semantic stacks, the S/R algorithm itself sometimes
needs to push a token on a two-element stack. Rather than using a formal
stack, two variables (tt(d_token_) and tt(d_nextToken_)) are used to
implement this little token-stack. The member function tt(pushToken_())
pushes a new value on the token stack, the member tt(popToken_())
pops a previously pushed value from the token stack. At any time,
tt(d_token_) contains the topmost element of the token stack.
The member tt(nextToken()) determines the next token to be processed. If the
token stack contains a value it is returned. Otherwise, tt(lex()) is called to
obtain the next token to be pushed on the token stack.
The member tt(lookup()) looks up the current token in the current state's
tt(SR_) table. For this a simple linear search algorithm is used. If
searching fails to find an action for the token an tt(UNEXPECTED_TOKEN_)
exception is thrown, which starts the error recovery. If an action was found,
it is returned.
Rules may have actions associated with them. These actions are executed when a
grammatical rule has been completely recognized. This is always at the end of
a rule: mid-rule actions are converted by b() into pseudo nonterminals,
replacing mid-rule action blocks by these pseudo nonterminals. The pseudo
nonterminals show up in the verbose grammar output as rules having LHSs
starting with tt(#). So, once a rule has been recognized its action (if
defined) is executed. For this the member function tt(executeAction()) is
available.
Finally, the token stack can be cleared using the member tt(clearin()).
Now that the relevant support functions have been introduced, the S/R
algorithm itself turns out to be a fairly simple algorithm. First, the
parser's stack is initialized with state 0 and the token stack is
cleared. Then, in a never ending loop:
itemization(
it() If a state needs a token (i.e., tt(REQ_TOKEN) has been specified for
that state), tt(nextToken()) is called to obtain the next token;
it() From the token and the current state tt(lookup()) determines the next
action;
it() If a shifting action was called for the next state is pushed on
the stack and the token is popped off the token stack.
it() If a reduction was called for that rule's action block is executed
followed by a reduction of the production rule (performed by tt(reduce_())):
the semantic and state stacks are reduced by the number of elements found in
that production rule, and the production rule's LHS is pushed on the token
stack
it() If the state/token combination indicates that the input is accepted
(normally: when tt(EOF) is encountered in state 1) then the parsing function
terminates, returning 0.
)
The following table shows the S/R algorithm in action when the example grammar
is given the input tt(3 + 4 + 5). The first column shows the (remaining)
input, the second column the current token stack (with tt(-) indicating an
empty token stack), the third column the state stack. The fourth column
provides a short description. The leftmost elements of
the stacks represent the tops of the stacks. The information shown below is
also (in more elaborate form) shown when the tt(--debug) option is provided to
B() when generating the parsing function.
table(7)(rllrlll)(
rowline()
row(cell(remaining input)cell( )cell(token stack)cell( )cell(state stack)
cell( )cell(description))
rowline()
ttrow(3 + 4 + 5) (-) ( 0) (initialization)
ttrow(3 + 4 + 5) (start) ( 0) (reduction by rule 2)
ttrow(3 + 4 + 5) (-) ( 1 0) (shift `start')
ttrow( + 4 + 5) (NR) ( 1 0) (obtain NR token)
ttrow( + 4 + 5) (-) ( 3 1 0) (shift NR)
ttrow( + 4 + 5) (expr) ( 1 0) (reduction by rule 3)
ttrow( + 4 + 5) (-) ( 2 1 0) (shift `expr')
ttrow( 4 + 5) (+) ( 2 1 0) (obtain `+' token)
ttrow( 4 + 5) (-) ( 4 2 1 0) (shift `+')
ttrow( + 5) (NR) ( 4 2 1 0) (obtain NR token)
ttrow( + 5) (-) (3 4 2 1 0) (shift NR)
ttrow( + 5) (expr) ( 4 3 1 0) (reduction by rule 3)
ttrow( + 5) (-) (5 4 3 1 0) (shift `expr')
ttrow( 5) (+) (5 4 3 1 0) (obtain `+' token)
ttrow( 5) (expr +) ( 1 0) (reduction by rule 4)
ttrow( 5) (+) ( 2 1 0) (shift `expr')
ttrow( 5) (-) ( 4 2 1 0) (shift '+')
ttrow( ) (NR) ( 4 2 1 0) (obtain NR token)
ttrow( ) (-) (3 4 2 1 0) (shift NR)
ttrow( ) (expr) ( 4 2 1 0) (reduction by rule 3)
ttrow( ) (-) (5 4 2 1 0) (shift `expr')
ttrow( ) (EOF) (5 4 2 1 0) (obtain EOF)
ttrow( ) (expr EOF) ( 1 0) (reduction by rule 4)
ttrow( ) (EOF) ( 2 1 0) (shift `expr')
ttrow( ) (start EOF) ( 2 1 0) (reduction by rule 1)
ttrow( ) (EOF) ( 1 0) (shift `start')
ttrow( ) (EOF) ( 1 0) (ACCEPT)
rowline()
)
|