1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173
|
Once the items of all the grammar's states have been determined the LA sets
for the states' items are computed. Starting from the LA set of the kernel
item of state 0 (representing the augmented grammar's production rule
tt(S_$: . S), where tt(S) is the grammar's start rule) the LA sets of all
items of all of the grammar's states are determined. By definition, the LA set
of state 0's kernel item equals tt($), representing end-of-file.
Starting from the function tt(State::determineLAsets), which is called for
state 0, the LA sets of all items of all states are computed.
For each state, the LA sets of its items are computed first. Once they have
been computed, the LA sets of items from where transitions to other states are
possible are then propagated to the matching kernel items of those destination
states. When the LA sets of kernel items of those destination states are
enlarged then their state indices are added to a set tt(todo). LA sets of the
items of states whose indices are stored in the tt(todo) set are (re)computed
(by calling tt(determineLAsets) for those states) until tt(todo) is empty, at
which point all LA sets have been computed. Initially tt(todo) only contains
0, the index of the initial state, representing the augmented grammar's
production rule.
To compute the LA sets of a state's items the LA set of each of its kernel
items is distributed (by the member tt(State::distributeLAsetOf)) over the
items which are implied by the item being considered. E.g., for item tt(X: a
. Y z), where tt(a) and tt(z) are any sequence of grammar symbols and tt(X)
and tt(Y) are nonterminal symbols, all of tt(Y's) production rules are added
as new items to the current state.
Then the member tt(distributeLAsetOfItem(idx)) matches the item's rule
specification with the specification tt(a.Bc), where tt(a) and tt(c) are
(possibly empty) sequences of grammatical symbols, and tt(B) is a (possibly
empty) nonterminal symbol appearing immediately to the right of the item's
dot position. if tt(B) is empty then there are no additional production rules
and tt(distributeLAsetOf) may return. Otherwise, the set tt(b = FIRST(c)) is
computed. This set holds all symbols which may follow tt(B). If tt(b) contains
epsilon() (i.e., the element representing the empty set), then the currently
defined LA set of the item can also be observed. In that case epsilon() is
removed, and the currently defined LA set is added to tt(b). Finally, the LA
sets of all items representing a production rule for tt(B) are inspected: if
tt(b) contains unique elements compared to the LA sets of those items, then
the unique elements of tt(b) are added to the LA sets of those items. Finally,
tt(distributeLAsetOfItem) is recursively called for those items whose LA sets
were enlarged.
Once the LA sets of the items of a state have thus been computed,
tt(inspectTransitions) is called to propagate the LA sets of items from where
transitions to other states are possible to the affected (kernel) items of
those other (destination) states. The member tt(inspectTransitions) inspects
all tt(Next) objects of the current state's tt(d_nextVector). Next objects
provide
itemization(
it() the state index of a state to transfer to from the current state;
it() a size_t vector of item transitions. Each element is the index of an
item in the current state (the source-item), its index is the index
of a (kernel) item of the state to transfer to (the destination
index).
)
If the LA set of a destination item can be enlarged from the LA set of the
source item then the LA sets of the destination state's items must be
recomputed. This is realized by inserting the destation state's index into the
`todo' set.
To illustrate an LA-set computation we will now compute the LA sets of (some
of) the items of the states of the grammar introduced at the beginning of this
chapter. Its augmented grammar consists of the following production rules:
verb(
1. start: start expr
2. start: // empty
3. expr: NR
4. expr: expr '+' expr
5. start_$: start
)
When analyzing this grammar, we found the following five states, consisting of
several items and transitions (kernel items are marked with K following their
item indices). Next to the items, where applicable, the goto-table is shown:
the state to go to when the mentioned grammatical symbol has been recognized:
verb(
Goto table
-----------
State 0: start
0K: start_$ -> . start 1
1: start -> . start expr 1
2: start -> .
State 1: expr NR
0K: start_$ -> start .
1K: start -> start . expr 2
2: expr -> . NR 3
3: expr -> . expr '+' expr
State 2: '+'
0K: start -> start expr .
1K: expr -> expr . '+' expr 4
State 3:
0K: expr -> NR .
State 4: expr NR
0K: expr -> expr '+' . expr 5
1: expr -> . NR 3
2: expr -> . expr '+' expr 5
State 5: '+'
0K: expr -> expr '+' expr .
1K: expr -> expr . '+' expr 4
)
Item 0 of state 0 by definition has LA symbol $, and LA computation therefore
always starts at item 0 of state 0. The interesting part of the LA set
computation is encountered in the recursive member tt(distributeLAsets):
verb(
distributeLAsetsOfItem(0)
start_$ -> . start: LA: {$}, B: start, c: {}, so b: {$}
items 1 and 2 refer to production rules of B (start) and are inspected:
1: LA(1): {}: b contains unique elements. Therefore:
LA(1) = {$}
distributeLAsetsOfItem(1):
start -> . start expr: LA: {$}, B: start, c: {expr}, so b: {NR}
inspect items 1 and 2 as they refer to production rules of B (start):
1: LA(1): {}: b contains unique elements. Therefore:
LA(1) = {$,NR}
distributeLAsetsOfItem(1)
start -> . start expr: LA: {$,NR}, B: start, c: {expr}, so b: {NR}
inspect items 1 and 2 as they refer to prod. rules of B (start):
1: LA(1): {$,NR}, so b does not contain unique elements: done
2: LA(2): {}, b contains unique elements
LA(2) = {NR}
distributeLAsetsOfItem(2)
start -> .: LA: {NR}, B: -, c: {}, so b: {NR}
inspect items 1 and 2 as they refer to prod. rules of B (start):
1: LA(1): {$,NR}, b does not contain unique elements: done
2: LA(2): {NR}, so b does not contain unique elements: done
2: LA(2): {NR}, so b does not contain unique elements: done
2: LA(2): {NR}: b contains unique elements. Therefore:
LA(2) = {$,NR}
distributeLAsetsOfItem(2)
start -> .: LA: {$,NR}, B: -, c: {}
B empty, so return.
)
So, item 0 has LA set tt({$}), items 1 and 2 have LA sets tt({$,NR}).
The next step involves propagating the LA sets to kernel items of the states
to where transitions are possible:
itemization(
it() Item 0, state 0 transits to item 0 state 1. Item 0 of state 1's
current LA set is empty, so it receives LA set tt({$}), and 1 (state 1's
index) is inserted into the tt(todo) set.
it() Item 1, state 0 transits to item 1 state 1. Item 1 of state 1's
current LA set is empty, so it receives LA set tt({$,NR}), and 1 (state 1's
index) is inserted into the tt(todo) set.
)
Following this LA set propagation the LA sets of all items of state 1 are
computed, which in turn is followed by LA propagation to other states (states
2 and 3), etc. etc.
In this grammar there are no transitions to the current state (i.e.,
transitions from state x to state x). If such transitions are encountered then
they can be ignored by tt(inspectTransitions) as the LA sets of the items of a
state have already be computed by the time tt(inspectTransitions) is called.
|