1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
|
The tt(FIRST) set defines all terminal tokens that can be encountered when
beginning to recognize a grammatical symbol. For each grammatical symbol
(terminal and nonterminal) a tt(FIRST) set can be determined as follows:
itemization(
it() The tt(FIRST) set of a terminal symbol is the symbol itself.
it() The tt(FIRST) set of an empty alternative is the empty set. The empty
set is indicated by epsilon() and is considered an actual element of the
tt(FIRST) set (So, a tt(FIRST) set could contain two elements:
tt('+') em(and) epsilon()).
it() If X has a production rule tt(X: X1 X2 X3..., Xi, ...Xn), then
initialize fst(X) to empty (i.e., not even holding epsilon()). Then,
for each Xi (1..n):
itemization(
it() add fst(Xi) to fst(X)
it() stop when fst(Xi) does not contain epsilon()
)
If fst(Xn) does not contain epsilon() remove epsilon() from fst(X) (unless
analyzing another production rule) epsilon() is already part of fst(X).
)
When starting this algorithm, only the nonterminals need to be
considered. Also, required tt(FIRST) sets may not yet be available. Therefore
the above algorithm iterates over all nonterminals until no changes were
observed. In the algorithm tt($) is not considered.
Applying the above algorithm to the rules of our grammar we get:
table(4)(llll)(
rowline()
fstrow(nonterminal)(rule)(FIRST set)
rowline()
fstrow(tt(start_$)) (tt(start)) (not yet available)
fstrow(tt(start)) (tt(start expr)) (not yet available)
fstrow(tt(start)) (tt(// empty)) (epsilon())
fstrow(tt(expr)) (tt(NR)) (tt(NR))
fstrow(tt(expr)) (tt(expr '+' expr)) (tt(NR))
rowline()
row(cell(changes in the next cycle:))
fstrow(tt(start)) (tt(start expr)) (tt(NR) epsilon())
fstrow(tt(start)) (tt(// empty)) (tt(NR) epsilon())
rowline()
row(cell(changes in the next cycle:))
fstrow(tt(start_$)) (tt(start)) (tt(NR) epsilon())
rowline()
row(cell(no further changes))
)
|