File: README.states

package info (click to toggle)
bisonc%2B%2B 6.04.01-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 5,864 kB
  • sloc: cpp: 10,112; ansic: 1,434; makefile: 1,069; sh: 522; yacc: 84; lex: 60
file content (243 lines) | stat: -rw-r--r-- 9,326 bytes parent folder | download | duplicates (9)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
This file contains a short description about the way states and look-aheads
(LA's) are constructed. The example does not cover new information. In fact,
it follows the construction of the states of the grammar given in example 4.42
in Aho et al. 

A more extensive description is found in documentation/manual/algorithm.yo

The grammar has the following production rules (numbered 1-3):

1   S'  ->  S
2   S   ->  CC
3   C   ->  cC
4   C   ->  d

Production rule 1 is added to the grammar, representing the so-called
augmented grammar. When this rule is recognized at end-of-input (represented
by $) then the input is grammatically correct, and the parser will flag
`success'.

$ is called a `look ahead' (LA) token. LA's may be in sets, which are used to
determine whether a rule (when potentially recognized) will be reduced or
not. So, when recognizing S, it is only reduced to S' (by rule S' -> S) if $
is indeed the next input token (i.e., if the end-of-input has been reached.


With the starting rule an `item' is associated. An items consists of a rule,
having a `dot' and a LA-set. The dot (.) represents the point to where input
has been recognized. So, initially we have, starting from rule 1 and using  []
to contain the LA token(s):

    S' -> . S   [$]

This is the starting point, state 0, and this point of departure defines the
`kernel' of a state. To the kernel additional (production) rules and new
LA-sets may be added as follows:

    New rules: If a . is followed by a non-terminal all production rules of 
that non-terminal are added to the item as new items having dots in postion
0. This process is followed for the newly added production rules as well until
no more rules are added.

    LA-sets: Here is where it gets a bit complicated. To find LA-sets the
FIRST function must be defined first. 

    The FIRST function defines the set of terminal tokens that can be
encountered at the start of a (series of) grammatical tokens. This FIRST-set
is determined as follows:

    - if X is a terminal token x, FIRST(X) is [x].
    - if X is empty FIRST(X) is [e] (the empty set)
    - if X is a production rule X1 X2 ... Xn then:
            FIRST(X) is FIRST(X1), except [e].
                    If [e] was removed, FIRST(X2) is added to the set, 
                    again removing [e], if it was now added to the set.
                    This process is repeated until an Xi was without [e], or
                    until FIRST(Xn) was added to the set. In this final case,
                    if FIRST(Xn) contains [e] it is kept

    For our little grammar, we have the following FIRST sets:

    S'  ->  S   FIRST(S')   = FIRST(S)
    S   ->  CC  FIRST(S)    = FIRST(C)
    C   ->  cC  FIRST(cC)   = [c]
    C   ->  d   FIRST(d)    = [d]
                
        so, FIRST(S') = FIRST(S) = FIRST(C) = FIRST(S) = [cd]

    Back to the LA-sets: In a state having an item:

        A -> x . B y, [a]

    with A, B non-terminals, x, y possibly empty series of terminal symbols
and [a] the LA-set belonging to this item (so, x . B y is a production rule of
A with a dot), all B's production rules are added to the state having
FIRST(ya) as its LA-set.

    Having constructed a state, the dots are moved one symbol to the
right. Every unique grammar symbol thus shifted represents a `shift' in the
parsing process to another state. A shift may result in a shift to an earlier
state. This earlier state may thus see its LA-set expanded, which expansion
must then propagate to all states derived from that state. When no shift is
possible, a reduction takes place. If a state has only a single reduction,
then this becomes the default reduction (on any symbol not resulting in a
shift). If a state has multiple reductions, then this is ok if their LA-sets
are different. Otherwise the grammar contains a reduce-reduce
conflict. Analogously, if a reduction on a certain LA-set is indicated, but
there's also a shift required for (one or more) terminal tokens in the LA-set,
then the grammar contains a shift-reduce conflict.

    At each rule in each state actions/gotos are specified: `goto' a state if
the symbol following a dot is a nonterminal, `shift' to a state if the symbol
following a dot is a terminal, or reduce according to a production rule if the
dot is at the end of a production rule. This indicates that a rule has been
recognized, and if an action block has been defined for such a production
rule, it is executed at that point (i.e., at the point where the reduction
takes place).

    Now it's time to start constructing states, called S0, S1, S2,
etc. Production rules are numbered by their original number. 


S0:
1       S'  -> . S   [$]                                    On S goto S1
                    for               S' ->   . S    [$] 
                    compute, matching A  -> x . B y, [a], 
                       FIRST(y a)
                    as FIRST(  $), which is: [$], LA for the next S rule:
add S-rule:
2       S   -> . C C [$]                                    On C goto S2
                    for               S  ->   . C C  [$] 
                    compute, matching A  -> x . B y, [a], 
                       FIRST(y a)
                    as FIRST(C $), which is: [cd], LA for the next C rules:
add C-rules:
3       C   -> . c C    [cd]                                On c shift S3
                                
4       C   -> . d      [cd]                                On d shift S4
                    having (of course) the same LA set. `Of course',
                    since the LA set does not depend on this production
                    rule, but on the rule causing this addition. So, once
                    a nonterminal is added, the LA-set of its production
                    rules is determined once.


Continuing this way, each of the previous state's items is processed in turn:

S1:
1       S'  ->  S .     [$]                         Reduce to S' by rule 1. 
                                                    This is the ACCEPT state


S2:
2       S   -> C . C    [$]                                     On C goto S5
                    for               S  -> C .   C  [$] 
                    compute, matching A  -> x . B y, [a]
                       FIRST(y a)
                    as FIRST(  $), which is: [$], LA for the next C rules:

3       C   -> . c C    [$]                                     On c shift S3
                            (Adding [$] to S3's LA-set)

4       C   -> . d      [$]                                     On d shift S4
                            (Adding [$] to S4's LA-set)


S3:
3       C   -> c . C    [cd]+[$] from S2                        On C goto S6
                            (Adding [cd] to S5's LA-set)
                            for               C  -> c . C    [cd]+[$]
                            compute, matching A  -> x . B y, [a]
                               FIRST(y a   )
                            as FIRST(  [cd]), which is: [cd]+[$]
                                
3       C   -> . c C    [cd]+[$]                                On c shift S3

4       C   -> . d      [cd]+[$]                                On d shift S4


S4:
4       C   -> d .      [cd]+[$] from S2            Reduce to C by rule 4

S5:
2       S   -> C C .    [$]                         Reduce to S by rule 2
        
S6:
3       C   -> c C .    [cd]+[$]                    Reduce to C by rule 3



Now, the action/goto table can be summarized:

------------------------------------
             action         goto
------------------------------------
State       c   d   $       S   C
------------------------------------
    0       s3  s4          s1  s2
    1               acc
    2       s3  s4              s5
    3       s3  s4              s6
    4       r4  r4  r4
    5               r2 [d]
    6       r3  r3  r3
------------------------------------
[d]: use this by default for every    
     next token





If a grammatical rule has an empty production the same procedure is followed.

Consider:
1   S' -> S
2   S -> S E
3   S -> 
4   E -> n
5   E -> i

(representing a grammar accepting possibly empty series of numbers and
identifiers). The grammar is constructed as follows:

    FIRST(E) = [in], FIRST(S') = FIRST(S) = [in$]

S0:
    S'  -> . S      [$]             On S: goto S1
Obtaining [$] from the above rule:
    S   -> . S E    [$]             On S: goto S1
    S   -> .        [$]             Default: reduce to S

S1:
    S'  -> S .      [$]             On $: ACCEPT
    S   -> S . E    [$]             On E: goto S2
            Add E-rules: match A -> b . B y  [a]
                         with: S -> S . E    [$]                
    E   -> . n      [$]             On n shift S3
    E   -> . i      [$]             On i shift S4

S2:
    S   -> S E .    [$]             REDUCE

S3:
    E   -> n .      [$]             REDUCE

S4:
    E   -> i .      [$]             REDUCE


Action/Goto table:

------------------------------------
             action         goto
------------------------------------
State       i   n   $       S   E
------------------------------------
    0       r3[d]           s1    
    1       s3  s4  acc         s2
    2       r2[d]
    3       r4[d]                   
    4       r5[d]      
------------------------------------