1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262
|
/*
* Copyright © 2002 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
* California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. has
* intellectual property rights relating to technology embodied in the product
* that is described in this document. In particular, and without limitation,
* these intellectual property rights may include one or more of the U.S.
* patents listed at http://www.sun.com/patents and one or more additional
* patents or pending patent applications in the U.S. and in other countries.
* U.S. Government Rights - Commercial software. Government users are subject
* to the Sun Microsystems, Inc. standard license agreement and applicable
* provisions of the FAR and its supplements. Use is subject to license terms.
* Sun, Sun Microsystems, the Sun logo and Java are trademarks or registered
* trademarks of Sun Microsystems, Inc. in the U.S. and other countries. This
* product is covered and controlled by U.S. Export Control laws and may be
* subject to the export or import laws in other countries. Nuclear, missile,
* chemical biological weapons or nuclear maritime end uses or end users,
* whether direct or indirect, are strictly prohibited. Export or reexport
* to countries subject to U.S. embargo or to entities identified on U.S.
* export exclusion lists, including, but not limited to, the denied persons
* and specially designated nationals lists is strictly prohibited.
*/
This file describes in a step by step manner how you can modify the
lookaheads in your grammar to optimize for space in the generated
parser. This is only necessary until JavaCC is improved to do this
optimization automatically (at which time, we will delete this file).
1. How much space optimization can one expect?
The optimizations that we show here gets rid of all methods in the
generated parser which start with "jj_2" and "jj_3" respectively. So
take a look at your generated parser and figure out how much space is
occupied by these methods. When we performed this optimization on the
Java grammar, we were able to decrease the size of the generated
parser by more than 50%.
2. Why are "jj_2" and "jj_3" methods generated?
These methods are generated to handle all lookaheads except the
following:
a. A syntactic lookahead of 1 (the most common situation).
b. For any amount of syntactic lookahead - when the target expansion can
match the empty string.
c. When the syntactic lookahead is 0. Typically there is only semantic
lookahead this situation.
The "jj_2" and "jj_3" methods are also generated for lookaheads in the
(a) and (b) categories above when hoisting of semantic predicates is
required.
3. Summary of how we achieve the optimization:
Essentially, we go through the entire grammar looking for choice
points that are not one of (a), (b), or (c) above. We then replace
the lookaheads at these points with one of kind (c) above.
When we do this, JavaCC generates a parser where all the new semantic
lookaheads causes hoisting which affect other related choice points
and cause the "jj_2" and "jj_3" methods to be generated there.
So the final step is to locate these related choice points and insert
explicit lookahead specifications here to prevent hoisting from taking
place.
4. When is a grammar amenable to such optimization?
If the majority of the choice points in your grammar use lookaheads
that fall into the categories (a), (b), or (c) above then go for it.
Otherwise, you'll be spending too much work and essentially rewriting
the entire parser. It took a few hours to optimize the Java grammar.
But I would not recommend doing it for C++ (for C++, just wait for
JavaCC to be upgraded to do it automatically).
5. Step by step instructions:
Suppose your grammar file is called Grammar.jj. You are going to go
through two steps - in the first step, you create Grammar1.jj, and
in the second step you create the final grammar file - Grammar2.jj.
Lets also assume that the parsers generated from these grammar files
are Grammar.java, Grammar1.java, and Grammar2.java respectively.
STEP 1:
For every lookahead in your grammar that does not fit into categories
(a), (b), or (c) above, replace it with an equivalent semantic
lookahead. When you're done with all such replacements, the resulting
file is Grammar1.jj.
Example:
In the Java1.1.jj grammar, we have the following production for Name:
void Name() :
{}
{
<IDENTIFIER>
( LOOKAHEAD(2)
"." <IDENTIFIER>
)*
}
It uses a lookahead of 2 - obviously this means that the choice
selection succeeds only if the next two characters are a "." and
an <IDENTIFIER>. The following modification replaces the syntactic
lookahead with a semantic lookahead:
void Name() :
{}
{
<IDENTIFIER>
( LOOKAHEAD( { getToken(1).kind == DOT && getToken(2).kind == IDENTIFIER } )
"." <IDENTIFIER>
)*
}
STEP 2:
Run JavaCC on both Grammar.jj and Grammar1.jj to obtain Grammar.java
and Grammar1.java. Remember, the goal is to get rid of "jj_2" and
"jj_3" methods from the grammar. Changes are you will see a bunch of
"jj_2" and "jj_3" methods in Grammar1.java. We need to focus only
on the "jj_2" methods. When these methods are removed, the "jj_3"
methods automatically go away too. In the Java grammar, we found 23
"jj_2" methods to get rid of in this step.
Search for calls to "jj_2" methods in Grammar1.java. Locate the
corresponding portions in Grammar1.jj (the choice points in the grammar
corresponding to these calls). These calls are there to perform
hoisting (which you don't want - you don't need to understand hoisting
to perform this step).
Now look for corresponding locations in Grammar.java. You will see
references to a token mask array instead of a "jj_2" call at these
locations. Take a look at the value of the token mask array. Use
these values to create a new lookahead for this choice point to get
rid of the "jj_2" call.
The resulting grammar (Grammar2.jj) is the grammar you want.
Obviously, we need an example:
The Java parser generated after performing step 1 has the following
method with a call to a "jj_2" method ("jj_2_3"):
static final public void UnmodifiedInterfaceDeclaration() throws ParseException {
jj_consume_token(INTERFACE);
jj_consume_token(IDENTIFIER);
if (jj_mask_29[getToken(1).kind]) {
jj_consume_token(EXTENDS);
NameList();
} else {
jj_expLA1[29] = jj_gen;
;
}
jj_consume_token(LBRACE);
label_8:
while (true) {
if (jj_2_3(1)) {
;
} else {
break label_8;
}
InterfaceMemberDeclaration();
}
jj_consume_token(RBRACE);
}
The corresponding location in the Java grammar is:
void UnmodifiedInterfaceDeclaration() :
{}
{
"interface" <IDENTIFIER> [ "extends" NameList() ]
"{"
( // <-- the choice point
InterfaceMemberDeclaration()
)*
"}"
}
The choice point corresponding to the call to "jj_2_3" above is
marked by the arrow.
Now lets take a look at the parser generated from the original Java
grammar. The above method is shown below:
static final public void UnmodifiedInterfaceDeclaration() throws ParseException {
jj_consume_token(INTERFACE);
jj_consume_token(IDENTIFIER);
if (jj_mask_40[getToken(1).kind]) {
jj_consume_token(EXTENDS);
NameList();
} else {
jj_expLA1[40] = jj_gen;
;
}
jj_consume_token(LBRACE);
label_9:
while (true) {
if (jj_mask_41[getToken(1).kind]) {
;
} else {
jj_expLA1[41] = jj_gen;
break label_9;
}
InterfaceMemberDeclaration();
}
jj_consume_token(RBRACE);
}
Instead of the call to "jj_2_3", there is a reference to the token
mask array "jj_mask_41". Now take a look at the initialization of
"jj_mask_41" that occurs a few lines further on:
static boolean[] jj_mask_41 = new boolean[120];
static {
jj_mask_41[ABSTRACT] =
jj_mask_41[BOOLEAN] =
jj_mask_41[BYTE] =
jj_mask_41[CHAR] =
jj_mask_41[CLASS] =
jj_mask_41[DOUBLE] =
jj_mask_41[FINAL] =
jj_mask_41[FLOAT] =
jj_mask_41[INT] =
jj_mask_41[INTERFACE] =
jj_mask_41[LONG] =
jj_mask_41[NATIVE] =
jj_mask_41[PRIVATE] =
jj_mask_41[PROTECTED] =
jj_mask_41[PUBLIC] =
jj_mask_41[SHORT] =
jj_mask_41[STATIC] =
jj_mask_41[SYNCHRONIZED] =
jj_mask_41[TRANSIENT] =
jj_mask_41[VOID] =
jj_mask_41[VOLATILE] =
jj_mask_41[IDENTIFIER] = true;
}
This defines the tokens that must be matched at the choice point
in order for that choice point to succeed. Use this to explicitly
insert a lookahead specification into the grammar to prevent the
hoisting of semantic predicates. Hence the resulting production
in the final grammar is:
void UnmodifiedInterfaceDeclaration() :
{}
{
"interface" <IDENTIFIER> [ "extends" NameList() ]
"{"
(
LOOKAHEAD(1, <ABSTRACT> | <BOOLEAN> | <BYTE> | <CHAR> | <CLASS> | <DOUBLE> |
<FINAL> | <FLOAT> | <INT> | <INTERFACE> | <LONG> | <NATIVE> |
<PRIVATE> | <PROTECTED> | <PUBLIC> | <SHORT> | <STATIC> |
<SYNCHRONIZED> | <TRANSIENT> | <VOID> | <VOLATILE> | <IDENTIFIER>)
InterfaceMemberDeclaration()
)*
"}"
}
|