1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517
|
<!--startcut ==========================================================-->
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<HTML>
<HEAD>
<title>Complier Construction Tools, Part III LG #41</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Mozilla/4.5 [en] (X11; I; Linux 2.0.36 i686) [Netscape]">
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
ALINK="#FF0000">
<!--endcut ============================================================-->
<H4>
"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>
<P> <HR> <P>
<!--===================================================================-->
<center>
<H1><font color="maroon">Compiler Construction Tools, Part III</font></H1>
</center>
<P> <HR> <P>
<a NAME="top"></a>
<h1>
Creating A Calculator Using JFlex And CUP</h1>
by Christopher Lopes, student at Eastern Washington University
<br>April 26, 1999
<p>This is the third part of a series begun in the April 1999 issue of
Linux Gazette.
<br>[see: <a href="http://www.linuxgazette.com/issue39/sevenich.html">Compiler Construction
Tools, Part I</a> ]. <A HREF="../sevenich.html">Part II</A>, giving detailed installation instructions
for JFlex and CUP appears in this same issue.
<p>This particular example is a modified version of the calculator example
shown in the CUP manual. In particular, the companion JFlex specification
file is included. Further, that file and the associated CUP specification
file are commented extensively. The calculator example is the traditional
first example to display the use of tools in the lex/yacc family. We are
currently working on a project that would comprise a deeper example - an
initialization language for a fuzzy logic engine to be used for decision
making applications. If there is sufficient interest expressed in that
longer term project, we will prepare an article for this or another venue.
<p>
<hr WIDTH="100%">
<ul>
<li>
<a href="#jflex">Using JFlex</a></li>
<li>
<a href="#cup">Using CUP</a></li>
<li>
<a href="#main"> Main for our Calculator</a></li>
<li>
<a href="#main"> Compiling the Calculator</a></li>
<li>
<a href="#example"> Sample Input and Output</a></li>
</ul>
<a NAME="jflex"></a>
<br>
<hr WIDTH="100%">
<h3>
Using JFlex</h3>
The purpose of JFlex in this project is to build a lexical analyzer
for our calculator. This lexical analyzer, or scanner, will check
the input for our calculator and make sure all character groupings are
valid.
<p>The lexical specification file for JFlex is broken up into three sections.
Each of these sections are separated by <i>%%</i>.
<p>User Code Section
<br>%%
<br>Options and Declarations Section
<br>%%
<br>Lexical Rules Section
<h4>
User Code Section</h4>
Everything in this section will be copied into the generated lexer class
before the class declaration. In this section one typically finds
<i>package</i> and <i>import</i> statements. Our lexical specification
for this section imports two classes, <i>sym</i> and <i>java_cup.runtime.*</i>,
and looks like the following.
<p> import java_cup.runtime.*;
<br> import sym;
<p>In our example, the sym class is generated (along with the parser) by
CUP.
<h4>
Options and Declarations Section</h4>
This section contains options, lexical states, and macro declarations.
Setting options will include extra code that will be included inside the
generated scanner class. Options must begin a line and start with
a <i>%</i>. There are many options that can be included. To
obtain a list of options that can be included consult the manual that comes
with JFlex. The options used in our lexical specification are below.
<p> %class Lexer
<br> %line
<br> %column
<br> %cup
<p>The first option, class Lexer, tells JFlex to name the generated
class <i>Lexer</i> and to write the code to a file called <i>Lexer.java</i>.
The line option turns on line counting letting you access the current line
number of the input with the variable <i>yyline</i>. The column option
does a similar thing except it is for the current column number with the
variable <i>yycolumn</i>. The last option, cup, puts JFlex
into a mode that will make it compatible with a CUP generated parser, which
is what we are using.
<br><a NAME="decl"></a>
<br>You next can declare member variables and functions for use inside
the scanner. The code that can be added is Java code and is placed
between <i>%{</i> and <i>%}</i>. It will be copied into the generated
lexer class source. For our lexical specification two member functions
will be declared. These functions create <i>java_cup.runtime.Symbol</i>
objects. The first one just contains position information of the
current token. The second contains this information as well as the
value of the token. A link to this declaration is below.
<p> <a href="lcalc.htm#decl">Declarations</a>
<p>The last part of this section contains macro declarations.
Macros are used as abbreviations for regular expressions. A macro
declaration consists of a macro identifier followed by <i>=</i> and then
the regular expression that it represents. A link to the macro declarations
used in our lexical specification follows. A link is also supplied
below that contains a list of what can be used to create a regular expression
and what each item in that list means.
<p> <a href="lcalc.htm#macro">Macro
Declarations</a>
<p> List of what can be
used in <a href="reg_exp.htm">Creating Regular Expressions</a>
<h4>
Lexical Rules Section</h4>
The last section of the lexical specification contains the regular expressions
and actions that will be executed when the scanner matches the associated
regular expression. The scanner will activate the regular expression
that has the longest match. So if there existed two regular expressions
"to" and "too" the scanner would match "too" since it is the longest.
If two regular expressions are identical and have the same length then
the scanner will match the regular expression that is listed first in the
specification. If the scanner read in the string "to" and was looking
for a regular expression to match what it read in it could activate either
regular expression listed below. The second regular expression is
possible since it contains a character class which allows for the string
"to" to be matched. The scanner would pick the first regular expression
in the list below since it was listed first.
<p> "to"
<br> [a-z]*
<p>Actions can then be attached to each regular expression that the scanner
can activate when it matches that regular expression. The actions
for each regular expression are just Java code fragments that you can write.
Actions that you might want to use could be printing something out or returning
the token that the scanner just found to the parser. Example code
that prints the token found by the scanner and returns it to the parser
could be done as in the following.
<p> "+"
{ System.out.print(" + "); return symbol(sym.PLUS); }
<br> "-"
{ System.out.print(" - "); return symbol(sym.MINUS); }
<br> "*"
{ System.out.print(" * "); return symbol(sym.TIMES); }
<br> "/"
{ System.out.print(" / "); return symbol(sym.DIVIDE); }
<p>JFlex allows the programmer to refine the specification by defining
special <i>lexical states</i> used as start conditions. YYINITIAL is a
predefined lexical state and is the state in which the lexer initiates
scanning input. It's the only one we'll use. Consequently, all our regular
expressions will be recognized starting from that lexical state. However,
one can define other such states which will essentially constitute the
start of a new branch of the state machine. In the example below, lexical
state <STRING> is reached by a transition from YYINITIAL. Regular expressions
defined in that state section <STRING> will only be recognized in that
branch.
<br><a NAME="rule"></a>
<br><YYINITIAL> {
<br> \"
{ string.setLength(0); yybegin(STRING); }
<br> "="
{ return symbol(sym.EQ); }
<br> "=="
{ return symbol(sym.EQEQ); }
<br> "+"
{ return symbol(sym.PLUS); }
<br>}
<p><STRING> {
<br> \"
{ yybegin(YYINITIAL);
<br>
return symbol(sym.STRINGLITERAL,
<br>
string.toString()); }
<br> [^\n\r\"\]+
{ string.append( yytext() ); }
<br>}
<p>In the above code the scanner will begin in the state YYINITIAL.
When it matches the regular expression \", which just means it is found
a double quote, it will change the scanner to the STRING state. Now
the only regular expressions that can be matched are the regular expressions
listed for that state. So the scanner will stay in this branch until
it matches another double quote - whereupon it will return to the YYINITIAL
state again. Again, for our calculator we never employ such starting
conditions other than the original YYINITIAL state. A link to the
lexical rules we used are below.
<p> <a href="lcalc.htm#rule">Link
to Lexical Rules</a>
<br>
<p>
<hr WIDTH="100%">
<p>Link to the JFlex file <a href="lcalc.htm">lcalc.flex</a> . This
is the lexical specification used for our calculator. In it there
are lots of comments explaining what is happening. It can be copied
and both the CUP and Main files which are also supplied in this article
can be copied so you can run this example project. Instructions on
how to prepare each file and run the calculator are included. Jdk,
JFlex, and CUP are needed to do this and can be downloaded for free at
the web sites listed in this article.
<center>
<p><font size=-1>For more information on JFlex consult the JFlex manual
that is available when you download JFlex at the web site that is listed
in this article.</font></center>
<p> <a href="#top">Back to Top</a>
<br><a NAME="cup"></a>
<br>
<hr WIDTH="100%">
<h3>
Using CUP</h3>
The purpose of CUP in this project is to build a syntactic analyzer for
our calculator. This syntactic analyzer, or parser, will check the
input for our calculator and make sure it is syntactically correct.
<br>That is to say that the statements in the input are arranged in a valid
order according to our syntax specification.
<p>The specification syntax for a CUP file is broken up into four sections.
<ol>
<li>
Preliminary Declarations</li>
<li>
Declarations of Terminals and Non Terminals</li>
<li>
Precedence and Associativity of Terminals</li>
<li>
Grammar</li>
</ol>
<a NAME="prelim"></a>
<h4>
Preliminary Declarations</h4>
This section provides preliminary and miscellaneous declarations to specify
how the parser is to be generated and supply parts of the runtime code.
This section is optional and doesn't need to be included in a CUP specification
file. For our calculator we will have three items in this section.
The first item will be an import declaration. We will import the
class <i>java_cup.runtime.*</i>.
<p> import java_cup.runtime.*;
<p>The next item we will add is parser code. The parser code will
be placed directly into the generated parser class definition. It
begins with <i>parser code {:</i> and ends with <i>:} </i>with all coded
inserted in between. In the parser code we will change two methods.
We will change the report_error and report_fatal_error method.
We will modify these methods so the if a syntactic error or a fatal error
occurs in the input then the error message that will be printed out will
contain the line and column number in the input of where the error occurred.
This extra information in error messages could prove very helpful when
determining errors in the input.
<p> <a href="ycalc.htm#parser_code">Link
to Parse Code</a>
<p>The last item we will add in this section indicates how the parser should
ask for the next token from the scanner and has the form <i>scan with {:
... :}</i>. We will use this to tell the parser to call the scanner
we created with JFlex.
<p> scan with {: return lexer.yylex();
:};
<br>
<h4>
Declarations of Terminals and Non Terminals</h4>
This section contains the symbol list and contains declarations that are
responsible for naming and supplying a type for each terminal and non terminal.
This section is required in a CUP specification. Terminals are declare
with the syntax <i>terminal classname name1, name2, ...;</i>. Classname
is the type of the object, such as Integer. If no classname is given
then the terminal has no content for the lexer to pass up to the parser..
After the classname the name of the terminals are listed that you want
to declare of that type as in the following.
<p> terminal PLUS, MINUS, TIMES,
DIVIDE, SEMI;
<br> terminal Integer NUMBER;
<p>Note that only NUMBER has an accompanying <i>classname</i>. In our example,
it is the only terminal that carries content. For example, when the
lexer recognizes a PLUS, it passes the associated code to the parser; but
when it recognizes a NUMBER it not only passes the associated code for
NUMBER, but also its value within the type wrapper class, <i>Integer</i>.
<p>Non terminals are declared in the same manner. The only difference
is the beginning of the declaration reflects that it is a non terminal
instead of a terminal as in the following.
<p> non terminal expr_list, expr_part;
<br> non terminal Integer expr;
<br><a NAME="prec"></a>
<h4>
Precedence and Associativity of Terminals</h4>
This section specifies the precedence and associativity of terminals, it
is an optional section that doesn't have to be included. This section
can be used when parsing ambiguous terminals. Instead of using this
section you could structure the grammar so that it is not ambiguous.
For instance TIMES should have a higher precedence then PLUS. When
the parser runs into a statement such as 5+4*3 it doesn't know whether
the expression needs to be calculated as 5+(4*3) or (5+4)*3. To eliminate
this ambiguity using this section you would declare the precedence as below.
The highest precedence starts at the bottom of the list and the precedence
gets less going up. The word left means that the associativity of
the terminals at that precedence level goes from left to right.
<p> precedence left PLUS, MINUS;
<br> precedence left TIMES, DIVIDE;
<p>To structure a grammar to eliminate the ambiguity you would create a
structure like the one below. This structure eliminates the ambiguity
because TIMES is further down in the grammar than PLUS. This will
result in TIMES being applied before PLUS as you go back up the grammar.
<p> Example of <a href="ycalc.htm#grammar">Grammar
Structure</a>
<br>
<h4>
Grammar</h4>
The last section in the specification syntax contains the grammar for the
parser. Each production in the grammar has a non terminal left hand
side followed by <i>::=</i>, which is then followed by zero or more actions,
terminals, or non terminals, and then followed by a semicolon. Each
symbol on the right hand side can be labeled with a name, which can be
used to carry content (e.g. a value) up the parse tree. A label name
is given by a colon after the symbol name, then the name of the label as
in the following where e1 and e2 are labels for expr. The left hand
side automatically is assigned the label RESULT. An example using
the label RESULT appears latter in this section.
<p> expr ::= expr:e1 PLUS expr:e2
<p>The label names must be unique in the production. If there exists
several productions for the same non terminal they can be declared together
and separated by |. The semicolon then needs to be placed at the
end of the last production as in the following.
<p> expr ::= expr PLUS expr
<br>
| expr MINUS expr
<br>
;
<p>Actions can also be inserted into the production. The action is
just Java code and will be executed when the production has been recognized.
Action is placed between the delimiters <i>{:</i> and <i>:}</i>.
An example of part of a grammar with these options is below. A link
to a file with the specification syntax for CUP named <i>ycalc.cup</i>
also follows and the grammar in it can be studied.
<p> expr
::= factor:f PLUS expr:e
<br>
{: RESULT = new Integer(f.intValue() + e.intValue()); :}
<br>
|
<br>
factor:f MINUS expr:e
<br>
{: RESULT = new Integer(f.intValue() - e.intValue()); :}
<br>
|
<br>
factor:f
<br>
{: RESULT = new Integer(f.intValue()); :}
<br>
;
<br><a NAME="cupcode"></a>
<br>
<hr WIDTH="100%">
<p>Link to the CUP file <a href="ycalc.htm">ycalc.cup.</a>
This is the specification syntax used for our calculator. In it there
are lots of comments explaining what is happening. It can be copied
and both the JFlex and Main files which are also supplied in this article
can be copied so you can run this example project. Instructions on
how to prepare each file and run the calculator are included. Jdk,
JFlex, and CUP are needed to do this and can be downloaded for free at
the web sites listed in this article.
<center>
<p> <font size=-1>For more information on CUP consult the CUP manual
that is available at the web site listed in this article.</font></center>
<p> <a href="#top">Back to Top</a>
<br><a NAME="main"></a>
<br>
<hr WIDTH="100%">
<h3>
Main for our Calculator</h3>
There is more than one way you can write the main for our project.
One way expects the user to enter input as the program runs. The
other way requires that you give it the name of an input file when you
start up the program. The main described here uses the second way
mentioned to retrieve input. The first thing we do is import
three classes that we will use. The first class is for our parser,
the next is the java_cup.runtime.Symbol class, and the last is the java.io.*;
class. We then declare are class <i>Main</i>. In it we will
call the parser to begin the syntactic analysis of the input file.
The parser will then call the scanner, that will lexically analyze the
input, when the parser needs the next token in the input file. The
class <i>Main</i> contains two items. It first sets the variable
<i>do_debug_parse</i>
to false. We then define a method called
<i>main</i>.
We pass into <i>main</i> an array of strings which contains the parameters
passed on the command line when the program was started. So in our
case the first element of the string will contain the name of the text
file we passed in when we started the program. The method then goes
into a <i>try</i> block which is what actually calls the parser.
The <i>try</i> block means that whatever is in the <i>try</i> block will
attempted. If something fails, the program will exit that block.
The first line in the <i>try</i> block creates a new parser object.
The parser object invokes a new Lexer object. The new Lexer object
will use the string passed into <i>main</i> for its input when it is created.
The second line will then start the parser. The code for the above
follows.
<p> try {
<br> parser p = new parser(new
Lexer(new FileReader(argv[0])));
<br> Object result = p.parse().value;
<br> }
<p>Following the <i>try</i> block is a c<i>atch</i> block. The purpose
of the <i>catch</i> block is to clean up an errors that happened in the
<i>try</i>
block. The <i>catch</i> block will take the exception, the reason
why we were kicked out of the <i>try</i> block, and do whatever is needed
to clean things up before the program exits. We don't do anything
in the contents of our <i>catch</i> block. After the <i>catch</i>
block we have the method <i>finally</i>. This method closes everything
out. We don't do anything in this method either. The code for
the <i>catch</i> block and method <i>finally</i> are below.
<p> catch (Exception e) {
<br>
/* do cleanup here -- possibly rethrow e */
<br>
} finally {
<br>
/* do close out here */
<br>
}
<br>
}
<p>That completes the contents of the method <i>main</i> and the class
<i>Main</i>.
We now have created a simple calculator using JFlex as our lexical analyzer
and CUP as our syntactical analyzer.
<p>
<hr WIDTH="100%">
<p>Link to the Java file <a href="main.htm">Main.java</a> <a href="ycalc.htm">.</a>
This is the main used for our calculator. In it there are comments
explaining what is happening. It can be copied and both the JFlex
and CUP files which are also supplied in this article can be copied so
you can run this example project. Instructions on how to prepare
each file and run the calculator are included. Jdk, JFlex, and CUP
are needed to do this and can be downloaded for free at the web sites listed
in this article.
<p> <a href="#top">Back to Top</a>
<br><a NAME="compiling"></a>
<br>
<hr WIDTH="100%">
<h3>
Compiling the Calculator</h3>
To setup the files to run the calculator you first need to use JFlex on
the lexical specification file lcalc.flex. This will produce the
file Lexer.java. The next step is to setup the CUP file ycalc.cup.
Afterwards you compile the Lexer.java file that was created by JFlex.
You finish the process by finally compiling the Main.java file. To
do the above you would enter the following at the command line.
<p> > jflex lcalc.flex
<br> > java java_cup.Main <
ycalc.cup
<br> > javac Lexer.java
<br> > javac Main.java
<p>Then to run the calculator you would enter the following at the command
line. The file test.txt is the input file for the calculator that
will be scanned and parsed.
<p> > java Main test.txt
<p> <a href="#top">Back to Top</a>
<br><a NAME="example"></a>
<hr WIDTH="100%">
<h3>
Sample Input and Output</h3>
A sample input file could look like the following.
<p> 2+4;
<br> 5*(6-3)+1;
<br> 6/3*5+20;
<br> 4*76/31;
<p>and so on. The output for the following input should then appear
as follows.
<p> 2 + 4 = 6
<br> 5 * ( 6 - 3 ) + 1 = 16
<br> 6 / 3 * 5 + 20 = 30
<br> 4 * 76 / 31 = 9
<p> <a href="#top">Back to Top</a>
<P> <HR> <P>
<center><H4>Previous ``Compiler Construction Tools'' Columns</H4></center>
<p>
<A HREF="../../issue39/sevenich.html">Compiler Construction Tools Part I, April 1998</A><BR>
<A HREF="../sevenich.html">Compiler Construction Tools Part II, May 1998</A><BR>
<!--===================================================================-->
<P> <hr> <P>
<center><H5>Copyright © 1999, Christopher Lopes <BR>
Published in Issue 41 of <i>Linux Gazette</i>, May 1999</H5></center>
<!--===================================================================-->
<P> <hr> <P>
<A HREF="../lg_toc41.html"><IMG ALIGN=BOTTOM SRC="../../gx/indexnew.gif"
ALT="[ TABLE OF CONTENTS ]"></A>
<A HREF="../../lg_frontpage.html"><IMG ALIGN=BOTTOM SRC="../../gx/homenew.gif"
ALT="[ FRONT PAGE ]"></A>
<A HREF="../sevenich.html"><IMG SRC="../../gx/back2.gif"
ALT=" Back "></A>
<A HREF="../searls.html"><IMG SRC="../../gx/fwd.gif" ALT=" Next "></A>
<P> <hr> <P>
<!--startcut ==========================================================-->
</BODY>
</HTML>
<!--endcut ============================================================-->
|