1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587
|
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.29
from ../tnf/syntax.tnf on 12 Febuary 2003 -->
<TITLE>Syntactic Analysis - Context-Free Grammars and Parsing</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000" BACKGROUND="gifs/bg.gif">
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0" VALIGN=BOTTOM>
<TR VALIGN=BOTTOM>
<TD WIDTH="160" VALIGN=BOTTOM><IMG SRC="gifs/elilogo.gif" BORDER=0> </TD>
<TD WIDTH="25" VALIGN=BOTTOM><img src="gifs/empty.gif" WIDTH=25 HEIGHT=25></TD>
<TD ALIGN=LEFT WIDTH="600" VALIGN=BOTTOM><IMG SRC="gifs/title.gif"></TD>
</TR>
</TABLE>
<HR size=1 noshade width=785 align=left>
<TABLE BORDER=0 CELLSPACING=2 CELLPADDING=0>
<TR>
<TD VALIGN=TOP WIDTH="160">
<h4>General Information</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="index.html">Eli: Translator Construction Made Easy</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="gindex_toc.html">Global Index</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="faq_toc.html" >Frequently Asked Questions</a> </td></tr>
</table>
<h4>Tutorials</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="EliRefCard_toc.html">Quick Reference Card</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="novice_toc.html">Guide For new Eli Users</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="news_toc.html">Release Notes of Eli</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="nametutorial_toc.html">Tutorial on Name Analysis</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="typetutorial_toc.html">Tutorial on Type Analysis</a></td></tr>
</table>
<h4>Reference Manuals</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ui_toc.html">User Interface</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="pp_toc.html">Eli products and parameters</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lidoref_toc.html">LIDO Reference Manual</a></td></tr>
</table>
<h4>Libraries</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lib_toc.html">Eli library routines</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="modlib_toc.html">Specification Module Library</a></td></tr>
</table>
<h4>Translation Tasks</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lex_toc.html">Lexical analysis specification</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="syntax_toc.html">Syntactic Analysis Manual</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="comptrees_toc.html">Computation in Trees</a></td></tr>
</table>
<h4>Tools</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lcl_toc.html">LIGA Control Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="show_toc.html">Debugging Information for LIDO</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="gorto_toc.html">Graphical ORder TOol</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="fw_toc.html">FunnelWeb User's Manual</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ptg_toc.html">Pattern-based Text Generator</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="deftbl_toc.html">Property Definition Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="oil_toc.html">Operator Identification Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="tp_toc.html">Tree Grammar Specification Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="clp_toc.html">Command Line Processing</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="cola_toc.html">COLA Options Reference Manual</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="idem_toc.html">Generating Unparsing Code</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="mon_toc.html">Monitoring a Processor's Execution</a> </td></tr>
</table>
<h4>Administration</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="sysadmin_toc.html">System Administration Guide</a> </td></tr>
</table>
<HR WIDTH="100%">
<CENTER> <A HREF="mailto:elibugs@cs.colorado.edu"><IMG SRC="gifs/button_mail.gif" NOSAVE BORDER=0 HEIGHT=32 WIDTH=32></A><A HREF="mailto:elibugs@cs.colorado.edu">Questions, Comments, ....</A></CENTER>
</TD>
<TD VALIGN=TOP WIDTH="25"><img src="gifs/empty.gif" WIDTH=25 HEIGHT=25></TD>
<TD VALIGN=TOP WIDTH="600">
<H1>Syntactic Analysis</H1>
<P>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_2.html"><IMG SRC="gifs/next.gif" ALT="Next Chapter" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_toc.html"><IMG SRC="gifs/up.gif" ALT="Table of Contents" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT="">
<HR size=1 noshade width=600 align=left>
<H1><A NAME="SEC1" HREF="syntax_toc.html#SEC1">Context-Free Grammars and Parsing</A></H1>
<P>
A <DFN>context-free grammar</DFN>
<A NAME="IDX10"></A>
<A NAME="IDX9"></A>
is a formal system that describes a language by
specifying how any legal text can be derived from a distinguished symbol
called the <DFN>axiom</DFN>,
<A NAME="IDX11"></A>
or <DFN>sentence symbol</DFN>.
<A NAME="IDX12"></A>
It consists of a set of <DFN>productions</DFN>,
<A NAME="IDX13"></A>
each of which states that a given symbol can be replaced by a given sequence
<A NAME="IDX14"></A>
of symbols.
To derive a legal text,
<A NAME="IDX15"></A>
the grammar is used as data for the following algorithm:
<P>
<OL>
<LI>
Let <CODE>text</CODE> be a single occurrence of the axiom.
<P>
<LI>
If no production states that a symbol currently in <CODE>text</CODE> can be replaced
by some sequence of symbols, then stop.
<P>
<LI>
Rewrite <CODE>text</CODE> by replacing one of its symbols with a sequence
according to some production.
<P>
<LI>
Go to step (2).
</OL>
<P>
When this algorithm terminates, <CODE>text</CODE> is a legal text in the language.
The <DFN>phrase structure</DFN>
<A NAME="IDX16"></A>
of that text is the hierarchy of sequences used in its derivation.
<P>
Given a context-free grammar that satisfies certain conditions,
Eli can generate a <DFN>parsing routine</DFN>
<A NAME="IDX17"></A>
to determine the derivation (and hence the phrase structure) of any legal text.
This routine will also automatically detect and report any errors
<A NAME="IDX19"></A>
<A NAME="IDX20"></A>
<A NAME="IDX21"></A>
<A NAME="IDX18"></A>
in the text, and repair
<A NAME="IDX22"></A>
them to produce a correct phrase structure
(which may not be that intended by the person who wrote the erroneous text).
<P>
<H2><A NAME="SEC2" HREF="syntax_toc.html#SEC2">How to describe a context-free grammar</A></H2>
<P>
Each production of a context-free grammar consists of a symbol to be replaced
and the sequence that replaces it.
This can be represented in a type-<TT>`con'</TT> file
<A NAME="IDX24"></A>
<A NAME="IDX25"></A>
<A NAME="IDX23"></A>
by giving the symbol to be replaced, followed by a colon,
followed by the sequence that replaces it, followed by a period:
<P>
<PRE>
Assignment: Variable ':=' Expression.
StatementList: .
Statement:
'if' Expression 'then' Statement
'else' Statement.
</PRE>
<P>
The first production asserts that the symbol <CODE>Assignment</CODE> can be replaced
by the sequence consisting of the three symbols <CODE>Variable</CODE>, <CODE>':='</CODE>,
and <CODE>Expression</CODE>.
Any occurrence of the symbol <CODE>StatementList</CODE> can be replaced by an empty
sequence according to the second production.
In the third production, you see that new lines can be used as separators
in the description of a production. This notation is often more commonly
referred to as <DFN>Backus Naur Form</DFN>, or just <DFN>BNF</DFN>.
<A NAME="IDX27"></A>
<A NAME="IDX26"></A>
<P>
Symbols that are to be replaced are called <DFN>nonterminals</DFN>,
<A NAME="IDX28"></A>
and are always represented by <DFN>identifiers</DFN>.
<A NAME="IDX29"></A>
(An identifier is a sequence of letters and digits, the first of which is a
letter.)
Every nonterminal must appear before a colon in at least one production.
The axiom is a nonterminal that appears before the colon in exactly one
production, and does not appear between the colon and the period in any
production.
There must be exactly one nonterminal satisfying the conditions for the axiom.
<P>
Symbols that cannot be replaced are called <DFN>terminals</DFN>,
<A NAME="IDX30"></A>
and may be represented by either identifiers or <DFN>literals</DFN>.
<A NAME="IDX31"></A>
(A literal is a sequence of characters bounded by apostrophes (<KBD>'</KBD>).
An apostrophe appearing within a literal is represented by two successive
apostrophes.)
No terminal may appear before a colon in any production.
Terminals represent character strings that are recognized by the lexical
analyzer (see <A HREF="lex_toc.html">Lexical Analysis</A>).
<A NAME="IDX32"></A>
<P>
<H3><A NAME="SEC3" HREF="syntax_toc.html#SEC3">Using extended BNF to describe more complex rules</A></H3>
<P>
Extended BNF allows the use of certain operators on the right hand side
of a production. These operators are designed to be short-hands to simplify
the grammar description. Rules with extended BNF operators can be
translated into rules which use only the strict BNF constructs described
so far. While the use of extended BNF constructs is supported for the
concrete syntax description in Eli, only strict BNF constructs are allowed
in the abstract syntax. When it comes time to deduce the correspondence
between the concrete and abstract syntax, Maptool operates on the abstract
syntax and a version of the concrete syntax in which all rules containing
extended BNF constructs have been translated into equivalent strict
BNF rules.
<P>
The remainder of this section is devoted to describing how each of the extended
BNF constructs are translated to their strict BNF equivalents. Note that
most of the EBNF constructs require the introduction of generated symbols
for their strict BNF translation. Users are strongly discouraged from using
these constructs in instances where attribution is required for those
contexts, because changes in the grammar will change the names of the
generated symbols used.
<P>
The most appropriate use of EBNF constructs that introduce generated
symbols is when matching the LIDO
<CODE>LISTOF</CODE> construct, since the <CODE>LISTOF</CODE> construct makes no
assumptions about the phrase structure of the list.
For a description of the <CODE>LISTOF</CODE> construct, see
<A HREF="lidoref_3.html#SEC4">Productions of LIDO - Reference Manual</A>.
<P>
When a grammar contains many productions specifying replacement of the same
nonterminal, a slash, denoting <DFN>alternation</DFN>
<A NAME="IDX33"></A>
can be used to avoid re-writing the symbol being replaced:
<P>
<PRE>
Statement:
Variable ':=' Expression /
'if' Expression 'then' Statement 'else' Statement /
'while' Expression 'do' Statement .
</PRE>
<P>
This alternation specifies three productions.
The nonterminal to be replaced is <CODE>Statement</CODE> in each case.
Possible replacement sequences are separated by slashes (<KBD>/</KBD>).
The strict BNF translation for the above example is:
<P>
<PRE>
Statement: Variable ':=' Expression .
Statement: 'if' Expression 'then' Statement 'else' Statement .
Statement: 'while' Expression 'do' Statement .
</PRE>
<P>
Alternation does not introduce any generated symbols and has a very
straight-forward translation. As a result, it is the most heavily used
of the EBNF constructs.
<P>
Square brackets are used to denote that the set of symbols
enclosed by the brackets are optional. In the following
example, <CODE>Constants</CODE> and <CODE>Variables</CODE> are optional,
but <CODE>Body</CODE> is not:
<P>
<PRE>
Program: [Constants] [Variables] Body .
</PRE>
<P>
The strict BNF translation of this construct is to generate
a rule for each possible permutation of the right hand side.
In the case of the above example, the following four rules
would result:
<P>
<PRE>
Program: Body .
Program: Variables Body .
Program: Constants Body .
Program: Constants Variables Body .
</PRE>
<P>
While the translation doesn't introduce any generated symbols,
indiscriminate use of this construct may lead to less readable specifications.
<P>
An asterisk (or star) is used to denote zero or more occurrences
of the phrase to which it is applied. In the following example,
<CODE>Program</CODE> consists of zero or more occurrences of <CODE>Variable</CODE>
followed by <CODE>Body</CODE>:
<P>
<PRE>
Program: Variable* Body .
</PRE>
<P>
The strict BNF translation of this construct requires the introduction
of a generated symbol. Generated symbols begin with the letter <CODE>G</CODE>
and are followed by a unique number. Generated symbols are chosen to not
conflict with existing symbols in the concrete syntax. No check is
performed to ensure that the generated symbols do not conflict with
symbols in the abstract syntax, so users should avoid using symbols
of this form in their abstract syntax. The translation
for the above example is as follows:
<P>
<PRE>
Program: G1 Body .
G1: G1 Variable .
G1: .
</PRE>
<P>
A plus is used to denote one or more occurrences
of the phrase to which it is applied. In the following example,
<CODE>Program</CODE> consists of one or more occurrences of <CODE>Variable</CODE>
followed by <CODE>Body</CODE>:
<P>
<PRE>
Program: Variable+ Body .
</PRE>
<P>
The strict BNF translation of this construct is similar to the translation
of the asterisk (see <A HREF="syntax_1.html#SEC3">Using extended BNF to describe more complex rules</A>). The translation
for the above example is as follows:
<P>
<PRE>
Program: G1 Body .
G1: G1 Variable .
G1: Variable .
</PRE>
<P>
A double slash is used to denote one or more occurrences of a phrase
separated by a symbol. In the following example, <CODE>Input</CODE> is a
sequence of one or more <CODE>Declaration</CODE>'s separated by a comma:
<P>
<PRE>
Input: Declaration // ',' .
</PRE>
<P>
The strict BNF translation for the above example is as follows:
<P>
<PRE>
Input: G1 .
G1: G2 .
G1: G1 ',' G2 .
G2: Declaration .
</PRE>
<P>
Note that all of the EBNF constructs, except the single slash (for alternation)
have higher precedence than the separator construct.
<P>
Parentheses are used to group EBNF constructs. This is used primarily
to apply other EBNF operators to more than a single symbol. For example:
<P>
<PRE>
Program: (Definition Use)+ .
</PRE>
<P>
In this example, we want to apply the Plus operator to the concatenation of
a <CODE>Definition</CODE> and a <CODE>Use</CODE>. The result denotes one or more
occurrences of <CODE>Definition</CODE>'s followed by <CODE>Use</CODE>'s. The strict
BNF translation for the above is:
<P>
<PRE>
Program: G2 .
G1: Definition Use .
G2: G1 .
G2: G2 G1 .
</PRE>
<P>
This is identical to the translation for the Plus operator operating on a
single symbol, except that another generated symbol is created to represent
the parenthetical phrase.
<P>
Note that a common error is to introduce parentheses where they are not
needed. This will result in the introduction of unexpected generated
symbols.
<P>
<H2><A NAME="SEC4" HREF="syntax_toc.html#SEC4">Using structure to convey meaning</A></H2>
<P>
A production is a construct with two components: the symbol to be replaced
and the sequence that replaces it.
We defined the meaning of the production in terms of those components,
saying that whenever the symbol was found in <CODE>text</CODE>, it could be
replaced by the sequence.
This is the general approach that we use in defining the meaning of constructs
<A NAME="IDX34"></A>
in any language.
For example, we say that an assignment is a statement with two components,
a variable and an expression.
The meaning of the assignment is to replace the value of the variable with
the value resulting from evaluating the expression.
<P>
The context-free grammar for a language specifies a "component" relationship.
Each production says that the components of the phrase represented by the
symbol to be replaced are the elements of the sequence that replaces it.
To be useful, the context-free grammar for a language should embody exactly the
relationship that we use in defining the meanings of the constructs of that
language.
<P>
<H3><A NAME="SEC5" HREF="syntax_toc.html#SEC5">Operator precedence</A></H3>
<P>
Consider the following expressions:
<P>
<PRE>
A + B * C
(A + B) * C
</PRE>
<P>
In the first expression, the operands of the addition are the variable
<CODE>A</CODE> and the product of the variables <CODE>B</CODE> and <CODE>C</CODE>.
The reason is that in normal mathematical notation, multiplication takes
precedence over addition.
<A NAME="IDX36"></A>
<A NAME="IDX37"></A>
<A NAME="IDX35"></A>
Parentheses have been used in the second expression to indicate that the
operands of the multiplication are the sum of variables <CODE>A</CODE> and
<CODE>B</CODE>, and the variable <CODE>C</CODE>.
<P>
The general method for embodying this concept of operator precedence in a
context-free grammar for expressions is to associate a distinct nonterminal
with each precedence level, and one with operands that do not contain
"visible" operators.
For our expressions, this requires three nonterminals:
<P>
<DL COMPACT>
<DT><CODE>Sum</CODE>
<DD>An expression whose operator is <CODE>+</CODE>
<P>
<DT><CODE>Term</CODE>
<DD>An expression whose operator is <CODE>*</CODE>
<P>
<DT><CODE>Primary</CODE>
<DD>An expression not containing "visible" operators
</DL>
<P>
The productions that embody the concept of operator precedence would then
be:
<P>
<PRE>
Sum: Sum '+' Term / Term.
Term: Term '*' Primary / Primary.
Primary: '(' Sum ')' / Identifier.
</PRE>
<P>
<H3><A NAME="SEC6" HREF="syntax_toc.html#SEC6">Operator associativity</A></H3>
<P>
Consider the following expressions:
<P>
<PRE>
A - B - C
A ** B ** C
A < B < C
</PRE>
<P>
Which operator has variable <CODE>B</CODE> as an operand in each case?
<P>
This question can be answered by stating an <DFN>association</DFN>
<A NAME="IDX39"></A>
<A NAME="IDX40"></A>
<A NAME="IDX38"></A>
for each operator:
If <CODE>-</CODE> is "left-associative",
<A NAME="IDX41"></A>
then the first expression is interpreted as though it had been written
<CODE>(A-B)-C</CODE>.
Saying that <CODE>**</CODE> is "right-associative"
<A NAME="IDX42"></A>
means that the second expression is interpreted as though it had been written
<CODE>A**(B**C)</CODE>.
The language designer may wish to disallow the third expression by saying
that <CODE><</CODE> is "non-associative".
<A NAME="IDX43"></A>
<P>
Association rules are embodied in a context-free grammar by selecting
appropriate nonterminals to describe the operands of an operator.
For each operator, two nonterminals must be known:
the nonterminal describing expressions that may contain that operator, and
the nonterminal describing expressions that do not contain that operator
but may be operands of that operator.
Usually these nonterminals have been established to describe operator
precedence.
Here is a typical set of nonterminals used to describe expressions:
<P>
<DL COMPACT>
<DT><CODE>Relation</CODE>
<DD>An expression whose operator is <CODE><</CODE> or <CODE>></CODE>
<P>
<DT><CODE>Sum</CODE>
<DD>An expression whose operator is <CODE>+</CODE> or <CODE>-</CODE>
<P>
<DT><CODE>Term</CODE>
<DD>An expression whose operator is <CODE>*</CODE> or <CODE>/</CODE>
<P>
<DT><CODE>Factor</CODE>
<DD>An expression whose operator is <CODE>**</CODE>
<P>
<DT><CODE>Primary</CODE>
<DD>An expression not containing "visible" operators
</DL>
<P>
The association rules discussed above would therefore be expressed by the
following productions
(these are <EM>not</EM> the only productions in the grammar):
<P>
<PRE>
Sum: Sum '-' Term.
Factor: Primary '**' Factor.
Relation: Sum '<' Sum.
</PRE>
<P>
The first production says that the left operand of <CODE>-</CODE> can contain
other <CODE>-</CODE> operators, while the right operand cannot (unless the
subexpression containing them is surrounded by parentheses).
Similarly, the right operand of <CODE>**</CODE> can contain other <CODE>**</CODE>
operators but the left operand cannot.
The third rule says that neither operand of <CODE><</CODE> can contain other
<CODE><</CODE> operators.
<P>
<H3><A NAME="SEC7" HREF="syntax_toc.html#SEC7">Scope rules for declarations</A></H3>
<P>
Identifiers
<A NAME="IDX45"></A>
<A NAME="IDX44"></A>
are normally given meaning by declarations.
The meaning given to an identifier by a particular declaration holds over
some portion of the program, called the <DFN>scope</DFN>
<A NAME="IDX46"></A>
of that declaration.
A context-free grammar for a language should define a phrase structure that
is consistent with the scope rules of that language.
<P>
For example, the declaration of a procedure <CODE>P</CODE> within the
body of procedure <CODE>Q</CODE> gives meaning to the identifier <CODE>P</CODE>, and
its scope might be the body of the procedure <CODE>Q</CODE>.
If <CODE>P</CODE> has parameters, the scope of their declarations (which are
components of the procedure declaration) is the body of procedure <CODE>Q</CODE>.
<P>
Now consider the following productions describing a procedure declaration:
<A NAME="IDX47"></A>
<P>
<PRE>
procedure_declaration: 'procedure' procedure_heading procedure_body.
procedure_heading:
ProcIdDef formal_parameter_part ';' specification_part.
</PRE>
<P>
Notice that the phrase structure induced by these productions is
inconsistent with the postulated scope rules.
The declaration of <CODE>P</CODE> (<CODE>ProcIdDef</CODE>) is in the same phrase
(<CODE>procedure_heading</CODE>) as the declarations of the formal parameters.
This defect can be remedied by a slight change in the productions:
<P>
<PRE>
procedure_declaration: 'procedure' ProcIdDef ProcRange.
ProcRange:
formal_parameter_part ';' value_part specification_part procedure_body.
</PRE>
<P>
Here the formal parameters and the body have both been made components of a
single phrase (<CODE>ProcRange</CODE>), which defines the scope of the formal
parameter declarations.
The declaration of <CODE>P</CODE> lies outside of this phrase, thus allowing its
scope to be differentiated from that of the formal parameters.
<P>
<HR size=1 noshade width=600 align=left>
<P>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_2.html"><IMG SRC="gifs/next.gif" ALT="Next Chapter" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_toc.html"><IMG SRC="gifs/up.gif" ALT="Table of Contents" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT="">
<HR size=1 noshade width=600 align=left>
</TD>
</TR>
</TABLE>
</BODY></HTML>
|