1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226
|
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.29
from ../tnf/lex.tnf on 12 Febuary 2003 -->
<TITLE>Lexical Analysis - Table of Contents</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000" BACKGROUND="gifs/bg.gif">
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0" VALIGN=BOTTOM>
<TR VALIGN=BOTTOM>
<TD WIDTH="160" VALIGN=BOTTOM><IMG SRC="gifs/elilogo.gif" BORDER=0> </TD>
<TD WIDTH="25" VALIGN=BOTTOM><img src="gifs/empty.gif" WIDTH=25 HEIGHT=25></TD>
<TD ALIGN=LEFT WIDTH="600" VALIGN=BOTTOM><IMG SRC="gifs/title.gif"></TD>
</TR>
</TABLE>
<HR size=1 noshade width=785 align=left>
<TABLE BORDER=0 CELLSPACING=2 CELLPADDING=0>
<TR>
<TD VALIGN=TOP WIDTH="160">
<h4>General Information</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="index.html">Eli: Translator Construction Made Easy</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="gindex_toc.html">Global Index</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="faq_toc.html" >Frequently Asked Questions</a> </td></tr>
</table>
<h4>Tutorials</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="EliRefCard_toc.html">Quick Reference Card</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="novice_toc.html">Guide For new Eli Users</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="news_toc.html">Release Notes of Eli</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="nametutorial_toc.html">Tutorial on Name Analysis</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="typetutorial_toc.html">Tutorial on Type Analysis</a></td></tr>
</table>
<h4>Reference Manuals</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ui_toc.html">User Interface</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="pp_toc.html">Eli products and parameters</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lidoref_toc.html">LIDO Reference Manual</a></td></tr>
</table>
<h4>Libraries</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lib_toc.html">Eli library routines</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="modlib_toc.html">Specification Module Library</a></td></tr>
</table>
<h4>Translation Tasks</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lex_toc.html">Lexical analysis specification</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="syntax_toc.html">Syntactic Analysis Manual</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="comptrees_toc.html">Computation in Trees</a></td></tr>
</table>
<h4>Tools</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lcl_toc.html">LIGA Control Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="show_toc.html">Debugging Information for LIDO</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="gorto_toc.html">Graphical ORder TOol</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="fw_toc.html">FunnelWeb User's Manual</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ptg_toc.html">Pattern-based Text Generator</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="deftbl_toc.html">Property Definition Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="oil_toc.html">Operator Identification Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="tp_toc.html">Tree Grammar Specification Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="clp_toc.html">Command Line Processing</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="cola_toc.html">COLA Options Reference Manual</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="idem_toc.html">Generating Unparsing Code</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="mon_toc.html">Monitoring a Processor's Execution</a> </td></tr>
</table>
<h4>Administration</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="sysadmin_toc.html">System Administration Guide</a> </td></tr>
</table>
<HR WIDTH="100%">
<CENTER> <A HREF="mailto:elibugs@cs.colorado.edu"><IMG SRC="gifs/button_mail.gif" NOSAVE BORDER=0 HEIGHT=32 WIDTH=32></A><A HREF="mailto:elibugs@cs.colorado.edu">Questions, Comments, ....</A></CENTER>
</TD>
<TD VALIGN=TOP WIDTH="25"><img src="gifs/empty.gif" WIDTH=25 HEIGHT=25></TD>
<TD VALIGN=TOP WIDTH="600">
<A HREF="lex.ps"><IMG SRC="gifs/print.gif" ALT="Open Postscript File" BORDER="0" ALIGN=RIGHT></A>
<H1>Lexical Analysis</H1>
<P>
The purpose of the lexical analyzer is to partition the input text,
delivering a sequence of <DFN>comments</DFN> and <DFN>basic symbols</DFN>.
Comments are character sequences to be ignored, while basic symbols are
character sequences that correspond to terminal symbols of the grammar
defining the phrase structure of the input
(see <A HREF="syntax_1.html#SEC1">Context-Free Grammars and Parsing of Syntactic Analysis</A>).
<P>
A user must define the forms of comments and the forms of all basic symbols
corresponding to non-literal terminal symbols of the grammar.
Eli can deduce the form of a literal terminal symbol from the grammar
specification.
<P>
The definition consists of one or more type-<TT>`gla'</TT> files.
Each line of a type-<TT>`gla'</TT> file describes a set of character sequences.
If a line begins with an identifier followed by a colon (<KBD>:</KBD>), then all
of the character sequences described by the line are instances of
the non-literal terminal symbol named by that identifier;
otherwise they are comments.
<P>
Here is an example of a type-<TT>`gla'</TT> file:
<P>
<PRE>
HexInteger: $0[Xx][0-9A-Fa-f]+
$! (auxEOL)
Identifier: C_IDENTIFIER
</PRE>
<P>
The first line of this specification uses a regular expression to define a
hexadecimal integer as a zero, followed by the letter <CODE>X</CODE> (either
upper or lower case) and one or more hexadecimal digits represented in the
usual way.
In the second line, one form of comment is defined by a regular expression
and the name of a C routine.
The C routine will be invoked when the regular expression has been matched.
This approach allows the user to define character sequences operationally
when a declarative definition is tedious or does not support appropriate
error reporting.
<P>
Since certain lexical structures are common to many languages, Eli provides
a library of definitions that can be invoked simply be giving their names.
<CODE>C_IDENTIFIER</CODE>, in the third line, is such an invocation.
The effect of the third line is to define the form of the basic symbol
<CODE>Identifier</CODE> as that of an identifier in C: a letter or underscore
followed by some sequence of letters, digits and underscores.
<P>
Chapter 1 defines the usage, form and content of specifications provided by
the user as type-<TT>`gla'</TT> files.
Those specifications may refer to canned descriptions, which are defined in
Chapter 2.
Chapter 3 presents the default processing of spaces, tabs and newlines and
explains how to define other strategies.
The treatment and meaning of literal terminal symbols is discussed in
Chapter 4, and Chapter 5 explains how a generated lexical analyzer can be
made insensitive to the case of letters.
Complex lexical analysis problems may require modification of the behavior
of the generated module; Chapter 6 discusses the possibilities.
<P>
<P>
<UL>
<LI><A NAME="SEC1" HREF="lex_1.html#SEC1">Specifications</A>
<UL>
<LI><A NAME="SEC2" HREF="lex_1.html#SEC2">Regular Expressions</A>
<UL>
<LI><A NAME="SEC3" HREF="lex_1.html#SEC3">Matching operator characters</A>
<LI><A NAME="SEC4" HREF="lex_1.html#SEC4">Character classes</A>
<LI><A NAME="SEC5" HREF="lex_1.html#SEC5">Building complex regular expressions</A>
<LI><A NAME="SEC6" HREF="lex_1.html#SEC6">What happens if the specification is ambiguous?</A>
</UL>
<LI><A NAME="SEC7" HREF="lex_1.html#SEC7">Auxiliary Scanners</A>
<UL>
<LI><A NAME="SEC8" HREF="lex_1.html#SEC8">Available scanners</A>
<LI><A NAME="SEC9" HREF="lex_1.html#SEC9">Building scanners</A>
</UL>
<LI><A NAME="SEC10" HREF="lex_1.html#SEC10">Token Processors</A>
<UL>
<LI><A NAME="SEC11" HREF="lex_1.html#SEC11">Available processors</A>
<LI><A NAME="SEC12" HREF="lex_1.html#SEC12">Building processors</A>
</UL>
</UL>
<LI><A NAME="SEC13" HREF="lex_2.html#SEC13">Canned Symbol Descriptions</A>
<UL>
<LI><A NAME="SEC14" HREF="lex_2.html#SEC14">Available Descriptions</A>
<LI><A NAME="SEC15" HREF="lex_2.html#SEC15">Definitions of Canned Descriptions</A>
</UL>
<LI><A NAME="SEC16" HREF="lex_3.html#SEC16">Spaces, Tabs and Newlines</A>
<UL>
<LI><A NAME="SEC17" HREF="lex_3.html#SEC17">Maintaining the Source Text Coordinates</A>
<LI><A NAME="SEC18" HREF="lex_3.html#SEC18">Restoring the Default Behavior for White Space</A>
<LI><A NAME="SEC19" HREF="lex_3.html#SEC19">Making White Space Illegal</A>
</UL>
<LI><A NAME="SEC20" HREF="lex_4.html#SEC20">Literal Symbols</A>
<UL>
<LI><A NAME="SEC21" HREF="lex_4.html#SEC21">Overriding the Default Treatment of Literal Symbols</A>
<LI><A NAME="SEC22" HREF="lex_4.html#SEC22">Using Literal Symbols to Represent Other Things</A>
</UL>
<LI><A NAME="SEC23" HREF="lex_5.html#SEC23">Case Insensitivity</A>
<UL>
<LI><A NAME="SEC24" HREF="lex_5.html#SEC24">A Case-Insensitive Token Processor</A>
<LI><A NAME="SEC25" HREF="lex_5.html#SEC25">Making Literal Symbols Case Insensitive</A>
</UL>
<LI><A NAME="SEC26" HREF="lex_6.html#SEC26">The Generated Lexical Analyzer Module</A>
<UL>
<LI><A NAME="SEC27" HREF="lex_6.html#SEC27">Interaction Between the Lexical Analyzer and the Text</A>
<LI><A NAME="SEC28" HREF="lex_6.html#SEC28">Resetting the Scan Pointer</A>
<LI><A NAME="SEC29" HREF="lex_6.html#SEC29">The Classification Operation</A>
<UL>
<LI><A NAME="SEC30" HREF="lex_6.html#SEC30">Setting coordinate values</A>
<LI><A NAME="SEC31" HREF="lex_6.html#SEC31">Deciding on a continuation after a classification</A>
<LI><A NAME="SEC32" HREF="lex_6.html#SEC32">Returning a classification</A>
</UL>
<LI><A NAME="SEC33" HREF="lex_6.html#SEC33">An Example of Interface Usage</A>
</UL>
<LI><A NAME="SEC34" HREF="lex_7.html#SEC34">Index</A>
</UL>
<HR size=1 noshade width=600 align=left>
</TD>
</TR>
</TABLE>
</BODY></HTML>
|