File: manual040.html

package info (click to toggle)
ocaml-doc 2.04-2
links: PTS
area: main
in suites: potato
size: 2,820 kB
ctags: 997
sloc: makefile: 38; sh: 12
file content (81 lines) | stat: -rw-r--r-- 3,266 bytes
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
            "http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
<HEAD>
<META http-equiv="Content-Type" content="text/html; charset= ISO-8859-1">
<TITLE>
 Module Genlex: a generic lexical analyzer
</TITLE>
</HEAD>
<BODY >
<A HREF="manual039.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
<A HREF="manual041.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
<A HREF="manual030.html"><IMG SRC ="contents_motif.gif" ALT="Contents"></A>
<HR>

<H2>17.10&nbsp;&nbsp; Module <TT>Genlex</TT>: a generic lexical analyzer</H2><A NAME="s:Genlex"></A>
<A NAME="@manual321"></A><BLOCKQUOTE>
This module implements a simple ``standard'' lexical analyzer, presented
as a function from character streams to token streams. It implements
roughly the lexical conventions of Caml, but is parameterized by the
set of keywords of your language. 
</BLOCKQUOTE>
<PRE>
type token =
    Kwd of string
  | Ident of string
  | Int of int
  | Float of float
  | String of string
  | Char of char
</PRE>
<BLOCKQUOTE>
The type of tokens. The lexical classes are: <CODE>Int</CODE> and <CODE>Float</CODE>
for integer and floating-point numbers; <CODE>String</CODE> for
string literals, enclosed in double quotes; <CODE>Char</CODE> for
character literals, enclosed in single quotes; <CODE>Ident</CODE> for
identifiers (either sequences of letters, digits, underscores
and quotes, or sequences of ``operator characters'' such as
<CODE>+</CODE>, <CODE>*</CODE>, etc); and <CODE>Kwd</CODE> for keywords (either identifiers or
single ``special characters'' such as <CODE>(</CODE>, <CODE>}</CODE>, etc). 
</BLOCKQUOTE>
<PRE>
           
val make_lexer: string list -&gt; (char Stream.t -&gt; token Stream.t)
</PRE>
<A NAME="@manual322"></A><BLOCKQUOTE>
Construct the lexer function. The first argument is the list of
keywords. An identifier <CODE>s</CODE> is returned as <CODE>Kwd s</CODE> if <CODE>s</CODE>
belongs to this list, and as <CODE>Ident s</CODE> otherwise.
A special character <CODE>s</CODE> is returned as <CODE>Kwd s</CODE> if <CODE>s</CODE>
belongs to this list, and cause a lexical error (exception
<CODE>Parse_error</CODE>) otherwise. Blanks and newlines are skipped.
Comments delimited by <CODE>(*</CODE> and <CODE>*)</CODE> are skipped as well,
and can be nested. 
</BLOCKQUOTE>
<BLOCKQUOTE>
Example: a lexer suitable for a desk calculator is obtained by

<PRE>
           let lexer = make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"]
</PRE>
The associated parser would be a function from <CODE>token stream</CODE>
to, for instance, <CODE>int</CODE>, and would have rules such as:

<PRE>
           let parse_expr = parser
                  [&lt; 'Int n &gt;] -&gt; n
                | [&lt; 'Kwd "("; n = parse_expr; 'Kwd ")" &gt;] -&gt; n
                | [&lt; n1 = parse_expr; n2 = parse_remainder n1 &gt;] -&gt; n2
           and parse_remainder n1 = parser
                  [&lt; 'Kwd "+"; n2 = parse_expr &gt;] -&gt; n1+n2
                | ...
</PRE>
</BLOCKQUOTE>

<HR>
<A HREF="manual039.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
<A HREF="manual041.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
<A HREF="manual030.html"><IMG SRC ="contents_motif.gif" ALT="Contents"></A>
</BODY>
</HTML>