File: syntax_5.html

package info (click to toggle)
eli-doc 4.4.0-4
links: PTS
area: main
in suites: sarge
size: 13,256 kB
ctags: 4,583
sloc: makefile: 42
file content (296 lines) | stat: -rw-r--r-- 14,075 bytes
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.29
     from ../tnf/syntax.tnf on 12 Febuary 2003 -->

<TITLE>Syntactic Analysis - Improving Error Recovery in the Generated Parser</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000" BACKGROUND="gifs/bg.gif">
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0" VALIGN=BOTTOM>
<TR VALIGN=BOTTOM>
<TD WIDTH="160" VALIGN=BOTTOM><IMG SRC="gifs/elilogo.gif" BORDER=0>&nbsp;</TD>
<TD WIDTH="25" VALIGN=BOTTOM><img src="gifs/empty.gif" WIDTH=25 HEIGHT=25></TD>
<TD ALIGN=LEFT WIDTH="600" VALIGN=BOTTOM><IMG SRC="gifs/title.gif"></TD>
</TR>
</TABLE>

<HR size=1 noshade width=785 align=left>
<TABLE BORDER=0 CELLSPACING=2 CELLPADDING=0>
<TR>
<TD VALIGN=TOP WIDTH="160">
<h4>General Information</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="index.html">Eli: Translator Construction Made Easy</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="gindex_toc.html">Global Index</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="faq_toc.html" >Frequently Asked Questions</a> </td></tr>
</table>

<h4>Tutorials</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="EliRefCard_toc.html">Quick Reference Card</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="novice_toc.html">Guide For new Eli Users</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="news_toc.html">Release Notes of Eli</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="nametutorial_toc.html">Tutorial on Name Analysis</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="typetutorial_toc.html">Tutorial on Type Analysis</a></td></tr>
</table>

<h4>Reference Manuals</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ui_toc.html">User Interface</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="pp_toc.html">Eli products and parameters</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lidoref_toc.html">LIDO Reference Manual</a></td></tr>
</table>

<h4>Libraries</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lib_toc.html">Eli library routines</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="modlib_toc.html">Specification Module Library</a></td></tr>
</table>

<h4>Translation Tasks</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lex_toc.html">Lexical analysis specification</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="syntax_toc.html">Syntactic Analysis Manual</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="comptrees_toc.html">Computation in Trees</a></td></tr>
</table>

<h4>Tools</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lcl_toc.html">LIGA Control Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="show_toc.html">Debugging Information for LIDO</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="gorto_toc.html">Graphical ORder TOol</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="fw_toc.html">FunnelWeb User's Manual</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ptg_toc.html">Pattern-based Text Generator</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="deftbl_toc.html">Property Definition Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="oil_toc.html">Operator Identification Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="tp_toc.html">Tree Grammar Specification Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="clp_toc.html">Command Line Processing</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="cola_toc.html">COLA Options Reference Manual</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="idem_toc.html">Generating Unparsing Code</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="mon_toc.html">Monitoring a Processor's Execution</a> </td></tr>
</table>

<h4>Administration</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="sysadmin_toc.html">System Administration Guide</a> </td></tr>
</table>

<HR WIDTH="100%">
<CENTER>&nbsp;<A HREF="mailto:elibugs@cs.colorado.edu"><IMG SRC="gifs/button_mail.gif" NOSAVE BORDER=0 HEIGHT=32 WIDTH=32></A><A HREF="mailto:elibugs@cs.colorado.edu">Questions, Comments, ....</A></CENTER>

</TD>
<TD VALIGN=TOP WIDTH="25"><img src="gifs/empty.gif" WIDTH=25 HEIGHT=25></TD>

<TD VALIGN=TOP WIDTH="600">
<H1>Syntactic Analysis</H1>
<P>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_4.html"><IMG SRC="gifs/prev.gif" ALT="Previous Chapter" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_6.html"><IMG SRC="gifs/next.gif" ALT="Next Chapter" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_toc.html"><IMG SRC="gifs/up.gif" ALT="Table of Contents" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT="">
<HR size=1 noshade width=600 align=left>
<H1><A NAME="SEC29" HREF="syntax_toc.html#SEC29">Improving Error Recovery in the Generated Parser</A></H1>
<P>
In some cases, the same pattern in the input text may represent
different tokens in the grammar.  Knowing which token the pattern
represents may be based on other available information.
When the parser determines that it cannot accept the next
look-ahead token, the boolean function <CODE>Reparatur</CODE> is
<A NAME="IDX131"></A>
called:
<P>
<PRE>
int Reparatur (POSITION *coord, int *syncode, int *intrinsic);
</PRE>
<P>
This allows the user to change the look-ahead token based on
other available information.  If the function returns <CODE>0</CODE>,
then the token has not been altered and the generated parser
continues with its normal error recovery.  If the function returns
<CODE>1</CODE>, it is assumed that the passed in attributes of the token
have been changed (in particular <CODE>syncode</CODE>), and the generated
parser rechecks the look-ahead token to see if it can accept it.
<A NAME="IDX132"></A>
<P>
By default, the Eli system provides file <TT>`dfltrepar.c'</TT>
containing a definition of the function <CODE>Reparatur</CODE> that
always returns <CODE>0</CODE>.  To override the default, the user must
provide a new definition of the function <CODE>Reparatur</CODE> in some C file.
<P>
In case of erroneous input the generated parser invokes its error
recovery.  The error recovery works completely automatically and
usually behaves satisfactorily.  There are a few possibilities to
control the error recovery in order to improve its behavior.  To
understand the control facilities it is necessary to know how the
error recovery works in principle.
<P>
If an error in the input is detected two methods for error repair
are used.  The first method tries to "correct" the error by deleting,
inserting, or replacing one input symbol.  The repair is considered
successful, if the next 4 parsing steps don't lead to another error.
The use of this method is optional.  If the first method is not used
or if it failed the second method performs a complete correction
without backtracking.  It skips input symbols until a so-called
<DFN>restart point</DFN> is reached.  The restart point is a symbol
<A NAME="IDX133"></A>
where normal parsing can be resumed.  Before normal parsing resumes
error correction takes place.  Input symbols are inserted in order
to construct a syntactically correct input and the associated
semantic actions are executed.  The intention is to pass consistent
information to the following compiler phases, which therefore do not
have to bother with syntax errors.
<P>
The second method for error recovery can be controlled by providing
additional information.  The intention is to decrease the probability
of error avalanches caused by wrong error repair decisions.  As a
running example, we use an ALGOL-like language defined by the following
grammar:
<P>
<PRE>
block : 'begin' declarations statements 'end' .
declarations : declarations declaration ';' / .
declaration : 'real' 'identifier' /
                / 'procedure' 'identifier' ';' statement .
statements : statement / statements ';' statement .
statement : 'identifier' / block .
</PRE>
<P>
Three types of error recovery information can be specified by the
user in files of type <TT>`.perr'</TT>:
<P>
The error recovery has a major drawback when applied to errors
in lists, defined, e.g., as
<P>
<PRE>
statements : statement / statements ';' statement .
</PRE>
<P>
A missing delimiter ';' cannot be inserted in order to parse the rest
of the list.  This could lead to an infinite loop in the parser.
Therefore errors like
<P>
<PRE>
begin identifier begin identifier ; ...
</PRE>
<P>
cannot be repaired by inserting the semicolon ';' but by deleting
the two symbols 'begin' and 'identifier'.
<P>
The following specification in a <TT>`.perr'</TT> file defines the
mentioned terminals as list separators.
<A NAME="IDX134"></A>
<P>
<PRE>
$SEPA ';' . ',' .
</PRE>
<P>
A list separator will always be inserted if a restart point can
be found immediately behind it.  In this case the rest of the
list can be parsed without the danger of getting into an infinite
loop.
<P>
Programming languages have bracketed structures like 'begin'
<A NAME="IDX135"></A>
and 'end' which delimit not only the syntactic structure of
"block" but also the scope of identifiers.  Deleting or inserting
such semantically significant parentheses is highly probably to
cause avalanches of syntactic and semantic errors.  Therefore,
the error recovery should not change the structures of a program
as far as it concerns scopes of identifiers or similar semantic
concepts.
<P>
Consider the following erroneous input:
<P>
<PRE>
begin
  procedure identifier ;
  begin
    real identifier ;
    identifier ;
    real identifier ;
    identifier ;
</PRE>
<P>
Inserting the terminal 'end' before the second "real declaration"
corrects the program syntactically but may lead to a semantic error
in the last line, as the scope structure is changed.
<P>
The specification
<P>
<PRE>
$BRACKET 'begin' . 'end' .
</PRE>
<P>
in a file of type <TT>`.perr'</TT> declares the mentioned terminals to
be delimiters of semantically significant regions (semantic delimiters).
<A NAME="IDX136"></A>
These terminals are not inserted unless the restart point is end of input
or the restart point itself is specified as such a delimiter.
<P>
Usually there are a few terminals not suited as restart points.
The reason is that is programming languages terminals like
'identifier' or 'number' occur in many different syntactic
positions.  Consider the error
<P>
<PRE>
begin real identifier identifier ; real identifier ...
</PRE>
<P>
There is no safe way to tell whether the second identifier belongs to
a statement or to a declaration.  If it is used as a restart point, the
error is corrected to
<P>
<PRE>
begin real identifier ; identifier ; real identifier ...
</PRE>
<P>
This corresponds to a transition from the declaration part into the
statement part of the block, a frequent cause for error avalanches.
In general, terminals like 'identifier' or 'number' are not feasible
as restart points.
<P>
The specification
<P>
<PRE>
$SKIP 'identifier' . 'integer_number' . 'real_number' .
</PRE>
<P>
in a type <TT>`.perr'</TT> file defines the mentioned terminals as
unsafe restart points.  Unsafe restart points are skipped in case
of an error in order to search for restart points more feasible.
<P>
With the above specification the second identifier in the mentioned
example will be skipped.  Parsing resumes at the following semicolon
without carrying out a transition to the statement part.
<P>
<HR size=1 noshade width=600 align=left>
<P>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_4.html"><IMG SRC="gifs/prev.gif" ALT="Previous Chapter" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_6.html"><IMG SRC="gifs/next.gif" ALT="Next Chapter" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_toc.html"><IMG SRC="gifs/up.gif" ALT="Table of Contents" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT="">
<HR size=1 noshade width=600 align=left>
</TD>
</TR>
</TABLE>

</BODY></HTML>