1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296
|
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.29
from ../tnf/syntax.tnf on 12 Febuary 2003 -->
<TITLE>Syntactic Analysis - Improving Error Recovery in the Generated Parser</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000" BACKGROUND="gifs/bg.gif">
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0" VALIGN=BOTTOM>
<TR VALIGN=BOTTOM>
<TD WIDTH="160" VALIGN=BOTTOM><IMG SRC="gifs/elilogo.gif" BORDER=0> </TD>
<TD WIDTH="25" VALIGN=BOTTOM><img src="gifs/empty.gif" WIDTH=25 HEIGHT=25></TD>
<TD ALIGN=LEFT WIDTH="600" VALIGN=BOTTOM><IMG SRC="gifs/title.gif"></TD>
</TR>
</TABLE>
<HR size=1 noshade width=785 align=left>
<TABLE BORDER=0 CELLSPACING=2 CELLPADDING=0>
<TR>
<TD VALIGN=TOP WIDTH="160">
<h4>General Information</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="index.html">Eli: Translator Construction Made Easy</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="gindex_toc.html">Global Index</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="faq_toc.html" >Frequently Asked Questions</a> </td></tr>
</table>
<h4>Tutorials</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="EliRefCard_toc.html">Quick Reference Card</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="novice_toc.html">Guide For new Eli Users</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="news_toc.html">Release Notes of Eli</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="nametutorial_toc.html">Tutorial on Name Analysis</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="typetutorial_toc.html">Tutorial on Type Analysis</a></td></tr>
</table>
<h4>Reference Manuals</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ui_toc.html">User Interface</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="pp_toc.html">Eli products and parameters</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lidoref_toc.html">LIDO Reference Manual</a></td></tr>
</table>
<h4>Libraries</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lib_toc.html">Eli library routines</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="modlib_toc.html">Specification Module Library</a></td></tr>
</table>
<h4>Translation Tasks</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lex_toc.html">Lexical analysis specification</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="syntax_toc.html">Syntactic Analysis Manual</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="comptrees_toc.html">Computation in Trees</a></td></tr>
</table>
<h4>Tools</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lcl_toc.html">LIGA Control Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="show_toc.html">Debugging Information for LIDO</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="gorto_toc.html">Graphical ORder TOol</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="fw_toc.html">FunnelWeb User's Manual</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ptg_toc.html">Pattern-based Text Generator</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="deftbl_toc.html">Property Definition Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="oil_toc.html">Operator Identification Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="tp_toc.html">Tree Grammar Specification Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="clp_toc.html">Command Line Processing</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="cola_toc.html">COLA Options Reference Manual</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="idem_toc.html">Generating Unparsing Code</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="mon_toc.html">Monitoring a Processor's Execution</a> </td></tr>
</table>
<h4>Administration</h4>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="sysadmin_toc.html">System Administration Guide</a> </td></tr>
</table>
<HR WIDTH="100%">
<CENTER> <A HREF="mailto:elibugs@cs.colorado.edu"><IMG SRC="gifs/button_mail.gif" NOSAVE BORDER=0 HEIGHT=32 WIDTH=32></A><A HREF="mailto:elibugs@cs.colorado.edu">Questions, Comments, ....</A></CENTER>
</TD>
<TD VALIGN=TOP WIDTH="25"><img src="gifs/empty.gif" WIDTH=25 HEIGHT=25></TD>
<TD VALIGN=TOP WIDTH="600">
<H1>Syntactic Analysis</H1>
<P>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_4.html"><IMG SRC="gifs/prev.gif" ALT="Previous Chapter" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_6.html"><IMG SRC="gifs/next.gif" ALT="Next Chapter" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_toc.html"><IMG SRC="gifs/up.gif" ALT="Table of Contents" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT="">
<HR size=1 noshade width=600 align=left>
<H1><A NAME="SEC29" HREF="syntax_toc.html#SEC29">Improving Error Recovery in the Generated Parser</A></H1>
<P>
In some cases, the same pattern in the input text may represent
different tokens in the grammar. Knowing which token the pattern
represents may be based on other available information.
When the parser determines that it cannot accept the next
look-ahead token, the boolean function <CODE>Reparatur</CODE> is
<A NAME="IDX131"></A>
called:
<P>
<PRE>
int Reparatur (POSITION *coord, int *syncode, int *intrinsic);
</PRE>
<P>
This allows the user to change the look-ahead token based on
other available information. If the function returns <CODE>0</CODE>,
then the token has not been altered and the generated parser
continues with its normal error recovery. If the function returns
<CODE>1</CODE>, it is assumed that the passed in attributes of the token
have been changed (in particular <CODE>syncode</CODE>), and the generated
parser rechecks the look-ahead token to see if it can accept it.
<A NAME="IDX132"></A>
<P>
By default, the Eli system provides file <TT>`dfltrepar.c'</TT>
containing a definition of the function <CODE>Reparatur</CODE> that
always returns <CODE>0</CODE>. To override the default, the user must
provide a new definition of the function <CODE>Reparatur</CODE> in some C file.
<P>
In case of erroneous input the generated parser invokes its error
recovery. The error recovery works completely automatically and
usually behaves satisfactorily. There are a few possibilities to
control the error recovery in order to improve its behavior. To
understand the control facilities it is necessary to know how the
error recovery works in principle.
<P>
If an error in the input is detected two methods for error repair
are used. The first method tries to "correct" the error by deleting,
inserting, or replacing one input symbol. The repair is considered
successful, if the next 4 parsing steps don't lead to another error.
The use of this method is optional. If the first method is not used
or if it failed the second method performs a complete correction
without backtracking. It skips input symbols until a so-called
<DFN>restart point</DFN> is reached. The restart point is a symbol
<A NAME="IDX133"></A>
where normal parsing can be resumed. Before normal parsing resumes
error correction takes place. Input symbols are inserted in order
to construct a syntactically correct input and the associated
semantic actions are executed. The intention is to pass consistent
information to the following compiler phases, which therefore do not
have to bother with syntax errors.
<P>
The second method for error recovery can be controlled by providing
additional information. The intention is to decrease the probability
of error avalanches caused by wrong error repair decisions. As a
running example, we use an ALGOL-like language defined by the following
grammar:
<P>
<PRE>
block : 'begin' declarations statements 'end' .
declarations : declarations declaration ';' / .
declaration : 'real' 'identifier' /
/ 'procedure' 'identifier' ';' statement .
statements : statement / statements ';' statement .
statement : 'identifier' / block .
</PRE>
<P>
Three types of error recovery information can be specified by the
user in files of type <TT>`.perr'</TT>:
<P>
The error recovery has a major drawback when applied to errors
in lists, defined, e.g., as
<P>
<PRE>
statements : statement / statements ';' statement .
</PRE>
<P>
A missing delimiter ';' cannot be inserted in order to parse the rest
of the list. This could lead to an infinite loop in the parser.
Therefore errors like
<P>
<PRE>
begin identifier begin identifier ; ...
</PRE>
<P>
cannot be repaired by inserting the semicolon ';' but by deleting
the two symbols 'begin' and 'identifier'.
<P>
The following specification in a <TT>`.perr'</TT> file defines the
mentioned terminals as list separators.
<A NAME="IDX134"></A>
<P>
<PRE>
$SEPA ';' . ',' .
</PRE>
<P>
A list separator will always be inserted if a restart point can
be found immediately behind it. In this case the rest of the
list can be parsed without the danger of getting into an infinite
loop.
<P>
Programming languages have bracketed structures like 'begin'
<A NAME="IDX135"></A>
and 'end' which delimit not only the syntactic structure of
"block" but also the scope of identifiers. Deleting or inserting
such semantically significant parentheses is highly probably to
cause avalanches of syntactic and semantic errors. Therefore,
the error recovery should not change the structures of a program
as far as it concerns scopes of identifiers or similar semantic
concepts.
<P>
Consider the following erroneous input:
<P>
<PRE>
begin
procedure identifier ;
begin
real identifier ;
identifier ;
real identifier ;
identifier ;
</PRE>
<P>
Inserting the terminal 'end' before the second "real declaration"
corrects the program syntactically but may lead to a semantic error
in the last line, as the scope structure is changed.
<P>
The specification
<P>
<PRE>
$BRACKET 'begin' . 'end' .
</PRE>
<P>
in a file of type <TT>`.perr'</TT> declares the mentioned terminals to
be delimiters of semantically significant regions (semantic delimiters).
<A NAME="IDX136"></A>
These terminals are not inserted unless the restart point is end of input
or the restart point itself is specified as such a delimiter.
<P>
Usually there are a few terminals not suited as restart points.
The reason is that is programming languages terminals like
'identifier' or 'number' occur in many different syntactic
positions. Consider the error
<P>
<PRE>
begin real identifier identifier ; real identifier ...
</PRE>
<P>
There is no safe way to tell whether the second identifier belongs to
a statement or to a declaration. If it is used as a restart point, the
error is corrected to
<P>
<PRE>
begin real identifier ; identifier ; real identifier ...
</PRE>
<P>
This corresponds to a transition from the declaration part into the
statement part of the block, a frequent cause for error avalanches.
In general, terminals like 'identifier' or 'number' are not feasible
as restart points.
<P>
The specification
<P>
<PRE>
$SKIP 'identifier' . 'integer_number' . 'real_number' .
</PRE>
<P>
in a type <TT>`.perr'</TT> file defines the mentioned terminals as
unsafe restart points. Unsafe restart points are skipped in case
of an error in order to search for restart points more feasible.
<P>
With the above specification the second identifier in the mentioned
example will be skipped. Parsing resumes at the following semicolon
without carrying out a transition to the statement part.
<P>
<HR size=1 noshade width=600 align=left>
<P>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_4.html"><IMG SRC="gifs/prev.gif" ALT="Previous Chapter" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_6.html"><IMG SRC="gifs/next.gif" ALT="Next Chapter" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT=""><A HREF="syntax_toc.html"><IMG SRC="gifs/up.gif" ALT="Table of Contents" BORDER="0"></A>
<IMG SRC="gifs/empty.gif" WIDTH=25 HEIGHT=25 ALT="">
<HR size=1 noshade width=600 align=left>
</TD>
</TR>
</TABLE>
</BODY></HTML>
|