1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 2.0">
<title>Geyacc: Error Recovery</title>
</head>
<body bgcolor="#FFFFFF">
<table border="0" width="100%">
<tr>
<td><font size="6"><strong>Error Recovery</strong></font></td>
<td align="right"><a href="context.html"><img
src="../image/previous.gif" alt="Previous" border="0"
width="40" height="40"></a><a href="skeleton.html"><img
src="../image/next.gif" alt="Next" border="0" width="40"
height="40"></a></td>
</tr>
</table>
<hr size="1">
<p>It is not usually acceptable to have a program terminate on a
parse error. For example, a compiler should recover sufficiently
to parse the rest of the input file and check it for errors; a
calculator should accept another expression.</p>
<p>In a simple interactive command parser where each input is one
line, it may be sufficient to allow routine <font color="#008080"><em><tt>parse</tt></em></font>
to terminate on error and have the caller ignore the rest of the
input line when that happens (and then call <font color="#008080"><em><tt>parse</tt></em></font>
again). But this is inadequate for a compiler, because it forgets
all the syntactic context leading up to the error. A syntax error
deep within a function in the compiler input should not cause the
compiler to treat the following line like the beginning of a
source file.</p>
<p>You can define how to recover from a syntax error by writing
rules to recognize the special token <font color="#0000FF"><tt>error</tt></font>.
This is a terminal symbol that is always defined (you need not
declare it) and reserved for error handling. The <em>geyacc</em>
parser generates an <font color="#0000FF"><tt>error</tt></font>
token whenever a syntax error happens; if you have provided a
rule to recognize this token in the current context, the parse
can continue. For example:</p>
<blockquote>
<pre><font color="#800080">stmnts</font><font color="#0000FF">: </font><font
color="#008080">-- Empty</font><font color="#0000FF">
| </font><font color="#800080">stmnts</font><font
color="#0000FF"> </font><font color="#FF0000">'\n'</font><font
color="#0000FF">
| </font><font color="#800080">stmnts exp</font><font
color="#0000FF"> </font><font color="#FF0000">'\n'</font><font
color="#0000FF">
| </font><font color="#800080">stmnts</font><font
color="#0000FF"> error </font><font color="#FF0000">'\n'</font><font
color="#0000FF">
;</font></pre>
</blockquote>
<p>The fourth rule in this example says that an error followed by
a newline makes a valid addition to any <font color="#800080"><tt>stmnts</tt></font>.
What happens if a syntax error occurs in the middle of an <font
color="#800080"><tt>exp</tt></font>? The error recovery rule,
interpreted strictly, applies to the precise sequence of a <font
color="#800080"><tt>stmnts</tt></font>, an <font color="#0000FF"><tt>error</tt></font>
and a newline. If an error occurs in the middle of an <font
color="#800080"><tt>exp</tt></font>, there will probably be some
additional tokens and subexpressions on the stack after the last <font
color="#800080"><tt>stmnts</tt></font>, and there will be tokens
to read before the next newline. So the rule is not applicable in
the ordinary way.</p>
<p>But <em>geyacc</em> can force the situation to fit the rule,
by discarding part of the semantic context and part of the input.
First it discards states and objects from the stack until it gets
back to a state in which the <font color="#0000FF"><tt>error</tt></font>
token is acceptable. (This means that the subexpressions already
parsed are discarded, back to the last complete <font
color="#800080"><tt>stmnts</tt></font>.) At this point the <font
color="#0000FF"><tt>error</tt></font> token can be shifted. Then,
if the old look-ahead token is not acceptable to be shifted next,
the parser reads tokens and discards them until it finds a token
which is acceptable. In this example, <em>geyacc</em> reads and
discards input until the next newline so that the fourth rule can
apply.</p>
<p>The choice of error rules in the grammar is a choice of
strategies for error recovery. A simple and useful strategy is
simply to skip the rest of the current input line or current
statement if an error is detected:</p>
<blockquote>
<pre><font color="#800080">stmnt</font><font color="#0000FF">: error</font> <font
color="#FF0000">';'</font>
<font color="#008080">-- on error, skip until ';' is read.
</font><font color="#0000FF">;</font></pre>
</blockquote>
<p>It is also useful to recover to the matching close-delimiter
of an opening-delimiter that has already been parsed. Otherwise
the close-delimiter will probably appear to be unmatched, and
generate another, spurious error message:</p>
<blockquote>
<pre><font color="#800080">primary</font><font
color="#0000FF">: </font><font color="#FF0000">'(' </font><font
color="#800080">expr</font><font color="#FF0000"> ')'</font><font
color="#0000FF">
| </font><font color="#FF0000">'(' </font><font
color="#0000FF">error</font><font color="#FF0000"> ')'</font><font
color="#0000FF">
</font>...<font color="#0000FF">
;</font></pre>
</blockquote>
<p>Error recovery strategies are necessarily guesses. When they
guess wrong, one syntax error often leads to another. In the
above example, the error recovery rule guesses that an error is
due to bad input within one <font color="#800080"><tt>stmnt</tt></font>.
Suppose that instead a spurious semicolon is inserted in the
middle of a valid <font color="#800080"><tt>stmnt</tt></font>.
After the error recovery rule recovers from the first error,
another syntax error will be found straightaway, since the text
following the spurious semicolon is also an invalid <font
color="#800080"><tt>stmnt</tt></font>.</p>
<p>To prevent an outpouring of error messages, the parser will
output no error message for another syntax error that happens
shortly after the first; only after three consecutive input
tokens have been successfully shifted will error messages resume.</p>
<p>Note that rules which accept the <font color="#0000FF"><tt>error</tt></font>
token may have actions, just as any other rules can.</p>
<p>You can make error messages resume immediately by using the
routine <a href="actions.html#recover"><font color="#008080"><em><tt>recover</tt></em></font></a>
from class <font color="#008080"><em><tt>YY_PARSER</tt></em></font><font
color="#008080" size="2" face="Courier New"><em> </em></font>in
an action. If you do this in the error rule's action, no error
messages will be suppressed.</p>
<p>The previous look-ahead token is reanalyzed immediately after
an error. If this is unacceptable, then the routine <a
href="actions.html#clear_token"><font color="#008080"><em><tt>clear_token</tt></em></font></a>
may be used to clear this token. For example, suppose that on a
parse error, an error handling routine is called that advances
the input stream to some point where parsing should once again
commence. The next symbol returned by the lexical scanner is
probably correct. The previous look-ahead token ought to be
discarded with <font color="#008080"><em><tt>clear_token</tt></em></font>.</p>
<p>The boolean feature <a href="actions.html#is_recovering"><font
color="#008080"><em><tt>is_recovering</tt></em></font></a> is
true when the parser is recovering from a syntax error, and false
the rest of the time. A true value indicates that error messages
are currently suppressed for new syntax errors.</p>
<hr size="1">
<table border="0" width="100%">
<tr>
<td><address>
<font size="2"><b>Copyright 1997</b></font><font
size="1"><b>, </b></font><font size="2"><strong>Eric
Bezault</strong></font><strong> </strong><font
size="2"><br>
<strong>mailto:</strong></font><a
href="mailto:ericb@gobosoft.com"><font size="2">ericb@gobosoft.com</font></a><font
size="2"><br>
<strong>http:</strong></font><a
href="http://www.gobosoft.com"><font size="2">//www.gobosoft.com</font></a><font
size="2"><br>
<strong>Last Updated:</strong> 7 September 1997</font><br>
<!--webbot bot="PurpleText"
preview="
$Date: 1999/06/12 18:55:57 $
$Revision: 1.8 $"
-->
</address>
</td>
<td align="right" valign="top"><a
href="http://www.gobosoft.com"><img
src="../image/home.gif" alt="Home" border="0" width="40"
height="40"></a><a href="index.html"><img
src="../image/toc.gif" alt="Toc" border="0" width="40"
height="40"></a><a href="context.html"><img
src="../image/previous.gif" alt="Previous" border="0"
width="40" height="40"></a><a href="skeleton.html"><img
src="../image/next.gif" alt="Next" border="0" width="40"
height="40"></a></td>
</tr>
</table>
</body>
</html>
|