1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Error Handling and Recovery</title>
</head>
<body bgcolor="#FFFFFF">
<h2><a name="_bb1"></a><a name="lexicalanalysis">Error
Handling and Recovery</a></h2>
<p>All syntactic and semantic errors cause parser exceptions to be thrown. In particular,
the methods used to match tokens in the parser base class (match et al) throw
MismatchedTokenException. If the lookahead predicts no alternative of a production in
either the parser or lexer, then a NoViableAltException is thrown. The methods in the
lexer base class used to match characters (match et al) throw analogous exceptions.</p>
<p>ANTLR will generate default error-handling code, or you may specify your own exception
handlers. Either case results (where supported by the language) in the creation of a <tt>try/catch</tt>
block. Such <tt>try{}</tt> blocks surround the generated code for the grammar element of
interest (rule, alternate, token reference, or rule reference). If no exception handlers
(default or otherwise) are specified, then the exception will propagate all the way out of
the parser to the calling program. </p>
<p>ANTLR's default exception handling is good to get something working, but you will have
more control over error-reporting and resynchronization if you write your own exception
handlers. </p>
<p>Note that the '@' exception specification of PCCTS 1.33 does not apply to ANTLR.</p>
<h3><a name="ANTLR Exception Hierarchy">ANTLR Exception Hierarchy</a></h3>
<p>ANTLR-generated parsers throw exceptions to signal recognition errors or other stream
problems. All exceptions derive from <font face="Courier New">ANTLRException</font>.
The following diagram shows the hierarchy:</p>
<p><img src="ANTLRException.gif" width="646" height="263"
alt="ANTLRException.gif (14504 bytes)"></p>
<table border="0" width="100%">
<tr>
<th width="50%">Exception</th>
<th width="50%">Description</th>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">ANTLRException</font></small></td>
<td width="50%">Root of the exception hiearchy. You can directly subclass this if
you want to define your own exceptions unless they live more properly under one of the
specific exceptions below.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"></td>
<td width="50%"></td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">CharStreamException</font></small></td>
<td width="50%">Something bad that happens on the character input stream. Most of
the time it will be an IO problem, but you could define an exception for input coming from
a dialog box or whatever.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">CharStreamIOException</font></small></td>
<td width="50%">The character input stream had an IO exception (e.g., <font
face="Courier New">CharBuffer.fill()</font> can throw this). If <font
face="Courier New">nextToken()</font> sees this, it will convert it to a <font
face="Courier New">TokenStreamIOException</font>.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"></td>
<td width="50%"></td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">RecognitionException</font></small></td>
<td width="50%">A generic recognition problem with the input. Use this as your
"catch all" exception in your main() or other method that invokes a parser,
lexer, or treeparser. All parser rules throw this exception.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">MismatchedCharException</font></small></td>
<td width="50%">Thrown by CharScanner.match() when it is looking for a character, but
finds a different one on the input stream.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">MismatchedTokenException</font></small></td>
<td width="50%">Thrown by Parser.match() when it is looking for a token, but finds a
different one on the input stream.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">NoViableAltException</font></small></td>
<td width="50%">The parser finds an unexpected token; that is, it finds a token that does
not begin any alternative in the current decision.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">NoViableAltForCharException</font></small></td>
<td width="50%">The lexer finds an unexpected character; that is, it finds a character
that does not begin any alternative in the current decision.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">SemanticException</font></small></td>
<td width="50%">Used to indicate syntactically valid, but nonsensical or otherwise bogus
input was found on the input stream. This exception is thrown automatically by
failed, validating semantic predicates such as:<pre>a : A {false}? B ;</pre>
<p>ANTLR generates:</p>
<pre><small>match(A);
if (!(false)) throw new
SemanticException("false");
match(B);</small></pre>
<p>You can throw this exception yourself during the parse if one of your actions
determines that the input is wacked.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"></td>
<td width="50%"></td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamException</font></small></td>
<td width="50%">Indicates that something went wrong while generating a stream of tokens.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamIOException</font></small></td>
<td width="50%">Wraps an IOException in a <font face="Courier New">TokenStreamException</font></td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamRecognitionException</font></small></td>
<td width="50%">Wraps a <font face="Courier New">RecognitionException</font> in a <font
face="Courier New">TokenStreamException</font> so you can pass it along on a stream.</td>
</tr>
<tr>
<td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamRetryException</font></small></td>
<td width="50%">Signals aborted recognition of current token. Try to get one again. Used
by <small><font face="Courier New">TokenStreamSelector.retry()</font></small> to force <font
face="Courier New">nextToken()</font> of stream to re-enter and retry. See the
examples/java/includeFile directory.<p>This a great way to handle nested include files and
so on or to try out multiple grammars to see which appears to fit the data. You can
have something listen on a socket for multiple input types without knowing which type will
show up when.</td>
</tr>
</table>
<p><a name="_bb2"></a>The typical main or parser invoker has try-catch around the
invocation:</p>
<pre> try {
...
}
catch(TokenStreamException e) {
System.err.println("problem with stream: "+e);
}
catch(RecognitionException re) {
System.err.println("bad input: "+re);
}</pre>
<p>Lexer rules throw <font face="Courier New">RecognitionException</font>, <font
face="Courier New">CharStreamException</font>, and <font face="Courier New">TokenStreamException</font>.</p>
<p>Parser rules throw <font face="Courier New">RecognitionException</font> and <font
face="Courier New">TokenStreamException</font>.</p>
<h3><a name="Modifying Default Error Messages With Paraphrases">Modifying Default Error
Messages With Paraphrases</a></h3>
<p>The name or definition of a token in your lexer is rarely meaningful to the user of
your recognizer or translator. For example, instead of seeing</p>
<pre>T.java:1:9: expecting ID, found ';'</pre>
<p>you can have the parser generate:</p>
<pre>T.java:1:9: expecting an identifier, found ';'</pre>
<p>ANTLR provides an easy way to specify a string to use in place of the token name.
In the definition for ID, use the paraphrase option:</p>
<pre>ID
options {
paraphrase = "an identifier";
}
: ('a'..'z'|'A'..'Z'|'_')
('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;</pre>
<p>Note that this paraphrase goes into the token types text file (ANTLR's persistence
file). In other words, a grammar that uses this vocabulary will also use the
paraphrase. </p>
<h3><a name="ParserExceptionHandling">Parser Exception Handling</a></h3>
<p>ANTLR generates recursive-descent recognizers. Since recursive-descent recognizers
operate by recursively calling the rule-matching methods, this results in a call stack
that is populated by the contexts of the recursive-descent methods. Parser exception
handling for grammar rules is a lot like exception handling in a language like C++ or
Java. Namely, when an exception is thrown, the normal thread of execution is stopped, and
functions on the call stack are exited sequentially until one is encountered that wants to
catch the exception. When an exception is caught, execution resumes at that point. </p>
<p>In ANTLR, parser exceptions are thrown when (a) there is a syntax error, (b) there
is a failed validating semantic predicate, or (c) you throw a parser exception from an
action. </p>
<p>In all cases, the recursive-descent functions on the call stack are exited until an
exception handler is encountered for that exception type or one of its base classes (in
non-object-oriented languages, the hierarchy of execption types is not implemented by a
class hierarchy). Exception handlers arise in one of two ways. First, if you do nothing,
ANTLR will generate a default exception handler for every parser rule. The default
exception handler will report an error, sync to the follow set of the rule, and return
from that rule. Second, you may specify your own exception handlers in a variety of ways,
as described later. </p>
<p>If you specify an exception handler for a rule, then the default exception handler is
not generated for that rule. In addition, you may control the generation of default
exception handlers with a <a href="options.html#defaultErrorHandler">per-grammar or
per-rule option</a>. </p>
<h3><a name="SpecifyingParserException-Handlers">Specifying Parser Exception-Handlers</a></h3>
<p>You may attach exception handlers to a rule, an alternative, or a labeled element. The
general form for specifying an exception handler is:</p>
<pre><tt>
exception [label]
catch [exceptionType exceptionVariable]
{ action }
catch ...
catch ...
</tt></pre>
<p>where the label is only used for attaching exceptions to labeled elements. The <tt>exceptionType</tt>
is the exception (or class of exceptions) to catch, and the <tt>exceptionVariable</tt> is
the variable name of the caught exception, so that the action can process the exception if
desired. Here is an example that catches an exception for the rule, for an alternate and
for a labeled element: </p>
<pre><tt>
rule: a:A B C
| D E
exception // for alternate
catch [RecognitionException ex] {
reportError(ex.toString());
}
;
exception // for rule
catch [RecognitionException ex] {
reportError(ex.toString());
}
exception[a] // for a:A
catch [RecognitionException ex] {
reportError(ex.toString());
}
</tt> </pre>
<p>Note that exceptions attached to alternates and labeled elements <b>do not</b> cause
the rule to exit. Matching and control flow continues as if the error had not occurred.
Because of this, you must be careful not to use any variables that would have been set by
a successful match when an exception is caught. </p>
<h3><a name="Default Exception Handling in the Lexer">Default Exception Handling in the
Lexer</a></h3>
<p>Normally you want the lexer to keep trying to get a valid token upon lexical error.
That way, the parser doesn't have to deal with lexical errors and ask for another
token. Sometimes you want exceptions to pop out of the lexer--usually when you want
to abort the entire parsing process upon syntax error. To get ANTLR to generate
lexers that pass on <font face="Courier New">RecognitionException</font>'s to the parser
as <font face="Courier New">TokenStreamException</font>'s, use the <font
face="Courier New">defaultErrorHandler=false</font> grammar option. Note that IO
exceptions are passed back as <font face="Courier New">TokenStreamIOException</font>'s
regardless of this option.</p>
<p>Here is an example that uses a bogus semantic exception (which is a subclass of <font
face="Courier New">RecognitionException</font>) to demonstrate blasting out of the lexer:</p>
<pre>class P extends Parser;
{
public static void main(String[] args) {
L lexer = new L(System.in);
P parser = new P(lexer);
try {
parser.start();
}
catch (Exception e) {
System.err.println(e);
}
}
}
start : "int" ID (COMMA ID)* SEMI ;
class L extends Lexer;
options {
defaultErrorHandler=false;
}
{int x=1;}
ID : ('a'..'z')+ ;
SEMI: ';'
{if ( <em>expr</em> )
throw new
SemanticException("test",
getFilename(),
getLine());} ;
COMMA:',' ;
WS : (' '|'\n'{newline();})+
{$setType(Token.SKIP);}
;</pre>
<p>When you type in, say, "<font face="Courier New">int b;</font>" you get the
following as output:</p>
<pre>antlr.TokenStreamRecognitionException: test</pre>
<pre><font face="Arial" size="2">Version: $Id: //depot/code/org.antlr/release/antlr-2.7.7/doc/err.html#2 $</font></pre>
</body>
</html>
|