<title>Syntax error handling</title>
<h1 align="center">Syntax error handling<br />>=======================<</h1>
[This file explains how netrik handles syntax errors in HTML pages, why it does
so, and what to do about that. See index.txt or
<a href="index.html">index.html</a> for an overview of available
<h2>Why does netrik always complain about HTML syntax errors??</h2>
<h3>No matter what site I load, (almost) always I get error messages. That can't be OK?!</h3>
Well, it *is* OK... The problem is: Actually almost all sites *are* more or
Other browsers simply ignore the errors, hoping that the effect will resemble
more or less what the author intended -- and nobody ever knows, but for some
little blemish maybe, where no one cares about. (If Netscape wasn't so tolerant
about syntax errors in the first place, we wouldn't have that problem today :-(
) However, the browser can only "guess" what the author intended, and that
doesn't always work out; such errors can cause the page to be layouted
completely wrong, including missing text parts etc. That's why netrik warns
about them, so the user at least knows what the matter is, and also gets the
chance to tell the page author about the problem. (s.b.)
Note that XHTML even *requires* a browser to abort on syntax errors -- sadly,
XHTML is not very popular. (Yet?...)
Of course, it might also happen that netrik sees an error where there is none
-- but this is rather rare, and would indicate a bug in netrik. Of course, if
you encounter such a situation, don't hesitate to send a bug report. ($
<h2>Can't I turn that off?</h2>
Well, actually, you can: There are the two options "--broken-html" and
"--ignore-broken". "--ignore-broken" will prevent netrik complaining about
*any* syntax errors, while "--broken-html" only turns off warnings about common
errors which in most cases can be guessed correctly and nicely worked around.
(But not always!)
Presently, "--broken-html" is the default, so less critical erros won't cause
netrik to stop and wait for keypress. (The error messages are still displayed,
but they just scroll through in this mode.) This is because the current
interface makes the warnings quite obtrusive; we will change the default again
when a better UI will allow displaying the messages in a less disturbing
However, please avoid usage of these options if possible. While "--broken-html"
probably can't be helped for now, it seems generally a bad idea to use
Still, please take the (little) trouble to tell the page author about the
<h2>Reporting a problem</h2>
Normally you will find an e-mail address of the right contact at the bottom of
the page, or on some "contact" page. Most authors will be greatful for the
information, and will gladly fix the problem -- once and for all.
You can improve your chances by telling exactly what the problem is. To find
out, you can load the page again, but using the "--debug" option. While parsing
the page, netrik will dump the source in this mode. When an error is
encountered, the last charactacter printed before the error message is the
offending one. Note however that in some cases the real reason for the problem
may be somewhere before.
If the output is too big, you may either use the --dump option and pipe the
output through some pager (but note that the debug output is written to stderr,
so you'll need to redirect it: "netrik --debug --dump <url> 2>&1 |less -R" or
something the like), or you can use the "--fussy-html" option, causing netrik
to abort after the first error is detected.
Alternatively, you can use the W3Cs (WWW Consortium) HTML validating service.
This will create a report with all problems nicely listed -- very helpful for
the page author.
<h2>A note on comments</h2>
Netrik implements full SGML comment parsing. The problem is that the comment
syntax in SGML is fairly complicated; most authors do not know it, and thus do
not know that putting a "--" string inside a comment normally generates errors.
In most cases these are quite obvious; netrik will print an error message, and
use a workaround which will work in most cases, but not always -- just like for
other typical errors.
However, there are some constructs which *are* valid SGML -- but do not make
what the author probably intended. "<!------>" is a typical example: It is
valid, but the comment doesn't terminate; everything behind it will also be
treated as part of the comment! As it is not an error, netrik can't print an
error message or apply a workaround; instead, just a warning is printed.
Often such an unterminated comment will generate other errors later, because it
interferes with other comments or because it stretches to the end of the file;
in such a case, the warning may help finding the problem. In other cases
however, no error is generated at all -- the warning is the only cue that
something went wrong.
<h2>Warnings in general</h2>
Other similar situations are thinkable. That's why netrik also prints warnings
on things that are strictly speaking valid hatml, but explicitly discouraged
in the HTML standard -- while some browsers will handle them correctly, others
might do otherwise; or sometimes a really correct behaviour even produces
results different from what the author intended and sees in popular browsers.
Warnings will normally (in --valid-html or --broken-html mode) only scroll
through, but without pausing with an additional message after parsing is
finished. To see all warnings, use --clean-html .