1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432
|
<head><title>Dark Side of the HTML</title></head>
<body bgcolor="#101010" text="#d0d0d0" link="#ffc0c0" vlink="#ff8080">
<table border="0" width="100%">
<tr>
<td width="20%"> </td>
<td><p><font size="-1">
<p>"...Decent surfing value..."<br>
"...A long rant..."<br>
"... If you ignore the spelling and grammatical errors, ... you will find this enlightening..."<br>
<p>
</font>
</td>
</tr>
</table>
<br>
<h1><tt>D a r k</tt><br>
<tt>S i d e</tt><br>
<tt>O f</tt><br>
<tt>T h e</tt><br>
<tt><img src="html.gif" align = top alt = "HTML"></tt></h1>
<table border="0" width="100%">
<tr>
<td width="40%"> </td>
<td>
There are more things in heaven and earth, Horatio,<br>
Than are dreamt of in your philosophy.
</td>
</tr>
<tr><td width = "40%"> </td><td align=right>Shakespeare.</td></tr>
</table>
<p>
this is <b><i>overlapping bold italic text</b></i>
<br>
<h3><a name="intro">Historical Note.</a></h3>
<dl><dd>
<font size="-1">This page was around for so damn long (at least by the Internet Time Standards) without
<i>any</i> modifications, so it ended up living in its own time-space continuum without any
visible correlation to ours.
I felt compelled to do something about it, and after spending countless sleepless
nights in thinking how to avoid rewriting of the whole thing, I came up with this
<a href="bkgrnote.html">historical background</a> note.
</font>
</dl>
<h4>Latest developments.</h4>
<dl><dd>
<font size="-1">
At the <a href="#darkest">end</a>. Mind-shattered news. <b>The</b> Darkest stuff.<br>
[<tt>updated 26-December-1996</tt>]
</font>
</dl>
<h2><a name="intro">Intro</a></h2>
<dl><dd>
It's a common knowledge that all documents on the WWW should be written in
so-called HTML, aka HyperText Markup Language.
<p>Much less is known what to <i>count</i> as HTML.
<p>There's rather vague relationship between HTML (Hypertext Markup Language)
as (almost) standartized by Internet Engineering Task Force and whatever is
called HTML as it implemented by WEB browsers.
To add confusion, there's some <i>levels</i> and <i>revisions(?)</i> of HTML.
For example there's HTML-2.0 Level-1, and HTML-3.0. HTML-2.0 supposed to be
some sort of the standard most browsers are trying to support.
<p>To make things really obfuscated, should be noted that HTML is a
<b><i>markup</i></b> language. Markup means that it will <i>mark</i>
different elements of your document, but how this document will be seen by
WEB wandering individuals
is on total behalf of miscellaneous WEB browsers running on a multitude of
variuos operating systems.
<p>What's interesting, some of the browsers are rushing to support
yet-to-be-defined HTML-3.0, by the way ignoring some basic features of
the (almost) standard HTML-2.0. Looks like HTML profanation is getting
really profound.
<p>Sudden paroxysm of critical paranoia stroke me and I wrote this page.
</dl>
<h2><a name="tags">Tags</a></h2>
<dl><dd>
<p>Tags are the base of HTML. Tags is what differentiates HTML from simple
dull dumb boring plain vanilla ASCII text. Sometimes it may seems HTML is just
loose collection of various tags. Unfortuantely, HTML is also a
<i>language</i> -- in it's own right.
Language -- it what is last letter in the HTML stands for.
But story so far will be about tags.
<p>Very little attention is paid to what exactly tag <i>is</i>.
Common sense says that tag is some word (called tag identifier)
surrounded by the angle brackets.
For example, <H1> declares beginning of a heading level ONE and law abiding browser
should display aforementioned heading in rather big font.
Right angle bracket, a.k.a less-than sign is
called start-tag open symbol, and left angle bracket,
a.k.a greater-than sign is called tag close symbol.
<p>Most of tags should be balanced -- when they are belong
to the element with certain context, like
<pre>
<H1>This is heading number one</H1>
</pre>
-- where opening tag is followed by the closing tag.
Note, that closing-tag open symbol is less-than sign followed by slash,
<tt><b></</b></tt>.
For some HTML elements open or close or even both tags could be omitted.
For example paragraph close tag </P> could be omitted.
<h3>Diversion</h3>
<p>Being a lousy typist, I'm having severe troubles in typing markup.
For example to get angle quotes you should keep pressing and releasing SHIFT
key while tapping on less-or-greater-than keys. Also typing proper
closing tags is quite boring, especially when they are nested.
<p>So I was quite excited when I've found that HTML-2.0 language definition
allows <i>Tag minimization</i> :
buried deep inside dark mess of HTML-2.0 SGML declaration (don't miss it with
DTD - Document Type Definition) was <i>the</i> magick word
SHORTTAG in FEATURES section and it was set to <b>YES</b>.
<p>I've rushed to my keyboard and typed this :
<pre>
<H1/First minimized HTML tag ever typed by humanity/
</pre>
Nothing happened.
<p>Netscape (which I'm, as millions of other people, evaluating for 90
days on the fact whether to purchase an ongoing license to the Software
or rather not) just ignored this thing as if it weren't there. Mosaic for Windows
won't go much further either.
<p>Slightly puzzled whether my knowledge is wrong or browers are screwed,
I went to the <a href="http://www.webtechs.com/html-val-svc/">
HTML Validation Service</a> (went - in cyber sense, you know; on the matter
I've just made a search on Yahoo)
and found that minimization tags are perfectly legal even in the strong arm of
the <b><i>Strict</i> HTML</b> law.
<p>Now I'm entertaining myself by hounding various web browsers with
several test pages shown below.
<p>Here they are -- perfectly HTML-compliant and utterly useless, tragically
invisible and infernally hostile to any existing HTML rendering device ...
<p><b>Minimization TAGS</b>
<p><ul>
<li><b>Empty tags</b> : tags which identifier can be omitted and will be
implied by the HTML reader (I cannot type browser since there's no one
capable to do so).<br>
<b>Empty start-tag</b> : consists from start-tag open and tag close symbols
(<tt><</tt> and <tt>></tt> respectively) without any space in between.<br>
If such tag is encountered by the HTML reader, the program will give to the empty
tag identifier of the most recently started element.
<pre>
<UL>
<LI> this is the first item of the list
<> this is second one -- implied identifier is LI
</>
</pre>
which is rendered as:
<ul>
<li> this is the first item of the list
<> this is second one -- implied identifier is LI
</>
- note, this unordered list was ended by <i>empty end-tag</i>.<br>
<b>Empty end-tag</b>: consists of end-tag open and tag close symbols
(i.e. <tt></></tt>)<br>
Identifier given to such tag by the HTML program is always that of the of the
last element to be opened:
<pre>
Some <B>bold text with empty end tag </> -- right here.
</pre>
Now check out how your browser will chew up <a href="empty.html">page</a> with such tags.
Doesn't it looks like <a href="empty1.html">this one</a>?
<p><li><b>Unclosed tags</b> : where two or more consecutive tags are required
in a document, end delimiters of all tags except the very last one in the sequence,
can be omitted:
<pre>
This text is <b<i> bold and italic at once </b</i>.
</pre>
Take a look at the <a href="unclosed.html">page</a> filled with such tags.
Obviously it should looks like <a href="unclosed1.html">this</a>, eh?
<p><li><b>Null-end tags</b> : allows to specify the end of an element with a single character,
like this:
<pre>
<H1/Header with null-end tag/
</pre>
Null-end tag consists of start-tag open symbol followed by the tag identifier and
textual data enclosed within two null-end tag symbols (slash).<br>
Appreciate how your browser will screw up <a href="null_end.html">such page</a>
which obviously gonna look like <a href="null_end1.html">this</a>.
</ul>
<p>So far I've stressed following browsers :
<ul>
<li> Netscape for Windows NT, version 1.2 and 2.0b1.
<li> Netscape for X-Windows, version 1.1N
<li> Mosaic for Windows, version 2.0
<li> Arena, version 0.98
</ul>
<p>Needless to say, none of them was capable to handle minimized tags.
<p>I have been told that Harmony Hyper-G Text Viewer can cope with MINIMIZED tags,
but since it cannot work from behind the firewall, I was unable to verify its
capabilities.
<p>Experience with Arena browser was the most inspiring. This browser is supposed
to be testbed for upcoming HTML-3.0 standard and have little indicator telling
you whether HTML ducument is bad (i.e. incorrect) or not. For <i>all</i> the
sample pages shown above and using minimization tags, it undoubtely flashed "Bad HTML"
sign. But these pages was Strict HTML-3.0 checked ! What HTML we're
talking about after all?
<p>Oh, yes - if you think this all is a joke - go to the
<a href="http://www.webtechs.com/html-val-svc/"> HTML Validation Service</a>
and check it for yourself.
</dl></dl>
<h2> Wait, there's more ...</h2>
<dl><dd>
<p>During HTML validation, I've found some things that contradicted to something
I've heard just before. Nothing serious, just another little critical paranoia splash:
<h3>Ubiquitous <P> tag</h3>
Paragraph element is one of few elements which <i>end-tag</i> symbol
could be omitted.<br>
Roaming around varous WEB tutorials and various sorts of wisdom stores
I've seen mentions about bad style of having paragraph break after something
which implies paragraph break by itself. Like having <P> tag right
after </h1>. Ultimate link lead to the
<a href="correction2.html">HTML spec. page</a> which showed
two examples (I've edited them for brevity's sake):
<p><b>"Bad"</b>
<pre> <h1>What not to do</h1>
<p>This is like bad or something...
</pre>
<p><b>"Good"</b>
<pre> <h1>What to do</h1>
This is like good <p>or something...<p>
</pre>
<p>Without much hesitation, I've feed both examples to
<a href="http://www.webtechs.com/html-val-svc/"> HTML Validation Service</a>
and slammed them against strict HTML-2.0.<br>
As I expected results were exacly in reverse to the name of examples : "Bad"
example passed test and "Good" example caused wrath of compiler.
<p>After consulting with HTML-2.0 spec. I've found that Validation Service
was 100% right (it would be surprising if it wouldn't). If someone doesn't know -
in HTML-2.0 paragraph is <i>non-empty</i> element with mandatory start tag
and optional end tag (<tt><P> and </P></tt> respectively). Therefore
paragraph element in HTML-2.0 can contain arbitrary number of subelements -
lists, text data, etc. In HTML-1.0 paragraph was <i>EMPTY</i> element,
which actually represented not a paragraph, but rather paragraph break and
had only start tag, <tt><P></tt> -- similar to the line break
<tt><BR></tt>. I failed to find HTML-1 DTD, but I think things are
pretty close to what I've described.
<p>Note, that wedging paragraph tags <i>within</i> <h1>...</h1> <b>is</b>
an error, so don't try to catch me on this.
<p>Another note: non-strict HTML-2.0 is more relaxed, so both examples would be ok.
<p>After testing there was one question left - where's such interesting
HTML specification page
came from? My guess (and I think I'm right with probability about 0.99) :
this was remnants of the HTML-1.0 spec. safely decomposing in some of the dark
corners of the W3 consortium. I've failed to find head or TOC of this document.
Interestingly enough,
<a href="correction1.html">link</a> to HTML-1.0 spec. on
<a href="http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html">W3 page</a> was
hoplessly broken too.
<h3>Minimal HTML document</h3>
<p>Looking at different HTML tutorials I've found suprising multitude of opinions
what should be considered as <i>minimal</i> HTML document. Id est what is the minimal
amount of tags you should put to make your ASCI text look like valid HTML document?
In other words, what is this thin metaphysical boundary beyond which plain ASCII
text became <b>HYPER</b>?
To cut off the fuzziness of the word "valid", I decided <i>valid</i>
would be <i>fully conforming to
HTML-2.0 (strict) specification</i> (or DTD -- for those who behold).
Since I wasn't sure by myself about what is the minimal valid document is,
I digged up HTML-2.0 spec and stared at it for a moment.<p>
I've found following amazing (or maybe not) facts:
<ul>
<li> HTML document is <b>content</b> surrounded by tags <HTML>...</HTML>
<li> <b>content</b> is <b>HEAD</b> followed by <b>BODY</b>.
<li> <b>HEAD</b> is mandatory <b>TITLE</b> plus <i>optional</i> <b>ISINDEX</b> and <b>BASE</b>.
<li> <b>BODY</b> is a collection of <b>headings</b> <b>text</b>, etc. repeated
<i>0 (zero)</i> or more times (note emphasis on zero -- it means all actual
information in HTML document is <i>optional</i>).
</ul>
<p>Summing all of the above minimal document would look like :
<pre>
<HTML>
<HEAD>
<TITLE>Minimal HTML Document</TITLE>
</HEAD>
<BODY>
</BODY>
</HTML>
</pre>
<p>But...all the elements except <b>TITLE</b> happened to have optional
start-tag and end-tag symbols ! So until you not a typing maniac, minimal
HTML document would be:
<pre>
<TITLE>Minimal HTML Document</TITLE>
</pre>
<p>If an HTML document is to convey any sort of information, minimal
<i>HTML-2.0 strict</i> -conforming document would be
(note <b><P></b> symbol!) :
<pre>
<TITLE>Minimal HTML Document</TITLE>
<P>Some text without any spark of sense.
</pre>
<p>Oh, yes, if we'd want to treat our minimal HTML documents as <i>SGML</i> one,
<b>document identifier</b> should precede everything, like:
<pre>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<TITLE>Minimal HTML Document</TITLE>
</pre>
<p>-- but now we starting to play by the SGML rules and no browser can stand where
SGML reigns...
<h3>Non-breaking space</h3>
<p>This element can be used whenever you want to protect your precious spaces
from all-spaces-in-one jamming browser.
<p>Non-breaking space value is 160, symbol is <tt><b>&#160</b></tt> and code is
<tt><b>&nbsp;</b></tt>. Code <tt>&nbsp</tt> is a part of HTML-2.0,
by the way.
<p>Some browsers, like any version of X-Mosaic, ignore both
<tt>&#160</tt> and <tt>&nbsp;</tt> or, like Arena -- only <tt>&#160</tt>,
while Mosaic for Windows and Netscape
(for everything) can cope with both.
<p>Should be noted, proportional font (default in many browsers) usually have
pretty narrow space character, so it is advisable to switch to the fixed
font before using non-breaking space.
<p>Here's some example:
<pre>
<dl><dd>
<tt>&nbsp;&nbsp;&nbsp;</tt>Look,
here's some paragraph with indent,<br>
whoa -- check it out.
</dl>
</pre>
<p> will looks like
<p> <dl><dd>
<tt> </tt>Look,
here's some paragraph with indent,<br> whoa -- check it out.
</dl>
<p>Useful side effect of the non-breaking space is that latter is not
considered by the the browsers as space at all,
so it could be used whenever you want to protect words from breaking apart.
</dl>
<h2>Moral of the story</h2>
<dl><dd>
<p>Now that I've have enough of the subject and it would be just right
time to outline what I've tried to tell and what would be the best approach
to cope with HTML:
<ul>
<li> Today's HTML is ruled by the Lynch Mob of the various WEB browsers. <br>
Whatever any of this browsers is capable to grind - is <i>the</i> HTML.
Obviously such HTML is changing from browser to browser.
<li> "Standard" HTML (HTML-2.0) is rather abstract thing : you may create
files complying to HTML standard, or spec., you may verify them using
free public service (if it gives you any relief), but there's no
guarantee that your document could be viewed at all (a little exaggeration here).
<li> If your page can be viewed by the certain browser -- stick with it. Put
disclaimer like "This page is optimized for NetZillaSoft Naviplorer, v. 0.003",
and consider all other people not using your browser are losers. This is much
better than the previous case : at least you can be sure your document will be
accepted by the at least one type of browser.
<li> If you have some information that you'd think is really KEWL, use
minimal amount of tags. While contents is k00l nobody would actually care
about absence of glitzy pics and fancy adornments.
<li> Don't trust any HTML tutorials, manuals, collection of advices, etc.
including this one.
<li> Don't try to feed this page to HTML Validator, it won't pass anyway.
Actually, who cares?
<li> Browse safely.
</ul>
</dl>
<h2>Credits.</h2>
<dl><dd>
<p><b>HAIL</b> to folks at WebTechs (formerly HAL)
for the pretty useful HTML Validation Service referred
throughout this manuscript. It saved me quite a time on running SP manually.<br>
<p><b>HAIL</b> to James Clark @ jclark.com, creator of the most profound SGML parser so far.
One of the previous version of this parser is used in the HTML Validation Service.
</dl>
<h2><a name="darkest">The end.</h2>
<dl>
<dt><tt>26-December-1996</tt>
<dd><p>Beyound the dark side: loads of incredibly odd information about tables
in <a href="http://www.absurd.org/absurd/tablemaquia">TABLEMAQUIA</a>.
</dl>
<hr>
<h5> You can send your frustrated comments to
<a href = "mailto:sur_html@sem.vip.best.com">me</a>. Take care then. </h5>
<h6>Disclaimer: This page is not information.</h6>
</body>
|