File: parser_ref.html

package info (click to toggle)
xmhtml 1.1.7-5
  • links: PTS
  • area: main
  • in suites: potato
  • size: 6,100 kB
  • ctags: 8,347
  • sloc: ansic: 68,063; makefile: 497; sh: 161; perl: 36
file content (518 lines) | stat: -rw-r--r-- 16,698 bytes parent folder | download | duplicates (10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
	<TITLE>XmHTMLParser Reference Guide</TITLE>
	<META HTTP-EQUIV="Keywords" CONTENT="XmHTML, HTML, Motif, Widget, eXode, XntHelp, Linux">
	<META HTTP-EQUIV="Reply-to" CONTENT="ripley@xs4all.nl">
	<META HTTP-EQUIV="Description" CONTENT="This document describes the resources, callbacks and translations associated with XmHTMLParser. XmHTMLParser provides an interactive HTML3.2 Parser Object.">
	<META NAME="Author" CONTENT="Koen D'Hondt">
	<META NAME="Copyright" content="1995-1997 by Ripley Software Development">
	<META NAME="Source" content="$Source$">
	<META NAME="Revision" content="$Revision$">
<!--
	<base href="http://www.xs4all.nl/~ripley/eXode/">
-->
	<link rev="made" href="mailto:ripley@xs4all.nl">
	<link rel="home" href="http://www.xs4all.nl/~ripley">
	<link rel="next" href="changes.html">
	<link rel="TOC" href="XmHTML.html#toc">
	<link rel="copyright" href="copyrights.html">
</HEAD>

<BODY BGCOLOR="#FFFFFF" text="#000000">
<font size="-1" face="arial, helvetica">[Note for Netscape, MSIE, Mosaic and
most other browsers: get a browser that supports HTML shorttags. This document
is littered with these elements, which are an official part of the HTML
standard and have been since HTML 1.0. The HTML Widget Set described on these
pages does know about them...]</font><p>

<a name="top"><img src="../Images/xmhtmlparser-banner.gif"></a><p>

XmHTMLParser provides an Object capable of parsing HTML 3.2 text. It offers 
both HTML 3.2 verification and repair of non-conforming HTML documents as well
as incremental document parsing. XmHTMLParser objects can be used for creating
a fully interactive HTML 3.2 parser application thru XmHTMLParser's callback
resources.
</p><p>


</p><p><a name="ClassInformation"></a>
</p><hr>

<h2>Class Information</h2>

<table cols=2 border=0>
<tr>
	<td>Include File:
	</td><td><i>&lt;Xm/Parser.h&gt;</i>
</td></tr><tr>
	<td>Class Name:
	</td><td>XmHTMLParser
</td></tr><tr>
	<td>Class Hierarchy:	
	</td><td>Object-&gt;XmHTMLParser
</td></tr><tr>
	<td>Class Pointer:
	</td><td>xmHTMLParserObjectClass
</td></tr><tr valign=top>
	<td>Functions/Macros:
	</td><td>XmCreateHTMLParser, XmHTMLParser... routines, XmIsHTMLParser.
</td></tr><tr>
</tr></table>

<p>
<a name="NewResources"></a>
</p><hr>

<h2>New Resources</h2>
XmHTMLParser defines the following resources.
<p>

</p><table cols=4 border=1>
<tr>
	<td><b/Name/
	</td><td><b/Type/
	</td><td><b/Default/
	</td><td><b/Access/
</td></tr><tr>
</tr><tr>
	<td><a href="#XmNmimeType">XmNmimeType</a>
	</td><td>String
	</td><td>text/html
	</td><td>CSG
</td></tr><tr>
	<td><a href="#XmNparserIsProgressive">XmNparserIsProgressive</a>
	</td><td>Boolean
	</td><td>False
	</td><td>CSG
</td></tr><tr>
	<td><a href="#XmNretainSource">XmNretainSource</a>
	</td><td>Boolean
	</td><td>False
	</td><td>CSG
</td></tr><tr>
	<td><a href="#XmNstrictHTMLChecking">XmNstrictHTMLChecking</a>
	</td><td>Boolean
	</td><td>False
	</td><td>CSG
</td></tr><tr>
	<td><a href="#XmNuserData">XmNuserData</a>
	</td><td>Pointer
	</td><td>NULL
	</td><td>CSG
</td></tr><tr>
</tr></table>
<p>

</p><dl>
<dt><a name="XmNmimeType">XmNmimeType</a>
</dt><dd>This resource informs how XmHTMLParser should parse text. XmHTMLParser
	knows how to handle the following mime types: <b>text/html, text/plain</b>
	and every image mime type specification that starts with <b>/image/</b>.
<p>

</p></dd><dt><a name="XmNparserIsProgressive">XmNparserIsProgressive</a>
</dt><dd>Setting this resource to <b/True/ will cause XmHTMLParser to enter
	progressive mode. 
<p>

</p></dd><dt><a name="XmNretainSource">XmNretainSource</a>
</dt><dd>When set to <b/True/, XmHTMLParser will keep the source text it has
	parsed until new text should be parsed. The default behaviour is to discard
	the source text once it has been parsed.
<p>

</p></dd><dt><a name="XmNstrictHTMLChecking">XmNstrictHTMLChecking</a>
</dt><dd>This resource enables <b/strict/ checking of HTML files according to
	the <b/HTML 3.2/ standard. Beware that <i/many/ (if not all) HTML 
	files are in violation of this standard (this document is) and that the 
	result might be not exactly what is intended. 
<p>

</p></dd><dt><a name="XmNuserData">XmNuserData</a>
</dt><dd>A pointer to data that the application can attach to the Object. This
	resource is unused internally.
</dd></dl>
<p>

</p><p><a name="CallbackResources"></a>
</p><hr>

<h2>Callback Resources</h2>
XmHTMLParser defines the following callback resources:
<p>

</p><table cols=3 border=1>
<tr>
	<td><b/Callback/
	</td><td><b/Reason Constant/
	</td><td><b/Callback Structure/
</td></tr><tr>
</tr><tr>
	<td><a name="XmNdocumentCallback">XmNdocumentCallback</a>
	</td><td>XmCR_HTML_DOCUMENT
	</td><td><a href="#XmHTMLDocumentCallbackStruct">XmHTMLDocumentCallbackStruct</a>
</td></tr><tr>
	<td><a name="XmNmodifyVerifyCallback">XmNmodifyVerifyCallback</a>
	</td><td>XmCR_HTML_MODIFYING_TEXT_VALUE
	</td><td><a href="#XmHTMLVerifyCallbackStruct">XmHTMLVerifyCallbackStruct</a>
</td></tr><tr>
	<td><a name="XmNparserCallback">XmNparserCallback</a>
	</td><td>XmCR_HTML_PARSER
	</td><td><a href="#XmHTMLParserCallbackStruct">XmXmHTMLParserCallbackStruct</a>
</td></tr><tr>
</tr></table>
<p>

All callback resources also reference XmAnyCallbackStruct.
</p><p>

XmNdocumentCallback is activated when XmHTMLParser has finished parsing a
document and before XmHTMLParserSetString or XmHTMLParserUpdateSource returns.
</p><p>

XmNmodifyVerifyCallback is activated when XmHTMLParser is about to insert
or remove text in or from the current source text.
</p><p>

XmNparserCallback is activated when XmHTMLParser encounters a HTML element
that is in error. XmHTMLParser detects unknown, unbalanced, badly placed
as well as unterminated HTML elements and HTML 3.2 violations. 
</p><p>

<a name="XmHTMLDocumentCallbackStruct"></a>
</p><hr>

<h3>XmHTMLDocumentCallbackStruct</h3>
The XmNdocumentCallback callback resource references the following structure:

<pre>
typedef struct
{
	int		reason;		/* the reason the callback was called */
	XEvent		*event;		/* always NULL */
	Boolean		html32;		/* True when document was HTML 3.2 conforming */
	Boolean		verified;	/* True when document has been verified */
	Boolean		balanced;	/* True when parser tree is balanced */
	Boolean		terminated;	/* True if parser is terminated prematurely */
	int		pass_level;	/* current parser level count. */
	Boolean		redo;		/* See below */
}XmHTMLDocumentCallbackStruct;
</pre>

The <i/reason/ field is always XmCR_HTML_DOCUMENT.
<p>

The <i/event/ field is always NULL.
</p><p>

The <i/html32/ field is set to <b/True/ when the loaded document was
found to be HTML 3.2 conformant. XmHTML performs checks on both the occurance
and contents of all HTML 3.2 elements.
</p><p>

The <i/verified/ field is set to <b/True/ when XmHTML has successfully
verified the HTML semantics of the loaded document.
</p><p>

The <i/balanced/ field is set to <b/True/ when the parser has generated
a balanced tree. A balanced tree is one in which each terminated HTML element
has its opening and closing members at the same level in the tree.
</p><p>

The <i/terminated/ field will be set to True when the parser finished parsing
the current document before it actually ended. This can only happen when the
parser encounters an internal error.

The <i/pass_level/ field contains the number of passes performed so far
by the parser. This number starts at 1.
</p><p>

Setting the <i/redo/ field to <b/True/ will instruct the parser to make
another pass on the current document. It is only effective when the parser
failed to generate a balanced tree. Since using an unbalanced tree can lead
to weird markup, XmHTML will set this field to <b/True/ by default.
</p><p>

When no XmNdocumentCallback callback resource is installed, XmHTML will make
at most two passes on the current document.

See the <a href="parser.html">Parser Description</a> document for more
information.

</p><p><a name="XmHTMLVerifyCallbackStruct"></a>
</p><hr>

<h3>XmHTMLVerifyCallbackStruct</h3>
The XmNmodifyVerifyCallback callback resource references the following
structure:

<pre>
typedef struct{
	int 		reason;		/* the reason the callback was called */
	XEvent		*event;		/* always NULL */
	Boolean		doit;		/* unused */
	int		action;		/* type of modification */
	int		line_no;	/* current line number in input text */
	int		start_pos;	/* start of text to change */
	int		end_pos;	/* end of text to change */
	XmHTMLTextBlock	text;		/* describes text to remove or insert */
}XmHTMLVerifyCallbackStruct, *XmHTMLVerifyPtr;
</pre>

The <i/reason/ field is always XmCR_HTML_MODIFYING_TEXT_VALUE.
<p>

The <i/event/ field is always NULL.
</p><p>

The <i/action/ field contains the type of action XmHTMLParser is about
to make to the current source text. When this field contains the value
<b/HTML_REMOVE/, XmHTMLParser is removing text, and when it is
<b/HTML_INSERT/, XmHTMLParser is inserting text.
</p><p>

<i/start_pos/ specifies the location at which to start modifying text
while <i/end_pos/ specifies the location at which to stop modifying text.
<i/end_pos/ is the same as <i/start_pos/ when <i/action/ is
<b/HTML_INSERT/. Both positions are always absolute to the first character
of the current source text (defined as 0) and take the number of characters
inserted or removed so far into account.
</p><p>

The <i/line_no/ field contains the line number in the <i/unmodified/
source text where the action takes place.
</p><p>

The <i/text/ field points to the structure below, which specifies the
text that is about to be inserted or removed.

</p><pre>
typedef struct{
	String		ptr;		/* pointer to text to remove/insert */
	int		len;		/* length of this text */
}XmHTMLTextBlockRec, *XmHTMLTextBlock;
</pre>
<p>

The <i/doit/ field is currently unused and is reserved for future use.
</p><p>

<a name="XmHTMLParserCallbackStruct"></a>
</p><hr>

<h3>XmHTMLParserCallbackStruct</h3>
The XmNparserCallback resource references the following structure:

<pre>
typedef struct{
	int		reason;		/* the reason the callback was called */
	XEvent		*event;		/* always NULL */
	int		errno;		/* total error count uptil now */
	int		line_no;	/* current line number in input text */
	int		start_pos;	/* start of text in error */
	int		end_pos;	/* end of text in error */
	parserError	error;		/* type of error */
	unsigned char	action;		/* suggested correction action */
	XmHTMLTextBlock	repair;		/* proposed element to insert */
	XmHTMLTextBlock	current;	/* current element */
	XmHTMLTextBlock	offender;	/* offending element */
}XmHTMLParserCallbackStruct, *XmHTMLParserPtr;
</pre>

The <i/reason/ field is always XmCR_HTML_PARSER
<p>

The <i/event/ field is always NULL.
</p><p>

The <i/errno/ field contains the total number of errors encountered so far.
</p><p>

The <i/line_no/ field contains the line number in the <i/unmodified/
source text where the error was found.
</p><p>

<i/start_pos/ specifies the location at which the offending element
starts while <i/end_pos/ specifies the location at which the offending
element ends. Both positions are always absolute to the first character of
the current source text (defined as 0) and take the number of characters
inserted or removed so far into account.
</p><p>

<i/current/ describes the element prior to inserting the offending element,
<i/offender/ describes the element that is in error, and <i/repair/ describes
the element that can possibly repair this error. Except when <i/error/ is
HTML_TERMINATE or HTML_NOTIFY, <i/offender/ will always contain a valid value.
</p><p>

The <i/error/ field indicates the type of error encountered by XmHTMLParser.
The table below lists all possible values for this field, a description of
what this error represents, the default action XmHTMLParser undertakes when
this error is encountered and whether the <i/current/ and <i/repair/ fields
are valid.
</p><p>

</p><table cols="4" border="1">
<tr>
	<td><b/error/
	</td><td><b/Description/
	</td><td><b/Default Action/
	</td><td><b>current/repair</b>
</td></tr><tr valign="top">
	<td>HTML_BAD
	</td><td>An element is completely out of order and the internal autocorrection
		routines cannot find a proper place for this element.
	</td><td>HTML_REMOVE
	</td><td>Yes/No
</td></tr><tr valign="top">
	<td>HTML_CLOSE_BLOCK
	</td><td>A closing block level element is encountered while it was never opened.
	</td><td>HTML_REMOVE
	</td><td>No/Yes
</td></tr><tr valign="top">
	<td>HTML_INTERNAL
	</td><td>An internal error was encountered.
	</td><td>HTML_TERMINATE
	</td><td>No/No
</td></tr><tr valign="top">
	<td>HTML_NOTIFY
	</td><td>Notification of insertion of an optional opening/closing element.
	</td><td>HTML_INSERT
	</td><td>No/Yes
</td></tr><tr valign="top">
	<td>HTML_OPEN_BLOCK
	</td><td>A new block-level element is encountered while a previous block
		element is still open.
	</td><td>HTML_INSERT
	</td><td>Yes/Yes
</td></tr><tr valign="top">
	<td>HTML_OPEN_ELEMENT
	</td><td>an unbalanced terminator is encountered.
	</td><td>HTML_SWITCH
	</td><td>Yes/Yes
</td></tr><tr valign="top">
	<td>HTML_VIOLATION
	</td><td>a HTML 3.2 violation was encountered.
	</td><td>HTML_INSERT, HTML_KEEP or HTML_REMOVE
	</td><td>Yes/Dynamic
</td></tr><tr valign="top">
	<td>HTML_UNKNOWN_ELEMENT
	</td><td>an unknown element was encountered.
	</td><td>HTML_REMOVE
	</td><td>No/No
</td></tr><tr>
</tr></table>
<p>

The HTML_VIOLATION error is a special case. When XmHTMLParser can find a
suitable element that will cause the offending element to be no longer in 
violation of the HTML 3.2 standard, it will propose to insert this new element.
When it can't find one, the default action depends on the value of the 
<a href="#XmNstrictHTMLChecking">XmNstrictHTMLChecking</a> resource. When this
resource is set to <b/True/, the default action will be to remove the
offending element and to keep it otherwise.
</p><p>

<i/action/ suggests the appropriate action to undertake in response
to a given error. Setting this action field to a different value than the
suggested one will cause the parser to change its behavior. Shown below are
all possible values for the <i/action/ field, their meaning and for which
error this action is a valid response.
When action is set to <i/HTML_TERMINATE/ the parsing process will be
terminated immediatly.
</p><p>

</p><table cols="2" border="1">
<tr>
	<td><b/Action/
	</td><td><b/Meaning/
	</td><td><b/Allowed For/
</td></tr><tr valign="top">
	<td>HTML_ALIAS
	</td><td>Replace <i/offender/ with <i/repair/
	</td><td>HTML_UNKNOWN_ELEMENT
</td></tr><tr valign="top">
	<td>HTML_IGNORE
	</td><td>Ignore this error, proceed as if nothing happened
	</td><td>HTML_BAD, HTML_INTERNAL
</td></tr><tr valign="top">
	<td>HTML_INSERT
	</td><td>Insert <i/repair/
	</td><td>HTML_CLOSE_BLOCK, HTML_NOTIFY, HTML_OPEN_BLOCK, HTML_VIOLATION
</td></tr><tr valign="top">
	<td>HTML_KEEP
	</td><td>Keep <i/offender/
	</td><td>HTML_CLOSE_BLOCK, HTML_OPEN_BLOCK, HTML_VIOLATION
</td></tr><tr valign="top">
	<td>HTML_REMOVE
	</td><td>Remove <i/offender/
	</td><td>all <b/except/ HTML_NOTIFY and HTML_INTERNAL
</td></tr><tr valign="top">
	<td>HTML_SWITCH
	</td><td>Switch <i/offender/ and <i/repair/
	</td><td>HTML_OPEN_ELEMENT
</td></tr><tr valign="top">
	<td>HTML_TERMINATE
	</td><td>Terminate parser
	</td><td>All errors
</td></tr><tr>
</tr></table>
<p>

<a name="InheritedResources"></a>
</p><hr>

<h2>Inherited Resources</h2>
XmHTMLParser inherits the following resources. The resources are listed 
alphabetically, along with the superclass that defines them.
<p>

</p><table cols=2 border=1>
<tr>
	<td><b/Resource/
	</td><td><b/Inherited From/
</td></tr><tr>
	<td>XmNdestroyCallback
	</td><td>Object
</td></tr><tr>
</tr></table>

<p><a name="Translations"></a>
</p><hr>

<h2>Translations</h2>
XmHTMLParser does not define any translations.
<p>

</p><p><a name="ActionRoutines"></a>
</p><hr>

<h2>Action Routines</h2>
XmHTMLParser does not define any actions.

<p><IMG SRC="../Images/wood/bar.gif" width=508 height=15><br>

<img ismap usemap=#back src="../Images/wood/back.gif" border=0>
<map name=back>
	<area href="progguide.html" shape=rect coords=0,0,83,33>
</map>

<img ismap usemap=#home src="../Images/wood/home.gif" border=0>
<map name=home>
	<area href="http://www.xs4all.nl/~ripley" shape=rect coords=0,0,83,33>
</map>

<img ismap usemap=#email_map src="../Images/wood/email.gif" border=0>
<map name=email_map>
	<area href="mailto:ripley@xs4all.nl" shape=rect coords=0,0,83,33>
</map>

<br><IMG SRC="../Images/wood/bar.gif" width=508 height=15><br>
<i><font size="-1">
&copy;Copyright 1996-1997 by Ripley Software Development<br>
Last update: September 19, 1997 by Koen
</font></i>
</p></BODY>
</HTML>