File: manual.html

package info (click to toggle)
lua-expat 1.5.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 392 kB
  • sloc: ansic: 723; makefile: 32
file content (487 lines) | stat: -rw-r--r-- 21,852 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
<!DOCTYPE html>
<html>
<head>
	<title>LuaExpat: XML Expat parsing for the Lua programming language</title>
	<link rel="stylesheet" href="./doc.css" type="text/css"/>
	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
</head>
<body>

<div id="container">

<div id="product">
	<div id="product_logo"><a href="https://github.com/lunarmodules/luaexpat">
		<img alt="LuaExpat logo" src="luaexpat.png"/>
	</a></div>
	<div id="product_name"><big><strong>LuaExpat</strong></big></div>
	<div id="product_description">XML Expat parsing for the Lua programming language</div>
</div> <!-- id="product" -->

<div id="main">

<div id="navigation">
<h1>LuaExpat</h1>
	<ul>
		<li><a href="index.html">Home</a>
			<ul>
				<li><a href="index.html#overview">Overview</a></li>
				<li><a href="index.html#status">Status</a></li>
				<li><a href="index.html#download">Download</a></li>
				<li><a href="index.html#history">History</a></li>
				<li><a href="index.html#references">References</a></li>
				<li><a href="index.html#credits">Credits</a></li>
			</ul>
		</li>
		<li><strong>Manual</strong>
			<ul>
				<li><a href="manual.html#introduction">Introduction</a></li>
				<li><a href="manual.html#building">Building</a></li>
				<li><a href="manual.html#installation">Installation</a></li>
				<li><a href="manual.html#parser">Parser Objects</a></li>
			</ul>
		</li>
		<li><a href="examples.html">Examples</a></li>
		<li><strong>Additional parsers</strong>
			<ul>
				<li><a href="lom.html">Lua Object Model</a></li>
				<li><a href="totable.html">Parse to table</a></li>
				<li><a href="threat.html">Threat protection</a></li>
			</ul>
		</li>
		<li><a href="https://github.com/lunarmodules/luaexpat">Project</a>
			<ul>
				<li><a href="https://github.com/lunarmodules/luaexpat/issues">Bug Tracker</a></li>
				<li><a href="https://github.com/lunarmodules/luaexpat">Source code</a></li>
				<li><a href="https://lunarmodules.github.io/luaexpat/">Documentation</a></li>
			</ul>
		</li>
		<li><a href="license.html">License</a></li>
	</ul>
</div> <!-- id="navigation" -->

<div id="content">

<h2><a name="introduction"></a>Introduction</h2>

<p>LuaExpat is a <a href="http://www.saxproject.org/">SAX</a> XML
parser based on the <a href="https://libexpat.github.io/">Expat</a> library.
SAX is the <em>Simple API for XML</em> and allows programs to:
</p>

<ul>
	<li>process a XML document incrementally, thus being able to handle
	huge documents without memory penalties;</li>

	<li>register handler functions which are called by the parser during
	the processing of the document, handling the document elements or
	text.</li>
</ul>

<p>With an event-based API like SAX the XML document can be fed to
the parser in chunks, and the parsing begins as soon as the parser
receives the first document chunk. LuaExpat reports parsing events
(such as the start and end of elements) directly to the application
through callbacks. The parsing of huge documents can benefit from
this piecemeal operation.</p>

<p>LuaExpat is distributed as a library and a file <code>lom.lua</code> that
implements the <a href="lom.html">Lua Object Model</a>.</p>


<h2><a name="building"></a>Building</h2>

<p>
LuaExpat could be built for Lua 5.1 to 5.4.
The language library and headers files for the desired version
must be installed properly.
LuaExpat also depends on Expat 2.0.0+ which should also be installed.
</p>

<p>The simplest way of building and installing LuaExpat is through LuaRocks.</p>

<p>
LuaExpat also offers a Makefile.
The file has some definitions like paths to the external libraries,
compiler options and the like.
One important definition is the version of Lua language,
which is not obtained from the installed software.
</p>


<h2><a name="installation"></a>Installation</h2>

<p> installation can be done using LuaRocks or <code>make install</code>.</p>

<p>Manually, the compiled binary file should be copied to a directory in your
<a href="http://www.lua.org/manual/5.1/manual.html#pdf-package.cpath">C path</a>.
The Lua files <code>./src/lxp/*.lua</code> should be copied to a directory in your
<a href="http://www.lua.org/manual/5.1/manual.html#pdf-package.path">Lua path</a>.</p>

<h2><a name="parser"></a>Parser objects</h2>

<p>Usually SAX implementations base all operations on the
concept of a parser that allows the registration of callback
functions. LuaExpat offers the same functionality but uses a
different registration method, based on a table of callbacks. This
table contains references to the callback functions which are
responsible for the handling of the document parts. The parser will
assume no behaviour for any undeclared callbacks.</p>

<h4>Finishing parsing</h4>

<p>Since the parser is a streaming parser, handling chunks of input at a time,
the following input will parse just fine (despite the unbalanced tags);</p>
<pre class="example">
    &lt;one&gt;&lt;two&gt;some text&lt;/two&gt;
</pre>
<p>Only when making the final call (with no data) to the <em>parse</em> method, the
document will be closed and an error will be returned;</p>
<pre class="example">
    assert(lxp.parse())
</pre>
<p>Closing the document is important to ensure the document being complete and valid.</p>

<h4>Constructor</h4>

<dl class="reference">
	<dt><strong>lxp.new(<em>callbacks [, separator[, merge_character_data]]</em>)</strong></dt>
	<dd>The parser is created by a call to the function <strong>lxp.new</strong>,
	which returns the created parser or raises a Lua error. It
	receives the callbacks table and optionally the parser <a href="#separator">
	separator character</a> used in the namespace expanded element names.
	If <em>merge_character_data</em> is false then LuaExpat will not combine multiple
	CharacterData calls into one. For more info on this behaviour see CharacterData below.</dd>
</dl>

<h4>Methods</h4>

<dl class="reference">
	<dt><strong>parser:close()</strong></dt>
	<dd>Closes the parser, freeing all memory used by it. A call to
	parser:close() without a previous call to parser:parse() could
	result in an error. Returns the parser object on success.</dd>

	<dt><strong>parser:getbase()</strong></dt>
	<dd>Returns the base for resolving relative URIs.</dd>

	<dt><strong>parser:getcallbacks()</strong></dt>
	<dd>Returns the callbacks table.</dd>

	<dt><strong>parser:parse(s)</strong></dt>
	<dd>Parse some more of the document. The string <em>s</em> contains
	part (or perhaps all) of the document. When called without
	arguments the document is closed (but the parser still has to be
	closed).<br/>
	The function returns the parser object when the parser has been
	successful. If the parser finds an error it returns five
	results: <em>nil</em>, <em>msg</em>, <em>line</em>, <em>col</em>, and
	<em>pos</em>, which are the error message, the line number,
	column number and absolute position of the error in the XML document.
<pre class="example">
local cb = {}    -- table with callbacks
local doc = "&lt;root&gt;xml doc&lt;/root&gt;"
lxp.new(cb):setencoding("UTF-8"):parse(doc):parse():close()
</pre>
	</dd>

	<dt><strong>parser:pos()</strong></dt>
	<dd>Returns three results: the current parsing line, column, and
	absolute position.</dd>

	<dt><strong>parser:getcurrentbytecount()</strong></dt>
	<dd>Return the number of bytes of input corresponding to the current
	event. This function can only be called inside a handler, in other
	contexts it will return 0. Do not use inside a CharacterData handler
	unless CharacterData merging has been disabled (see <em>lxp.new</em>).</dd>

	<dt><strong>parser:returnnstriplet(bool)</strong></dt>
	<dd>Instructs the parser to return namespaces in triplet (<em>true</em>), or
	only duo (<em>false</em>).
	Setting this must be done before calling <em>parse</em>, and will only
	have effect if the parser was created with a <em>separator</em>.
	Returns the parser object.</dd>

	<dt><strong>parser:setbase(base)</strong></dt>
	<dd>Sets the <em>base</em> to be used for resolving relative URIs in
	system identifiers. Returns the parser object on success.</dd>

	<dt><strong>parser:setblamaxamplification(max_amp)</strong></dt>
	<dd>Sets the <em>maximum amplification</em> (float) to be allowed. This
	protects against the Billion Laughs Attack. The
	<em>libexpat</em> default is 100. Returns the parser object on success.<br/>
	</dd>

	<dt><strong>parser:setblathreshold(threshold)</strong></dt>
	<dd>Sets the <em>threshold</em> (int, in bytes) after which the protection
	starts. This protects against the Billion Laughs Attack. The
	<em>libexpat</em> default is 8 MiB. Returns the parser object on success.<br/>
	</dd>

	<dt><strong>parser:setencoding(encoding)</strong></dt>
	<dd>Set the encoding to be used by the parser. There are four
	built-in encodings, passed as strings: "US-ASCII",
	"UTF-8", "UTF-16", and "ISO-8859-1". Returns the parser object on success.</dd>

	<dt><strong>parser:stop()</strong></dt>
	<dd>Abort the parser and prevent it from parsing any further
	through the data it was last passed. Use to halt parsing the
	document when an error is discovered inside a callback, for
	example. The parser object cannot accept more data after
	this call.</dd>
</dl>

<h4>Callbacks</h4>

<p>The Lua callbacks define the handlers of the parser events. The
use of a table in the parser constructor has some advantages over
the registration of callbacks, since there is no need for for the API
to provide a way to manipulate callbacks.</p>

<p>Another difference lies in the behaviour of the callbacks during
the parsing itself. The callback table contains references to the
functions that can be redefined at will. The only restriction is
that only the callbacks present in the table at creation time
will be called.</p>

<p>The callbacks table indices are named after the equivalent Expat
callbacks:<br />
<em>AttlistDecl</em>, <em>CharacterData</em>, <em>Comment</em>,
<em>Default</em>, <em>DefaultExpand</em>,
<em>ElementDecl</em>, <em>EndCdataSection</em>,
<em>EndElement</em>, <em>EndNamespaceDecl</em>, <em>EntityDecl</em>,
<em>ExternalEntityRef</em>, <em>NotStandalone</em>,
<em>NotationDecl</em>, <em>ProcessingInstruction</em>, <em>SkippedEntity</em>,
<em>StartCdataSection</em>, <em>StartElement</em>,
<em>StartNamespaceDecl</em>, <em>UnparsedEntityDecl</em>,
<em>XmlDecl</em> and <em>StartDoctypeDecl</em>.</p>

<p>These indices can be references to functions with
specific signatures, as seen below. The parser constructor also
checks the presence of a field called <em>_nonstrict</em> in the
callbacks table. If <em>_nonstrict</em> is absent, only valid
callback names are accepted as indices in the table
(Defaultexpanded would be considered an error for example). If
<em>_nonstrict</em> is defined, any other fieldnames can be
used (even if not called at all).</p>

<p>The callbacks can optionally be defined as <code>false</code>,
acting thus as placeholders for future assignment of functions.</p>

<p>Every callback function receives as the first parameter the
calling parser itself, thus allowing the same functions to be used
for more than one parser for example.</p>

<dl class="reference">
	<dt><strong>callbacks.AttlistDecl = function(parser, elementName, attrName, attrType, default, required)</strong></dt>
	<dd>The Attlist declaration handler is called for each attribute. So
	a single Attlist declaration with multiple attributes declared will
	generate multiple calls to this handler. The <em>default</em> parameter
	may be <em>nil</em> in the case of the "#IMPLIED" or "#REQUIRED"
	keyword. The <em>required</em> parameter will be <em>true</em> and the
	<em>default</em> value will be <em>nil</em> in the case of "#REQUIRED".
	If <em>required</em> is <em>true</em> and <em>default</em> is non-<em>nil</em>,
	then this is a "#FIXED" default.
	</dd>

	<dt><strong>callbacks.CharacterData = function(parser, string)</strong></dt>
	<dd>Called when the <em>parser</em> recognizes an XML CDATA <em>string</em>.
	Note that LuaExpat automatically combines multiple CharacterData events
	from Expat into a single call to this handler, unless <em>merge_character_data</em>
	is set to false when calling lxp.new().</dd>

	<dt><strong>callbacks.Comment = function(parser, string)</strong></dt>
	<dd>Called when the <em>parser</em> recognizes an XML comment
	<em>string</em>.</dd>

	<dt><strong>callbacks.Default = function(parser, string)</strong></dt>
	<dd>Called when the <em>parser</em> has a <em>string</em>
	corresponding to any characters in the document which wouldn't
	otherwise be handled. Using this handler has the side effect of
	turning off expansion of references to internally defined general
	entities. Instead these references are passed to the default
	handler.</dd>

	<dt><strong>callbacks.DefaultExpand = function(parser, string)</strong></dt>
	<dd>Called when the <em>parser</em> has a <em>string</em>
	corresponding to any characters in the document which wouldn't
	otherwise be handled. Using this handler doesn't affect expansion
	of internal entity references.</dd>

	<dt><strong>callbacks.ElementDecl = function(parser, name, type, quantifier, children)</strong></dt>
	<dd>Called when the <em>parser</em> detects an Element declaration in the DTD.
	The <em>type</em> parameter will be any of; <em>"EMPTY", "ANY", "MIXED", "NAME",
	"CHOICE",</em> or <em>"SEQUENCE"</em>.
	The <em>quantifier</em> parameter will be any of <em>"?", "*", "+",</em> or
	<em>nil</em>. The array <em>children</em> can be <em>nil</em> if there are no
	children. The child elements in the array will be objects with the following fields;
	<em>name, type, quantifier</em>, and <em>children</em>. Those fields will have
	the same values as the similarly named parameters for this call.
	</dd>

	<dt><strong>callbacks.EndCdataSection = function(parser)</strong></dt>
	<dd>Called when the <em>parser</em> detects the end of a CDATA
	section.</dd>

	<dt><strong>callbacks.EndDoctypeDecl = function(parser)</strong></dt>
	<dd>Called when the <em>parser</em> detects the end of the <em>DOCTYPE</em>
	declaration when the closing &gt; is encountered, but after processing any
	external subset.</dd>

	<dt><strong>callbacks.EndElement = function(parser, elementName)</strong></dt>
	<dd>Called when the <em>parser</em> detects the ending of an XML
	element with <em>elementName</em>.</dd>

	<dt><strong>callbacks.EndNamespaceDecl = function(parser, namespaceName)</strong></dt>
	<dd>Called when the <em>parser</em> detects the ending of an XML
	namespace with <em>namespaceName</em>. The handling of the end
	namespace is done after the handling of the end tag for the element
	the namespace is associated with.</dd>

	<dt><strong>callbacks.EntityDecl = function(parser, entityName, is_parameter, value, base, systemId, publicId, notationName)</strong></dt>
	<dd>This is called for entity declarations. The <em>is_parameter</em>
	argument will be <em>true</em> if the entity is a parameter entity,
	<em>false</em> otherwise.

	For internal entities (&lt;!ENTITY foo "bar"&gt;), <em>value</em> will
	be a string and <em>systemId</em>, <em>publicID</em>, and <em>notationName</em>
	will be <em>nil</em>.
	The <em>value</em> string can be <em>nil</em>, as well as an empty string, which is
	a valid value.

	For external entities, <em>value</em> will be <em>nil</em> and <em>systemId</em> will be
	a string. The <em>publicId</em> argument will be <em>nil</em> unless a public
	identifier was provided. The <em>notationName</em> argument will have a
	string value only for unparsed entity declarations.
	</dd>

	<dt><strong>callbacks.ExternalEntityRef = function(parser, subparser, base, systemId, publicId)</strong></dt>
	<dd>Called when the <em>parser</em> detects an external entity
	reference.<br/><br/>
	The <em>subparser</em> is a LuaExpat parser created with the
	same callbacks and Expat context as the <em>parser</em> and should
	be used to parse the external entity.<br/>
	The <em>base</em> parameter is the base to use for relative
	system identifiers. It is set by parser:setbase and may be nil.<br/>
	The <em>systemId</em> parameter is the system identifier
	specified in the entity declaration and is never nil.<br/>
	The <em>publicId</em> parameter is the public id given in the
	entity declaration and may be nil.</dd>

	<dt><strong>callbacks.NotStandalone = function(parser)</strong></dt>
	<dd>This callback expects a return value, if the callback returns
	a falsy value, parsing will be aborted. The callback will be
	called when the <em>parser</em> detects that the document is not
	"standalone". This happens when there is an external subset or a
	reference to a parameter entity, but the document does not have standalone set
	to "yes" in an XML declaration.</dd>

	<dt><strong>callbacks.NotationDecl = function(parser, notationName, base, systemId, publicId)</strong></dt>
	<dd>Called when the <em>parser</em> detects XML notation
	declarations with <em>notationName</em><br/>
	The <em>base</em> parameter is the base to use for relative
	system identifiers. It is set by parser:setbase and may be nil.<br/>
	The <em>systemId</em> parameter is the system identifier
	specified in the entity declaration and is never nil.<br/>
	The <em>publicId</em> parameter is the public id given in the
	entity declaration and may be nil.</dd>

	<dt><strong>callbacks.ProcessingInstruction = function(parser, target, data)</strong></dt>
	<dd>Called when the <em>parser</em> detects XML processing
	instructions. The <em>target</em> is the first word in the
	processing instruction. The <em>data</em> is the rest of the
	characters in it after skipping all whitespace after the initial
	word.</dd>

	<dt><strong>callbacks.SkippedEntity = function(parser, name, isParameter)</strong></dt>
	<dd>This is called in two situations.
	One; An entity reference is encountered for which no declaration has been
	read *and* this is not an error.
	Two; An internal entity reference is read, but not expanded, because
	the <em>Default</em> handler has been called.
	<strong>Note:</strong> skipped parameter entities in declarations and skipped
	general entities in attribute values cannot be reported, because the event
	would be out of sync with the reporting of the declarations or attribute values.
	</dd>

	<dt><strong>callbacks.StartCdataSection = function(parser)</strong></dt>
	<dd>Called when the <em>parser</em> detects the beginning of an XML
	CDATA section.</dd>

	<dt><strong>callbacks.StartDoctypeDecl = function(parser, name, sysid, pubid, has_internal_subset)</strong></dt>
	<dd>Called when the <em>parser</em> detects the beginning of an XML
	DTD (DOCTYPE) section. These precede the XML root element and take
	the form:
<pre class="example">
&lt;!DOCTYPE root_elem PUBLIC "example"&gt;
</pre>
	</dd>

	<dt><strong>callbacks.StartElement = function(parser, elementName, attributes)</strong></dt>
	<dd>Called when the <em>parser</em> detects the beginning of an XML
	element with <em>elementName</em>.<br/>
	The <em>attributes</em> parameter is a Lua table with all the
	element attribute names and values. The table contains an entry for
	every attribute in the element start tag and entries for the
	default attributes for that element.<br/>
	The attributes are listed by name (including the inherited ones)
	and by position (inherited attributes are not considered in the
	position list).<br/>
	As an example if the <em>book</em> element has attributes
	<em>author</em>, <em>title</em> and an optional <em>format</em>
	attribute (with "printed" as default value),
<pre class="example">
&lt;book author="Ierusalimschy, Roberto" title="Programming in Lua"&gt;
</pre>
    would be represented as<br/>
<pre class="example">
{[1] = "Ierusalimschy, Roberto",
 [2] = "Programming in Lua",
 author = "Ierusalimschy, Roberto",
 format = "printed",
 title = "Programming in Lua"}
</pre></dd>

	<dt><strong>callbacks.StartNamespaceDecl = function(parser, namespaceName, namespaceUri)</strong></dt>
	<dd>Called when the <em>parser</em> detects an XML namespace
	declaration. The <em>namespaceName</em> can be nil. Namespace declarations
	occur inside start tags, but the StartNamespaceDecl handler is
	called before the StartElement handler for each namespace declared
	in that start tag.</dd>

	<dt><strong>callbacks.UnparsedEntityDecl = function(parser, entityName, base, systemId, publicId, notationName)</strong></dt>
	<dd><strong>Obsolete: use EntityDecl instead</strong>. Called when the <em>parser</em> receives declarations of
	unparsed entities. These are entity declarations that have a
	notation (NDATA) field.<br/>
	As an example, in the chunk
<pre class="example">
&lt;!ENTITY logo SYSTEM "images/logo.gif" NDATA gif&gt;
</pre>
	<em>entityName</em> would be "logo", <em>systemId</em> would be
	"images/logo.gif" and <em>notationName</em> would be "gif".
	For this example the <em>publicId</em> parameter would be nil.
	The <em>base</em> parameter would be whatever has been set with
	<code>parser:setbase</code>. If not set, it would be nil.</dd>

	<dt><strong>callbacks.XmlDecl = function(parser, version, encoding, standalone)</strong></dt>
	<dd>Called when the <em>parser</em> encounters an XML document declaration (these are
	optional, and valid only at the start of the document). The callback receives
	the declared XML <em>version</em> and document <em>encoding</em> as strings, and
	<em>standalone</em> as a boolean (or <em>nil</em> if it was not specified).</dd>
</dl>

<h4><a name="separator"></a>The separator character</h4>

<p>The optional separator character in the parser constructor
defines the character used in the namespace expanded element names.
The separator character is optional (if not defined the parser will
not handle namespaces) but if defined it must be different from
the character '\0'.</p>

</div> <!-- id="content" -->

</div> <!-- id="main" -->

</div> <!-- id="container" -->

</body>
</html>