1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377
|
<!DOCTYPE html>
<html>
<head>
<title>XMLS: Common Lisp XML Parser</title>
</head>
<body>
<h1>XMLS</h1>
<center>Manual For Version 3</center>
<h2>Summary</h2>
<p>
Xmls is a small, simple, non-validating xml parser for Common Lisp. It's
designed to be a self-contained, easily embedded parser that recognizes a useful
subset of the XML spec. It provides a simple mapping from xml to lisp
structures or s-expressions and back.
</p>
<p>Since XMLS was first released it has gained some additional
complications/features. In particular:</p>
<ul>
<li><strong>Now XMLS by default parses XML documents into lisp structures,
rather than s-expressions.</strong> This makes accessing the structures
simpler and more reliable. See <a href="#Compatibility" >section on backward compatibility</a>.</li>
<li>We have added clearly named accessors to further improve
extraction of information from parsed XML.</li>
<li>Thanks to Max Rottenkolber, we now have the affiliated library,
<a href="#octets"><code>xmls/octets</code></a> that will open streams for the XMLS parser,
processing any content-type declarations in the process.</li>
</ul>
<h2>Features</h2>
<ul>
<li>
Free (BSD license).
</li>
<li>
Understands enough of the xml spec to parse many common documents, including
those occurring in common internet protocols like xml-rpc, webdav, and BEEP.
Parses 85 out of the 98 valid documents in the oasis parser compliance suite.
</li>
<li>
Small and easily embedded. The entire parser is contained in one
file and it's currently less than 600 lines of code. Xmls is written in
pure lisp and requires no external parsing tools or foreign libraries.
</li>
<li>
Supports xml namespaces.
</li>
<li>
Threadsafe.
</li>
<li>
Serializes s-expr list structures back to xml as well as parsing xml.
</li>
</ul>
<h2>Limitations</h2>
<ul>
<li>
Parses entire document into memory and consequently can't handle large
documents.
</li>
<li>
No detailed error reporting.
</li>
<li>
Hand-built LR parser, meaning the parser structure is a little hard
to understand, and can be hard to modify. Use of CL-YACC or similar
might be a preferable route for a rewrite.
</li>
</ul>
<h2>XML Representation</h2>
<p>
Parsed xml is represented as a nested lisp structure, unlike in the
original version, where it was a lisp list. The s-expression
representation is still maintained, and there are <a href="#translators">functions to
translate to and from this notation</a>.
</p>
<h3>XML representation as lisp structures</h3>
<p>In the structure representation, a node, corresponding to an XML
element, is defined as follows:
</p>
<pre>(defstruct (node (:constructor %make-node))
name
ns
attrs
children)</pre>
<p>
Xmls also includes a helper function, make-node for creating xml nodes
of this form:
</p>
<pre>
(make-node &key name ns attrs children)
</pre>
<p>
Xmls provides the corresponding accessor functions node-name, node-ns
node-attrs, and node-children.
</p>
<h3>XML representation as s-expressions</h3>
<p>In the s-expression representation, a node is represented as follows:</p>
<pre>
(name (attributes) children*)
</pre>
<p>
A name is either a simple string, if the element does not belong to a namespace,
or a list of (name namespace-url) if the element does belong to a namespace.
</p>
<p>
Attributes are stored as (name value) lists.
</p>
<p>
Children are stored as a list of either element nodes or text nodes.
</p>
<p>
For example, the following xml document:
</p>
<pre>
<?xml version="1.0"?>
<!-- test document -->
<book title='The Cyberiad'>
<!-- comment in here -->
<author xmlns='http://authors'>Stanislaw Lem</author>
<info:subject xmlns:info='http://bookinfo' rank='1'>&quot;Cybernetic Fables&quot;</info:subject>
</book>
</pre>
Would parse as:
<pre>
("book" (("title" "The Cyberiad"))
(("author" . "http://authors") NIL "Stanislaw Lem")
(("subject" . "http://bookinfo") (("rank" "1")) "\"Cybernetic Fables\""))
</pre>
<a name="Compatibility"><h3>Backward Compatibility</h3></a>
<p>To detect whether in this version of XMLS the return value of <code>PARSE</code> will be a
list or a structure, check for the feature <code>:XMLS-NODES-ARE-STRUCTS</code>.
<p>For old code that wants XML parsed into lists, instead of
structures, you may replace calls to <code>(parse str)</code> with
<code>(node->nodelist (parse str))</code>.</p>
<p>For greater convenience, we offer <code>PARSE-TO-LIST</code>, which
performs the same function.</p>
<h2>Usage</h2>
<p>
The interface is straightforward. The two main functions are
<code>PARSE</code> and <code>TOXML</code>.
</p>
<pre>
(parse source &key (compress-whitespace t) (quash-errors t)
</pre>
<p>
Parse accepts either a string or an input stream and attempts to parse the xml
document contained therein. It will return the s-expr parse tree if it's
successful or nil if parsing fails.</p>
<p>
If <code>COMPRESS-WHITESPACE</code> is non-<code>NIL</code>, content nodes will be trimmed of whitespace and
empty whitespace strings between nodes will be discarded.
</p>
<pre>
(parse-to-list source (&rest args))
</pre>
<p>Functions as <code>PARSE</code>, but returns a list representation
of the XML document, instead of a structure.</p>
<pre>
(write-prologue xml-decl doctype stream)
</pre>
<p>
write-prologue writes the leading
<code><?xml ... ?></code> and <code><!DOCTYPE ... ></code>
elements to <code>stream</code>.
<code>xml-decl</code> is an alist of attribute name value pairs.
Valid xml-decl attributes per the xml spec are "version", "encoding",
and "standalone", though write-prologue does not verify this.
<code>doctype</code> is a string containing the document type definition.
</p>
<pre>
(write-prolog xml-decl doctype stream)
</pre>
<p>
U.S. spelling alternative to <code>write-prologue</code>.
</p>
<pre>
(write-xml xml stream &key (indent nil))
</pre>
<p>
write-xml accepts a lisp list in the format described above and writes the
equivalent xml string to stream. Currently, if nodes use namespaces xmls will not
assign namespaces prefixes but will explicitly assign the namespace to each node. This
will be changed in a later release.
Xmls will indent the generated xml output if indent is non-nil.
</p>
<pre>
(toxml node &key (indent nil))
</pre>
<p>
<code>TOXML</code> is a convenience wrapper around write-xml that returns the in a newly
allocated string.
</p>
<a name="translators"><h3>Translating to and from s-expressions</h3></a>
<p>XMLS provides two exported functions to translate between the CL
structure representation of the XML tree and the s-expression
representation:</p>
<dl>
<dt><code>node->nodelist</code></dt>
<dd>Translate the structure representation into s-expressions.</dd>
<dt><code>nodelist->nodes</code></dt>
<dd>Translate the s-expression representation of an XMLS parse tree
into lisp structures.</dd>
</dl>
<h3>Helper functions</h3>
<p>These are intended to allow programmers to avoid direct manipulation of the
XMLS element representation. If you use these, your code should be easier to
read and you will avoid problems if there is a change in internal
representation (such changes would be hard to even find, much less correct, if
using the lists directly).</p>
<dl>
<dt><code>
make-xmlrep (tag &key attribs children)
</code>
</dt>
<dd>Constructor function.</dd>
<dt>
<code>xmlrep-add-child! (xmlrep child)</code>
</dt>
<dd>Add a new child node to the XMLREP node.</dd>
<dt>
<code>xmlrep-tag (xmlrep)</code>
</dt>
<dd>Extract the tag from XMLREP.</dd>
<dt>
<code>xmlrep-tagmatch (tag treenode)</code>
</dt>
<dd>Returns true if TAG is the tag of TREENODE. Match is
case <em>insensitive</em> (quite possibly this is the Wrong Thing). </dd>
<dt>
<code>xmlrep-attribs (xmlrep)</code>
</dt>
<dd>Extract the attributes from an XMLREP node.</dd>
<dt>
<code>xmlrep-children (xmlrep)</code>
</dt>
<dd>Extract the children from an XMLREP node.</dd>
<dt>
<code>xmlrep-find-child-tags (tag treenode)</code>
</dt>
<dd>
Return all of the (direct) children of TREENODE whose tags are TAG.
Matching done by <a href="#xmlrep-tagmatch"><code>xmlrep-tagmatch</code></a>.
</dd>
<dt>
<code>xmlrep-find-child-tag (tag treenode &optional (if-unfound :error))</code>
</dt>
<dd>
Find a <em>single</em> child of TREENODE with TAG. Returns an error
if there is more or less than one such child.
</dd>
<dt>
<code>xmlrep-string-child (treenode &optional (if-unfound :error))</code>
</dt>
<dd>
Returns the <em>single</em> string-valued child of TREENODE.
If there is more than one child, or if a single child is not
a simple value, returns IF-UNFOUND, which defaults to :ERROR.
</dd>
<dt>
<code>xmlrep-integer-child (treenode)</code>
</dt>
<dd>
Find the <em>single</em> child of TREENODE whose value is a string that
can be parsed into an integer. Returns an
error if there is more than one child, or if a single child is not
appropriately valued.
</dd>
<dt>
<code>xmlrep-attrib-value (attrib treenode &optional (if-undefined :error))</code>
</dt>
<dd>
Find the value of ATTRIB, a string, in TREENODE.
if there is no ATTRIB, will return the value of IF-UNDEFINED,
which defaults to :ERROR.
</dd>
<dt>
<code>xmlrep-boolean-attrib-value (attrib treenode &optional (if-undefined :error))</code>
</dt>
<dd>
Find the value of ATTRIB, a string, in TREENODE.
The value should be either "true" or "false". The
function will return T or NIL, accordingly. If there is no ATTRIB,
will return the value of IF-UNDEFINED, which defaults to :ERROR.
</dd>
</dl>
<a name="octets"><h2>XMLS/Octets</h2></a>
<p>
XMLS itself simply processes strings or streams. This means that it
does not provide native support for handling character encodings, as
declared in the XML headers. The system <code>xmls/octets</code>,
which depends on <code>xmls</code> provides that support with the
exported function <code>make-xml-stream</code>, which takes an
octet-stream as argument, processes its header, choosing the
appropriate character encoding, and then returns a stream suitable for
passing to <code>xmls:parse</code>.</p>
<p>Probably <code>make-xml-stream</code> should be made generic, and support
arguments of other types (e.g., strings interpreted as filenames,
pathnames, etc.).</p>
<h2>Installation</h2>
<p>
xmls can be installed as an asdf system. An asdf
system definition is provided with the distribution.
</p>
<p>Previous versions of XMLS were single files, and could be installed simply by
loading the file xmls.lisp. This option is no longer supported.</p>
<h2>Contact Information</h2>
<p>
Please contact Robert Goldman, rpgoldman AT sift.net with any
questions or bug reports.
</p>
</body>
</html>
|