1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206
|
<!-- neon XML interface -*- text -*- -->
<sect1 id="xml">
<title>Parsing XML</title>
<para>The &neon; XML interface is exposed by the
<filename>ne_xml.h</filename> header file. This interface gives a
wrapper around the standard <ulink
url="http://www.saxproject.org/">SAX</ulink> API used by XML
parsers, with an additional abstraction, <firstterm>stacked SAX
handlers</firstterm>, and also giving consistent <ulink
url="http://www.w3.org/TR/REC-xml-names">XML Namespace</ulink> support.</para>
<sect2 id="xml-sax">
<title>Introduction to SAX</title>
<para>A SAX-based parser works by emitting a sequence of
<firstterm>events</firstterm> to reflect the tokens being parsed
from the XML document. For example, parsing the following document
fragment:
<programlisting><![CDATA[
<hello>world</hello>
]]></programlisting>
results in the following events:
<orderedlist>
<listitem>
<simpara>&startelm; "hello"</simpara>
</listitem>
<listitem>
<simpara>&cdata; "world"</simpara>
</listitem>
<listitem>
<simpara>&endelm; "hello"</simpara>
</listitem>
</orderedlist>
This example demonstrates the three event types used used in the
subset of SAX exposed by the &neon; XML interface: &startelm;,
&cdata; and &endelm;. In a C API, an <quote>event</quote> is
implemented as a function callback; three callback types are used in
&neon;, one for each type of event.</para>
</sect2>
<sect2 id="xml-stacked">
<title>Stacked SAX handlers</title>
<para>WebDAV property values are represented as fragments of XML,
transmitted as parts of larger XML documents over HTTP (notably in
the body of the response to a <literal>PROPFIND</literal> request).
When &neon; parses such documents, the SAX events generated for
these property value fragments may need to be handled by the
application, since &neon; has no knowledge of the structure of
properties used by the application.</para>
<para>To solve this problem<footnote id="foot.xml.sax"><para>This
<quote>problem</quote> only needs solving because the SAX interface
is so inflexible when implemented as C function callbacks; a better
approach would be to use an XML parser interface which is not based
on callbacks.</para></footnote> the &neon; XML interface introduces
the concept of a <firstterm>SAX handler</firstterm>. A SAX handler
comprises a &startelm;, &cdata; and &endelm; callback; the
&startelm; callback being defined such that each handler may
<emphasis>accept</emphasis> or <emphasis>decline</emphasis> the
&startelm; event. Handlers are composed into a <firstterm>handler
stack</firstterm> before parsing a document. When a new &startelm;
event is generated by the XML parser, &neon; invokes each &startelm;
callback in the handler stack in turn until one accepts the event.
The handler which accepts the event will then be subsequently be
passed &cdata; events if the element contains character data,
followed by an &endelm; event when the element is closed. If no
handler in the stack accepts a &startelm; event, the branch of the
tree is ignored.</para>
<para>To illustrate, given a handler A, which accepts the
<literal>cat</literal> and <literal>age</literal> elements, and a
handler B, which accepts the <literal>name</literal> element, the
following document:
<example id="xml-example">
<title>An example XML document</title>
<programlisting><![CDATA[
<cat>
<age>3</age>
<name>Bob</name>
</cat>
]]></programlisting></example>
would be parsed as follows:
<orderedlist>
<listitem>
<simpara>A &startelm; "cat" → <emphasis>accept</emphasis></simpara>
</listitem>
<listitem>
<simpara>A &startelm; "age" → <emphasis>accept</emphasis></simpara>
</listitem>
<listitem>
<simpara>A &cdata; "3"</simpara>
</listitem>
<listitem>
<simpara>A &endelm; "age"</simpara>
</listitem>
<listitem>
<simpara>A &startelm; "name" → <emphasis>decline</emphasis></simpara>
</listitem>
<listitem>
<simpara>B &startelm; "name" → <emphasis>accept</emphasis></simpara>
</listitem>
<listitem>
<simpara>B &cdata; "Bob"</simpara>
</listitem>
<listitem>
<simpara>B &endelm; "name"</simpara>
</listitem>
<listitem>
<simpara>A &endelm; "cat"</simpara>
</listitem>
</orderedlist></para>
<para>The search for a handler which will accept a &startelm; event
begins at the handler of the parent element and continues toward the
top of the stack. For the root element, it begins at the base of
the stack. In the above example, handler A is at the base, and
handler B at the top; if the <literal>name</literal> element had any
children, only B's &startelm; would be invoked to accept
them.</para>
</sect2>
<sect2 id="xml-state">
<title>Maintaining state</title>
<para>To facilitate communication between independent handlers, a
<firstterm>state integer</firstterm> is associated with each element
being parsed. This integer is returned by &startelm; callback and
is passed to the subsequent &cdata; and &endelm; callbacks
associated with the element. The state integer of the parent
element is also passed to each &startelm; callback, the value zero
used for the root element (which by definition has no
parent).</para>
<para>To further extend <xref linkend="xml-example"/>: if handler A
defines that the state of the root element <sgmltag>cat</sgmltag>
will be <literal>42</literal>, the event trace would be as
follows:
<orderedlist>
<listitem>
<simpara>A &startelm; (parent = 0, "cat") →
<emphasis>accept</emphasis>, state = 42
</simpara>
</listitem>
<listitem>
<simpara>A &startelm; (parent = 42, "age") →
<emphasis>accept</emphasis>, state = 50
</simpara>
</listitem>
<listitem>
<simpara>A &cdata; (state = 50, "3")</simpara>
</listitem>
<listitem>
<simpara>A &endelm; (state = 50, "age")</simpara>
</listitem>
<listitem>
<simpara>A &startelm; (parent = 42, "name") →
<emphasis>decline</emphasis></simpara>
</listitem>
<listitem>
<simpara>B &startelm; (parent = 42, "name") →
<emphasis>accept</emphasis>, state = 99</simpara>
</listitem>
<listitem>
<simpara>B &cdata; (state = 99, "Bob")</simpara>
</listitem>
<listitem>
<simpara>B &endelm; (state = 99, "name")</simpara>
</listitem>
<listitem>
<simpara>A &endelm; (state = 42, "cat")</simpara>
</listitem>
</orderedlist></para>
<para>To avoid collisions between state integers used by different
handlers, the interface definition of any handler includes the range
of integers it will use.</para>
</sect2>
<sect2 id="xml-ns">
<title>XML namespaces</title>
<para>To support XML namespaces, every element name is represented
as a <emphasis>(namespace, name)</emphasis> pair. The &startelm;
and &endelm; callbacks are passed namespace and name strings
accordingly. If an element in the XML document has no declared
namespace, the namespace given will be the empty string,
<literal>""</literal>.</para>
</sect2>
</sect1>
|