File: xml.html

package info (click to toggle)
tla 1.3.5%2Bdfsg1-2
  • links: PTS
  • area: main
  • in suites: bullseye, buster, stretch
  • size: 22,292 kB
  • ctags: 11,952
  • sloc: ansic: 149,771; sh: 16,009; xml: 2,689; lisp: 1,927; makefile: 1,064; cpp: 363; tcl: 230; awk: 48; sed: 25
file content (90 lines) | stat: -rw-r--r-- 10,734 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Parsing XML</title><link rel="stylesheet" href="../manual.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.65.1"><link rel="home" href="index.html" title="neon HTTP/WebDAV client library"><link rel="up" href="api.html" title="Chapter2.The neon C language interface"><link rel="previous" href="api.html" title="Chapter2.The neon C language interface"><link rel="next" href="ref.html" title="neon API reference"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Parsing XML</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="api.html">Prev</a></td><th width="60%" align="center">Chapter2.The neon C language interface</th><td width="20%" align="right"><a accesskey="n" href="ref.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="xml"></a>Parsing XML</h2></div></div><div></div></div><p>The neon XML interface is exposed by the
  <tt class="filename">ne_xml.h</tt> header file.  This interface gives a
  wrapper around the standard <a href="http://www.saxproject.org/" target="_top">SAX</a> API used by XML
  parsers, with an additional abstraction, <i class="firstterm">stacked SAX
  handlers</i>, and also giving consistent <a href="http://www.w3.org/TR/REC-xml-names" target="_top">XML Namespace</a> support.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="xml-sax"></a>Introduction to SAX</h3></div></div><div></div></div><p>A SAX-based parser works by emitting a sequence of
  <i class="firstterm">events</i> to reflect the tokens being parsed
  from the XML document.  For example, parsing the following document
  fragment:

</p><pre class="programlisting">
&lt;hello&gt;world&lt;/hello&gt;
</pre><p>

  results in the following events:

  </p><div class="orderedlist"><ol type="1"><li><span class="emphasis"><em>start-element</em></span> "hello"</li><li><span class="emphasis"><em>character-data</em></span> "world"</li><li><span class="emphasis"><em>end-element</em></span> "hello"</li></ol></div><p>

  This example demonstrates the three event types used used in the
  subset of SAX exposed by the neon XML interface: <span class="emphasis"><em>start-element</em></span>,
  <span class="emphasis"><em>character-data</em></span> and <span class="emphasis"><em>end-element</em></span>.  In a C API, an &#8220;<span class="quote">event</span>&#8221; is
  implemented as a function callback; three callback types are used in
  neon, one for each type of event.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="xml-stacked"></a>Stacked SAX handlers</h3></div></div><div></div></div><p>WebDAV property values are represented as fragments of XML,
  transmitted as parts of larger XML documents over HTTP (notably in
  the body of the response to a <tt class="literal">PROPFIND</tt> request).
  When neon parses such documents, the SAX events generated for
  these property value fragments may need to be handled by the
  application, since neon has no knowledge of the structure of
  properties used by the application.</p><p>To solve this problem<sup>[<a name="id3018824" href="#ftn.id3018824">1</a>]</sup> the neon XML interface introduces
  the concept of a <i class="firstterm">SAX handler</i>.  A SAX handler
  comprises a <span class="emphasis"><em>start-element</em></span>, <span class="emphasis"><em>character-data</em></span> and <span class="emphasis"><em>end-element</em></span> callback; the
  <span class="emphasis"><em>start-element</em></span> callback being defined such that each handler may
  <span class="emphasis"><em>accept</em></span> or <span class="emphasis"><em>decline</em></span> the
  <span class="emphasis"><em>start-element</em></span> event.  Handlers are composed into a <i class="firstterm">handler
  stack</i> before parsing a document.  When a new <span class="emphasis"><em>start-element</em></span>
  event is generated by the XML parser, neon invokes each <span class="emphasis"><em>start-element</em></span>
  callback in the handler stack in turn until one accepts the event.
  The handler which accepts the event will then be subsequently be
  passed <span class="emphasis"><em>character-data</em></span> events if the element contains character data,
  followed by an <span class="emphasis"><em>end-element</em></span> event when the element is closed.  If no
  handler in the stack accepts a <span class="emphasis"><em>start-element</em></span> event, the branch of the
  tree is ignored.</p><p>To illustrate, given a handler A, which accepts the
  <tt class="literal">cat</tt> and <tt class="literal">age</tt> elements, and a
  handler B, which accepts the <tt class="literal">name</tt> element, the
  following document:

</p><div class="example"><a name="xml-example"></a><p class="title"><b>Example2.1.An example XML document</b></p><pre class="programlisting">
&lt;cat&gt;
  &lt;age&gt;3&lt;/age&gt;    
  &lt;name&gt;Bob&lt;/name&gt;
&lt;/cat&gt;
</pre></div><p>

  would be parsed as follows:
  
  </p><div class="orderedlist"><ol type="1"><li>A <span class="emphasis"><em>start-element</em></span> "cat" &#8594; <span class="emphasis"><em>accept</em></span></li><li>A <span class="emphasis"><em>start-element</em></span> "age" &#8594; <span class="emphasis"><em>accept</em></span></li><li>A <span class="emphasis"><em>character-data</em></span> "3"</li><li>A <span class="emphasis"><em>end-element</em></span> "age"</li><li>A <span class="emphasis"><em>start-element</em></span> "name" &#8594; <span class="emphasis"><em>decline</em></span></li><li>B <span class="emphasis"><em>start-element</em></span> "name" &#8594; <span class="emphasis"><em>accept</em></span></li><li>B <span class="emphasis"><em>character-data</em></span> "Bob"</li><li>B <span class="emphasis"><em>end-element</em></span> "name"</li><li>A <span class="emphasis"><em>end-element</em></span> "cat"</li></ol></div><p>The search for a handler which will accept a <span class="emphasis"><em>start-element</em></span> event
  begins at the handler of the parent element and continues toward the
  top of the stack.  For the root element, it begins at the base of
  the stack.  In the above example, handler A is at the base, and
  handler B at the top; if the <tt class="literal">name</tt> element had any
  children, only B's <span class="emphasis"><em>start-element</em></span> would be invoked to accept
  them.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="xml-state"></a>Maintaining state</h3></div></div><div></div></div><p>To facilitate communication between independent handlers, a
  <i class="firstterm">state integer</i> is associated with each element
  being parsed.  This integer is returned by <span class="emphasis"><em>start-element</em></span> callback and
  is passed to the subsequent <span class="emphasis"><em>character-data</em></span> and <span class="emphasis"><em>end-element</em></span> callbacks
  associated with the element.  The state integer of the parent
  element is also passed to each <span class="emphasis"><em>start-element</em></span> callback, the value zero
  used for the root element (which by definition has no
  parent).</p><p>To further extend <a href="xml.html#xml-example" title="Example2.1.An example XML document">Example2.1, &#8220;An example XML document&#8221;</a>: if handler A
  defines that the state of the root element <tt class="sgmltag-element">cat</tt>
  will be <tt class="literal">42</tt>, the event trace would be as
  follows:

  </p><div class="orderedlist"><ol type="1"><li>A <span class="emphasis"><em>start-element</em></span> (parent = 0, "cat") &#8594;
      <span class="emphasis"><em>accept</em></span>, state = 42
      </li><li>A <span class="emphasis"><em>start-element</em></span> (parent = 42, "age") &#8594; 
      <span class="emphasis"><em>accept</em></span>, state = 50
      </li><li>A <span class="emphasis"><em>character-data</em></span> (state = 50, "3")</li><li>A <span class="emphasis"><em>end-element</em></span> (state = 50, "age")</li><li>A <span class="emphasis"><em>start-element</em></span> (parent = 42, "name") &#8594; 
      <span class="emphasis"><em>decline</em></span></li><li>B <span class="emphasis"><em>start-element</em></span> (parent = 42, "name") &#8594;
      <span class="emphasis"><em>accept</em></span>, state = 99</li><li>B <span class="emphasis"><em>character-data</em></span> (state = 99, "Bob")</li><li>B <span class="emphasis"><em>end-element</em></span> (state = 99, "name")</li><li>A <span class="emphasis"><em>end-element</em></span> (state = 42, "cat")</li></ol></div><p>To avoid collisions between state integers used by different
  handlers, the interface definition of any handler includes the range
  of integers it will use.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="xml-ns"></a>XML namespaces</h3></div></div><div></div></div><p>To support XML namespaces, every element name is represented
  as a <span class="emphasis"><em>(namespace, name)</em></span> pair.  The <span class="emphasis"><em>start-element</em></span>
  and <span class="emphasis"><em>end-element</em></span> callbacks are passed namespace and name strings
  accordingly.  If an element in the XML document has no declared
  namespace, the namespace given will be the empty string,
  <tt class="literal">""</tt>.</p></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a name="ftn.id3018824" href="#id3018824">1</a>] </sup>This
  &#8220;<span class="quote">problem</span>&#8221; only needs solving because the SAX interface
  is so inflexible when implemented as C function callbacks; a better
  approach would be to use an XML parser interface which is not based
  on callbacks.</p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="api.html">Prev</a></td><td width="20%" align="center"><a accesskey="u" href="api.html">Up</a></td><td width="40%" align="right"><a accesskey="n" href="ref.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter2.The neon C language interface</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">neon API reference</td></tr></table></div></body></html>