1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
|
.. _The_DOM_module:
**************
The DOM module
**************
DOM is another standard associated with XML, in which the XML stream is
represented as a tree in memory. This tree can be manipulated at will, to add
new nodes, remove existing nodes, change attributes...
Since it contains the whole XML information, it can then in turn be dumped to a
stream.
As an example, most modern web browsers provide a DOM interface to the document
currently loaded in the browser. Using javascript, one can thus modify
dynamically the document. The calls to do so are similar to the ones provided
by XML/Ada for manipulating a DOM tree, and all are defined in the DOM
standard.
The W3C committee (`http://www.w3c.org <http://www.w3c.org>`_) has defined
several versions of the DOM, each building on the previous one and adding
several enhancements.
XML/Ada currently supports the second revision of DOM (DOM 2.0), which mostly
adds namespaces over the first revision. The third revision is not supported at
this point, and it adds support for loading and saving XML streams in a
standardized fashion.
Although it doesn't support DOM 3.0, XML/Ada provides subprograms for doing
similar things.
Only the Core module of the DOM standard is currently implemented, other
modules will follow.
Note that the :file:`encodings.ads` file specifies the encoding to use to store
the tree in memory. Full compatibility with the XML standard requires that this
be UTF16, however, it is generally much more memory-efficient for European
languages to use UTF8. You can freely change this and recompile.
.. _Using_DOM:
Using DOM
=========
In XML/Ada, the DOM tree is built through a special implementation of a
SAX parser, provided in the `DOM.Readers` package.
Using DOM to read an XML document is similar to using SAX: one must set up an
input stream, then parse the document and get the tree. This is done with a
code similar to the following:
.. literalinclude:: dom/domexample.adb
:language: ada
:linenos:
This code is almost exactly the same as the code that was used when
demonstrating the use of SAX (:ref:`Using_SAX`).
The main two differences are:
* We no longer need to define our own XML reader, and we simply use the
one provided in `DOM.Readers`.
* We therefore do not add our own callbacks to react to the XML events.
Instead, the last instruction of the program gets a handle on the tree that
was created in memory.
The tree can now be manipulated to get access to the value stored.
If we want to implement the same thing we did for SAX, the code would look
like:
.. literalinclude:: dom/domexample2.adb
:language: ada
:linenos:
The code is much simpler than with SAX, since most of the work is done
internally by XML/Ada. In particular, for SAX we had to take into account the
fact that the textual contents of a node could be reported in several events.
For DOM, the tree is initially normalized, ie all text nodes are collapsed
together when possible.
This added simplicity has one drawback, which is the amount of memory required
to represent even a simple tree.
XML/Ada optimizes the memory necessary to represent a tree by sharing the
strings as much as possible (this is under control of constants at the
beginning of :file:`dom-core.ads`). Still, DOM requires a significant amount of
information to be kept for each node.
For really big XML streams, it might prove impossible to keep the whole tree in
memory, in which case ad hoc storage might be implemented through the use of a
SAX parser. The implementation of `dom-readers.adb` will prove helpful in
creating such a parser.
Editing DOM trees
=================
Once in memory, DOM trees can be manipulated through subprograms provided by
the DOM API.
Each of these subprograms is fully documented both in the Ada specs (the
:file:`*.ads` files) and in the DOM standard itself, which XML/Ada follows
fully.
One important note however is related to the use of strings. Various
subprograms allow you to set the textual content of a node, modify its
attributes... Such subprograms take a Byte_Sequence as a parameter.
This Byte_Sequence must always be encoded in the encoding defined in the
package `Sax.Encoding` (as described earlier, changing this package requires
recompiling XML/Ada). By default, this is UTF-8.
Therefore, if you need to set an attribute to a string encoded for
instance in iso-8859-15, you should use the subprogram
`Unicode.Encodings.Convert` to convert it appropriately.
The code would thus look as follows:
.. literalinclude:: dom/convert.adb
:language: ada
Printing DOM tress
==================
The standard DOM 2.0 does not define a common way to read DOM trees from
input sources, nor how to write them back to output sources. This was
added in later revision of the standard (DOM 3.0), which is not yet
supported by XML/Ada.
However, the package :file:`DOM.Core.Nodes` provides a `Write`
procedure that can be used for that purpose. It outputs a given DOM tree
to an Ada stream. This stream can then be connected to a standard file
on the disk, to a socket, or be used to transform the tree into a string
in memory.
An example is provided in the XML/Ada distribution, called
:file:`tests/dom/tostring.adb` which shows how you can create a stream to
convert the tree in memory, without going through a file on the disk.
Adding information to the tree
==============================
The DOM standard does not mandate each node to have a pointer to the
location it was read from (for instance `file:line:column`). In fact,
storing that for each node would increase the size of the DOM tree
(not small by any means already) significantly.
But depending on your application, this might be a useful information
to have, for instance if you want to report error messages with a
correct location.
Fortunately, this can be done relatively easily by extending the
type `DOM.Readers.Tree_Reader`, and override the `Start_Element`.
You would then add a custom attribute to all the nodes that contain
the location for this node. Here is an example.
.. literalinclude:: dom/dom_with_location.ads
:language: ada
:linenos:
.. literalinclude:: dom/dom_with_location.adb
:language: ada
:linenos:
|