1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323
|
\chapter{Retrieving Data From An \XML{} Datasource}
This chapter shows how to retrieve \XML{} data from a standard data
source.
Such source can be a file, an \ltext{HTTP} object or a text string.
The method described in this chapter is the simplest way to retrieve
\XML{} data.
More advanced ways are described in the next chapters.
\section{A Very Simple Example}
This section describes a very simple \XML{} application.
It parses \XML{} data from a stream and dumps it ``pretty-printed'' to
the standard output.
While its use is very limited, it shows how to set up a parser and parse an
\XML{} document.
\begin{example}
\xkeyword{import} net.n3.nanoxml.*;\xcallout{1}
\xkeyword{import} java.io.*;
\xkeyword{public class} DumpXML
\{
~~\xkeyword{public static void} main(String[] args)
~~~~\xkeyword{throws} Exception
~~\{
~~~~IXMLParser parser = XMLParserFactory.createDefaultXMLParser();\xcallout{2}
~~~~IXMLReader reader = StdXMLReader.fileReader("test.xml");\xcallout{3}
~~~~parser.setReader(reader);
~~~~IXMLElement xml = (IXMLElement) parser.parse();\xcallout{4}
~~~~XMLWriter writer = new XMLWriter(System.out);\xcallout{5}
~~~~writer.write(xml);
~~\}
\}
\end{example}
\begin{callout}
\coitem
The \NanoXML{} classes are located in the package
\packagename{net.n3.nanoxml}.
\coitem
This command creates an \XML{} parser.
The actual class of the parser is dependent on the value of the system
property \propertykey{net.n3.nanoxml.XMLParser}, which is by default
\propertyvalue{net.n3.nanoxml.StdXMLParser}.
\coitem
The command creates a ``standard'' reader which reads its data from the
file called \filename{test.xml}.
Usually you can use \classname{StdXMLReader} to feed the \XML{} data to
the parser.
The default reader is able to set up HTTP connections when retrieving
\ltext{DTDs} or entities from different machines.
If necessary, you can supply your own reader to \acronym{e.g.} provide
support for \ltext{PUBLIC} identifiers.
\coitem
The \XML{} parser now parses the data read from \filename{test.xml}
and creates a tree of parsed \XML{} elements.
The structure of those elements will be described in the next section.
\coitem
An \classname{XMLWriter} can be used to dump a ``pretty-printed'' view
of the parsed \XML data on an output stream.
In this case, we dump the read data to the standard output
\ltext{(System.out)}.
\end{callout}
\section{Analyzing The Data}
You can easily traverse the logical tree generated by the parser.
If you need to create your own object tree, you can create your custom
% TODO: make "chapter 3" a xref
builder, which is described in chapter 3.
The default \XML{} builder, \classname{StdXMLBuilder} generates a tree of
\classname{IXMLElement} objects.
Every such object has a name and can have attributes, \ltext{\#PCDATA} content and child objects.
The following XML data:
\begin{example}
$<$FOO attr1="fred" attr2="barney"$>$
~~~~$<$BAR a1="flintstone" a2="rubble"$>$
~~~~~~~~Some data.
~~~~$<$/BAR$>$
~~~~$<$QUUX/$>$
$<$/FOO$>$
\end{example}
is parsed to the following objects:
\begin{itemize}
\item[] Element FOO:
\begin{itemize}
\item[] Attributes = \{ "attr1"="fred", "attr2"="barney" \}
\item[] Children = \{ BAR, QUUX \}
\item[] PCData = null
\end{itemize}
\item[] Element BAR:
\begin{itemize}
\item[] Attributes = \{ "a1"="flintstone", "a2"="rubble" \}
\item[] Children = \{\}
\item[] PCData = "Some data."
\end{itemize}
\item[] Element QUUX:
\begin{itemize}
\item[] Attributes = \{\}
\item[] Children = \{\}
\item[] PCData = null
\end{itemize}
\end{itemize}
You can retrieve the name of an element using \methodname{getFullName}, thus:
\begin{example}
FOO.getFullName() $\to$ "FOO"
\end{example}
You can enumerate the attribute keys using
\methodname{enumerateAttributeNames}:
\begin{example}
Enumeration enum = FOO.enumerateAttributeNames();
\xkeyword{while} (enum.hasMoreElements()) \{
~~System.out.print(enum.nextElement());
~~System.out.print(' ');
\}
$\to$ attr1 attr2
\end{example}
You can retrieve the value of an attribute using \methodname{getAttribute}:
\begin{example}
FOO.getAttribute ("attr1", null) $\to$ "fred"
\end{example}
The child elements can be enumerated using \methodname{enumerateChildren}:
\begin{example}
Enumeration enum = FOO.enumerateChildren();
\xkeyword{while} (enum.hasMoreElements()) \{
~~System.out.print(enum.nextElement() + ' ');
\}
$\to$ BAR QUUX
\end{example}
If the element contains parsed character data \ltext{(\#PCDATA)} as its only
child.
You can retrieve that data using \methodname{getContent}:
\begin{example}
BAR.getContent() $\to$ "Some data."
\end{example}
If an element contains both \ltext{\#PCDATA} and \XML elements as its
children, the character data segments will be put in untitled \XML
elements (whose name is \ltext{null}).
\classname{IXMLElement} contains many convenience methods for retrieving data
and traversing the \XML tree.
\section{Generating \XML}
You can very easily create a tree of \XML elements or modify an
existing one.
To create a new tree, just create an \classname{IXMLElement} object:
\begin{example}
IXMLElement elt = \xkeyword{new} XMLElement("ElementName");
\end{example}
You can add an attribute to the element by calling \methodname{setAttribute}.
\begin{example}
elt.setAttribute("key", "value");
\end{example}
You can add a child element to an element by calling \methodname{addChild}:
\begin{example}
IXMLElement child = elt.createElement("Child");
elt.addChild(child);
\end{example}
Note that the child element is created calling \methodname{createElement}.
This insures that the child instance is compatible with its new parent.
If an element has no children, you can add \ltext{\#PCDATA} content to it using
\methodname{setContent}:
\begin{example}
child.setContent("Some content");
\end{example}
If the element does have children, you can add \ltext{\#PCDATA} content to it
by adding an untitled element, which you create by calling
\methodname{createPCDataElement}:
\begin{example}
IXMLElement pcdata = elt.createPCDataElement();
pcdata.setContent("Blah blah");
elt.addChild(pcdata);
\end{example}
When you have created or edited the XML element tree, you can write it out to
an output stream or writer using an \classname{XMLWriter}:
\begin{example}
java.io.Writer output = \ldots;
IXMLElement xmltree = \ldots;
XMLWriter xmlwriter = new XMLWriter(output);
writer.write(xmltree);
\end{example}
\section{Namespaces}
As of version 2.1, \ltext{NanoXML} has support for namespaces.
Namespaces allow you to attach a \ltext{URI} to the name of an element name or an attribute.
This \ltext{URI} allows you to make a distinction between similary named
entities coming from different sources.
More information about namespaces can be found in the XML Namespaces
recommendation, which can be found at
\href{http://www.w3c.org/TR/REC-xml-names/}%
{http://www.w3c.org/TR/REC-xml-names/}.
Please note that a \ltext{DTD} has no support for namespaces.
It is important to understand that an \XML document can have only one
\ltext{DTD}.
Though the namespace \ltext{URI} is often presented as a \ltext{URL}, that
\ltext{URL} is not a system id for a \ltext{DTD}.
The only function of a namespace \ltext{URI} is to provide a globally unique
name.
As an example, lets have the following \XML data:
\begin{example}
$<$doc:book xmlns:doc="http://nanoxml.n3.net/book"$>$
~~$<$chapter xmlns="http://nanoxml.n3.net/chapter"
~~~~~~~~~~~title="Introduction"
~~~~~~~~~~~doc:id="chapter1"/$>$
$<$/doc:book$>$
\end{example}
The top-level element uses the namespace \ltext{``http://nanoxml.n3.net/book''}.
The prefix is used as an alias for the namespace, which is defined in the
attribute \ltext{xmlns:doc}.
This prefix is defined for the \ltext{doc:book} element and its child elements.
The chapter element uses the namespace
\ltext{``http://nanoxml.n3.net/chapter''}.
Because the namespace \ltext{URI} has been defined as the value of the xmlns
attribute, the namespace is the default namespace for the chapter element.
Default namespaces are inherited by the child elements, but only for their
names.
Attributes never have a default namespace.
The chapter element has an attribute \ltext{doc:id}, which is defined in the
same namespace as \ltext{doc:book} because of the doc prefix.
\ltext{NanoXML 2.1} offers some variants on the standard retrieval methods to allow the application to access the namespace information.
In the following examples, we assume the variable book to contain the
\ltext{doc:book} element and the variable chapter to contain the chapter
element.
To get the full name, which includes the namespace prefix, of the element, use \methodname{getFullName}:
\begin{example}
book.getFullName() $\to$ "doc:book"
chapter.getFullName() $\to$ "chapter"
\end{example}
To get the short name, which excludes the namespace prefix, of the element,
use \methodname{getName}:
\begin{example}
book.getName() $\to$ "book"
chapter.getName $\to$ "chapter"
\end{example}
For elements that have no associated namespace, \methodname{getName} and
\methodname{getFullName} are equivalent.
To get the namespace \ltext{URI} associated with the name of the element, use \methodname{getNamespace}:
\begin{example}
book.getNamespace() $\to$ "http://nanoxml.n3.net/book"
chapter.getNamespace() $\to$ "http://nanoxml.n3.net/chapter"
\end{example}
If no namespace is associated with the name of the element, this method returns \variable{null}.
You can get an attribute of an element using either its full name (which
includes its prefix) or its short name together with its namespace \ltext{URI}, so the following two instructions are equivalent:
\begin{example}
chapter.getAttribute("doc:id", null)
chapter.getAttribute("id", "http://nanoxml.n3.net/book", null)
\end{example}
Note that the title attribute of chapter has no namespace, even though the
chapter element name has a default namespace.
You can create a new element which uses a namespace this way:
\begin{example}
book = \xkeyword{new} XMLElement("doc:book", "http://nanoxml.n3.net/book");
chapter = book.createElement("chapter",
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"http://nanoxml.n3.net/chapter");
\end{example}
You can add an attribute which uses a namespace this way:
\begin{example}
chapter.setAttribute("doc:id",
~~~~~~~~~~~~~~~~~~~~~"http://nanoxml.n3.net/book",
~~~~~~~~~~~~~~~~~~~~~chapterId);
\end{example}
|