File: retrieving_data.tex

package info (click to toggle)
libnanoxml2-java 2.2.3.dfsg-9
links: PTS, VCS
area: main
in suites: bookworm, bullseye, forky, sid, trixie
size: 988 kB
sloc: java: 5,085; xml: 150; makefile: 86; sh: 59
file content (323 lines) | stat: -rw-r--r-- 10,545 bytes
parent folder | download | duplicates (4)
\chapter{Retrieving Data From An \XML{} Datasource}

This chapter shows how to retrieve \XML{} data from a standard data
source.
Such source can be a file, an \ltext{HTTP} object or a text string.
The method described in this chapter is the simplest way to retrieve
\XML{} data.
More advanced ways are described in the next chapters.

\section{A Very Simple Example}

This section describes a very simple \XML{} application.
It parses \XML{} data from a stream and dumps it ``pretty-printed'' to
the standard output.
While its use is very limited, it shows how to set up a parser and parse an
\XML{} document.

\begin{example}
\xkeyword{import} net.n3.nanoxml.*;\xcallout{1}
\xkeyword{import} java.io.*;

\xkeyword{public class} DumpXML
\{
~~\xkeyword{public static void} main(String[] args)
~~~~\xkeyword{throws} Exception
~~\{
~~~~IXMLParser parser = XMLParserFactory.createDefaultXMLParser();\xcallout{2}
~~~~IXMLReader reader = StdXMLReader.fileReader("test.xml");\xcallout{3}
~~~~parser.setReader(reader);
~~~~IXMLElement xml = (IXMLElement) parser.parse();\xcallout{4}
~~~~XMLWriter writer = new XMLWriter(System.out);\xcallout{5}
~~~~writer.write(xml);
~~\}
\}
\end{example}

\begin{callout}
  \coitem
    The \NanoXML{} classes are located in the package 
    \packagename{net.n3.nanoxml}.
  \coitem
    This command creates an \XML{} parser.
    The actual class of the parser is dependent on the value of the system
    property \propertykey{net.n3.nanoxml.XMLParser}, which is by default
    \propertyvalue{net.n3.nanoxml.StdXMLParser}.
  \coitem
    The command creates a ``standard'' reader which reads its data from the
    file called \filename{test.xml}.

    Usually you can use \classname{StdXMLReader} to feed the \XML{} data to
    the parser.
    The default reader is able to set up HTTP connections when retrieving
    \ltext{DTDs} or entities from different machines.
    If necessary, you can supply your own reader to \acronym{e.g.} provide
    support for \ltext{PUBLIC} identifiers.
  \coitem
    The \XML{} parser now parses the data read from \filename{test.xml}
    and creates a tree of parsed \XML{} elements.

    The structure of those elements will be described in the next section.
  \coitem
    An \classname{XMLWriter} can be used to dump a ``pretty-printed'' view
    of the parsed \XML  data on an output stream.
    In this case, we dump the read data to the standard output
    \ltext{(System.out)}.
\end{callout}


\section{Analyzing The Data}

You can easily traverse the logical tree generated by the parser.
If you need to create your own object tree, you can create your custom
% TODO: make "chapter 3" a xref
builder, which is described in chapter 3.

The default \XML{} builder, \classname{StdXMLBuilder} generates a tree of
\classname{IXMLElement} objects.
Every such object has a name and can have attributes, \ltext{\#PCDATA} content and child objects.

The following XML data:

\begin{example}
$<$FOO attr1="fred" attr2="barney"$>$
~~~~$<$BAR a1="flintstone" a2="rubble"$>$
~~~~~~~~Some data.
~~~~$<$/BAR$>$
~~~~$<$QUUX/$>$
$<$/FOO$>$
\end{example}

is parsed to the following objects:

\begin{itemize}
  \item[] Element FOO:
    \begin{itemize}
      \item[] Attributes = \{ "attr1"="fred", "attr2"="barney" \}
      \item[] Children = \{ BAR, QUUX \}
      \item[] PCData = null
    \end{itemize}
  \item[] Element BAR:
    \begin{itemize}
      \item[] Attributes = \{ "a1"="flintstone", "a2"="rubble" \}
      \item[] Children = \{\}
      \item[] PCData = "Some data."
    \end{itemize}
  \item[] Element QUUX:
    \begin{itemize}
      \item[] Attributes = \{\}
      \item[] Children = \{\}
      \item[] PCData = null
    \end{itemize}
\end{itemize}

You can retrieve the name of an element using \methodname{getFullName}, thus:

\begin{example}
FOO.getFullName() $\to$ "FOO"
\end{example}

You can enumerate the attribute keys using 
\methodname{enumerateAttributeNames}:

\begin{example}
Enumeration enum = FOO.enumerateAttributeNames();
\xkeyword{while} (enum.hasMoreElements()) \{
~~System.out.print(enum.nextElement());
~~System.out.print(' ');
\}
$\to$ attr1 attr2
\end{example}

You can retrieve the value of an attribute using \methodname{getAttribute}:

\begin{example}
FOO.getAttribute ("attr1", null) $\to$ "fred"
\end{example}

The child elements can be enumerated using \methodname{enumerateChildren}:

\begin{example}
Enumeration enum = FOO.enumerateChildren();
\xkeyword{while} (enum.hasMoreElements()) \{
~~System.out.print(enum.nextElement() + ' ');
\}
$\to$ BAR QUUX
\end{example}

If the element contains parsed character data \ltext{(\#PCDATA)} as its only
child.
You can retrieve that data using \methodname{getContent}:

\begin{example}
BAR.getContent() $\to$ "Some data."
\end{example}

If an element contains both \ltext{\#PCDATA} and \XML  elements as its
children, the character data segments will be put in untitled \XML
elements (whose name is \ltext{null}).

\classname{IXMLElement} contains many convenience methods for retrieving data
and traversing the \XML  tree.


\section{Generating \XML}

You can very easily create a tree of \XML  elements or modify an
existing one.

To create a new tree, just create an \classname{IXMLElement} object:

\begin{example}
IXMLElement elt = \xkeyword{new} XMLElement("ElementName");
\end{example}

You can add an attribute to the element by calling \methodname{setAttribute}.

\begin{example}
elt.setAttribute("key", "value");
\end{example}

You can add a child element to an element by calling \methodname{addChild}:

\begin{example}
IXMLElement child = elt.createElement("Child");
elt.addChild(child);
\end{example}

Note that the child element is created calling \methodname{createElement}.
This insures that the child instance is compatible with its new parent.

If an element has no children, you can add \ltext{\#PCDATA} content to it using
\methodname{setContent}:

\begin{example}
child.setContent("Some content");
\end{example}

If the element does have children, you can add \ltext{\#PCDATA} content to it
by adding an untitled element, which you create by calling
\methodname{createPCDataElement}:

\begin{example}
IXMLElement pcdata = elt.createPCDataElement();
pcdata.setContent("Blah blah");
elt.addChild(pcdata);
\end{example}

When you have created or edited the XML element tree, you can write it out to
an output stream or writer using an \classname{XMLWriter}:

\begin{example}
java.io.Writer output = \ldots;
IXMLElement xmltree = \ldots;
XMLWriter xmlwriter = new XMLWriter(output);
writer.write(xmltree);
\end{example}


\section{Namespaces}

As of version 2.1, \ltext{NanoXML} has support for namespaces.
Namespaces allow you to attach a \ltext{URI} to the name of an element name or an attribute.
This \ltext{URI} allows you to make a distinction between similary named
entities coming from different sources.
More information about namespaces can be found in the XML Namespaces
recommendation, which can be found at
\href{http://www.w3c.org/TR/REC-xml-names/}%
{http://www.w3c.org/TR/REC-xml-names/}.

Please note that a \ltext{DTD} has no support for namespaces.
It is important to understand that an \XML  document can have only one
\ltext{DTD}.
Though the namespace \ltext{URI} is often presented as a \ltext{URL}, that
\ltext{URL} is not a system id for a \ltext{DTD}.
The only function of a namespace \ltext{URI} is to provide a globally unique
name.

As an example, lets have the following \XML  data:

\begin{example}
$<$doc:book xmlns:doc="http://nanoxml.n3.net/book"$>$
~~$<$chapter xmlns="http://nanoxml.n3.net/chapter"
~~~~~~~~~~~title="Introduction"
~~~~~~~~~~~doc:id="chapter1"/$>$
$<$/doc:book$>$
\end{example}

The top-level element uses the namespace \ltext{``http://nanoxml.n3.net/book''}.
The prefix is used as an alias for the namespace, which is defined in the
attribute \ltext{xmlns:doc}.
This prefix is defined for the \ltext{doc:book} element and its child elements.

The chapter element uses the namespace
\ltext{``http://nanoxml.n3.net/chapter''}.
Because the namespace \ltext{URI} has been defined as the value of the xmlns
attribute, the namespace is the default namespace for the chapter element.
Default namespaces are inherited by the child elements, but only for their
names.
Attributes never have a default namespace.

The chapter element has an attribute \ltext{doc:id}, which is defined in the
same namespace as \ltext{doc:book} because of the doc prefix.

\ltext{NanoXML 2.1} offers some variants on the standard retrieval methods to allow the application to access the namespace information.

In the following examples, we assume the variable book to contain the
\ltext{doc:book} element and the variable chapter to contain the chapter
element.

To get the full name, which includes the namespace prefix, of the element, use \methodname{getFullName}:

\begin{example}
book.getFullName() $\to$ "doc:book"
chapter.getFullName() $\to$ "chapter"
\end{example}

To get the short name, which excludes the namespace prefix, of the element,
use \methodname{getName}:

\begin{example}
book.getName() $\to$ "book"
chapter.getName $\to$ "chapter"
\end{example}

For elements that have no associated namespace, \methodname{getName} and
\methodname{getFullName} are equivalent.

To get the namespace \ltext{URI} associated with the name of the element, use \methodname{getNamespace}:

\begin{example}
book.getNamespace() $\to$ "http://nanoxml.n3.net/book"
chapter.getNamespace() $\to$ "http://nanoxml.n3.net/chapter"
\end{example}

If no namespace is associated with the name of the element, this method returns \variable{null}.

You can get an attribute of an element using either its full name (which
includes its prefix) or its short name together with its namespace \ltext{URI}, so the following two instructions are equivalent:

\begin{example}
chapter.getAttribute("doc:id", null)
chapter.getAttribute("id", "http://nanoxml.n3.net/book", null)
\end{example}

Note that the title attribute of chapter has no namespace, even though the
chapter element name has a default namespace.

You can create a new element which uses a namespace this way:

\begin{example}
book = \xkeyword{new} XMLElement("doc:book", "http://nanoxml.n3.net/book");
chapter = book.createElement("chapter",
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"http://nanoxml.n3.net/chapter");
\end{example}

You can add an attribute which uses a namespace this way:

\begin{example}
chapter.setAttribute("doc:id",
~~~~~~~~~~~~~~~~~~~~~"http://nanoxml.n3.net/book",
~~~~~~~~~~~~~~~~~~~~~chapterId);
\end{example}