File: introduction.tex

package info (click to toggle)
libnanoxml2-java 2.2.3.dfsg-9
links: PTS, VCS
area: main
in suites: bookworm, bullseye, forky, sid, trixie
size: 988 kB
sloc: java: 5,085; xml: 150; makefile: 86; sh: 59
file content (106 lines) | stat: -rw-r--r-- 4,572 bytes
parent folder | download | duplicates (4)
\chapter{Introduction}

This chapter gives a short introduction to XML and NanoXML.

\section{About \ltext{XML}}

The extensible markup language,
\href{http://www.w3c.org/TR/REC-xml}{\ltext{XML}}, is a way to mark up text in
a structured document.

\ltext{XML} is a simplification of the complex \ltext{SGML} standard.
\ltext{SGML}, the Standard Generalized Markup Language, is an international
(\ltext{ISO}) standard for marking up text and graphics.
The best known application of \ltext{SGML} is \ltext{HTML}.

Although \ltext{SGML} data is very easy to write, it's very difficult to write a
generic \ltext{SGML} parser.
When designing \ltext{XML} however, the authors removed much of the flexibility
of \ltext{SGML} making it much easier to parse \ltext{XML} documents correctly.

\ltext{XML} data is structured as a tree of \term{entities}.
An entity can be a string of character data or an element which can contain other
entities.
Elements can optionally have a set of attributes.
Attributes are key/value pairs which set some properties of an element.

The following example shows some XML data:

\begin{example}
$<$book$>$
~~$<$chapter id="my chapter"$>$
~~~~$<$title$>$The title$<$/title$>$
~~~~Some text.
~~$<$/chapter$>$
$<$/book$>$
\end{example}

At the root of the tree, you can find the element ``book''.
This element contains one child element: ``chapter''.
The chapter element has one attribute which maps the key ``id'' to
``my chapter''.
The chapter element has two child entities: the element ``title'' and the
character data ``Some text.''.
Finally, the title element has one child, the string ``The title''.

\section{About \ltext{NanoXML}}

In April 2000, \ltext{NanoXML} was first released as a spin-off project of
\ltext{AUIT}, the Abstract User Interface Toolkit.

The intent of NanoXML was to be a small parser which was easy to use.
\ltext{SAX} and \ltext{DOM} are much too complex for what I needed and the
mainstream parsers were either much too big or had a very restrictive license.

\ltext{NanoXML 1} has all the features I needed: it is very small (about 6K),
is reasonably fast for small \ltext{XML} documents, is very easy to use and is
free (\ltext{zlib/libpng} license).
As I never intended to use \ltext{NanoXML} to parse \ltext{DocBook} documents,
there was no support for mixed data or \ltext{DTD} parsing.

\ltext{NanoXML} was released as a \ltext{SourceForge} project and, because of the
very good response from its users, it matured to a small and stable parser.
The final version, release \ltext{1.6.8} was released in May 2001.

Because of its small size, people started to use \ltext{NanoXML} for embedded
systems (\ltext{KVM}, \ltext{J2ME}) and kindly submitted patches to make
\ltext{NanoXML} work in such restricted environment.

\section{\ltext{NanoXML} 2}

In July 2001, \ltext{NanoXML} 2 has been released.
Unlike \ltext{NanoXML 1}, speed and \ltext{XML} compliancy were considered to be
very important when the new parser was designed.
\ltext{NanoXML 2} is also very modular: you can easily replace the different
components in the parser to customize it to your needs.
The modularity of \ltext{NanoXML 2} also benefits extensions like \acronym{e.g.}
\ltext{SAX} support which can now directly access the parser.
In \ltext{NanoXML 1}, the \ltext{SAX} adapter had to iterate the data structure
built by the base product.

Although many features were added to \ltext{NanoXML}, the second release was
still very small.
The full parser with builder fits in a \ltext{JAR} file of about 32K.
This is still very tiny, especially when you compare this with the ``standard''
parsers of more than four times its size.

As there is still need for a tiny parser like \ltext{NanoXML 1}, there is a
special branch of \ltext{NanoXML 2}: \ltext{NanoXML/Lite}. This parser is source
compatible with \ltext{NanoXML 1} but features a new parsing algorithm which
makes it more than twice as fast as the older version.
It is however more restrictive on the \ltext{XML} data it parses: the older
version allowed some not-wellformed data to be parsed.

There are three branches of NanoXML 2:
\begin{itemize}
  \item[$\bullet$]
    \term{NanoXML/Lite} is the successor of \ltext{NanoXML 1}.
    It features an almost compatible parser which is extremely small.
  \item[$\bullet$]
    \term{NanoXML/Java} is the standard parser.
  \item[$\bullet$]
    \term{NanoXML/SAX} is the \ltext{SAX} adapter for \ltext{NanoXML/Java}.
\end{itemize}

The latest version of \ltext{NanoXML} is \ltext{NanoXML 2.2.1}, which has been
released in February 2002.