1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
|
\chapter{Introduction}
This chapter gives a short introduction to XML and NanoXML.
\section{About \ltext{XML}}
The extensible markup language,
\href{http://www.w3c.org/TR/REC-xml}{\ltext{XML}}, is a way to mark up text in
a structured document.
\ltext{XML} is a simplification of the complex \ltext{SGML} standard.
\ltext{SGML}, the Standard Generalized Markup Language, is an international
(\ltext{ISO}) standard for marking up text and graphics.
The best known application of \ltext{SGML} is \ltext{HTML}.
Although \ltext{SGML} data is very easy to write, it's very difficult to write a
generic \ltext{SGML} parser.
When designing \ltext{XML} however, the authors removed much of the flexibility
of \ltext{SGML} making it much easier to parse \ltext{XML} documents correctly.
\ltext{XML} data is structured as a tree of \term{entities}.
An entity can be a string of character data or an element which can contain other
entities.
Elements can optionally have a set of attributes.
Attributes are key/value pairs which set some properties of an element.
The following example shows some XML data:
\begin{example}
$<$book$>$
~~$<$chapter id="my chapter"$>$
~~~~$<$title$>$The title$<$/title$>$
~~~~Some text.
~~$<$/chapter$>$
$<$/book$>$
\end{example}
At the root of the tree, you can find the element ``book''.
This element contains one child element: ``chapter''.
The chapter element has one attribute which maps the key ``id'' to
``my chapter''.
The chapter element has two child entities: the element ``title'' and the
character data ``Some text.''.
Finally, the title element has one child, the string ``The title''.
\section{About \ltext{NanoXML}}
In April 2000, \ltext{NanoXML} was first released as a spin-off project of
\ltext{AUIT}, the Abstract User Interface Toolkit.
The intent of NanoXML was to be a small parser which was easy to use.
\ltext{SAX} and \ltext{DOM} are much too complex for what I needed and the
mainstream parsers were either much too big or had a very restrictive license.
\ltext{NanoXML 1} has all the features I needed: it is very small (about 6K),
is reasonably fast for small \ltext{XML} documents, is very easy to use and is
free (\ltext{zlib/libpng} license).
As I never intended to use \ltext{NanoXML} to parse \ltext{DocBook} documents,
there was no support for mixed data or \ltext{DTD} parsing.
\ltext{NanoXML} was released as a \ltext{SourceForge} project and, because of the
very good response from its users, it matured to a small and stable parser.
The final version, release \ltext{1.6.8} was released in May 2001.
Because of its small size, people started to use \ltext{NanoXML} for embedded
systems (\ltext{KVM}, \ltext{J2ME}) and kindly submitted patches to make
\ltext{NanoXML} work in such restricted environment.
\section{\ltext{NanoXML} 2}
In July 2001, \ltext{NanoXML} 2 has been released.
Unlike \ltext{NanoXML 1}, speed and \ltext{XML} compliancy were considered to be
very important when the new parser was designed.
\ltext{NanoXML 2} is also very modular: you can easily replace the different
components in the parser to customize it to your needs.
The modularity of \ltext{NanoXML 2} also benefits extensions like \acronym{e.g.}
\ltext{SAX} support which can now directly access the parser.
In \ltext{NanoXML 1}, the \ltext{SAX} adapter had to iterate the data structure
built by the base product.
Although many features were added to \ltext{NanoXML}, the second release was
still very small.
The full parser with builder fits in a \ltext{JAR} file of about 32K.
This is still very tiny, especially when you compare this with the ``standard''
parsers of more than four times its size.
As there is still need for a tiny parser like \ltext{NanoXML 1}, there is a
special branch of \ltext{NanoXML 2}: \ltext{NanoXML/Lite}. This parser is source
compatible with \ltext{NanoXML 1} but features a new parsing algorithm which
makes it more than twice as fast as the older version.
It is however more restrictive on the \ltext{XML} data it parses: the older
version allowed some not-wellformed data to be parsed.
There are three branches of NanoXML 2:
\begin{itemize}
\item[$\bullet$]
\term{NanoXML/Lite} is the successor of \ltext{NanoXML 1}.
It features an almost compatible parser which is extremely small.
\item[$\bullet$]
\term{NanoXML/Java} is the standard parser.
\item[$\bullet$]
\term{NanoXML/SAX} is the \ltext{SAX} adapter for \ltext{NanoXML/Java}.
\end{itemize}
The latest version of \ltext{NanoXML} is \ltext{NanoXML 2.2.1}, which has been
released in February 2002.
|