File: xmlproc_tut.html

package info (click to toggle)
qm 1.1.3-1
  • links: PTS
  • area: main
  • in suites: woody
  • size: 8,628 kB
  • ctags: 10,249
  • sloc: python: 41,482; ansic: 20,611; xml: 12,837; sh: 485; makefile: 226
file content (179 lines) | stat: -rw-r--r-- 5,914 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
  <TITLE>An overview of xmlproc</TITLE>
  <META NAME="Author"      CONTENT="Lars Marius Garshol">
  <META NAME="Generator"   CONTENT="Homemade">
  <META NAME="Description" CONTENT="This page documents the rough structure of xmlproc.">
  <META NAME="Keywords"    CONTENT="XML, Python, XML parser">
  <LINK REL=stylesheet HREF="standard.css" TYPE="text/css" MEDIA=screen>
</HEAD>

<BODY>

<H1>An overview of xmlproc</H1>

<P>
xmlproc has been designed in a highly modular fashion, with the
intention that it should be possible to reuse the modules in different
contexts and applications. Emphasis has been placed on flexibility,
leading to a rather large and perhaps bewildering interface. This
document attempts to explain the different pieces and how they fit
together.
</P>

<h2>The command-line parsers</h2>

<P>
xmlproc ships with two command-line parser applications for XML
parsing. They are not useful for incorporating xmlproc in your own
applications, but can be used to check that xmlproc is working, to
verify documents and also to see how the parser interprets documents,
by making them output canonical XML or ESIS.
</P>

<P>
The diagram below shows how these applications use the xmlproc APIs.
</P>

<img src="cmdline.gif" ALT="[Diagram #1]">

<P>
As the diagram shows one application (xpcmd.py) uses the object called
XMLProcessor. This is the well-formedness parser that does not read
the external DTD and which does not validate. The second application
(xvcmd.py) uses an object called XMLValidator, which again uses the
well-formedness parser XMLProcessor for basic parsing, but provides
full validation on top of this.
</P>

<P>
The command-line interfaces are documented <A HREF="cmdline.html">here</A>.
</P>

<h3>The DTD parser</h3>

<p>
The xmlproc distribution also includes a command-line application
dtdcmd.py, which only uses the DTD parsing APIs of xmlproc and can
parse DTD files directly. The command-line interface is as follows:
</p>

<pre>
  python dtdcmd.py [--list] <urltodtd>+
</pre>

<p>
Here <tt>--list</tt> makes the parser list all declarations after
parsing a DTD, and <tt>urltodtd</tt> is a list of one or more DTD file
names or URLs.
</p>

<h3>The DTD to XML Schema converter</h3>

<p>
The command-line application dtd2schema.py converts DTDs into
equivalent XML Schema documents. The application takes one mandatory
argument (the system identifier of the DTD) and one optional argument
(the file name of the output document). If no output file name is
specified, one is inferred from the input file name.
</p>

<p>
The converter tries to do some naive inference of attribute groups and
will produce output with attribute groups inferred from the DTD structure.
</p>

<h2>The parsing API</h2>

<P>
If you want to use xmlproc in an application of your own what you do
is basically what I did with xpcmd.py and xvcmd.py: you use the
xmlproc modulse in an application to provide it with XML parsing
functionality. On top of that you must build whatever you want to use
xmlproc for yourself. (In the case of xvcmd.py and xpcmd.py this is
simply outputting parse results to the console and interpreting
command-line options.
</P>

<P>
The diagram below shows the main objects involved in this API:
</P>

<img src="basicapi.gif" ALT="[Diagram #2]">

<P>
The central object is the Parser object, which can be either an
XMLProcessor or an XMLValidator (they have the same interface, but
only the latter validates and the former is faster). When it is
created the Parser creates the four objects shown in the diagram. It
also provides methods that can be used to make it use objects that you
provide instead of these objects. (See the <A
HREF="api-doco.html">documentation</A> for details</A>.)
</P>

<P>
The roles of these four objects are:
</P>

<dl>
  <dt>Application
  <dd>This object receives all data events from the parsing of the
  document such as handle_start_tag, handle_end_tag, handle_data and
  so on.
  <dt>ErrorHandler
  <dd>This object receives all error events.
  <dt>PubIdResolver
  <dd>Whenever an entity reference is encountered, this object will be
  given the public identifier and system identifier (file name/URL) of
  the entity and asked to supply the system identifier to be used.
  <dt>InputSourceFactory
  <dd>Once the PubIdResolver has returned the correct file name/URL
  for the entity, the InputSourceFactory will be asked to create a
  file-like object from which the entity contents can be read.
</dl>

<P>
So, to act on the document content, make an Application object and
tell the parser to use it. To control error reporting, make an
ErrorHandler. To control the resolution of public identifiers (and
also to remap system identifiers), make a PubIdResolver. To add
support for new kinds of URLs or to provide your own support for a
class of URLs, make an InputSourceFactory.
</P>

<H2>Example</H3>

<PRE><CODE>
from xml.parsers.xmlproc import xmlproc

class MyApplication(xmlproc.Application):
    pass # Add some useful stuff here

p=xmlproc.XMLProcessor()  # Make this xmlval.XMLValidator if you want to validate
p.set_application(MyApplication())
p.parse_resource("foo.xml")
</CODE></PRE>

<h2>The GUI parser frontend</h2>

<p>
xmlproc includes a GUI application that can be used to parse XML
documents in the wxValidator.py application. This application uses <a
href="http://alldunn.com/wxPython/">wxPython</a> to create a more
user-friendly interface, as shown below:
</p>

<img src="wxval.gif" alt="[Screenshot of wxValidator]">

<HR>

<ADDRESS>
Last update 2000-05-11 14:20, by 
<a href="mailto:larsga@garshol.priv.no">Lars Marius Garshol</a>.
</ADDRESS>

</DIV>

</BODY>
</HTML>