File: xmlproc.html

package info (click to toggle)
python-xml 0.4.19981014-1
  • links: PTS
  • area: main
  • in suites: slink
  • size: 2,124 kB
  • ctags: 3,099
  • sloc: ansic: 9,075; python: 8,150; xml: 7,940; makefile: 84; sh: 41
file content (203 lines) | stat: -rw-r--r-- 6,501 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
  <TITLE>xmlproc: A Python XML parser</TITLE>
  <META NAME="Author"    CONTENT="Lars Marius Garshol">
  <META NAME="Generator" CONTENT="Homemade (http://birk105.studby.uio.no/hovedfag/pilot.html)">
  <META NAME="Description" CONTENT="This is the home page of a free XML parser written in Python.">
  <LINK REL=StyleSheet HREF="../../../standard.css" TYPE="text/css" MEDIA=screen>
  <LINK REL=top               HREF="index.html" TITLE="Tools for parsing XML with Python">
</HEAD>

<BODY>

<DIV CLASS=partof>
This page is a part of <A HREF="index.html">Tools for parsing XML with Python</A>.
</DIV>

<H1>xmlproc: A Python XML parser</H1>

<TABLE CLASS="programinfo">
<TR><TH ALIGN=left>Version:   <TD>0.52
<TR><TH ALIGN=left>Author:    <TD>Lars Marius Garshol
<TR><TH ALIGN=left>Email:     <TD>larsga@ifi.uio.no
<TR><TH ALIGN=left>Released:  <TD>12.Sep.98
</TABLE>

<H2>What is xmlproc?</H2>

<P>
xmlproc is an XML parser written in Python. It is a fairly complete validating parser, but does not
do everything required of a validating parser, or even a well-formedness parser. The average
user should not run into any omissions, though. Later releases will be more complete.
</P>

<P>
xmlproc now supports both <A HREF="xmlproc-catalog-doco.html">SGML Open Catalogs and XCatalog 0.1</A>.
</P>

<H2>Deviations from the XML specification</H2>

<P>
xmlproc does not follow the XML specification in these respects:
</P>

<UL>
  <LI>Parameter entities in external DTD subsets are not allowed inside declarations,
          only between them.
  <LI>No attempt is made to deal with different character sets or encodings.
  <LI>The parser does not check for the illegal characters below &amp;#x20;.
  <LI>Some internal consistency checks on the DTD (such as that the values of default
          attribute values are valid) are not performed.
  <LI>NOTATION attributes are not fully supported.
  <LI>Single-character entities are not handled correctly.
</UL>

<P>
All other deviations from the specification are unintentional bugs and should be reported
to me via email. Hopefully, xmlproc will be 100% compliant in version 1.00.
</P>

<H2>Using xmlproc</H2>

<P>
xmlproc can be used both as a command-line parser and as a parser API
you can use to write XML applications.
</P>

<H3>The command-line parser</H3>

<P>
The command-line parser is in xpcmd.py for well-formedness parsing and xvcmd.py
for validating parsing. Currently xpcmd.py only accepts one
argument: the URL to the file to parse. (You can use just the file
name instead of a full URL if you like.) 
</P>

<P>
xvcmd.py has more options:
</P>

<PRE>
Usage:

  xvcmd.py [-c catalog] [-l language] {-o format] [urltodoc]

  ---Options:  
  catalog:  path to catalog file to use to resolve public identifiers
  language: ISO 3166 language code for language to use in error messages
  format:   Format to output parsed XML. 'e': ESIS, 'x': canonical XML
            No data will be outputted if this option is not specified
  urltodoc: URL to the document to parse. (You can use plain file names
            as well.) Can be omitted if a catalog is specified and contains
            a DOCUMENT entry.  
            
  Catalog files with URLs that end in '.xml' are assumed to be XCatalogs,
  all others are assumed to be SGML Open Catalogs.

  If the -c option is not specified the environment variables XMLXCATALOG
  and XMLSOCATALOG will be used (in that order).
</PRE>

<H3>Basic usage</H3>

<P>
If you want to make a program that gets data from the parser you
should subclass the Application class in xmlapp.py. This is a sample
xmlproc client:
</P>

<PRE><CODE>
from xml.parsers.xmlproc import xmlproc

class MyApplication(xmlproc.Application):
    pass # Add some useful stuff here

p=xmlproc.XMLProcessor()  # Make this xmlval.XMLValidator if you want to validate
p.set_application(MyApplication())
p.parse_resource("foo.xml")
</CODE></PRE>

<H3>More detailed information</H3>

<P>
The xmlproc APIs are now <A HREF="xmlproc-doco.html">documented</A>. Note however,
that if possible, you should use the <A HREF="saxlib.html">SAX API</A> instead of xmlprocs native API.
This is because the SAX API will allow you to switch parsers without changing your application
code.
</P>

<H2>Licence?</H2>

<P>
xmlproc is free and you can do as you like with it. If you change it,
please let me know.
</P>

<H2>Getting xmlproc</H2>

<P>
You can download xmlproc <A HREF="xmlproc.zip">here</A>.
</P>

<H2>Changes since last release</H2>

<P>
These are the changes since version 0.51:
</P>

<UL>
  <LI>40% speed increase for well-formedness parsing. The improvement for validating
          parsing seems to be around 25%. (Depends a lot on DTD size versus document size.)
  <LI>Error reporting improved. Better error messages, and support for error messages
          in different languages.
  <LI>xvcmd.py option interpretation improved (-l option added)
  <LI>Numerous minor parse bug fixes
  <LI>Some API extensions:
    <UL>
      <LI>CatalogManager.get_public_ids() method added
      <LI>DTD.get_elements() method added
      <LI>Parser.set_error_language() method added
      <LI>optional bufsize argument added to Parser.parse_resource()
    </UL>
</UL>

<H2>Feedback</H2>

<P>
Any and all feedback is welcome, from suggestions for improvements or new features
to bug reports. And I really mean it! If you have some opinions on this program, please
let me hear them.
</P>

<H2>Email notification of new versions</H2>

<P>
To be notified by email when a new version is released, fill out this
form. I guarantee that these email addresses won't be used for any
other purpose, and that you'll receive notification if the service
dies. (If you follow the Python XML-SIG mailing list you won't need to register here
since new releases will also be announced there.)
</P>

<FORM METHOD=POST ACTION="http://www.stud.ifi.uio.no/~larsga/addmail.cgi">
  <TABLE>
  <TR><TD>Your full name:     <TD><INPUT TYPE=TEXT NAME=FULLNAME SIZE=30>
  <TR><TD>Your email address: <TD><INPUT TYPE=TEXT NAME=EMAIL    SIZE=30>
  <TR><TD COLSPAN=2><INPUT TYPE=SUBMIT VALUE="Add to list">
  </TABLE>
  <INPUT TYPE=hidden NAME=LIST VALUE=xmlproc>
</FORM>


<HR>

<ADDRESS>
14.Sep.98 23:19, 
<A HREF="../../../lmg.html">Lars Marius Garshol</A>,
<A HREF="mailto:larsga@ifi.uio.no">larsga@ifi.uio.no</A>. A part of 
<A HREF="index.html">Tools for parsing XML with Python</A>.
</ADDRESS>

</BODY>
</HTML>