1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<TITLE>xmlproc: A Python XML parser</TITLE>
<META NAME="Author" CONTENT="Lars Marius Garshol">
<META NAME="Generator" CONTENT="Homemade (http://birk105.studby.uio.no/hovedfag/pilot.html)">
<META NAME="Description" CONTENT="This is the home page of a free XML parser written in Python.">
<LINK REL=StyleSheet HREF="../../../standard.css" TYPE="text/css" MEDIA=screen>
<LINK REL=top HREF="index.html" TITLE="Tools for parsing XML with Python">
</HEAD>
<BODY>
<DIV CLASS=partof>
This page is a part of <A HREF="index.html">Tools for parsing XML with Python</A>.
</DIV>
<H1>xmlproc: A Python XML parser</H1>
<TABLE CLASS="programinfo">
<TR><TH ALIGN=left>Version: <TD>0.52
<TR><TH ALIGN=left>Author: <TD>Lars Marius Garshol
<TR><TH ALIGN=left>Email: <TD>larsga@ifi.uio.no
<TR><TH ALIGN=left>Released: <TD>12.Sep.98
</TABLE>
<H2>What is xmlproc?</H2>
<P>
xmlproc is an XML parser written in Python. It is a fairly complete validating parser, but does not
do everything required of a validating parser, or even a well-formedness parser. The average
user should not run into any omissions, though. Later releases will be more complete.
</P>
<P>
xmlproc now supports both <A HREF="xmlproc-catalog-doco.html">SGML Open Catalogs and XCatalog 0.1</A>.
</P>
<H2>Deviations from the XML specification</H2>
<P>
xmlproc does not follow the XML specification in these respects:
</P>
<UL>
<LI>Parameter entities in external DTD subsets are not allowed inside declarations,
only between them.
<LI>No attempt is made to deal with different character sets or encodings.
<LI>The parser does not check for the illegal characters below &#x20;.
<LI>Some internal consistency checks on the DTD (such as that the values of default
attribute values are valid) are not performed.
<LI>NOTATION attributes are not fully supported.
<LI>Single-character entities are not handled correctly.
</UL>
<P>
All other deviations from the specification are unintentional bugs and should be reported
to me via email. Hopefully, xmlproc will be 100% compliant in version 1.00.
</P>
<H2>Using xmlproc</H2>
<P>
xmlproc can be used both as a command-line parser and as a parser API
you can use to write XML applications.
</P>
<H3>The command-line parser</H3>
<P>
The command-line parser is in xpcmd.py for well-formedness parsing and xvcmd.py
for validating parsing. Currently xpcmd.py only accepts one
argument: the URL to the file to parse. (You can use just the file
name instead of a full URL if you like.)
</P>
<P>
xvcmd.py has more options:
</P>
<PRE>
Usage:
xvcmd.py [-c catalog] [-l language] {-o format] [urltodoc]
---Options:
catalog: path to catalog file to use to resolve public identifiers
language: ISO 3166 language code for language to use in error messages
format: Format to output parsed XML. 'e': ESIS, 'x': canonical XML
No data will be outputted if this option is not specified
urltodoc: URL to the document to parse. (You can use plain file names
as well.) Can be omitted if a catalog is specified and contains
a DOCUMENT entry.
Catalog files with URLs that end in '.xml' are assumed to be XCatalogs,
all others are assumed to be SGML Open Catalogs.
If the -c option is not specified the environment variables XMLXCATALOG
and XMLSOCATALOG will be used (in that order).
</PRE>
<H3>Basic usage</H3>
<P>
If you want to make a program that gets data from the parser you
should subclass the Application class in xmlapp.py. This is a sample
xmlproc client:
</P>
<PRE><CODE>
from xml.parsers.xmlproc import xmlproc
class MyApplication(xmlproc.Application):
pass # Add some useful stuff here
p=xmlproc.XMLProcessor() # Make this xmlval.XMLValidator if you want to validate
p.set_application(MyApplication())
p.parse_resource("foo.xml")
</CODE></PRE>
<H3>More detailed information</H3>
<P>
The xmlproc APIs are now <A HREF="xmlproc-doco.html">documented</A>. Note however,
that if possible, you should use the <A HREF="saxlib.html">SAX API</A> instead of xmlprocs native API.
This is because the SAX API will allow you to switch parsers without changing your application
code.
</P>
<H2>Licence?</H2>
<P>
xmlproc is free and you can do as you like with it. If you change it,
please let me know.
</P>
<H2>Getting xmlproc</H2>
<P>
You can download xmlproc <A HREF="xmlproc.zip">here</A>.
</P>
<H2>Changes since last release</H2>
<P>
These are the changes since version 0.51:
</P>
<UL>
<LI>40% speed increase for well-formedness parsing. The improvement for validating
parsing seems to be around 25%. (Depends a lot on DTD size versus document size.)
<LI>Error reporting improved. Better error messages, and support for error messages
in different languages.
<LI>xvcmd.py option interpretation improved (-l option added)
<LI>Numerous minor parse bug fixes
<LI>Some API extensions:
<UL>
<LI>CatalogManager.get_public_ids() method added
<LI>DTD.get_elements() method added
<LI>Parser.set_error_language() method added
<LI>optional bufsize argument added to Parser.parse_resource()
</UL>
</UL>
<H2>Feedback</H2>
<P>
Any and all feedback is welcome, from suggestions for improvements or new features
to bug reports. And I really mean it! If you have some opinions on this program, please
let me hear them.
</P>
<H2>Email notification of new versions</H2>
<P>
To be notified by email when a new version is released, fill out this
form. I guarantee that these email addresses won't be used for any
other purpose, and that you'll receive notification if the service
dies. (If you follow the Python XML-SIG mailing list you won't need to register here
since new releases will also be announced there.)
</P>
<FORM METHOD=POST ACTION="http://www.stud.ifi.uio.no/~larsga/addmail.cgi">
<TABLE>
<TR><TD>Your full name: <TD><INPUT TYPE=TEXT NAME=FULLNAME SIZE=30>
<TR><TD>Your email address: <TD><INPUT TYPE=TEXT NAME=EMAIL SIZE=30>
<TR><TD COLSPAN=2><INPUT TYPE=SUBMIT VALUE="Add to list">
</TABLE>
<INPUT TYPE=hidden NAME=LIST VALUE=xmlproc>
</FORM>
<HR>
<ADDRESS>
14.Sep.98 23:19,
<A HREF="../../../lmg.html">Lars Marius Garshol</A>,
<A HREF="mailto:larsga@ifi.uio.no">larsga@ifi.uio.no</A>. A part of
<A HREF="index.html">Tools for parsing XML with Python</A>.
</ADDRESS>
</BODY>
</HTML>
|