Python/XML HOWTO _________________________________________________________________ _________________________________________________________________ Python/XML HOWTO The Python/XML Special Interest Group xml-sig@python.org (edited by akuchling@acm.org) Abstract: XML is the eXtensible Markup Language, a subset of SGML, intended to allow the creation and processing of application-specific markup languages. Python makes an excellent language for processing XML data. This document is a tutorial for the Python/XML package. It assumes you're already familiar with the structure and terminology of XML. This is a draft document; 'XXX' in the text indicates that something has to be filled in later, or rewritten, or verified, or something. Contents * Contents * 1. Introduction to XML + 1.1 Related Links * 2. Installing the XML Toolkit + 2.1 Related Links * 3. SAX: The Simple API for XML + 3.1 Starting Out + 3.2 Error Handling + 3.3 Searching Element Content + 3.4 Related Links * 4. DOM: The Document Object Model + 4.1 Getting A DOM Tree + 4.2 Manipulating The Tree + 4.3 Walking Over The Entire Tree + 4.4 Building A Document + 4.5 Processing HTML + 4.6 Related Links * 5. xmlarch: Architectural Forms + 5.1 Related Links * 6. Glossary 1. Introduction to XML XML, the eXtensible Markup Language, is a simplified dialect of SGML, the Standardized General Markup Language. XML is intended to be reasonably simple to implement and use, and is already being used for specifying markup languages for various new standards: MathML for expressing mathematical equations, Synchronized Multimedia Integration Language for multimedia presentations, and so forth. SGML and XML represent a document by tagging the document's various components with their function, or meaning. For example, an academic paper contains several parts: it has a title, one or more authors, an abstract, the actual text of the paper, a list of references, and so forth. A markup languge for writing such papers would therefore have tags for indicating what the contents of the abstract are, what the title is, and so forth. This should not be confused with the physical details of how the document is actually printed on paper. The abstract might be printed with narrow margins in a smaller font than the rest of the document, but the markup usually won't be concerned with details such as this; other software will translate from the markup language to a typesetting language such as TEX, and will handle the details. A markup language specified using XML looks a lot like HTML; a document consists of a single element, which contains sub-elements, which can have further sub-elements inside them. Elements are indicated by tags in the text. Tags are always inside angle brackets < >. There are two forms of elements. An element can contain content between opening and closing tags, as in Euryale, which is a name element containing the data "Euryale". This content may be text data, other XML elements, or a mixture of both. Elements can also be empty, in which case they contain nothing, and are represented as a single tag ended with a slash, as in , which is an empty stop element. Unlike HTML, XML element names are case-sensitive; stop and Stop are two different element types. Opening and empty tags can also contain attributes, which specify values associated with an element. For example, text such as Herakles, the name element has a lang attribute which has a value of "greek". This would contrast with Hercules, where the attribute's value is "latin". A given XML language is specified with a Document Type Definition, or DTD. The DTD declares the element names that are allowed, and how elements can be nested inside each other. The DTD also specifies the attributes that can be provided for each element, their default values, and if they can be omitted. For example, to take an example from HTML, the LI element, representing an entry in a list, can only occur inside certain elements which represent lists, such as OL or UL. A validating parser can be given a DTD and a document, and verify whether a given document is legal according to the DTD's rules, or determine that one or more rules have been violated. Applications that process XML can be classed into two types. The simplest class is an application that only handles one particular markup language. For example, a chemistry program may only need to process Chemical Markup Language, but not MathML. This application can therefore be written specifically for a single DTD, and doesn't need to be capable of handling multiple markup languages. This type is simpler to write, and can easily be implemented with the available Python software. The second type of application is less common, and has to be able to handle any markup language you throw at it. An example might be a smart XML editor that helps you to write XML that conforms to a selected DTD; it might do so by not letting you enter an element where it would be illegal, or by suggesting elements that can be placed at the current cursor location. Such an application needs to handle any possible XML-defined markup, and therefore must be able to obtain a data structure embodying the DTD in use. XXX This type of application can't currently be implemented in Python without difficulty (XXX but wait and see if a DTD module is included...) For the full details of XML's syntax, the one definitive source is the XML 1.0 specification, available on the Web at http://www.w3.org/TR/xml-spec.html. However, like all specifications, it's quite formal and isn't intended to be a friendly introduction or a tutorial. The annotated version of the standard, at http://www.xml.com/xml/pub/axml/axmlintro.html, is quite helpful in clarifying the specification's intent. There are also various informal tutorials and books available to introduce you to XML. The rest of this HOWTO will assume that you're familiar with the relevant terminology. Most section will use XML terms such as element and attribute; section 4 on the Document Object Model will assume that you've read the relevant Working Draft, and are familiar with things like Iterators and Nodes. Section 3 does not require that you have experience with the Java SAX implentations. 1.1 Related Links 2. Installing the XML Toolkit Windows users should get the precompiled version at XXX; Mac users will use the corresponding precompiled version at XXX. Linux users may wish to use either the Debian package from XXX, or the RPM from XXX. To compile from source on a Unix platform, simply perform the following steps. 1. Get a copy of the source distribution from http://www.python.org/topics/xml/download.html. Unpack it with the following command. gzip -dc xml-package.tgz | tar -xvf - 2. Run: make -f Makefile.pre.in boot This creates the "Makefile" and "config.c" (producing various other intermediate files in the process), incorporating the values for sys.prefix, sys.exec_prefix and sys.version from the installed Python binary. For this to work, the Python interpreter must be on your path. If this fails, try make -f Makefile.pre.in Makefile VERSION=1.5 installdir= where "" is the value of "installdir" used when installing Python. You may possibly have to also set "exec_installdir" to the value of "exec_prefix". 3. Once the Makefile has been constructed, just run "make" to compile the C modules. There's no test suite yet, but there will be one someday. 4. To install the code, run "make install". The code will be installed under the "site-packages/" directory as a package named "xml/". If you have difficulty installing this software, send a problem report to describing the problem. There are various demonstration programs in the "demo/" directory of the source distribution. You may wish to look at them next to get an impression of what's possible with the XML tools, and as a source of example code. 2.1 Related Links http://www.python.org/topics/xml/ This is the starting point for Python-related XML topics; it is updated to refer to all software, mailing lists, documentation, etc. 3. SAX: The Simple API for XML The Simple API for XML isn't a standard in the formal sense, but an informal specification designed by David Megginson, with input from many people on the xml-dev mailing list. SAX defines an event-driven interface for parsing XML. To use SAX, you must create Python class instances which implement a specified interface, and the parser will then call various methods of those objects. SAX is most suitable for purposes where you want to read through an entire XML document from beginning to end, and perform some computation, such as building a data structure representating a document, or summarizing information in a document (computing an average value of a certain element, for example). It's not very useful if you want to modify the document structure in some complicated way that involves changing how elements are nested, though it could be used if you simply wish to change element contents or attributes. For example, you would not want to re-order chapters in a book using SAX, but you might want to change the contents of any name elements with the attribute lang equal to 'greek' into Greek letters. One advantage of SAX is speed and simplicity. Let's say you've defined a complicated DTD for listing comic books, and you wish to scan through your collection and list everything written by Neil Gaiman. For this specialized task, there's no need to expend effort examining elements for artists and editors and colourists, because they're irrelevant to the search. You can therefore write a class instance which ignores all elements that aren't writer. Another advantage is that you don't have the whole document resident in memory at any one time, which matters if you are processing really huge documents. SAX defines 4 basic interfaces; an SAX-compliant XML parser can be passed any objects that support these interfaces, and will call various methods as data is processed. Your task, therefore, is to implement those interfaces that are relevant to your application. The SAX interfaces are: Interface Purpose DocumentHandler Called for general document events. This interface is the heart of SAX; its methods are called for the start of the document, the start and end of elements, and for the characters of data contained inside elements. DTDHandler Called to handle DTD events required for basic parsing. This means notation declarations (XML spec section 4.7) and unparsed entity declarations (XML spec section 4). EntityResolver Called to resolve references to external entities. If your documents will have no external entity references, you won't need to implement this interface. ErrorHandler Called for error handling. The parser will call methods from this interface to report all warnings and errors. Python doesn't support the concept of interfaces, so the interfaces listed above are implemented as Python classes. The default method implementations are defined to do nothing--the method body is just a Python pass statement-so usually you can simply ignore methods that aren't relevant to your application. The one big exception is the ErrorHandler interface; if you don't provide methods that print a message or otherwise take some action, errors in the XML data will be silently ignored. This is almost certainly not what you want your application to do, so always implement at least the error() and fatalError() methods. xml.sax.saxutils provides an ErrorPrinter class which sends error messages to standard error, and an ErrorRaiser class which raises an exception for any warnings or errors. Pseudo-code for using SAX looks something like this: # Define your specialized handler classes from xml.sax import saxlib class docHandler(saxlib.DocumentHandler): ... # Create an instance of the handler classes dh = docHandler() # Create an XML parser parser = ... # Tell the parser to use your handler instance parser.setDocumentHandler(dh) # Parse the file; your handler's method will get called parser.parseFile(sys.stdin) # Close the parser parser.close() 3.1 Starting Out Following the earlier example, let's consider a simple XML format for storing information about a comic book collection. Here's a sample document for a collection consisting of a single issue: Neil Gaiman Glyn Dillon Charles Vess An XML document must have a single root element; this is the "collection" element. It has one child comic element for each issue; the book's title and number are given as attributes of the comic element, which can have one or more children containing the issue's writer and artists. There may be several artists or writers for a single issue. Let's start off with something simple: a document handler named FindIssue that reports whether a given issue is in the collection. from xml.sax import saxlib class FindIssue(saxlib.HandlerBase): def __init__(self, title, number): self.search_title, self.search_number = title, number The HandlerBase class inherits from all four interfaces: DocumentHandler, DTDHandler, EntityResolver, and ErrorHandler. This is what you should use if you want to use one class for everything. When you want separate classes for each purpose, you can just subclass each interface individually. Neither of the two approaches is always ``better'' than the other; their suitability depends on what you're trying to do, and on what you prefer. Since this class is doing a search, an instance needs to know what to search for. The desired title and issue number are passed to the FindIssue constructor, and stored as part of the instance. Now let's look at the function which actually does all the work. This simple task only requires looking at the attributes of a given element, so only the startElement method is relevant. def startElement(self, name, attrs): # If it's not a comic element, ignore it if name != 'comic': return # Look for the title and number attributes (see text) title = attrs.get('title', None) number = attrs.get('number', None) if title == self.search_title and number == self.search_number: print title, '#'+str(number), 'found' The startElement() method is passed a string giving the name of the element, and an instance containing the element's attributes. The latter implements the AttributeList interface, which includes most of the semantics of Python dictionaries. Therefore, the function looks for comic elements, and compares the specified title and number attributes to the search values. If they match, a message is printed out. startElement() is called for every single element in the document. If you added print 'Starting element:', name to the top of startElement(), you would get the following output. Starting element: collection Starting element: comic Starting element: writer Starting element: penciller Starting element: penciller To actually use the class, we need top-level code that creates instances of a parser and of FindIssue, associates them, and then calls a parser method to process the input. from xml.sax import saxexts if __name__ == '__main__': # Create a parser parser = saxexts.make_parser() # Create the handler dh = FindIssue('Sandman', '62') # Tell the parser to use our handler parser.setDocumentHandler(dh) # Parse the input parser.parseFile(file) # Close the parser parser.close() The ParserFactory class can automate the job of creating parsers. There are already several XML parsers available to Python, and more might be added in future. "xmllib.py" is included with Python 1.5, so it's always available, but it's also not particularly fast. A faster version of "xmllib.py" is included in xml.parsers. The pyexpat module is faster still, so it's obviously a preferred choice if it's available. ParserFactory's make_parser method determines which parsers are available and chooses the fastest one, so you don't have to know what the different parsers are, or how they differ. (You can also tell make_parser to use a given parser, if you want to use a specific one.) Once you've created a parser instance, calling setDocumentHandler tells the parser what to use as the handler. The final statement, parser.close(), is very important, forcing the parser to flush its internal buffers and free any internal objects and memory allocations. Memory leaks can result if you don't call the close() method. If you run the above code with the sample XML document, it'll output Sandman #62 found. 3.2 Error Handling Now, try running the above code with this file as input: &foo; The &foo; entity is unknown, and the comic element isn't closed (if it was empty, there would be a "/" before the closing ">". Why did the file get processed without complaint? Because the default code for the ErrorHandler interface does nothing, and no different implementation was provided, so the errors are silently ignored. The ErrorRaiser class automatically raises an exception for any error; you'll usually set an instance of this class as the error handler. Otherwise, you should provide your own version of the ErrorHandler interface, and at minimum override the error() and fatalError() methods. The minimal implementation for each method can be a single line. The methods in the ErrorHandler interface-warning, error, and fatalError-are all passed a single argument, an exception instance. The exception will always be a subclass of SAXException, and calling str() on it will produce a readable error message explaining the problem. So, to re-implement a variant of ErrorRaiser, simply define two of the three methods to raise the exception they're passed: def error(self, exception): raise exception def fatalError(self, exception): raise exception warning() might simply print the exception to sys.stderr and return without raising the exception. Now the same incorrect XML file will cause a traceback to be printed, with the error message ``xml.sax.saxlib.SAXException: reference to unknown entity''. 3.3 Searching Element Content Let's tackle a slightly more complicated task, printing out all issues written by a certain author. This now requires looking at element content, because the writer's name is inside a writer element: Peter Milligan. The search will be performed using the following algorithm: 1. The startElement method will be more complicated. For comic elements, the handler has to save the title and number, in case this comic is later found to match the search criterion. For writer elements, it sets a inWriterContent flag to true, and sets a writerName attribute to the empty string. 2. Characters outside of XML tags must be processed. When inWriterContent is true, these characters must be added to the writerName string. 3. When the writer element is finished, we've now collected all of the element's content in the writerName attribute, so we can check if the name matches the one we're searching for, and if so, print the information about this comic. We must also set inWriterContent back to false. Here's the first part of the code; this implements step 1. from xml.sax import saxlib import string def normalize_whitespace(text): "Remove redundant whitespace from a string" return string.join( string.split(text), ' ') class FindWriter(saxlib.HandlerBase): def __init__(self, search_name): # Save the name we're looking for self.search_name = normalize_whitespace( search_name ) # Initialize the flag to false self.inWriterContent = 0 def startElement(self, name, attrs): # If it's a comic element, save the title and issue if name == 'comic': title = normalize_whitespace( attrs.get('title', "") ) number = normalize_whitespace( attrs.get('number', "") ) self.this_title = title self.this_number = number # If it's the start of a writer element, set flag elif name == 'writer': self.inWriterContent = 1 self.writerName = "" The startElement() method has been discussed previously. Now we have to look at how the content of elements is processed. The normalize_whitespace() function is important, and you'll probably use it in your own code. XML treats whitespace very flexibly; you can include extra spaces or newlines wherever you like. This means that you must normalize the whitespace before comparing attribute values or element content; otherwise the comparision might produce a wrong result due to the content of two elements having different amounts of whitespace. def characters(self, ch, start, length): if self.inWriterContent: self.writerName = self.writerName + ch[start:start+length] The characters() method is called for characters that aren't inside XML tags. ch is a string of characters, and start is the point in the string where the characters start. length is the length of the character data. You should not assume that start is equal to 0, or that all of ch is the character data. An XML parser could be implemented to read the entire document into memory as a string, and then operate by indexing into the string. This would mean that ch would always contain the entire document, and only the values of start and length would be changed. You also shouldn't assume that all the characters are passed in a single function call. In the example above, there might be only one call to characters() for the string "Peter Milligan", or it might call characters() once for each character. More realistically, if the content contains an entity reference, as in "Wagner & Seagle", the parser might call the method three times; once for "Wagner ", once for "&", represented by the entity reference, and again for " Seagle". For step 2 of FindWriter, characters() only has to check inWriterContent, and if it's true, add the characters to the string being built up. Finally, when the writer element ends, the entire name has been collected, so we can compare it to the name we're searching for. def endElement(self, name): if name == 'writer': self.inWriterContent = 0 self.writerName = normalize_whitespace(self.writerName) if self.search_name == self.writerName: print 'Found:', self.this_title, self.this_number To avoid being confused by differing whitespace, the normalize_whitespace() function is called. This can be done because we know that leading and trailing whitespace are insignificant for this element, in this DTD. End tags can't have attributes on them, so there's no attrs parameter. Empty elements with attributes, such as "", will result in a call to startElement(), followed immediately by a call to endElement(). XXX how are external entities handled? Anything special need to be done for them? 3.4 Related Links http://www.megginson.com/SAX/ The SAX home page. This has the most recent copy of the specification, and lists SAX implementations for various languages and platforms. At the moment it's somewhat Java-centric. 4. DOM: The Document Object Model The Document Object Model specifies a tree-based representation for an XML document. A top-level Document instance is the root of the tree, and has a single child which is the top-level Element instance; this Element has children nodes representing the content and any sub-elements, which may have further children, and so forth. Functions are defined which let you traverse the resulting tree any way you like, access element and attribute values, insert and delete nodes, and convert the tree back into XML. The DOM is useful for modifying XML documents, because you can create a DOM tree, modify it by adding new nodes and moving subtrees around, and then produce a new XML document as output. You can also construct a DOM tree yourself, and convert it to XML; this is often a more flexible way of producing XML output than simply writing ... to a file. While the DOM doesn't require that the entire tree be resident in memory at one time, the Python DOM implementation currently does keep the whole tree in RAM. It's possible to write an implementation that stores most of the tree on disk or in a database, and reads in new sections as they're accessed, but this hasn't been done yet. This means you may not have enough memory to process very large documents as a DOM tree. A SAX handler, on the other hand, can potentially churn through amounts of data far larger than the available RAM. 4.1 Getting A DOM Tree The easiest way to get a DOM tree is to have it built for you. One of the modules in the xml.dom package is sax_builder.py, which provides a SaxBuilder class that will construct a DOM tree from its input. You must create a SaxBuilder instance and a SAX parser, associate the instance as the parser's document handler, and then retrieve the resulting tree. import sys from xml.sax import saxexts from xml.dom.sax_builder import SaxBuilder # Create a SAX parser and a SaxBuilder instance p = saxexts.make_parser() dh = SaxBuilder() p.setDocumentHandler(dh) # Parse the input, and close the parser p.parseFile(sys.stdin) p.close() # Retrieve the DOM tree doc = dh.document The SaxBuilder document handler makes the resulting DOM tree available as its document attribute. 4.2 Manipulating The Tree This HOWTO can't be a complete introduction to the Document Object Model, because there are lots of interfaces and lots of methods. Luckily, the DOM Recommendation is quite a readable document, so I'd recommend that you read it to get a complete picture of the available interfaces; this will only be a partial overview. The Document Object Model represents a XML document as a tree of nodes, represented by an instance of some subclass of the Node class. Some subclasses of Node are Element, Text, and Comment. We'll use a single example document throughout this section. Here's the sample: No description XML bookmarks SIG for XML Processing in Python Converted to a DOM tree, this document could produce the following tree: Element xbel None Text #text ' \012 ' ProcessingInstruction processing 'instruction' Text #text '\012 ' Element desc None Text #text 'No description' Text #text '\012 ' Element folder None Text #text '\012 ' Element title None Text #text 'XML bookmarks' Text #text '\012 ' Element bookmark None Text #text '\012 ' Element title None Text #text 'SIG for XML Processing in Python' Text #text '\012 ' Text #text '\012 ' Text #text '\012' This isn't the only possible tree, because different parsers may differ in how they generate Text nodes; any of the Text nodes in the above tree might be split into multiple nodes.) 4.2.1 The Node class We'll start by considering the basic Node class. All the other DOM nodes -- Document, Element, Text, and so forth -- are subclasses of Node. It's possible to perform many tasks using just the interface provided by Node. XXX table of attributes and methods readonly attribute DOMString nodeName; attribute DOMString nodeValue; // raises(DOMException) on setting // raises(DOMException) on retrieval readonly attribute unsigned short nodeType; readonly attribute Node parentNode; readonly attribute NodeList childNodes; readonly attribute Node firstChild; readonly attribute Node lastChild; readonly attribute Node previousSibling; readonly attribute Node nextSibling; readonly attribute NamedNodeMap attributes; readonly attribute Document ownerDocument; Node insertBefore(in Node newChild, in Node refChild) raises(DOMException); Node replaceChild(in Node newChild, in Node oldChild) raises(DOMException); Node removeChild(in Node oldChild) raises(DOMException); Node appendChild(in Node newChild) raises(DOMException); boolean hasChildNodes(); Node cloneNode(in boolean deep); 4.2.2 Document, Element, and Text nodes The base of the entire tree is the Document node. Its documentElement attribute contains the Element node for the root element. The Document node may have additional children, such as ProcessingInstruction nodes; the complete list of children XXX. 4.3 Walking Over The Entire Tree The xml.dom package also includes various helper classes for common tasks such as walking over trees. The Walker class Introduction to the walker class 4.4 Building A Document Intro to builder 4.5 Processing HTML Intro to HTML builder 4.6 Related Links http://www.w3.org/DOM/ The World Wide Web Consortium's DOM page. http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/ The DOM Level 1 Recommendation. Unlike most standards, this one is actually pretty readable, particularly if you're only interested in the Core XML interfaces, which are the only ones implemented in Python. 5. xmlarch: Architectural Forms The xmlarch module contains an XML architectural forms processor written in Python. It allows you to process XML architectural forms using any parser that uses the SAX interfaces. The module allows you to process several architectures in one parsing pass. Architectural document events for an architecture can even be broadcasted to multiple DocumentHandlers. (e.g. you can have 2 handlers for the RDF architecture, 3 for the XLink architecture and perhaps one for the HyTime architecture.) The architecture processor uses the SAX DocumentHandler interface which means that you can register the architecture handler (ArchDocHandler) with any SAX 1.0 compliant parser. It currently does not process any meta document type definition documents (meta-DTDs). When a DTD parser module is available the code will be modified to use that in order to process meta-DTD information. Please note that validating and well-formed parsers may report different SAX events when parsing documents. The xmlarch module contains six classes: ArchDocHandler, Architecture, ArchParseState, ArchException, AttributeParser and Normalizer. * ArchDocHandler is a subclass of the saxlib.DocumentHandler interface. This is the class used for processing an architectural document. * Architecture contains information about an architecture. * ArchParseState holds information about an architecture's parse state when parsing a document. * AttributeParser parses architecture use declaration PIs (attribute strings). * ArchException holds information about an architectural exception thrown by an ArchDocHandler instance. * Normalizer is a document handler that outputs "normalized" XML. Using the xmlarch module usually means that you have to do the following things: * Import the required SAX modules; saxexts, saxlib, saxutils. * Import the xmlarch module. * Create a SAX compliant parser object. * Create an XML architectures processor handler. * Register this handler with the parser. * Add document handlers for the architectures you want to process. * Register a default document handler with the architecture processor handler. * Parse a document. A simple example Python code: # Import needed modules from xml.sax import saxexts, saxlib, saxutils import sys, xmlarch # Create architecture processor handler arch_handler = xmlarch.ArchDocHandler() # Create parser and register architecture processor with it parser = saxexts.XMLParserFactory.make_parser() parser.setDocumentHandler(arch_handler) # Add an document handler to process the html architecture arch_handler.addArchDocumentHandler("html", xmlarch.Normalizer(sys.stdout)) # Parse (and process) the document parser.parse("simple.xml") A sample XML document: My first architectual document Geir Ove Gronmo, grove@infotek.no This is the first paragraph in this document This is the second paragraph The result:

My first architectual document

Geir Ove Gronmo, grove@infotek.no

This is the second paragraph

See also the files "simple.py" and "simple.xml" in the "demo/arch" directory of the Python/XML distribution. If you try to process the persons architecture in this document instead you get the following output: Geir Ove GrønmoEliot KimberD avid MegginsonLars Marius Garshol A more complex example: Python code: # Import needed modules from xml.sax import saxexts, saxlib, saxutils import sys, xmlarch # create architecture processor handler arch_handler = xmlarch.ArchDocHandler() # Create parser and register architecture processor with it parser = saxexts.XMLParserFactory.make_parser() parser.setDocumentHandler(arch_handler) # Add an document handlers to process the html and biblio architectures arch_handler.addArchDocumentHandler("html", xmlarch.Normalizer(open("html.out", "w"))) arch_handler.addArchDocumentHandler("biblio", saxutils.ESISDocHandler(open("bib lio1.out", "w"))) arch_handler.addArchDocumentHandler("biblio", saxutils.Canonizer(open("biblio2. out", "w"))) # Register a default document handler that just passes through any incoming eve nts arch_handler.setDefaultDocumentHandler(xmlarch.Normalizer(sys.stdout)) # Parse (and process) the document parser.parse("complex.xml") Because this causes a lot of output I've not included the XML document and the results. See instead the files "complex.py" and "complex.xml" in the "demo/xml" directory of the Python/XML distribution and try it yourself. 5.1 Related Links 6. Glossary XML has given rise to a sea of acronyms and terms. This section will list the most significant terms, and sketch their relevance. Many of the following definitions are taken from Lars Marius Garshol's SGML glossary, at http://www.stud.ifi.uio.no/larsga/download/diverse/sgmlglos.html. DOM (Document Object Model) The Document Object Model is intended to be a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. Documents will be represented as tree structures which can be traversed and modified. DTD (Document Type Definition) A Document Type Definition (nearly always called DTD) defines an XML document type, complete with element types, entities and an XML declaration. In other words: a DTD completely describes one particular kind of XML document, such as, for instance, HTML 3.2. SAX (Simple API for XML) SAX is a simple standardized API for XML parsers developed by the contributors to the xml-dev mailing list. The interface is mostly language-independent, as long as the language is object-oriented; the first implementation was written for Java, but a Python implementation is also available. SAX is supported by many XML parsers. XML (eXtensible Markup Language) XML is an SGML application profile specialized for use on the web and has its own standards for linking and stylesheets under development. XSL (eXtensible Style Language) XSL is a proposal for a stylesheet language for XML, which enables browsers to lay out XML documents in an attractive manner, and also provides a way to convert XML documents to HTML. About this document ... Python/XML HOWTO This document was generated using the LaTeX2HTML translator. LaTeX2HTML is Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds, and Copyright © 1997, 1998, Ross Moore, Mathematics Department, Macquarie University, Sydney. The application of LaTeX2HTML to the Python documentation has been heavily tailored by Fred L. Drake, Jr. (fdrake@acm.org). Original navigation icons were contributed by Christopher Petrilli (petrilli@dworkin.amber.org). _________________________________________________________________ Python/XML HOWTO _________________________________________________________________