File: README.sgmlop

package info (click to toggle)
qm 1.1.3-1
  • links: PTS
  • area: main
  • in suites: woody
  • size: 8,628 kB
  • ctags: 10,249
  • sloc: python: 41,482; ansic: 20,611; xml: 12,837; sh: 485; makefile: 226
file content (98 lines) | stat: -rw-r--r-- 3,015 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

=============================
The sgmlop accelerator module
=============================

sgmlop contains an optimized SGML/XML parser, designed as an add-on to
the sgmllib/htmllib and xmllib modules shipped with Python 1.5.

using empty callbacks, this driver is about 6 times faster than the
original xmllib implementation.  when using sgmlop directly, it can be
more than 50 times faster.  for more information on benchmarking
sgmlop, see below.

Enjoy /F

fredrik@pythonware.com
http://www.pythonware.com

--------------------------------------------------------------------
Copyright (c) 1998 by Secret Labs AB.

Permission to use, copy, modify, and distribute this software and
its associated documentation for any purpose and without fee is
hereby granted.  This software is provided as is.
--------------------------------------------------------------------


release info
------------

This is the third public release.  Changes include:

- added a starttag attribute parser written in C.  this gives
  a considerable speedup on files using lots of tag attributes

- the callback object can now have an sgmllib/xmllib interface
  (finish/handle) *or* a saxlib interface (see saxhack.py for
  an example).


contents
--------

README		this file

sgmllib.py	a drop-in replacement for the sgmllib.py module
		distributed with Python 1.5

xmllib.py	a drop-in replacement for the xmllib.py module
		distributed with Python 1.5

saxhack.py	illustrates how to implement the SAX DocumentHandler
		interface directly with native sgmlop.  this is over
		30 times faster than a corresponding parser based on
		the original xmllib.

sgmlop.dll	a precompiled version for python 1.5 on win32

sgmlop.c	accelerator source code

sgmlop.mak	makefile for MSVC++ 5.0 generated by opal/pymake.
		make sure to change the directory names before you
		use it on your own machine.

bench*.py	various test files and benchmarks
test*.py


benchmarks
----------

benchmarking the sgmlop parser is non-trivial; if you don't install
any callbacks, it's some 300 times faster than the original xmllib (it
can parse more than 10 MB/s on a fast Pentium II).  this means that in
a typical test, far more time is lost on the Python method call
overhead than on the parsing proper.

my earlier benchmarks used a 'collecting' parser, which stored all
tags and elements in a list.  with that setup, sgmlop is roughly 5
times faster than the original implementation.

the benchxml.py script provided with this release uses empty parsers
instead (that is, all callbacks exists, but they include only a 'pass'
operation), in order to measure the parser and Python call overhead
only.

here's a typical test run (with the time for the original xmllib
implementation set to 1):

parser	     time
--------------------------------------------------------------------
slow xmllib  1.0
fast xmllib  0.156 (6.4x)
sgmlop dummy 0.019 (53.5x)
sgmlop null  0.003 (297.8x)

the null time is obtained by running the parser without any callbacks
installed.