File: xmlproc-catalog-doco.html

package info (click to toggle)
qm 1.1.3-1
  • links: PTS
  • area: main
  • in suites: woody
  • size: 8,628 kB
  • ctags: 10,249
  • sloc: python: 41,482; ansic: 20,611; xml: 12,837; sh: 485; makefile: 226
file content (401 lines) | stat: -rw-r--r-- 13,462 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
  <TITLE>Documentation: Catalog support in xmlproc</TITLE>
  <META NAME="Author"      CONTENT="Lars Marius Garshol">
  <META NAME="Generator"   CONTENT="Homemade">
  <META NAME="Description" CONTENT="This page documents the catalog support available in
	xmlproc.">
  <META NAME="Keywords"    CONTENT="XML, Python, parser, SGML Open catalogs, catalog files,
	XCatalog">
  <LINK REL=stylesheet HREF="standard.css" TYPE="text/css" MEDIA=screen>
</HEAD>

<BODY>

<H1>Documentation: Catalog support in xmlproc</H1>

<H2>Contents:</H2>

<P>
This page consists of the following sections:
</P>

<UL>
  <LI><A HREF="#whatare">What are catalog files?</A>
  <LI><A HREF="#catsupp">Catalog file support in xmlproc</A>
  <LI><A HREF="#catuse">Using the catalog file parser</A>
  <LI><A HREF="#xcat">Support for XCatalog 0.1</A>
</UL>

<H2><A NAME="whatare">What are catalog files?</A></H2>

<H3>What they do</H3>

<P>
Catalog files are a means of telling a parser how to map public
identifiers to system identifiers. One simple example of this would be
to use a catalog file to tell an SGML parser that the DTD with the
public identifier "-//W3C//DTD HTML 4.0 Transitional//EN" can be found
at the location "file:///usr/pub/sgml/dtds/html40.dtd".
</P>

<P>
In other words: a public identifier is a well-known name for something
that is not site-dependent, while a system identifier tells
applications how to find this thing on the local system. A catalog
file can be used to find out where to find something at a particular
site given its public identifier.
</P>

<P>
In addition to this, catalog files can affect the parsing of documents
in other ways as well.
</P>

<H3>Where they come from?</H3>

<P>
Catalog files come from the SGML community, but are not part of the
SGML standard itself. The catalog file format and semantics are
defined in <A
HREF="http://www.sil.org/sgml/sotr9401-a2.html">SGML Open
Technical Resolution TR9401:1997</A>, and have since been implemented
in the SP SGML parser, the DXP XML parser and xmlproc.
</P>

<P>
The format used by SP (which extends the original format somewhat) has
become the de facto standard for catalog files. xmlproc supports a
subset of this format.
</P>

<H3>The catalog file format</H3>

<P>
Catalog files consist of entries: which start with a keyword, followed
by arguments separated by whitespace. Arguments which contain spaces
must be quoted. Entries are separated by whitespace and comments
(which start with "--" and end with "--") can appear anywhere whitespace can appear.
</P>

<P>
An example catalog file:
</P>

<PRE>
-- DSSSL --

PUBLIC "-//James Clark//DTD DSSSL Flow Object Tree//EN" "c:\programfiler\apps\jade\fot.dtd"
PUBLIC "ISO/IEC 10179:1996//DTD DSSSL Architecture//EN" "c:\programfiler\apps\jade\dsssl.dtd"
PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" "c:\programfiler\apps\jade\style-sheet.dtd"

-- HTML 2 --

PUBLIC  "-//IETF//DTD HTML//EN"                           html2.dtd
PUBLIC  "-//IETF//DTD HTML 2.0//EN"                       html2.dtd
</PRE>

<H2><A NAME="catsupp">Catalog file support in xmlproc</A></H2>

<H3>Level of support</H3>

<P>
The support for catalog files has not been thoroughly tested and
xmlproc probably will not handle the cases where there are conflicts
between entries correctly. This part of xmlproc should be considered
to be of demonstration quality.
</P>

<P>
xmlproc supports the following keywords:
</P>

<DL>
  <DT><PRE>PUBLIC <I>pubid sysid</I></PRE>
  <DD>Specifies that the pubid should be mapped to sysid whenever it
  occurs.
  <DT><PRE>SYSTEM <I>sysid1 sysid2</I></PRE>
  <DD>Specifies that whenever sysid1 appears as the explicit system
  identifier sysid2 should be used instead.
  <DT><PRE>DOCUMENT <I>sysid</I></PRE>
  <DD>Specifies that if no document entity is supplied to the parser,
  this document should be parsed.
  <DT><PRE>CATALOG <I>sysid</I></PRE>
  <DD>Includes the catalog file at sysid.
  <DT><PRE>BASE <I>sysid</I></PRE>
  <DD>Uses sysid as the base system identifier to resolve relative
  system identifiers against below this point.
  <DT><PRE>DELEGATE <I>pubid-prefix sysid</I></PRE>
  <DD>Resolves public identifiers that begin with pubid-prefix with
  the catalog file at sysid.
</DL>

<H3>How to make xmlproc use a catalog file</H3>

<P>
This is easily done. Here is some code that parses the catalog file
referred to by the XMLSOCATALOG environment variable:
</P>

<PRE><CODE>
import os
from xml.parsers.xmlproc import xmlval,catalog

p=xmlval.XMLValidator()

cat=catalog.xmlproc_catalog(os.environ["XMLSOCATALOG"],\
                            catalog.CatParserFactory())
p.set_pubid_resolver(cat)
p.parse_resource(sysid)
</CODE></PRE>

<H2><A NAME="catuse">Using the catalog file parser</A></H2>

<P>
The xmlproc implementation contains both a general catalog file parser
and a general catalog file implementation, to which the xmlproc
PubIdResolver is just one of many possible clients. This means that
you can use this catalog file parser in your own applications.
</P>

<P>
If you just want to make xmlproc use a catalog file you should look at
the <A HREF="#xmlproc_catalog">xmlproc_catalog</A> class.
</P>

<P>
The catalog module has the following classes and interfaces:
</P>

<UL>
  <LI><A HREF="#CatalogParser">CatalogParser</A>
  <LI><A HREF="#CatalogApp">CatalogApp</A>
  <LI><A HREF="#CatalogManager">CatalogManager</A>
  <LI><A HREF="#CatParserFactory">CatParserFactory</A>
  <LI><A HREF="#xmlproc_catalog">xmlproc_catalog</A>
  <LI><A HREF="#SAX_catalog">SAX_catalog</A>
</UL>

<H3><A NAME="CatalogParser">The CatalogParser class</A></H3>

<P>
The CatalogParser class is mainly useful if you want to develop your
own catalog file support completely from scratch. It only parses the
file and passes information to you, without doing anything with it.
If you just want to query the parsed information you should probably
look at the catalog manager below.
</P>

<P>
The CatalogParser class has these methods:
</P>

<DL>
  <DT><CODE>def __init__(self,error_lang=None):</CODE>
  <DD>This creates a parser ready for parsing. The error language can be set if desired, and accepts
             the same values as xmlproc itself.

  <DT><CODE>def set_application(self,app):</CODE>
  <DD>This tells the parser where to send parse events. The
      application object must conform to the
      <A HREF="#CatalogApp">CatalogApp</A> interface.
      
  <DT><CODE>def set_error_handler(self,err):</CODE>
  <DD>This tells the parser where to send error events. The
      error handler must conform to the usual ErrorHandler interface.

  <DT><CODE>def parse_resource(self,sysid):</CODE>
  <DD>Parses the catalog file with the given system identifier,
      passing error and data events.
</DL>

<H3><A NAME="CatalogApp">The CatalogApp interface</A></H3>

<P>
This is the definition of the interface used by applications that wish
to receive catalog file parsing events. No attempt is made to
interpret the entries or their parameters in any way. These methods
are required:
</P>

<DL>
  <DT><CODE>def handle_public(self,pubid,sysid):</CODE>
  <DD>This notifies the application of a PUBLIC entry in the catalog file.

  <DT><CODE>def handle_delegate(self,prefix,sysid):</CODE>
  <DD>This notifies the application of a DELEGATE entry in the catalog
  file.

  <DT><CODE>def handle_document(self,sysid):</CODE>
  <DD>This notifies the application of a DOCUMENT entry in the catalog
  file.

  <DT><CODE>def handle_system(self,sysid1,sysid2):</CODE>
  <DD>This notifies the application of a SYSTEM entry in the catalog
  file. 

  <DT><CODE>def handle_base(self,sysid):</CODE>
  <DD>This notifies the application of a BASE entry in the catalog
  file. 

  <DT><CODE>def handle_catalog(self,sysid):</CODE>
  <DD>This notifies the application of a CATALOG entry in the catalog
  file. 
</DL>

<H3><A NAME="CatalogManager">The CatalogManager class</A></H3>

<P>
The CatalogManager is a central class in the catalog implementation.
Users that want to work with catalog files should instantiate a
CatalogManager and let it parse and keep track of the catalog
information for them, and only query it when information is needed.
</P>

<P>
The CatalogManager class has these methods:
</P>

<DL>
  <DT><CODE>def __init__(self):</CODE>
  <DD>This creates an empty CatalogManager, ready for use.

  <DT><CODE>def set_error_handler(self,err):</CODE>
  <DD>This tells the CatalogManager where to send error messages from
  parsing.

  <DT><CODE>def set_parser_factory(self,parser_fact):</CODE>
  <DD>This gives the CatalogManager an object it can use to create
  catalog parsers. The parser_fact object must conform to the
  <A HREF="#CatParserFactory">CatParserFactory</A> interface.

  <DT><CODE>def parse_catalog(self,sysid):</CODE>
  <DD>Makes the CatalogManager parse the given catalog file and store
  the information in it internally.

  <DT><CODE>def report(self,out=sys.stdout):</CODE>
  <DD>Makes the CatalogManager write a badly formatted report of its
  internal information to the out file object.

  <DT><CODE>def get_document_sysid(self):</CODE>
  <DD>Returns the contents of the DOCUMENT entry in the catalog file.

  <DT><CODE>def remap_sysid(self,sysid):</CODE>
  <DD>Returns the system identifier after remapping it according to
  the SYSTEM entries in the catalog file. (This should only be used
  for system identifiers occurred alone, without an accompanying
  public identifier.)

  <DT><CODE>def resolve_sysid(self,pubid,sysid):</CODE>
  <DD>Returns the correct system identifier for this combination of
  system and public identifiers. If there was no public identifier the
  pubid parameter should be None.

   <DT><CODE>def get_public_ids(self):</CODE>
   <DD>Returns a list of all declared public indentifiers in this catalog
              and delegates.
</DL>

<H3><A NAME="CatParserFactory">The CatParserFactory interface</A></H3>

<P>
This class is used by the CatalogManager to create catalog parsers for
parsing catalog files. It is mainly interesting if you want to control
which parser the CatalogManager uses for parsing its catalog files,
such as if you want to use your own subclass of CatalogParser instead
of the usual class.
</P>

<P>
The CatParserFactory has these methods:
</P>

<DL>
  <DT><CODE>def make_parser(self,sysid):</CODE>
  <DD>This method must return an object conforming to the
  CatalogParser interface.
</DL>

<H3><A NAME="xmlproc_catalog">The xmlproc_catalog class</A></H3>

<P>
This class is a client to the CatalogManager that conforms to the
PubIdResolver interface, and so can be used to make xmlproc use a
catalog file. The xmlproc_catalog class has these methods:
</P>

<DL>
  <DT><CODE>def __init__(self,sysid,pf,error_handler=None):</CODE>
  <DD><P>Creates an xmlproc_catalog object, ready to be given to the
  xmlproc parser with the set_pubid_resolver method. The sysid
  parameter holds the system identifier of the catalog file to use and
  the pf parameter holds the
  <A HREF="#CatParserFactory">CatParserFactory</A> used to create
  catalog file parsers.

  <P>The error_handler can be a reference to an error handler which
  can receive notification of errors.
</DL>

<H3><A NAME="SAX_catalog">The SAX_catalog class</A></H3>

<P>
This class is a client to the CatalogManager that conforms to the SAX
EntityResolver interface, and so can be used to make a SAX use a
catalog file for resolving entity public identifiers. The
SAX_catalog class has these methods:
</P>

<DL>
  <DT><CODE>def __init__(self,sysid,pf):</CODE>
  <DD>Creates an SAX_catalog object, ready to be given to the
  SAX parser with the setEntityResolver method. The sysid
  parameter holds the system identifier of the catalog file to use and
  the pf parameter holds the
  <A HREF="#CatParserFactory">CatParserFactory</A> used to create
  catalog file parsers.
</DL>

<H2><A NAME="xcat">Support for XCatalog 0.1</A></H2>

<P>
Just before xmlproc 0.50 was released John Cowan proposed the XCatalog
0.1 standard for catalog files in XML format. This proposal has an XML
DTD which can be used to mark up catalog files instead of the special
syntax used by SGML Open Catalogs. The XCatalog DTD only has a subset
of the catalog file functionality implemented by xmlproc for SGML Open
Catalogs.
</P>

<P>
The xmlproc XCatalog implementation is found in the xcatalog module
and consists of three classes:
</P>

<UL>
  <LI>XCatalogParser: a CatalogParser that parses XCatalogs instead of
  SGML Open Catalogs.
  <LI>XCatParserFactory: a CatParserFactory that always creates
  XCatalogParser objects.
  <LI>FancyParserFactory: a CatParserFactory that creates
  XCatalogParsers for catalog files with system identifiers ending in
  ".xml", and CatalogParsers for all other catalog files.
</UL>

<P>
The support for XCatalog should be considered an experimental feature.
</P>


<HR>

<ADDRESS>
Last update 2000-05-11 14:20, by 
<a href="mailto:larsga@garshol.priv.no">Lars Marius Garshol</a>.
</ADDRESS>

</DIV>

</BODY>
</HTML>