1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292
|
Whoa, Pilgrim!
Make sure that the standalone version of XML::SAX::Base is
what you want.
The standalone Base distribution is only useful for cases
where you will be generating event streams from non-XML
data (for example, extracting data from an RDBMS and
translating it into XML).
If you plan to plug XML parsers into this
class to generate SAX events then you certainly want the
full XML::SAX distribution. Go on, you'll be much happier
there, trust me.
INSTALLATION:
via tarball:
% tar -zxvf XML-SAX-Base-xxx.tar.gz
% perl Makefile.PL
% make && make test
% make install
via CPAN shell:
% perl -MCPAN -e shell
% install XML::SAX::Base
The rest of this file consists of the fine XML::SAX::Base documentation
contributed by Robin Berjon.
Enjoy!
=head1 NAME
XML::SAX::Base - Base class SAX Drivers and Filters
=head1 SYNOPSIS
package MyFilter;
use XML::SAX::Base;
@ISA = ('XML::SAX::Base');
=head1 DESCRIPTION
This module has a very simple task - to be a base class for PerlSAX
drivers and filters. It's default behaviour is to pass the input directly
to the output unchanged. It can be useful to use this module as a base class
so you don't have to, for example, implement the characters() callback.
The main advantages that it provides are easy dispatching of events the right
way (ie it takes care for you of checking that the handler has implemented
that method, or has defined an AUTOLOAD), and the guarantee that filters
will pass along events that they aren't implementing to handlers downstream
that might nevertheless be interested in them.
=head1 WRITING SAX DRIVERS AND FILTERS
Writing SAX Filters is tremendously easy: all you need to do is
inherit from this module, and define the events you want to handle. A
more detailed explanation can be found at
http://www.xml.com/pub/a/2001/10/10/sax-filters.html.
Writing Drivers is equally simple. The one thing you need to pay
attention to is B<NOT> to call events yourself (this applies to Filters
as well). For instance:
package MyFilter;
use base qw(XML::SAX::Base);
sub start_element {
my $self = shift;
my $data = shift;
# do something
$self->{Handler}->start_element($data); # BAD
}
The above example works well as precisely that: an example. But it has
several faults: 1) it doesn't test to see whether the handler defines
start_element. Perhaps it doesn't want to see that event, in which
case you shouldn't throw it (otherwise it'll die). 2) it doesn't check
ContentHandler and then Handler (ie it doesn't look to see that the
user hasn't requested events on a specific handler, and if not on the
default one), 3) if it did check all that, not only would the code be
cumbersome (see this module's source to get an idea) but it would also
probably have to check for a DocumentHandler (in case this were SAX1)
and for AUTOLOADs potentially defined in all these packages. As you can
tell, that would be fairly painful. Instead of going through that,
simply remember to use code similar to the following instead:
package MyFilter;
use base qw(XML::SAX::Base);
sub start_element {
my $self = shift;
my $data = shift;
# do something to filter
$self->SUPER::start_element($data); # GOOD (and easy) !
}
This way, once you've done your job you hand the ball back to
XML::SAX::Base and it takes care of all those problems for you!
Note that the above example doesn't apply to filters only, drivers
will benefit from the exact same feature.
=head1 METHODS
A number of methods are defined within this class for the purpose of
inheritance. Some probably don't need to be overridden (eg parse_file)
but some clearly should be (eg parse). Options for these methods are
described in the PerlSAX2 specification available from
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/perl-xml/libxml-perl/doc/sax-2.0.html?rev=HEAD&content-type=text/html.
=over 4
=item * parse
The parse method is the main entry point to parsing documents. Internally
the parse method will detect what type of "thing" you are parsing, and
call the appropriate method in your implementation class. Here is the
mapping table of what is in the Source options (see the Perl SAX 2.0
specification for the meaning of these values):
Source Contains parse() calls
=============== =============
CharacterStream (*) _parse_characterstream($stream, $options)
ByteStream _parse_bytestream($stream, $options)
String _parse_string($string, $options)
SystemId _parse_systemid($string, $options)
However note that these methods may not be sensible if your driver class
is not for parsing XML. An example might be a DBI driver that generates
XML/SAX from a database table. If that is the case, you likely want to
write your own parse() method.
Also note that the Source may contain both a PublicId entry, and an
Encoding entry. To get at these, examine $options->{Source} as passed
to your method.
(*) A CharacterStream is a filehandle that does not need any encoding
translation done on it. This is implemented as a regular filehandle
and only works under Perl 5.7.2 or higher using PerlIO. To get a single
character, or number of characters from it, use the perl core read()
function. To get a single byte from it (or number of bytes), you can
use sysread(). The encoding of the stream should be in the Encoding
entry for the Source.
=item * parse_file, parse_uri, parse_string
These are all convenience variations on parse(), and in fact simply
set up the options before calling it. You probably don't need to
override these.
=item * get_options
This is a convenience method to get options in SAX2 style, or more
generically either as hashes or as hashrefs (it returns a hashref).
You will probably want to use this method in your own implementations
of parse() and of new().
=item * get_feature, set_feature
These simply get and set features, and throw the
appropriate exceptions defined in the specification if need be.
If your subclass defines features not defined in this one,
then you should override these methods in such a way that they check for
your features first, and then call the base class's methods
for features not defined by your class. An example would be:
sub get_feature {
my $self = shift;
my $feat = shift;
if (exists $MY_FEATURES{$feat}) {
# handle the feature in various ways
}
else {
return $self->SUPER::get_feature($feat);
}
}
Currently this part is unimplemented.
=back
It would be rather useless to describe all the methods that this
module implements here. They are all the methods supported in SAX1 and
SAX2. In case your memory is a little short, here is a list. The
apparent duplicates are there so that both versions of SAX can be
supported.
=over 4
=item * start_document
=item * end_document
=item * start_element
=item * start_document
=item * end_document
=item * start_element
=item * end_element
=item * characters
=item * processing_instruction
=item * ignorable_whitespace
=item * set_document_locator
=item * start_prefix_mapping
=item * end_prefix_mapping
=item * skipped_entity
=item * start_cdata
=item * end_cdata
=item * comment
=item * entity_reference
=item * notation_decl
=item * unparsed_entity_decl
=item * element_decl
=item * attlist_decl
=item * doctype_decl
=item * xml_decl
=item * entity_decl
=item * attribute_decl
=item * internal_entity_decl
=item * external_entity_decl
=item * resolve_entity
=item * start_dtd
=item * end_dtd
=item * start_entity
=item * end_entity
=item * warning
=item * error
=item * fatal_error
=back
=head1 TODO
- more tests
- conform to the "SAX Filters" and "Java and DOM compatibility"
sections of the SAX2 document.
=head1 AUTHOR
Kip Hampton (khampton@totalcinema.com) did most of the work, after porting
it from XML::Filter::Base.
Robin Berjon (robin@knowscape.com) pitched in with patches to make it
usable as a base for drivers as well as filters, along with other patches.
Matt Sergeant (matt@sergeant.org) wrote the original XML::Filter::Base,
and patched a few things here and there, and imported it into
the XML::SAX distribution.
=head1 SEE ALSO
L<XML::SAX>
=cut
|