1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650
|
package XML::Validator::Schema;
use 5.006;
use strict;
use warnings;
our $VERSION = '1.10';
=head1 NAME
XML::Validator::Schema - validate XML against a subset of W3C XML Schema
=head1 SYNOPSIS
use XML::SAX::ParserFactory;
use XML::Validator::Schema;
#
# create a new validator object, using foo.xsd
#
$validator = XML::Validator::Schema->new(file => 'foo.xsd');
#
# create a SAX parser and assign the validator as a Handler
#
$parser = XML::SAX::ParserFactory->parser(Handler => $validator);
#
# validate foo.xml against foo.xsd
#
eval { $parser->parse_uri('foo.xml') };
die "File failed validation: $@" if $@;
=head1 DESCRIPTION
This module allows you to validate XML documents against a W3C XML
Schema. This module does not implement the full W3C XML Schema
recommendation (http://www.w3.org/XML/Schema), but a useful subset.
See the L<SCHEMA SUPPORT|"SCHEMA SUPPORT"> section below.
B<IMPORTANT NOTE>: To get line and column numbers in the error
messages generated by this module you must install
L<XML::Filter::ExceptionLocator|XML::Filter::ExceptionLocator> and use
L<XML::SAX::ExpatXS|XML::SAX::ExpatXS> as your SAX parser. This
module is much more useful if you can tell where your errors are, so
using these modules is highly recommended!
=head1 INTERFACE
=over 4
=item *
C<< XML::Validator::Schema->new(file => 'file.xsd', cache => 1) >>
Call this method to create a new XML::Validator:Schema object. The
only required option is C<file> which must provide a path to an XML
Schema document.
Setting the optional C<cache> parameter to 1 causes
XML::Validator::Schema to keep a copy of the schema parse tree in
memory. The tree will be reused on subsequent calls with the same
C<file> parameter, as long as the mtime on the schema file hasn't
changed. This can save a lot of time if you're validating many
documents against a single schema.
Since XML::Validator::Schema is a SAX filter you will normally pass
this object to a SAX parser:
$validator = XML::Validator::Schema->new(file => 'foo.xsd');
$parser = XML::SAX::ParserFactory->parser(Handler => $validator);
Then you can proceed to validate files using the parser:
eval { $parser->parse_uri('foo.xml') };
die "File failed validation: $@" if $@;
Setting the optional C<debug> parameter to 1 causes
XML::Validator::Schema to output elements and associated attributes
while parsing and validating the XML document. This provides useful
information on the position where the validation failed (although not
at useful as the line and column numbers included when
XML::Filter::ExceptiionLocator and XML::SAX::ExpatXS are used).
=back
=head1 RATIONALE
I'm writing a piece of software which uses Xerces/C++
( http://xml.apache.org/xerces-c/ ) to validate documents against XML
Schema schemas. This works very well, but I'd like to release my
project to the world. Requiring users to install Xerces is simply too
onerous a requirement; few will have it already and the Xerces
installation system leaves much to be desired.
On CPAN, the only available XML Schema validator is XML::Schema.
Unfortunately, this module isn't ready for use as it lacks the ability
to actually parse the XML Schema document format! I looked into
enhancing XML::Schema but I must admit that I'm not smart enough to
understand the code... One day, when XML::Schema is completed I will
replace this module with a wrapper around it.
This module represents my attempt to support enough XML Schema syntax
to be useful without attempting to tackle the full standard. I'm sure
this will mean that it can't be used in all situations, but hopefully
that won't prevent it from being used at all.
=head1 SCHEMA SUPPORT
=head2 Supported Elements
The following elements are supported by the XML Schema parser. If you
don't see an element or an attribute here then you definitely can't
use it in a schema document.
You can expect that the schema document parser will produce an error
if you include elements which are not supported. However, unsupported
attributes I<may> be silently ignored. This should not be
misconstrued as a feature and will eventually be fixed.
All of these elements must be in the http://www.w3.org/2001/XMLSchema
namespace, either using a default namespace or a prefix.
<schema>
Supported attributes: targetNamespace, elementFormDefault,
attributeFormDefault
Notes: the only supported values for elementFormDefault and
attributeFormDefault are "unqualified." As such, targetNamespace
is essentially ignored.
<element name="foo">
Supported attributes: name, type, minOccurs, maxOccurs, ref
<attribute>
Supported attributes: name, type, use, ref
<sequence>
Supported attributes: minOccurs, maxOccurs
<choice>
Supported attributes: minOccurs, maxOccurs
<all>
Supported attributes: minOccurs, maxOccurs
<complexType>
Supported attributes: name
<simpleContent>
The only supported sub-element is <extension>.
<extension>
Supported attributes: base
Notes: only allowed inside <simpleContent>
<simpleType>
Supported attributes: name
<restriction>
Supported attributes: base
Notes: only allowed inside <simpleType>
<whiteSpace>
Supported attributes: value
<pattern>
Supported attributes: value
<enumeration>
Supported attributes: value
<length>
Supported attributes: value
<minLength>
Supported attributes: value
<maxLength>
Supported attributes: value
<minInclusive>
Supported attributes: value
<minExclusive>
Supported attributes: value
<maxInclusive>
Supported attributes: value
<maxExclusive>
Supported attributes: value
<totalDigits>
Supported attributes: value
<fractionDigits>
Supported attributes: value
<annotation>
<documentation>
Supported attributes: name
<union>
Supported attributes: MemberTypes
=head2 Simple Type Support
Supported built-in types are:
string
normalizedString
token
NMTOKEN
Notes: the spec says NMTOKEN should only be used for attributes,
but this rule is not enforced.
boolean
decimal
Notes: the enumeration facet is not supported on decimal or any
types derived from decimal.
integer
int
short
byte
unsignedInt
unsignedShort
unsignedByte
positiveInteger
negativeInteger
nonPositiveInteger
nonNegativeInteger
dateTime
Notes: Although dateTime correctly validates the lexical format it does not
offer comparison facets (min*, max*, enumeration).
double
Notes: Although double correctly validates the lexical format it
does not offer comparison facets (min*, max*, enumeration). Also,
minimum and maximum constraints as described in the spec are not
checked.
float
Notes: The restrictions on double support apply to float as well.
duration
time
date
gYearMonth
gYear
gMonthDay
gDay
gMonth
hexBinary
base64Binary
anyURI
QName
NOTATION
=head2 Miscellaneous Details
Other known devations from the specification:
=over
=item *
Patterns specified in pattern simpleType restrictions are Perl regexes
with none of the XML Schema extensions available.
=item *
No effort is made to prevent the declaration of facets which "loosen"
the restrictions on a type. This is a bug and will be fixed in a
future release. Until then types which attempt to loosen restrictions
on their base class will behave unpredictably.
=item *
No attempt has been made to exclude content models which are
ambiguous, as the spec demands. In fact, I don't see any compelling
reason to do so, aside from strict compliance to the spec. The
content model implementaton uses regular expressions which should be
able to handle loads of ambiguity without significant performance
problems.
=item *
Marking a facet "fixed" has no effect.
=item *
SimpleTypes must come after their base types in the schema body. For
example, this is ok:
<xs:simpleType name="foo">
<xs:restriction base="xs:string">
<xs:minLength value="10"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="foo_bar">
<xs:restriction base="foo">
<xs:length value="10"/>
</xs:restriction>
</xs:simpleType>
But this is not:
<xs:simpleType name="foo_bar">
<xs:restriction base="foo">
<xs:length value="10"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="foo">
<xs:restriction base="xs:string">
<xs:minLength value="10"/>
</xs:restriction>
</xs:simpleType>
=back
=head1 CAVEATS
Here are a few gotchas that you should know about:
=over
=item *
No Unicode testing has been performed, although it seems possible that
the module will handle Unicode data correctly.
=item *
Namespace processing is almost entirely missing from the module.
=item *
Little work has been done to ensure that invalid schemas fail
gracefully. Until that is done you may want to develop your schemas
using a more mature validator (like Xerces or XML Spy) before using
them with this module.
=back
=head1 BUGS
Please use C<rt.cpan.org> to report bugs in this module:
http://rt.cpan.org
Please note that I will delete bugs which merely point out the lack of
support for a particular feature of XML Schema. Those are feature
requests, and believe me, I know we've got a long way to go.
=head1 SUPPORT
This module is supported on the perl-xml mailing-list. Please join
the list if you have questions, suggestions or patches:
http://listserv.activestate.com/mailman/listinfo/perl-xml
=head1 CVS
If you'd like to help develop XML::Validator::Schema you'll want to
check out a copy of the CVS tree:
http://sourceforge.net/cvs/?group_id=89764
=head1 CREDITS
The following people have contributed bug reports, test cases and/or
code:
Russell B Cecala (aka Plankton)
David Wheeler
Toby Long-Leather
Mathieu
h.bridge@fasol.fujitsu.com
michael.jacob@schering.de
josef@clubphoto.com
adamk@ali.as
Jean Flouret
=head1 AUTHOR
Sam Tregar <sam@tregar.com>
=head1 COPYRIGHT AND LICENSE
Copyright (C) 2002-2003 Sam Tregar
This program is free software; you can redistribute it and/or modify
it under the same terms as Perl 5 itself.
=head1 A NOTE ON DEVELOPMENT METHODOLOGY
This module isn't just an XML Schema validator, it's also a test of
the Test Driven Development methodology. I've been writing tests
while I develop code for a while now, but TDD goes further by
requiring tests to be written I<before> code. One consequence of this
is that the module code may seem naive; it really is I<just enough>
code to pass the current test suite. If I'm doing it right then there
shouldn't be a single line of code that isn't directly related to
passing a test. As I add functionality (by way of writing tests) I'll
refactor the code a great deal, but I won't add code only to support
future development.
For more information I recommend "Test Driven Development: By Example"
by Kent Beck.
=head1 SEE ALSO
L<XML::Schema>
http://www.w3.org/XML/Schema
http://xml.apache.org/xerces-c/
=cut
use base qw(XML::SAX::Base); # this module is a SAX filter
use Carp qw(croak); # make some noise
use XML::SAX::Exception; # for real
use XML::Filter::BufferText; # keep text together
use XML::SAX::ParserFactory; # needed to parse the schema documents
use XML::Validator::Schema::Parser;
use XML::Validator::Schema::ElementNode;
use XML::Validator::Schema::ElementRefNode;
use XML::Validator::Schema::RootNode;
use XML::Validator::Schema::ComplexTypeNode;
use XML::Validator::Schema::SimpleTypeNode;
use XML::Validator::Schema::SimpleType;
use XML::Validator::Schema::TypeLibrary;
use XML::Validator::Schema::ElementLibrary;
use XML::Validator::Schema::AttributeLibrary;
use XML::Validator::Schema::ModelNode;
use XML::Validator::Schema::Attribute;
use XML::Validator::Schema::AttributeNode;
use XML::Validator::Schema::Util qw(_err);
our %CACHE;
our $DEBUG = 0;
# create a new validation filter
sub new {
my $pkg = shift;
my $opt = (@_ == 1) ? { %{shift()} } : {@_};
my $self = bless $opt, $pkg;
$self->{debug} = exists $self->{debug} ? $self->{debug} : $DEBUG;
# check options
croak("Missing required 'file' option.") unless $self->{file};
# if caching is on, check the cache
if ($self->{cache} and
exists $CACHE{$self->{file}} and
$CACHE{$self->{file}}{mtime} == (stat($self->{file}))[9]) {
# load cached object
$self->{node_stack} = $CACHE{$self->{file}}{node_stack};
# might have nodes on it leftover from failed validation,
# truncate to root
$#{$self->{node_stack}} = 0;
# clean up any lingering state from the last use of this tree
$self->{node_stack}[0]->walk_down(
{ callback => sub { shift->clear_memory; 1; } });
} else {
# create an empty element stack
$self->{node_stack} = [];
# load the schema, filling in the element tree
$self->parse_schema();
# store to cache
if ($self->{cache}) {
$CACHE{$self->{file}}{mtime} = (stat($self->{file}))[9];
$CACHE{$self->{file}}{node_stack} = $self->{node_stack};
}
}
# buffer text for convenience
my $bf = XML::Filter::BufferText->new( Handler => $self );
# add line-numbers and column-numbers to errors if
# XML::Filter::ExceptionLocator is available
eval { require XML::Filter::ExceptionLocator; };
if ($@) {
# no luck, just return the buffer-text handler
return $bf;
} else {
# create a new exception-locator and return it
my $el = XML::Filter::ExceptionLocator->new( Handler => $bf );
return $el;
}
}
# parse an XML schema document, filling $self->{node_stack}
sub parse_schema {
my $self = shift;
_err("Specified schema file '$self->{file}' does not exist.")
unless -e $self->{file};
# initialize the schema parser
my $parser = XML::Validator::Schema::Parser->new(schema => $self);
# add line-numbers and column-numbers to errors if
# XML::Filter::ExceptionLocator is available
eval { require XML::Filter::ExceptionLocator; };
unless ($@) {
# create a new exception-locator and set it up above the parser
$parser = XML::Filter::ExceptionLocator->new( Handler => $parser );
}
# parse the schema file
$parser = XML::SAX::ParserFactory->parser(Handler => $parser);
$parser->parse_uri($self->{file});
}
# check element start
sub start_element {
my ($self, $data) = @_;
my $name = $data->{LocalName};
my $node_stack = $self->{node_stack};
my $element = $node_stack->[-1];
print STDERR " " x scalar(@{$node_stack}), " o ", $name, "\n"
if $self->{debug};
# check that this alright
my $daughter = $element->check_daughter($name);
# check attributes
$daughter->check_attributes($data->{Attributes});
if ($self->{debug}) {
foreach my $att ( keys %{ $data->{Attributes} } ) {
print STDERR " " x (scalar(@{$node_stack}) + 2), " - ",
$data->{Attributes}->{$att}->{Name}, " = ",
$data->{Attributes}->{$att}->{Value}, "\n"
}
}
# enter daughter node
push(@$node_stack, $daughter);
$self->SUPER::start_element($data);
}
# check character content
sub characters {
my ($self, $data) = @_;
my $element = $self->{node_stack}[-1];
$element->check_contents($data->{Data});
$element->{checked_content} = 1;
$self->SUPER::characters($data);
}
# finish element checking
sub end_element {
my ($self, $data) = @_;
my $node_stack = $self->{node_stack};
my $element = $node_stack->[-1];
# check empty content if haven't checked yet
$element->check_contents('')
unless $element->{checked_content};
$element->{checked_content} = 0;
# final model check
$element->{model}->check_final_model($data->{LocalName},
$element->{memory} || [])
if $element->{model};
# done
$element->clear_memory();
pop(@$node_stack);
$self->SUPER::end_element($data);
}
1;
|