1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404
|
=encoding utf8
=head1 NAME
XML::Compile::Translate::Reader - translate XML to HASH
=head1 INHERITANCE
XML::Compile::Translate::Reader
is a XML::Compile::Translate
=head1 SYNOPSIS
my $schema = XML::Compile::Schema->new(...);
my $code = $schema->compile(READER => ...);
=head1 DESCRIPTION
The translator understands schemas, but does not encode that into
actions. This module implements those actions to translate from XML
into a (nested) Perl HASH structure.
Extends L<"DESCRIPTION" in XML::Compile::Translate|XML::Compile::Translate/"DESCRIPTION">.
=head1 METHODS
Extends L<"METHODS" in XML::Compile::Translate|XML::Compile::Translate/"METHODS">.
=head1 DETAILS
Extends L<"DETAILS" in XML::Compile::Translate|XML::Compile::Translate/"DETAILS">.
=head2 Translator options
Extends L<"Translator options" in XML::Compile::Translate|XML::Compile::Translate/"Translator options">.
=head2 Processing Wildcards
If you want to collect information from the XML structure, which is
permitted by C<any> and C<anyAttribute> specifications in the schema,
you have to implement that yourself. The problem is C<XML::Compile>
has less knowledge than you about the possible data.
=head3 option any_attribute
By default, the C<anyAttribute> specification is ignored. When C<TAKE_ALL>
is given, all attributes which are fulfilling the name-space requirement
added to the returned data-structure. As key, the absolute element name
will be used, with as value the related unparsed XML element.
In the current implementation, if an explicit attribute is also
covered by the name-spaces permitted by the anyAttribute definition,
then it will also appear in that list (and hence the handler will
be called as well).
Use L<XML::Compile::Schema::compile(any_attribute)|XML::Compile::Schema/"Compilers"> to write your
own handler, to influence the behavior. The handler will be called for
each attribute, and you must return list of pairs of derived information.
When the returned is empty, the attribute data is lost. The value may
be a complex structure.
B<. Example: anyAttribute in a READER>
Say your schema looks like this:
<schema targetNamespace="http://mine"
xmlns:me="http://mine" ...>
<element name="el">
<complexType>
<attribute name="a" type="xs:int" />
<anyAttribute namespace="##targetNamespace"
processContents="lax">
</complexType>
</element>
<simpleType name="non-empty">
<restriction base="NCName" />
</simpleType>
</schema>
Then, in an application, you write:
my $r = $schema->compile
( READER => pack_type('http://mine', 'el')
, anyAttribute => 'ALL'
);
# or lazy: READER => '{http://mine}el'
my $h = $r->( <<'__XML' );
<el xmlns:me="http://mine">
<a>42</a>
<b type="me:non-empty">
everything
</b>
</el>
__XML
use Data::Dumper 'Dumper';
print Dumper $h;
__XML__
The output is something like
$VAR1 =
{ a => 42
, '{http://mine}a' => ... # XML::LibXML::Node with <a>42</a>
, '{http://mine}b' => ... # XML::LibXML::Node with <b>everything</b>
};
You can improve the reader with a callback. When you know that the
extra attribute is always of type C<non-empty>, then you can do
my $read = $schema->compile
( READER => '{http://mine}el'
, anyAttribute => \&filter
);
my $anyAttRead = $schema->compile
( READER => '{http://mine}non-empty'
);
sub filter($$$$)
{ my ($fqn, $xml, $path, $translator) = @_;
return () if $fqn ne '{http://mine}b';
(b => $anyAttRead->($xml));
}
my $h = $r->( see above );
print Dumper $h;
Which will result in
$VAR1 =
{ a => 42
, b => 'everything'
};
The filter will be called twice, but return nothing in the first
case. You can implement any kind of complex processing in the filter.
=head3 option any_element
By default, the C<any> definition in a schema will ignore all elements
from the container which are not used. Also in this case C<TAKE_ALL>
is required to produce C<any> results. C<SKIP_ALL> will ignore all
results, although this are being processed for validation needs.
=head3 option any_type CODE
By default, the elements which have type "xsd:anyType" will return
an XML::LibXML::Element when there are sub-elements. Otherwise,
it will return the textual content.
If you pass your own CODE reference, you can change this behavior. It
will get called with the path, the node, and the default handler. Be
awayre the $node may actually be a string already.
$schema->compile(READER => ..., any_type => \&handle_any_type);
sub handle_any_type($$$)
{ my ($path, $node, $handler) = @_;
ref $node or return $node;
$node;
}
=head2 Mixed elements
[available since 0.86]
ComplexType and ComplexContent components can be declared with the
C<<mixed="true">> attribute. This implies that text is not limited
to the content of containers, but may also be used inbetween elements.
Usually, you will only find ignorable white-space between elements.
In this example, the C<a> container is marked to be mixed:
<a id="5"> before <b>2</b> after </a>
Often the "mixed" option is bending one of both ways: either the element
is needed as text, or the element should be parsed and the text ignored.
The reader has various options to avoid the need of processing raw
XML::LibXML nodes.
[1.00]
When the return is a HASH, that HASH will also contain the
C<_MIXED_ELEMENT_MODE> key, to help people understand what
happens. This is not possible for all modes, only for some.
With L<XML::Compile::Schema::compile(mixed_elements)|XML::Compile::Schema/"Compilers"> set to
=over 4
=item ATTRIBUTES (the default)
a HASH is returned, the attributes are processed. The node is found
as XML::LibXML::Element with the key '_'. Above example will
produce
$r = { id => 5, _ => $xmlnode };
=item TEXTUAL
Like the previous, but now the textual representation of the content is
returned with key '_'. Above example will produce
$r = { id => 5, _ => ' before 2 after '};
=item STRUCTURAL
will remove all mixed-in text, and treat the element as normal element.
The example will be transformed into
$r = { id => 5, b => 2 };
=item XML_NODE
return the XML::LibXML::Node itself. The example:
$r = $xmlnode;
=item XML_STRING
return the mixed node as XML string, just as in the source. Be warned
that it is rather expensive: the string was parsed and then stringified
again, which is costly for large nodes. Result:
$r = '<a id="5"> before <b>2</b> after </a>';
=item CODE reference
the reference is called with the XML::LibXML::Node as first argument.
When a value is returned (even undef), then the right tag with the value
will be included in the translators result. When an empty list is
returned by the code reference, then nothing is returned (which may
result in an error if the element is required according to the schema)
=back
When some of your mixed elements need different behavior from other
elements, then you have to go play with the normal hooks in specific
cases.
=head2 Schema hooks
=head3 hooks executed before the XML is being processed
The C<before> hooks receives an XML::LibXML::Node object and
the path string. It must return a new (or same) XML node which
will be used from then on. You probably can best modify a node
clone, not the original as provided by the user. When C<undef>
is returned, the whole node will disappear.
This hook offers a predefined C<PRINT_PATH>.
B<. Example: to trace the paths>
$schema->addHook
( action => 'READER'
, path => qr/./
, before => 'PRINT_PATH'
);
=head3 hooks executed as replacement
Your C<replace> hook should return a list of key-value pairs. To produce
it, it will get the XML::LibXML::Element, the translator settings as
HASH, the path, and the localname.
This hook has a predefined C<SKIP>, which will not process the
found element, but simply return the string "SKIPPED" as value.
This way, a whole tree of unneeded translations can be avoided.
[1.51] The predefined hook C<XML_NODE> will not attempt to parse the
selected element, but returns the XML::LibXML::Element node instead.
This may break on some schema-contained validations.
Sometimes, the Schema spec is such a mess, that XML::Compile cannot
automatically translate it. I have seen cases where confusion
over name-spaces is created: a choice between three elements with
the same name but different types. Well, in such case you may use
L<XML::LibXML::Simple|XML::LibXML::Simple> to translate a part of your tree. Simply
use XML::LibXML::Simple qw/XMLin/;
$schema->addHook
( action => 'READER'
, type => 'tns:xyz' # or pack_type($tns,'xyz')
# path => qr!/company$! # by element name
, replace =>
sub { my ($xml, $args, $path, $type, $r) = @_;
($type => XMLin($xml, ...));
}
);
=head3 hooks for post-processing, after the data is collected
Your code reference gets called with three parameters: the XML node,
the data collected and the path. Be careful that the collected data
might be a SCALAR (for simpleType). Return a HASH or a SCALAR. C<undef>
may work, unless it is the value of a required element you throw awy.
This hook also offers a predefined C<PRINT_PATH>. Besides, it
has C<INCLUDE_PATH>, C<XML_NODE>, C<NODE_TYPE>, C<ELEMENT_ORDER>,
and C<ATTRIBUTE_ORDER>, which will result in additional fields in
the HASH, respectively containing the NODE which was processed (an
XML::LibXML::Element), the type_of_node, the element names, and the
attribute names. The keys start with an underscore C<_>.
=head2 Typemaps
In a typemap, a relation between an XML element type and a Perl class (or
object) is made. Each translator back-end will implement this a little
differently. This section is about how the reader handles typemaps.
=head3 Typemap to Class
Usually, an XML type will be mapped on a Perl class. The Perl class
implements the C<fromXML> method as constructor.
$schema->addTypemaps($sometype => 'My::Perl::Class');
package My::Perl::Class;
...
sub fromXML
{ my ($class, $data, $xmltype) = @_;
my $self = $class->new($data);
...
$self;
}
Your method returns the data which will be included in the result tree
of the reader. You may return an object, the unmodified C<$data>, or
C<undef>. When C<undef> is returned, this may fail the schema parser
when the data element is required.
In the simpelest implementation, the class stores its data exactly as
the XML structure:
package My::Perl::Class;
sub fromXML
{ my ($class, $data, $xmltype) = @_;
bless $data, $class;
}
# The same, even shorter:
sub fromXML { bless $_[1], $_[0] }
=head3 Typemap to Object
Another option is to implement an object factory: one object which creates
other objects. In this case, the C<$xmltype> parameter can come of use,
to have one object spawning many different other objects.
my $object = My::Perl::Class->new(...);
$schema->typemap($sometype => $object);
package My::Perl::Class;
sub fromXML
{ my ($object, $xmltype, $data) = @_;
return Some::Other::Class->new($data);
}
This object factory may be a very simple solution when you map XML onto
objects which are not under your control; where there is not way to
add the C<fromXML> method.
=head3 Typemap to CODE
The light version of an object factory works with CODE references.
$schema->typemap($t1 => \&myhandler);
sub myhandler
{ my ($backend, $data, $type) = @_;
return My::Perl::Class->new($data)
if $backend eq 'READER';
$data;
}
# shorter
$schema->typemap($t1 => sub {My::Perl::Class->new($_[1])} );
=head3 Typemap implementation
Internally, the typemap is simply translated into an "after" hook for the
specific type. After the data was processed via the usual mechanism,
the hook will call method C<fromXML> on the class or object you specified
with the data which was read. You may still use "before" and "replace"
hooks, if you need them.
Syntactic sugar:
$schema->typemap($t1 => 'My::Package');
$schema->typemap($t2 => $object);
is comparible to
$schema->typemap($t1 => sub {My::Package->fromXML(@_)});
$schema->typemap($t2 => sub {$object->fromXML(@_)} );
with some extra checks.
=head1 SEE ALSO
This module is part of XML-Compile distribution version 1.64,
built on October 21, 2024. Website: F<http://perl.overmeer.net/xml-compile/>
=head1 LICENSE
Copyrights 2006-2024 by [Mark Overmeer <markov@cpan.org>]. For other contributors see ChangeLog.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
See F<http://dev.perl.org/licenses/>
|