File: usage.rst

package info (click to toggle)
python-xmlschema 4.1.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 5,208 kB
  • sloc: python: 39,174; xml: 1,282; makefile: 36
file content (847 lines) | stat: -rw-r--r-- 35,656 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
*****
Usage
*****

.. _lxml: http://lxml.de

.. testsetup::

    import xmlschema
    import os
    import warnings

    if os.getcwd().endswith('/doc'):
        os.chdir('..')
    warnings.simplefilter("ignore", xmlschema.XMLSchemaIncludeWarning)

.. testsetup:: collection

    import xmlschema
    import os
    import warnings

    if os.getcwd().endswith('/doc'):
        os.chdir('..')
    warnings.simplefilter("ignore", xmlschema.XMLSchemaIncludeWarning)
    schema = xmlschema.XMLSchema('tests/test_cases/examples/collection/collection.xsd')


Create a schema instance
========================

Import the library and then create an instance of a schema using the path of
the file containing the schema as argument:

.. doctest::

    >>> import xmlschema
    >>> schema = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')

The argument can be also a file-like object or a string containing the schema definition:

.. doctest::

    >>> schema_file = open('tests/test_cases/examples/collection/collection.xsd')
    >>> schema = xmlschema.XMLSchema(schema_file)

.. doctest::

    >>> schema = xmlschema.XMLSchema("""
    ... <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    ... <xs:element name="block" type="xs:string"/>
    ... </xs:schema>
    ... """)

Strings and file-like objects might not work when the schema includes other local subschemas,
because the package cannot knows anything about the schema's source location:

.. doctest::

    >>> schema_xsd = open('tests/test_cases/examples/vehicles/vehicles.xsd').read()
    >>> schema = xmlschema.XMLSchema(schema_xsd)
    Traceback (most recent call last):
    ...
    ...
    xmlschema.validators.exceptions.XMLSchemaParseError: unknown element '{http://example.com/vehicles}cars':

    Schema:

      <xs:element xmlns:xs="http://www.w3.org/2001/XMLSchema" ref="vh:cars" />

    Path: /xs:schema/xs:element/xs:complexType/xs:sequence/xs:element

In these cases you can provide an appropriate *base_url* optional argument to define the
reference directory path for other includes and imports:

.. doctest::

    >>> schema_file = open('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> schema = xmlschema.XMLSchema(schema_file, base_url='tests/test_cases/examples/vehicles/')


Non standard options for schema instance creation
-------------------------------------------------

Other options for schema instance creation are available using non-standard
methods. Most cases require to use the *build* option to delay the schema
build after the loading of all schema resources. For example:

.. doctest::

    >>> schema_file = open('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> schema = xmlschema.XMLSchema(schema_file, build=False)
    >>> _ = schema.include_schema('tests/test_cases/examples/vehicles/cars.xsd')
    >>> _ = schema.include_schema('tests/test_cases/examples/vehicles/bikes.xsd')
    >>> schema.build()

Another option, available since release v1.6.1, is to provide a list of schema sources,
particularly useful when sources have no locations associated:

.. doctest::

    >>> sources = [open('tests/test_cases/examples/vehicles/vehicles.xsd'),
    ...            open('tests/test_cases/examples/vehicles/cars.xsd'),
    ...            open('tests/test_cases/examples/vehicles/bikes.xsd'),
    ...            open('tests/test_cases/examples/vehicles/types.xsd')]
    >>> schema = xmlschema.XMLSchema(sources)

or similarly to the previous example one can use the method :meth:`xmlschema.XMLSchemaBase.add_schema`:

.. doctest::

    >>> schema_file = open('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> schema = xmlschema.XMLSchema(schema_file, build=False)
    >>> _ = schema.add_schema(open('tests/test_cases/examples/vehicles/cars.xsd'))
    >>> _ = schema.add_schema(open('tests/test_cases/examples/vehicles/bikes.xsd'))
    >>> _ = schema.add_schema(open('tests/test_cases/examples/vehicles/types.xsd'))
    >>> schema.build()


.. note::
    Anyway, the advice is to build intermediate XSD schemas instead for loading
    all the schemas needed in a standard way, because XSD mechanisms of imports,
    includes, redefines, and overrides are usually supported when you submit your
    schemas to other XSD validators.


Creating a local copy of a remote XSD schema for offline use
------------------------------------------------------------

Sometimes, it is advantageous to validate XML files using an XSD schema located
at a remote location while also having the option to store the same schema
locally for offline use.

The first option is to build a schema and then export the XSD sources to a local
directory:

.. code-block:: py

    import xmlschema
    schema = xmlschema.XMLSchema("https://www.omg.org/spec/ReqIF/20110401/reqif.xsd")
    schema.export(target='my_schemas', save_remote=True)
    schema = xmlschema.XMLSchema("my_schemas/reqif.xsd")  # works without internet

With these commands, a folder ``my_schemas`` is created and contains the
XSD files that can be used without access to the internet.

The resulting XSD files are identical to their remote source files, with the
only difference being that xmlschema transforms the remote URLs into local
URLs. The ``export`` command bundles a set of a target XSD file and all its
dependencies by changing the ``schemaLocation`` attributes into
``xs:import/xs:include`` statements as follows:

.. code-block:: xml

    <xsd:import namespace="http://www.w3.org/1999/xhtml" schemaLocation="http://www.omg.org/spec/ReqIF/20110402/driver.xsd"/>

becomes

.. code-block:: xml

    <xsd:import namespace="http://www.w3.org/1999/xhtml" schemaLocation="my_schemas/www.omg.org/spec/ReqIF/20110402/driver.xsd"/>

The alternative option is to download the XSD resources directly:

.. code-block:: py

    from xmlschema import download_schemas
    download_schemas("https://www.omg.org/spec/ReqIF/20110401/reqif.xsd", target='my_schemas')

For default the original XSD schemas are not changed and a location map is returned. This map
is also written to a LOCATION_MAP dictionary in the target directory as the module `__init__.py`,
so can be used after as *uri_mapper* argument for building the schema instance.

.. note::

    Since release v2.5.0 the ``schemaLocation`` attributes are rewritten with
    local paths that don't start with the target directory path, in order to be
    reusable from any working directory. Furthermore for default the residual
    redundant imports from different location hints, are cleaned stripping
    ``schemaLocation`` attributes from them.


Validation
==========

A schema instance has methods to validate an XML document against the schema.

The first method is :meth:`xmlschema.XMLSchemaBase.is_valid`, that returns ``True``
if the XML argument is validated by the schema loaded in the instance,
and returns ``False`` if the document is invalid.

.. doctest::

    >>> import xmlschema
    >>> schema = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> schema.is_valid('tests/test_cases/examples/vehicles/vehicles.xml')
    True
    >>> schema.is_valid('tests/test_cases/examples/vehicles/vehicles-1_error.xml')
    False
    >>> schema.is_valid("""<?xml version="1.0" encoding="UTF-8"?><fancy_tag/>""")
    False

An alternative mode for validating an XML document is implemented by the method
:meth:`xmlschema.XMLSchemaBase.validate`, that raises an error when the XML doesn't
conform to the schema:

.. doctest::

    >>> import xmlschema
    >>> schema = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> schema.validate('tests/test_cases/examples/vehicles/vehicles.xml')
    >>> schema.validate('tests/test_cases/examples/vehicles/vehicles-1_error.xml')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/brunato/Development/projects/xmlschema/xmlschema/schema.py", line 220, in validate
        raise error
    xmlschema.exceptions.XMLSchemaValidationError: failed validating <Element ...

    Reason: character data between child elements not allowed!

    Schema:

      <xs:sequence xmlns:xs="http://www.w3.org/2001/XMLSchema">
            <xs:element maxOccurs="unbounded" minOccurs="0" name="car" type="vh:vehicleType" />
      </xs:sequence>

    Instance:

      <ns0:cars xmlns:ns0="http://example.com/vehicles">
        NOT ALLOWED CHARACTER DATA
        <ns0:car make="Porsche" model="911" />
        <ns0:car make="Porsche" model="911" />
      </ns0:cars>


A validation method is also available at module level, useful when you need to
validate a document only once or if you extract information about the schema,
typically the schema location and the namespace, directly from the XML document:

.. doctest::

    >>> xmlschema.validate('tests/test_cases/examples/vehicles/vehicles.xml')

    >>> xml_file = 'tests/test_cases/examples/vehicles/vehicles.xml'
    >>> xsd_file = 'tests/test_cases/examples/vehicles/vehicles.xsd'
    >>> xmlschema.validate(xml_file, schema=xsd_file)


Data decoding and encoding
==========================

A schema instance can be also used for decoding an XML document to a nested dictionary:

.. doctest::

    >>> import xmlschema
    >>> from pprint import pprint
    >>> xs = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> pprint(xs.to_dict('tests/test_cases/examples/vehicles/vehicles.xml'))
    {'@xmlns:vh': 'http://example.com/vehicles',
     '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
     '@xsi:schemaLocation': 'http://example.com/vehicles vehicles.xsd',
     'vh:bikes': {'vh:bike': [{'@make': 'Harley-Davidson', '@model': 'WL'},
                              {'@make': 'Yamaha', '@model': 'XS650'}]},
     'vh:cars': {'vh:car': [{'@make': 'Porsche', '@model': '911'},
                            {'@make': 'Porsche', '@model': '911'}]}}

The decoded values match the datatypes declared in the XSD schema:

.. doctest::

    >>> import xmlschema
    >>> from pprint import pprint
    >>> xs = xmlschema.XMLSchema('tests/test_cases/examples/collection/collection.xsd')
    >>> pprint(xs.to_dict('tests/test_cases/examples/collection/collection.xml'))
    {'@xmlns:col': 'http://example.com/ns/collection',
     '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
     '@xsi:schemaLocation': 'http://example.com/ns/collection collection.xsd',
     'object': [{'@available': True,
                 '@id': 'b0836217462',
                 'author': {'@id': 'PAR',
                            'born': '1841-02-25',
                            'dead': '1919-12-03',
                            'name': 'Pierre-Auguste Renoir',
                            'qualification': 'painter'},
                 'estimation': Decimal('10000.00'),
                 'position': 1,
                 'title': 'The Umbrellas',
                 'year': '1886'},
                {'@available': True,
                 '@id': 'b0836217463',
                 'author': {'@id': 'JM',
                            'born': '1893-04-20',
                            'dead': '1983-12-25',
                            'name': 'Joan Miró',
                            'qualification': 'painter, sculptor and ceramicist'},
                 'position': 2,
                 'title': None,
                 'year': '1925'}]}


Decoded data can be encoded back to XML:

.. doctest:: collection

    >>> obj = schema.decode('tests/test_cases/examples/collection/collection.xml')
    >>> collection = schema.encode(obj)
    >>> collection
    <Element '{http://example.com/ns/collection}collection' at ...>
    >>> print(xmlschema.etree_tostring(collection, {'col': 'http://example.com/ns/collection'}))
    <col:collection xmlns:col="http://example.com/ns/collection" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://example.com/ns/collection collection.xsd">
        <object id="b0836217462" available="true">
            <position>1</position>
            <title>The Umbrellas</title>
            <year>1886</year>
            <author id="PAR">
                <name>Pierre-Auguste Renoir</name>
                <born>1841-02-25</born>
                <dead>1919-12-03</dead>
                <qualification>painter</qualification>
            </author>
            <estimation>10000.00</estimation>
        </object>
        <object id="b0836217463" available="true">
            <position>2</position>
            <title />
            <year>1925</year>
            <author id="JM">
                <name>Joan Miró</name>
                <born>1893-04-20</born>
                <dead>1983-12-25</dead>
                <qualification>painter, sculptor and ceramicist</qualification>
            </author>
        </object>
    </col:collection>


All the decoding and encoding methods are based on two generator methods of the `XMLSchema` class,
namely *iter_decode()* and *iter_encode()*, that yield both data and validation errors.
See :ref:`schema-level-api` section for more information.


Decoding a part using XPath
---------------------------

If you need to decode only a part of the XML document you can pass also an XPath
expression using the *path* argument.

.. doctest::

    >>> xs = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> pprint(xs.to_dict('tests/test_cases/examples/vehicles/vehicles.xml', '/vh:vehicles/vh:bikes'))
    {'vh:bike': [{'@make': 'Harley-Davidson', '@model': 'WL'},
                 {'@make': 'Yamaha', '@model': 'XS650'}]}

.. note::

    An XPath expression for the schema *considers the schema as the root element
    with global elements as its children*.

If you need to be selective on XML data, e.g. select the 2nd of a sequence od elements
using a predicate, the provided *path* argument can't be used for selecting the matching
XSD element:

.. doctest::

    >>> xs = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> filepath = 'tests/test_cases/examples/vehicles/vehicles.xml'
    >>> pprint(xs.to_dict(filepath, path='//vh:bike[2]'))
    Traceback (most recent call last):
    ...
    xmlschema.validators.exceptions.XMLSchemaValidationError: failed validating <Element '{http://example.com/vehicles}bike' at 0x...> with XMLSchema10(name='vehicles.xsd', namespace='http://example.com/vehicles'):

    Reason: path='//vh:bike[2]' doesn't select any '{http://example.com/vehicles}bike' element of the schema, maybe you have to provide a different path using the schema_path argument

    Instance type: <class 'xml.etree.ElementTree.Element'>

    Instance:

      <vh:bike xmlns:vh="http://example.com/vehicles" make="Yamaha" model="XS650" />

    Path: /vh:vehicles/vh:bikes/vh:bike[2]

in these cases you must provide also an appropriate *schema_path* argument for finding the schema element:

.. doctest::

    >>> xs = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> filepath = 'tests/test_cases/examples/vehicles/vehicles.xml'
    >>> pprint(xs.to_dict(filepath, path='//vh:bike[2]', schema_path='/vh:vehicles/vh:bikes/vh:bike'))
    {'@make': 'Yamaha', '@model': 'XS650'}
    >>> pprint(xs.to_dict(filepath, path='//vh:bike[2]', schema_path='//vh:bike'))
    {'@make': 'Yamaha', '@model': 'XS650'}


Validating and decoding ElementTree's data
------------------------------------------

Validation and decode API works also with XML data loaded in ElementTree structures:

.. doctest::

    >>> import xmlschema
    >>> from pprint import pprint
    >>> from xml.etree import ElementTree
    >>> xs = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> xt = ElementTree.parse('tests/test_cases/examples/vehicles/vehicles.xml')
    >>> xs.is_valid(xt)
    True
    >>> pprint(xs.to_dict(xt, process_namespaces=False), depth=2)
    {'@{http://www.w3.org/2001/XMLSchema-instance}schemaLocation': 'http://...',
     '{http://example.com/vehicles}bikes': {'{http://example.com/vehicles}bike': [...]},
     '{http://example.com/vehicles}cars': {'{http://example.com/vehicles}car': [...]}}

The standard ElementTree library lacks of namespace information in trees, so you
have to provide a map to convert URIs to prefixes:

    >>> namespaces = {'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 'vh': 'http://example.com/vehicles'}
    >>> pprint(xs.to_dict(xt, namespaces=namespaces))
    {'@xmlns:vh': 'http://example.com/vehicles',
     '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
     '@xsi:schemaLocation': 'http://example.com/vehicles vehicles.xsd',
     'vh:bikes': {'vh:bike': [{'@make': 'Harley-Davidson', '@model': 'WL'},
                              {'@make': 'Yamaha', '@model': 'XS650'}]},
     'vh:cars': {'vh:car': [{'@make': 'Porsche', '@model': '911'},
                            {'@make': 'Porsche', '@model': '911'}]}}

You can also convert XML data using the lxml_ library, that works better because
namespace information is associated within each node of the trees:

.. doctest::

    >>> import xmlschema
    >>> from pprint import pprint
    >>> import lxml.etree as etree
    >>> xs = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> xt = etree.parse('tests/test_cases/examples/vehicles/vehicles.xml')
    >>> xs.is_valid(xt)
    True
    >>> pprint(xs.to_dict(xt))
    {'@xmlns:vh': 'http://example.com/vehicles',
     '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
     '@xsi:schemaLocation': 'http://example.com/vehicles vehicles.xsd',
     'vh:bikes': {'vh:bike': [{'@make': 'Harley-Davidson', '@model': 'WL'},
                              {'@make': 'Yamaha', '@model': 'XS650'}]},
     'vh:cars': {'vh:car': [{'@make': 'Porsche', '@model': '911'},
                            {'@make': 'Porsche', '@model': '911'}]}}
    >>> pprint(xmlschema.to_dict(xt, 'tests/test_cases/examples/vehicles/vehicles.xsd'))
    {'@xmlns:vh': 'http://example.com/vehicles',
     '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
     '@xsi:schemaLocation': 'http://example.com/vehicles vehicles.xsd',
     'vh:bikes': {'vh:bike': [{'@make': 'Harley-Davidson', '@model': 'WL'},
                              {'@make': 'Yamaha', '@model': 'XS650'}]},
     'vh:cars': {'vh:car': [{'@make': 'Porsche', '@model': '911'},
                            {'@make': 'Porsche', '@model': '911'}]}}

With the version v4.0.0 of the library the optional argument *iterparse* has been
added for processing XML data using a dependency injection:

.. doctest::

    >>> import lxml.etree as etree
    >>> schema_file = 'tests/test_cases/examples/vehicles/vehicles.xsd'
    >>> xml_data = 'tests/test_cases/examples/vehicles/vehicles.xml'
    >>> xs = xmlschema.XMLSchema(schema_file, iterparse=etree.iterparse)
    >>> type(xs.root)
    <class 'lxml.etree._Element'>

that option is applied also for creating :class:`xmlschema.XMLResource`
instances when validation/decode is called using a schema API:

    >>> xs.is_valid('tests/test_cases/examples/vehicles/vehicles-ns-mix.xml')
    True
    >>> pprint(xs.to_dict('tests/test_cases/examples/vehicles/vehicles-ns-mix.xml'))
    {'@xmlns': 'http://example.com/vehicles',
     '@xmlns:vh': 'http://xmlschema.test/other-ns',
     '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
     '@xsi:schemaLocation': 'http://example.com/vehicles vehicles.xsd',
     'vh:bikes': {'@xmlns': '',
                  '@xmlns:vh': 'http://example.com/vehicles',
                  'vh:bike': [{'@make': 'Harley-Davidson', '@model': 'WL'},
                              {'@make': 'Yamaha', '@model': 'XS650'}]},
     'vh:cars': {'@xmlns': '',
                 '@xmlns:vh': 'http://example.com/vehicles',
                 'vh:car': [{'@make': 'Porsche', '@model': '911'},
                            {'@make': 'Porsche', '@model': '911'}]}}

The same option is supported by package API (e.g. `xmlschema.to_dict`).

The benefits of using *lxml* instead of *ElementTree* are mainly on namespace
processing and an extended data model, but *ElementTree* structures can be fragmented
without duplications so they could be more suitable for processing data in lazy mode.

Furthermore, namespace mapping is also supported by :class:`xmlschema.XMLResource` internals,
so the use of *ElementTree* for decoding XML data produce the same results:

    >>> pprint(xmlschema.to_dict('tests/test_cases/examples/vehicles/vehicles-ns-mix.xml'))
    {'@xmlns': 'http://example.com/vehicles',
     '@xmlns:vh': 'http://xmlschema.test/other-ns',
     '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
     '@xsi:schemaLocation': 'http://example.com/vehicles vehicles.xsd',
     'vh:bikes': {'@xmlns': '',
                  '@xmlns:vh': 'http://example.com/vehicles',
                  'vh:bike': [{'@make': 'Harley-Davidson', '@model': 'WL'},
                              {'@make': 'Yamaha', '@model': 'XS650'}]},
     'vh:cars': {'@xmlns': '',
                 '@xmlns:vh': 'http://example.com/vehicles',
                 'vh:car': [{'@make': 'Porsche', '@model': '911'},
                            {'@make': 'Porsche', '@model': '911'}]}}


Customize the decoded data structure
------------------------------------

Starting from the version 0.9.9 the package includes converter objects, in order to
control the decoding process and produce different data structures. These objects
intervene at element level to compose the decoded data (attributes and content) into
a data structure.

The default converter produces a data structure similar to the format produced by
previous versions of the package. You can customize the conversion process providing
a converter instance or subclass when you create a schema instance or when you want
to decode an XML document.
For instance you can use the *Badgerfish* converter for a schema instance:

.. doctest::

    >>> import xmlschema
    >>> from pprint import pprint
    >>> xml_schema = 'tests/test_cases/examples/vehicles/vehicles.xsd'
    >>> xml_document = 'tests/test_cases/examples/vehicles/vehicles.xml'
    >>> xs = xmlschema.XMLSchema(xml_schema, converter=xmlschema.BadgerFishConverter)
    >>> pprint(xs.to_dict(xml_document, dict_class=dict), indent=4)
    {   'vh:vehicles': {   '@xmlns': {   'vh': 'http://example.com/vehicles',
                                         'xsi': 'http://www.w3.org/2001/XMLSchema-instance'},
                           '@xsi:schemaLocation': 'http://example.com/vehicles '
                                                  'vehicles.xsd',
                           'vh:bikes': {   'vh:bike': [   {   '@make': 'Harley-Davidson',
                                                              '@model': 'WL'},
                                                          {   '@make': 'Yamaha',
                                                              '@model': 'XS650'}]},
                           'vh:cars': {   'vh:car': [   {   '@make': 'Porsche',
                                                            '@model': '911'},
                                                        {   '@make': 'Porsche',
                                                            '@model': '911'}]}}}

You can also change the data decoding process providing the keyword argument *converter*
to the method call:

.. doctest::

    >>> pprint(xs.to_dict(xml_document, converter=xmlschema.ParkerConverter, dict_class=dict), indent=4)
    {'vh:bikes': {'vh:bike': [None, None]}, 'vh:cars': {'vh:car': [None, None]}}


See the :ref:`converters` section for more information about converters.


Control the decoding of XSD atomic datatypes
--------------------------------------------

XSD datatypes are decoded to Python basic datatypes. Python strings are used
for all string-based XSD types and others, like *xs:hexBinary* or *xs:QName*.
Python integers are used for *xs:integer* and derived types, `bool` for *xs:boolean*
values and `decimal.Decimal` for *xs:decimal* values.

Currently there are three options for variate the decoding of XSD atomic datatypes:

decimal_type
    decoding type for *xs:decimal* (is `decimal.Decimal` for default)

datetime_types
    if set to `True` decodes datetime and duration types to their respective XSD
    atomic types instead of keeping the XML string value

binary_types
    if set to `True` decodes *xs:hexBinary* and *xs:base64Binary* types to their
    respective XSD atomic types instead of keeping the XML string value


Filling missing values
----------------------

Incompatible values are decoded with `None` when the *validation* mode is `'lax'`.
For these situations there are two options for changing the behavior of the decoder:

filler
    a callback function to fill undecodable data with a typed value. The
    callback function must accept one positional argument, that can be an
    XSD Element or an attribute declaration. If not provided undecodable
    data is replaced by `None`.

fill_missing
    if set to True the decoder fills also missing attributes. The filling value
    is None or a typed value if the *filler* callback is provided.


Control the decoding of elements
--------------------------------

These options concern the decoding of XSD elements:

value_hook
    a function that will be called with any decoded atomic value and the XSD type
    used for decoding. The return value will be used instead of the original value.

keep_empty
    if set to `True` empty elements that are valid are decoded with an empty string
    value instead of `None`.

element_hook
    an function that is called with decoded element data before calling the converter
    decode method. Takes an `ElementData` instance plus optionally the XSD element
    and the XSD type, and returns a new `ElementData` instance.


Control the decoding of wildcards
---------------------------------

These two options are specific for the content processed with an XSD wildcard:

keep_unknown
    if set to `True` unknown tags are kept and are decoded with *xs:anyType*.
    For default unknown tags not decoded by a wildcard are discarded.

process_skipped
    process XML data that match a wildcard with `processContents=’skip’`.


Control the decoding depth
--------------------------

max_depth
    maximum level of decoding, for default there is no limit. With lazy resources
    is automatically set to *source.lazy_depth* for managing lazy decoding.
    Available also for validation methods.

depth_filler
    a callback function for replacing data over the *max_depth* level. The callback
    function must accept one positional argument, that can be an XSD Element. For
    default deeper data is replaced with `None` values when *max_depth* is provided.

Control the validation
----------------------

extra_validator
    an optional function for performing non-standard validations on XML data. The
    provided function is called for each traversed element, with the XML element as
    1st argument and the corresponding XSD element as 2nd argument. It can be also a
    generator function and has to raise/yield `XMLSchemaValidationError` exceptions.

validation_hook
    an optional function for stopping or changing validation/decoding at element level.
    The provided function must accept two arguments, the XML element and the matching
    XSD element. If the value returned by this function is evaluated to false then the
    validation/decoding process continues without changes, otherwise it's stopped or
    changed. If the value returned is a validation mode the validation/decoding process
    continues changing the current validation mode to the returned value, otherwise the
    element and its content are not processed. For validation only this function can
    also stop validation suddenly raising a `XMLSchemaStopValidation` exception.


Decoding to JSON
================

The data structured created by the decoder can be easily serialized to JSON. But if you data
include `Decimal` values (for *decimal* XSD built-in type) you cannot convert the data to JSON:

.. doctest::

    >>> import xmlschema
    >>> import json
    >>> xml_document = 'tests/test_cases/examples/collection/collection.xml'
    >>> print(json.dumps(xmlschema.to_dict(xml_document), indent=4))
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/doctest.py", line 1315, in __run
        compileflags, 1) in test.globs
      File "<doctest default[3]>", line 1, in <module>
        print(json.dumps(xmlschema.to_dict(xml_document), indent=4))
      File "/usr/lib64/python2.7/json/__init__.py", line 251, in dumps
        sort_keys=sort_keys, **kw).encode(obj)
      File "/usr/lib64/python2.7/json/encoder.py", line 209, in encode
        chunks = list(chunks)
      File "/usr/lib64/python2.7/json/encoder.py", line 434, in _iterencode
        for chunk in _iterencode_dict(o, _current_indent_level):
      File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
        for chunk in chunks:
      File "/usr/lib64/python2.7/json/encoder.py", line 332, in _iterencode_list
        for chunk in chunks:
      File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
        for chunk in chunks:
      File "/usr/lib64/python2.7/json/encoder.py", line 442, in _iterencode
        o = _default(o)
      File "/usr/lib64/python2.7/json/encoder.py", line 184, in default
        raise TypeError(repr(o) + " is not JSON serializable")
    TypeError: Decimal('10000.00') is not JSON serializable

This problem is resolved providing an alternative JSON-compatible type for `Decimal` values,
using the keyword argument *decimal_type*:

.. doctest::

    >>> print(json.dumps(xmlschema.to_dict(xml_document, decimal_type=str), indent=4))  # doctest: +SKIP
    {
        "object": [
            {
                "@available": true,
                "author": {
                    "qualification": "painter",
                    "born": "1841-02-25",
                    "@id": "PAR",
                    "name": "Pierre-Auguste Renoir",
                    "dead": "1919-12-03"
                },
                "title": "The Umbrellas",
                "year": "1886",
                "position": 1,
                "estimation": "10000.00",
                "@id": "b0836217462"
            },
            {
                "@available": true,
                "author": {
                    "qualification": "painter, sculptor and ceramicist",
                    "born": "1893-04-20",
                    "@id": "JM",
                    "name": "Joan Mir\u00f3",
                    "dead": "1983-12-25"
                },
                "title": null,
                "year": "1925",
                "position": 2,
                "@id": "b0836217463"
            }
        ],
        "@xsi:schemaLocation": "http://example.com/ns/collection collection.xsd"
    }

From version 1.0 there are two module level API for simplify the JSON serialization
and deserialization task.
See the :meth:`xmlschema.to_json` and :meth:`xmlschema.from_json` in the
:ref:`document-level-api` section.


XML resources and documents
===========================

Schemas and XML instances processing are based on the class :class:`xmlschema.XMLResource`,
that handles the loading and the iteration of XSD/XML data.
Starting from v1.3.0 :class:`xmlschema.XMLResource` has been empowered with ElementTree-like
XPath API. From the same release a new class :class:`xmlschema.XmlDocument` is
available for representing XML resources with a related schema:

.. doctest::

    >>> import xmlschema
    >>> xml_document = xmlschema.XmlDocument('tests/test_cases/examples/vehicles/vehicles.xml')
    >>> xml_document.schema
    XMLSchema10(name='vehicles.xsd', namespace='http://example.com/vehicles')

This class can be used to derive specialized schema-related classes.
See :ref:`wsdl11-documents` section for an application example.


Meta-schemas and XSD sources
============================

Schema classes :class:`xmlschema.XMLSchema10` and :class:`xmlschema.XMLSchema11`
have built-in meta-schema instances, related to the XSD namespace, that can be used
directly to validate XSD sources without build a new schema:

.. doctest::

    >>> from xmlschema import XMLSchema
    >>> XMLSchema.meta_schema.validate('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> XMLSchema.meta_schema.validate('tests/test_cases/examples/vehicles/invalid.xsd')
    Traceback (most recent call last):
    ...
    ...
    xmlschema.validators.exceptions.XMLSchemaValidationError: failed validating ...

    Reason: use of attribute 'name' is prohibited

    Schema:

      <xs:restriction xmlns:xs="http://www.w3.org/2001/XMLSchema" base="xs:complexType">
       <xs:sequence>
        <xs:element ref="xs:annotation" minOccurs="0" />
        <xs:group ref="xs:complexTypeModel" />
       </xs:sequence>
       <xs:attribute name="name" use="prohibited" />
       <xs:attribute name="abstract" use="prohibited" />
       <xs:attribute name="final" use="prohibited" />
       <xs:attribute name="block" use="prohibited" />
       <xs:anyAttribute namespace="##other" processContents="lax" />
      </xs:restriction>

    Instance:

      <xs:complexType xmlns:xs="http://www.w3.org/2001/XMLSchema" name="vehiclesType">
        <xs:sequence>
          <xs:element ref="vh:cars" />
          <xs:element ref="vh:bikes" />
        </xs:sequence>
      </xs:complexType>

    Path: /xs:schema/xs:element/xs:complexType


Furthermore also decode and encode methods can be applied on XSD files or sources:

.. doctest::

    >>> from xmlschema import XMLSchema
    >>> obj = XMLSchema.meta_schema.decode('tests/test_cases/examples/vehicles/vehicles.xsd')
    >>> from pprint import pprint
    >>> pprint(obj)
    {'@attributeFormDefault': 'unqualified',
     '@blockDefault': [],
     '@elementFormDefault': 'qualified',
     '@finalDefault': [],
     '@targetNamespace': 'http://example.com/vehicles',
     '@xmlns:vh': 'http://example.com/vehicles',
     '@xmlns:xs': 'http://www.w3.org/2001/XMLSchema',
     'xs:attribute': {'@name': 'step', '@type': 'xs:positiveInteger'},
     'xs:element': {'@abstract': False,
                    '@name': 'vehicles',
                    '@nillable': False,
                    'xs:complexType': {'@mixed': False,
                                       'xs:sequence': {'@maxOccurs': 1,
                                                       '@minOccurs': 1,
                                                       'xs:element': [{'@maxOccurs': 1,
                                                                       '@minOccurs': 1,
                                                                       '@nillable': False,
                                                                       '@ref': 'vh:cars'},
                                                                      {'@maxOccurs': 1,
                                                                       '@minOccurs': 1,
                                                                       '@nillable': False,
                                                                       '@ref': 'vh:bikes'}]}}},
     'xs:include': [{'@schemaLocation': 'cars.xsd'},
                    {'@schemaLocation': 'bikes.xsd'}]}

.. note::
    Building a new schema for XSD namespace could be not trivial because other schemas are
    required for base namespaces (e.g. XML namespace 'http://www.w3.org/XML/1998/namespace').
    This is particularly true for XSD 1.1 because the XSD meta-schema lacks of built-in
    list types definitions, so a patch schema is required.