File: customizing.rst

package info (click to toggle)
cbor2 5.7.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 648 kB
  • sloc: ansic: 5,522; python: 3,884; makefile: 19; sh: 8
file content (153 lines) | stat: -rw-r--r-- 6,146 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
Customizing encoding and decoding
=================================

.. py:currentmodule:: cbor2

Both the encoder and decoder can be customized to support a wider range of types.

On the encoder side, this is accomplished by passing a callback as the ``default`` constructor
argument. This callback will receive an object that the encoder could not serialize on its own.
The callback should then return a value that the encoder can serialize on its own, although the
return value is allowed to contain objects that also require the encoder to use the callback, as
long as it won't result in an infinite loop.

On the decoder side, you have two options: ``tag_hook`` and ``object_hook``. The former is called
by the decoder to process any semantic tags that have no predefined decoders. The latter is called
for any newly decoded ``dict`` objects, and is mostly useful for implementing a JSON compatible
custom type serialization scheme. Unless your requirements restrict you to JSON compatible types
only, it is recommended to use ``tag_hook`` for this purpose.

Using the CBOR tags for custom types
------------------------------------

The most common way to use ``default`` is to call :meth:`CBOREncoder.encode`
to add a custom tag in the data stream, with the payload as the value::

    class Point:
        def __init__(self, x, y):
            self.x = x
            self.y = y

    def default_encoder(encoder, value):
        # Tag number 4000 was chosen arbitrarily
        encoder.encode(CBORTag(4000, [value.x, value.y]))

The corresponding ``tag_hook`` would be::

    def tag_hook(decoder, tag, shareable_index=None):
        if tag.tag != 4000:
            return tag

        # tag.value is now the [x, y] list we serialized before
        return Point(*tag.value)

Using dicts to carry custom types
---------------------------------

The same could be done with ``object_hook``, except less efficiently::

    def default_encoder(encoder, value):
        encoder.encode(dict(typename='Point', x=value.x, y=value.y))

    def object_hook(decoder, value):
        if value.get('typename') != 'Point':
            return value

        return Point(value['x'], value['y'])

You should make sure that whatever way you decide to use for telling apart your "specially marked"
dicts from arbitrary data dicts won't mistake on for the other.

Value sharing with custom types
-------------------------------

In order to properly encode and decode cyclic references with custom types, some special care has
to be taken. Suppose you have a custom type as below, where every child object contains a reference
to its parent and the parent contains a list of children::

    from cbor2 import dumps, loads, shareable_encoder, CBORTag


    class MyType:
        def __init__(self, parent=None):
            self.parent = parent
            self.children = []
            if parent:
                self.parent.children.append(self)

This would not normally be serializable, as it would lead to an endless loop (in the worst case)
and raise some exception (in the best case). Now, enter CBOR's extension tags 28 and 29. These tags
make it possible to add special markers into the data stream which can be later referenced and
substituted with the object marked earlier.

To do this, in ``default`` hooks used with the encoder you will need to use the
:meth:`shareable_encoder` decorator on your ``default`` hook function. It will
automatically automatically add the object to the shared values registry on the encoder and prevent
it from being serialized twice (instead writing a reference to the data stream)::

    @shareable_encoder
    def default_encoder(encoder, value):
        # The state has to be serialized separately so that the decoder would have a chance to
        # create an empty instance before the shared value references are decoded
        serialized_state = encoder.encode_to_bytes(value.__dict__)
        encoder.encode(CBORTag(3000, serialized_state))

On the decoder side, you will need to initialize an empty instance for shared value lookup before
the object's state (which may contain references to it) is decoded.
This is done with the :meth:`CBORDecoder.set_shareable` method::

    def tag_hook(decoder, tag, shareable_index=None):
        # Return all other tags as-is
        if tag.tag != 3000:
            return tag

        # Create a raw instance before initializing its state to make it possible for cyclic
        # references to work
        instance = MyType.__new__(MyType)
        decoder.set_shareable(shareable_index, instance)

        # Separately decode the state of the new object and then apply it
        state = decoder.decode_from_bytes(tag.value)
        instance.__dict__.update(state)
        return instance

You could then verify that the cyclic references have been restored after deserialization::

    parent = MyType()
    child1 = MyType(parent)
    child2 = MyType(parent)
    serialized = dumps(parent, default=default_encoder, value_sharing=True)

    new_parent = loads(serialized, tag_hook=tag_hook)
    assert new_parent.children[0].parent is new_parent
    assert new_parent.children[1].parent is new_parent

Decoding Tagged items as keys
-----------------------------

Since the CBOR specification allows any type to be used as a key in the mapping type, the decoder
provides a flag that indicates it is expecting an immutable (and by implication hashable) type. If
your custom class cannot be used this way you can raise an exception if this flag is set::

    def tag_hook(decoder, tag, shareable_index=None):
        if tag.tag != 3000:
            return tag

        if decoder.immutable:
            raise CBORDecodeException('MyType cannot be used as a key or set member')

        return MyType(*tag.value)

An example where the data could be used as a dict key::

    from collections import namedtuple

    Pair = namedtuple('Pair', 'first second')

    def tag_hook(decoder, tag, shareable_index=None):
        if tag.tag != 4000:
            return tag

        return Pair(*tag.value)

The ``object_hook`` can check for the immutable flag in the same way.