File: addendum.rst

package info (click to toggle)
bidict 0.23.1-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,148 kB
  • sloc: python: 1,603; makefile: 157; sh: 114; javascript: 25; xml: 9
file content (272 lines) | stat: -rw-r--r-- 7,870 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
Addendum
========

Performance
-----------

Bidict is written to be as performant as possible
without sacrificing other important goals,
such as safety, portability, and maintainability.

In general, using a bidict to maintain a bidirectional mapping
should exhibit about the same performance as
keeping two mutually-inverse one-directional mappings
in sync manually.
The test suite includes benchmarks so that bidict's performance
can be continuously measured and improved.

If you spot an opportunity to improve bidict's performance further,
please don't hesitate to
:doc:`file an issue or submit a pull request <contributors-guide>`.


``bidict`` Avoids Reference Cycles
----------------------------------

A careful reader might notice the following...

.. doctest::

   >>> fwd = bidict(one=1)
   >>> inv = fwd.inverse
   >>> inv.inverse is fwd
   True

...and worry that a :class:`~bidict.bidict` and its inverse
create a reference cycle.
If this were true,
in CPython this would mean that the memory for a :class:`~bidict.bidict`
could not be immediately reclaimed when you retained no more references to it,
but rather would have to wait for the next garbage collection to kick in
before it could be reclaimed.

However, :class:`~bidict.bidict`\s use a :class:`weakref.ref`
to store the inverse reference in one direction,
avoiding the strong reference cycle.
As a result, when you no longer retain
any references to a :class:`~bidict.bidict` you create,
you can be sure that its refcount in CPython drops to zero,
and that its memory will therefore be reclaimed immediately.

.. note::

   In PyPy this does not occur, as PyPy doesn't use reference counts.
   The memory for unreferenced objects in PyPy is only reclaimed
   when GC kicks in, which is unpredictable.


Terminology
-----------

- It's intentional that the term "inverse" is used rather than "reverse".

  Consider a collection of *(k, v)* pairs.
  Taking the reverse of the collection can only be done if it is ordered,
  and (as you'd expect) reverses the order of the pairs in the collection.
  But each original *(k, v)* pair remains in the resulting collection.

  By contrast, taking the inverse of such a collection
  neither requires the collection to be ordered
  nor guarantees any ordering in the result,
  but rather just replaces every *(k, v)* pair
  with the inverse pair *(v, k)*.

- "keys" and "values" could perhaps more properly be called
  "primary keys" and "secondary keys" (as in a database),
  or even "forward keys" and "inverse keys", respectively.
  :mod:`bidict` sticks with the terms "keys" and "values"
  for the sake of familiarity and to avoid potential confusion,
  but technically values are also keys themselves.

  Concretely, this allows :class:`~bidict.bidict`\s
  to return a set-like (*dict_keys*) object
  for :meth:`~bidict.BidictBase.values`,
  rather than a non-set-like *dict_values* object.


Missing ``bidict``\s in the Standard Library
--------------------------------------------

The Python standard library actually contains some examples
where :class:`~bidict.bidict`\s could be used for fun and profit
(depending on your ideas of fun and profit):

- The :mod:`logging` module
  contains a private ``_levelToName`` dict
  which maps integer levels like *10* to their string names like *DEBUG*.
  If I had a nickel for every time I wanted that exposed in a bidirectional map
  (and as a public attribute, no less),
  I bet I could afford some better turns of phrase.

- The :mod:`dis` module
  maintains a mapping from opnames to opcodes
  ``dis.opmap``
  and a separate list of opnames indexed by opcode
  ``dis.opnames``.
  These could be combined into a single bidict.

- Python 3's
  :mod:`html.entities` module
  maintains separate
  ``html.entities.name2codepoint`` and
  ``html.entities.codepoint2name`` dicts.
  These could be combined into a single bidict.


Caveats
-------

Non-Atomic Mutation
^^^^^^^^^^^^^^^^^^^

As with built-in dicts,
mutating operations on a :class:`~bidict.bidict` are not atomic.
If you need to mutate the same :class:`~bidict.bidict` from different threads,
use a
`synchronization primitive <https://docs.python.org/3/library/threading.html#lock-objects>`__
to coordinate access. [#]_

.. [#] *See also:*
       [`2 <https://twitter.com/teozaurus/status/518071391959388160>`__],
       [`3 <https://twitter.com/ph1/status/943240854419922945>`__]


Equivalent but distinct :class:`~collections.abc.Hashable`\s
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Consider the following:

.. doctest::

   >>> d = {1: int, 1.0: float}

How many items do you expect *d* to contain?
The actual result might surprise you:

.. doctest::

   >>> len(d)
   1

And similarly,

.. doctest::

   >>> {1: int, 1.0: float, 1+0j: complex, True: bool}
   {1: <class 'bool'>}
   >>> 1+0j in {True}
   True

(Note that ``1 == 1.0 == 1+0j == True``.)

This illustrates that a mapping cannot contain two items
with equivalent but distinct keys
(and likewise a set cannot contain two equivalent but distinct elements).
If an object that is being looked up in a set or mapping
is equal to a contained object,
the contained object will be found,
even if it is distinct.

With a :class:`~bidict.bidict`,
since values function as keys in the inverse mapping,
this behavior occurs in the inverse direction too,
and means that a :class:`~bidict.bidict` can end up with a different
but equivalent key from the corresponding value
in its own inverse:

.. doctest::

   >>> b = bidict({'false': 0})
   >>> b.forceput('FALSE', False)
   >>> b
   bidict({'FALSE': False})
   >>> b.inverse
   bidict({0: 'FALSE'})


*nan* as a Key
^^^^^^^^^^^^^^

In CPython, *nan* is especially tricky when used as a dictionary key:

.. doctest::

   >>> d = {float('nan'): 'nan'}
   >>> d
   {nan: 'nan'}
   >>> d[float('nan')]  # doctest: +SKIP
   Traceback (most recent call last):
       ...
   KeyError: nan
   >>> d[float('nan')] = 'not overwritten'
   >>> d  # doctest: +SKIP
   {nan: 'nan', nan: 'not overwritten'}

In other Python implementations such as PyPy,
*nan* behaves just like any other dictionary key.
But in CPython, beware of this unexpected behavior,
which applies to :class:`~bidict.bidict`\s too.
:mod:`bidict` contains no special-case logic
for dealing with *nan* as a key,
so bidict's behavior will match :class:`dict`'s
on whatever runtime you're using.

See e.g. `these docs
<https://doc.pypy.org/en/latest/cpython_differences.html>`__
for more info (search the page for "nan").


Simultaneous Assignment
^^^^^^^^^^^^^^^^^^^^^^^

:class:`~bidict.bidict`\s may behave differently
from dicts with respect to so-called "simultaneous assignment".

Consider the following:

.. doctest::

   >>> m = {'a': 'a', 'b': 'b'}
   >>> m['a'], m['b'] = m['b'], m['a']  # swap two values
   >>> m
   {'a': 'b', 'b': 'a'}

With a :class:`~bidict.bidict`,
simultaneous assignment cannot be used
to swap two values in this way:

.. doctest::

   >>> m = bidict({'a': 'a', 'b': 'b'})
   >>> m['a'], m['b'] = m['b'], m['a']
   Traceback (most recent call last):
       ...
   bidict.KeyAndValueDuplicationError: ('a', 'b')

This is because "simultaneous" assignments like the above
are `by definition <https://docs.python.org/3/reference/simple_stmts.html#assignment-statements>`__
just syntax sugar for:

.. code-block:: python

   # desugaring: m['a'], m['b'] = m['b'], m['a']
   tmp = (m['b'], m['a'])
   m['a'] = tmp[0]
   m['b'] = tmp[1]

and so the intermediate ``m['a'] = tmp[0]`` assignment
raises :class:`~bidict.KeyAndValueDuplicationError`
before the second half of the swap assignment has a chance to run.

For a working alternative, you can write:

.. doctest::

   >>> m.forceupdate({m['a']: m['b'], m['b']: m['a']})
   >>> m
   bidict({'a': 'b', 'b': 'a'})

----

For more in this vein,
check out :doc:`learning-from-bidict`.