File: extend-sizeof.rst

package info (click to toggle)
dask 2024.12.1%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 20,024 kB
  • sloc: python: 105,182; javascript: 1,917; makefile: 159; sh: 88
file content (36 lines) | stat: -rw-r--r-- 1,722 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Extend `sizeof`
===============

When Dask needs to compute the size of an object in bytes, e.g. to determine which objects to spill to disk, it uses the ``dask.sizeof.sizeof`` registration mechanism. Users who need to define a ``sizeof`` implementation for their own objects can use ``sizeof.register``:

.. code-block:: python

   >>> import numpy as np
   >>> from dask.sizeof import sizeof
   >>> @sizeof.register(np.ndarray)
   >>> def sizeof_numpy_like(array):
   ...     return array.nbytes

This code can be executed in order to register the implementation with Dask by placing it in one of the library's modules e.g. ``__init__.py``. However, this introduces a maintenance burden on the developers of these libraries, and must be manually imported on all workers in the event that these libraries do not accept the patch. 

Therefore, Dask also exposes an `entrypoint <https://packaging.python.org/specifications/entry-points/>`_ under the group ``dask.sizeof`` to enable third-party libraries to develop and maintain these ``sizeof`` implementations. 

For a fictitious library ``numpy_sizeof_dask.py``, the necessary ``setup.cfg`` configuration would be as follows:

.. code-block:: ini

   [options.entry_points]
   dask.sizeof = 
      numpy = numpy_sizeof_dask:sizeof_plugin

whilst ``numpy_sizeof_dask.py`` would contain

.. code-block:: python

   >>> import numpy as np
   >>> def sizeof_plugin(sizeof):
   ...    @sizeof.register(np.ndarray)
   ...    def sizeof_numpy_like(array):
   ...        return array.nbytes 

Upon the first import of `dask.sizeof`, Dask calls the entrypoint (``sizeof_plugin``) with the ``dask.sizeof.sizeof`` object, which can then be used to register a sizeof implementation.