File: customize-initialization.rst

package info (click to toggle)
dask 2024.12.1%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 20,024 kB
  • sloc: python: 105,182; javascript: 1,917; makefile: 159; sh: 88
file content (125 lines) | stat: -rw-r--r-- 4,209 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
Customize Initialization
========================

Often we want to run custom code when we start up or tear down a scheduler or
worker.  We might do this manually with functions like ``Client.run`` or
``Client.run_on_scheduler``, but this is error prone and difficult to automate.

To resolve this, Dask includes a few mechanisms to run arbitrary code around
the lifecycle of a Scheduler, Worker, Nanny, or Client.

Preload Scripts
---------------

Both ``dask-scheduler`` and ``dask-worker`` support a ``--preload`` option that
allows custom initialization of each scheduler/worker respectively. A module or
Python file passed as a ``--preload`` value is guaranteed to be imported before
establishing any connection. A ``dask_setup(service)`` function is called if
found, with a ``Scheduler``, ``Worker``, ``Nanny``, or ``Client`` instance as
the argument. As the service stops, ``dask_teardown(service)`` is called if
present.

To support additional configuration, a single ``--preload`` module may register
additional command-line arguments by exposing ``dask_setup`` as a  Click_
command.  This command will be used to parse additional arguments provided to
``dask-worker`` or ``dask-scheduler`` and will be called before service
initialization.

.. _Click: http://click.pocoo.org/

Example
~~~~~~~

As an example, consider the following file that creates a
`scheduler plugin <https://distributed.dask.org/en/latest/plugins.html>`_
and registers it with the scheduler

.. code-block:: python

   # scheduler-setup.py
   import click

   from distributed.diagnostics.plugin import SchedulerPlugin

   class MyPlugin(SchedulerPlugin):
       def __init__(self, print_count):
         self.print_count = print_count
         super().__init__()

       def add_worker(self, scheduler=None, worker=None, **kwargs):
           print("Added a new worker at:", worker)
           if self.print_count and scheduler is not None:
               print("Total workers:", len(scheduler.workers))

   @click.command()
   @click.option("--print-count/--no-print-count", default=False)
   def dask_setup(scheduler, print_count):
       plugin = MyPlugin(print_count)
       scheduler.add_plugin(plugin)

We can then run this preload script by referring to its filename (or module name
if it is on the path) when we start the scheduler::

   dask-scheduler --preload scheduler-setup.py --print-count

Types
~~~~~

Preloads can be specified as any of the following forms:

-   A path to a script, like ``/path/to/myfile.py``
-   A module name that is on the path, like ``my_module.initialize``
-   The text of a Python script, like ``import os; os.environ["A"] = "value"``

Configuration
~~~~~~~~~~~~~

Preloads can also be registered with configuration at the following values:

.. code-block:: yaml

   distributed:
     scheduler:
       preload:
       - "import os; os.environ['A'] = 'b'"  # use Python text
       - /path/to/myfile.py                  # or a filename
       - my_module                           # or a module name
       preload-argv:
       - []                                  # Pass optional keywords
       - ["--option", "value"]
       - []
     worker:
       preload: []
       preload-argv: []
     nanny:
       preload: []
       preload-argv: []
     client:
       preload: []
       preload-argv: []

.. note::

   Because the ``dask-worker`` command needs to accept keywords for both the
   Worker and the Nanny (if a nanny is used) it has both a ``--preload`` and
   ``--preload-nanny`` keyword.  All extra keywords (like ``--print-count``
   above) will be sent to the workers rather than the nanny.  There is no way
   to specify extra keywords to the nanny preload scripts on the command line.
   We recommend the use of the more flexible configuration if this is
   necessary.


Worker Lifecycle Plugins
------------------------

You can also create a class with ``setup``, ``teardown``, and ``transition`` methods,
and register that class with the scheduler to give to every worker using the
``Client.register_worker_plugin`` method.

.. currentmodule:: distributed

.. autosummary::
   Client.register_worker_plugin

.. automethod:: Client.register_worker_plugin
   :noindex: