File: deploying-python.rst

package info (click to toggle)
dask 2024.12.1%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 20,024 kB
  • sloc: python: 105,182; javascript: 1,917; makefile: 159; sh: 88
file content (101 lines) | stat: -rw-r--r-- 2,839 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
Python API
==========

You can create a ``dask.distributed`` scheduler by importing and creating a
``Client`` with no arguments.  This overrides whatever default was previously
set.

.. code-block:: python

   from dask.distributed import Client
   client = Client()

You can navigate to ``http://localhost:8787/status`` to see the diagnostic
dashboard if you have Bokeh installed.

Client
------

You can trivially set up a local cluster on your machine by instantiating a Dask
Client with no arguments

.. code-block:: python

   from dask.distributed import Client
   client = Client()

This sets up a scheduler in your local process along with a number of workers and
threads per worker related to the number of cores in your machine.

If you want to run workers in your same process, you can pass the
``processes=False`` keyword argument.

.. code-block:: python

   client = Client(processes=False)

This is sometimes preferable if you want to avoid inter-worker communication
and your computations release the GIL.  This is common when primarily using
NumPy or Dask Array.


LocalCluster
------------

The ``Client()`` call described above is shorthand for creating a LocalCluster
and then passing that to your client.

.. code-block:: python

   from dask.distributed import Client, LocalCluster
   cluster = LocalCluster()
   client = Client(cluster)

This is equivalent, but somewhat more explicit.

You may want to look at the
keyword arguments available on ``LocalCluster`` to understand the options available
to you on handling the mixture of threads and processes, like specifying explicit
ports, and so on.

To create a local cluster with all workers running in dedicated subprocesses, 
``dask.distributed`` also offers the experimental ``SubprocessCluster``.

Cluster manager features
------------------------

Instantiating a cluster manager class like ``LocalCluster`` and then passing it to the
``Client`` is a common pattern. Cluster managers also provide useful utilities to help
you understand what is going on.

For example you can retrieve the Dashboard URL.

.. code-block:: python

   >>> cluster.dashboard_link
   'http://127.0.0.1:8787/status'

You can retrieve logs from cluster components.

.. code-block:: python

   >>> cluster.get_logs()
   {'Cluster': '',
   'Scheduler': "distributed.scheduler - INFO - Clear task state\ndistributed.scheduler - INFO -   S...

If you are using a cluster manager that supports scaling you can modify the number of workers manually
or automatically based on workload.

.. code-block:: python

   >>> cluster.scale(10)  # Sets the number of workers to 10

   >>> cluster.adapt(minimum=1, maximum=10)  # Allows the cluster to auto scale to 10 when tasks are computed

Reference
---------

.. currentmodule:: distributed.deploy.local

.. autoclass:: LocalCluster
   :members: