1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
|
Python API
==========
You can create a ``dask.distributed`` scheduler by importing and creating a
``Client`` with no arguments. This overrides whatever default was previously
set.
.. code-block:: python
from dask.distributed import Client
client = Client()
You can navigate to ``http://localhost:8787/status`` to see the diagnostic
dashboard if you have Bokeh installed.
Client
------
You can trivially set up a local cluster on your machine by instantiating a Dask
Client with no arguments
.. code-block:: python
from dask.distributed import Client
client = Client()
This sets up a scheduler in your local process along with a number of workers and
threads per worker related to the number of cores in your machine.
If you want to run workers in your same process, you can pass the
``processes=False`` keyword argument.
.. code-block:: python
client = Client(processes=False)
This is sometimes preferable if you want to avoid inter-worker communication
and your computations release the GIL. This is common when primarily using
NumPy or Dask Array.
LocalCluster
------------
The ``Client()`` call described above is shorthand for creating a LocalCluster
and then passing that to your client.
.. code-block:: python
from dask.distributed import Client, LocalCluster
cluster = LocalCluster()
client = Client(cluster)
This is equivalent, but somewhat more explicit.
You may want to look at the
keyword arguments available on ``LocalCluster`` to understand the options available
to you on handling the mixture of threads and processes, like specifying explicit
ports, and so on.
To create a local cluster with all workers running in dedicated subprocesses,
``dask.distributed`` also offers the experimental ``SubprocessCluster``.
Cluster manager features
------------------------
Instantiating a cluster manager class like ``LocalCluster`` and then passing it to the
``Client`` is a common pattern. Cluster managers also provide useful utilities to help
you understand what is going on.
For example you can retrieve the Dashboard URL.
.. code-block:: python
>>> cluster.dashboard_link
'http://127.0.0.1:8787/status'
You can retrieve logs from cluster components.
.. code-block:: python
>>> cluster.get_logs()
{'Cluster': '',
'Scheduler': "distributed.scheduler - INFO - Clear task state\ndistributed.scheduler - INFO - S...
If you are using a cluster manager that supports scaling you can modify the number of workers manually
or automatically based on workload.
.. code-block:: python
>>> cluster.scale(10) # Sets the number of workers to 10
>>> cluster.adapt(minimum=1, maximum=10) # Allows the cluster to auto scale to 10 when tasks are computed
Reference
---------
.. currentmodule:: distributed.deploy.local
.. autoclass:: LocalCluster
:members:
|