File: quickstart.rst

package info (click to toggle)
dask.distributed 2022.12.1%2Bds.1-3
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 10,164 kB
  • sloc: python: 81,938; javascript: 1,549; makefile: 228; sh: 100
file content (115 lines) | stat: -rw-r--r-- 2,942 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
Quickstart
==========

Install
-------

::

    $ python -m pip install dask distributed --upgrade

See :doc:`installation <install>` document for more information.


Setup Dask.distributed the Easy Way
-----------------------------------

If you create a client without providing an address it will start up a local
scheduler and worker for you.

.. code-block:: python

   >>> from dask.distributed import Client
   >>> client = Client()  # set up local cluster on your laptop
   >>> client
   <Client: scheduler="127.0.0.1:8786" processes=8 cores=8>


Setup Dask.distributed the Hard Way
-----------------------------------
This allows dask.distributed to use multiple machines as workers.

Set up scheduler and worker processes on your local computer::

   $ dask scheduler
   Scheduler started at 127.0.0.1:8786

   $ dask worker 127.0.0.1:8786
   $ dask worker 127.0.0.1:8786
   $ dask worker 127.0.0.1:8786

.. note:: At least one ``dask worker`` must be running after launching a
          scheduler.

Launch a Client and point it to the IP/port of the scheduler.

.. code-block:: python

   >>> from dask.distributed import Client
   >>> client = Client('127.0.0.1:8786')

See `setup documentation <https://docs.dask.org/en/latest/setup.html>`_ for advanced use.


Map and Submit Functions
~~~~~~~~~~~~~~~~~~~~~~~~

Use the ``map`` and ``submit`` methods to launch computations on the cluster.
The ``map/submit`` functions send the function and arguments to the remote
workers for processing.  They return ``Future`` objects that refer to remote
data on the cluster.  The ``Future`` returns immediately while the computations
run remotely in the background.

.. code-block:: python

   >>> def square(x):
           return x ** 2

   >>> def neg(x):
           return -x

   >>> A = client.map(square, range(10))
   >>> B = client.map(neg, A)
   >>> total = client.submit(sum, B)
   >>> total.result()
   -285


Gather
~~~~~~

The ``map/submit`` functions return ``Future`` objects, lightweight tokens that
refer to results on the cluster.  By default the results of computations
*stay on the cluster*.

.. code-block:: python

   >>> total  # Function hasn't yet completed
   <Future: status: waiting, key: sum-58999c52e0fa35c7d7346c098f5085c7>

   >>> total  # Function completed, result ready on remote worker
   <Future: status: finished, key: sum-58999c52e0fa35c7d7346c098f5085c7>

Gather results to your local machine either with the ``Future.result`` method
for a single future, or with the ``Client.gather`` method for many futures at
once.

.. code-block:: python

   >>> total.result()   # result for single future
   -285
   >>> client.gather(A) # gather for many futures
   [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


Restart
~~~~~~~

When things go wrong, or when you want to reset the cluster state, call the
``restart`` method.

.. code-block:: python

   >>> client.restart()

See :doc:`client <client>` for advanced use.