File: web.rst

package info (click to toggle)
dask.distributed 2.10.0+ds.1-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 7,016 kB
  • sloc: python: 48,042; makefile: 231; sh: 83
file content (229 lines) | stat: -rw-r--r-- 7,623 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
Web Interface
=============

.. raw:: html

    <iframe width="560"
            height="315"
            src="https://www.youtube-nocookie.com/embed/N_GqzcuGLCY"
            frameborder="0"
            allow="autoplay; encrypted-media"
            allowfullscreen>
    </iframe>

Information about the current state of the network helps to track progress,
identify performance issues, and debug failures.

Dask.distributed includes a web interface to help deliver this information over
a normal web page in real time.  This web interface is launched by default
wherever the scheduler is launched if the scheduler machine has Bokeh_
installed (``conda install bokeh -c bokeh``).

These diagnostic pages are:

*   Main Scheduler pages at ``http://scheduler-address:8787``.  These pages,
    particularly the ``/status`` page are the main page that most people
    associate with Dask.  These pages are served from a separate standalone
    Bokeh server application running in a separate process.

The available pages are ``http://scheduler-address:8787/<page>/`` where ``<page>`` is one of

- ``status``: a stream of recently run tasks, progress bars, resource use
- ``tasks``: a larger stream of the last 100k tasks
- ``workers``: basic information about workers and their current load
- ``health``: basic health check, returns ``ok`` if service is running

.. _Bokeh: http://bokeh.pydata.org/en/latest/

Plots
-----

Example Computation
~~~~~~~~~~~~~~~~~~~

The following plots show a trace of the following computation:

.. code-block:: python

   from distributed import Client
   from time import sleep
   import random

   def inc(x):
       sleep(random.random() / 10)
       return x + 1

   def dec(x):
       sleep(random.random() / 10)
       return x - 1

   def add(x, y):
       sleep(random.random() / 10)
       return x + y


   client = Client('127.0.0.1:8786')

   incs = client.map(inc, range(100))
   decs = client.map(dec, range(100))
   adds = client.map(add, incs, decs)
   total = client.submit(sum, adds)

   del incs, decs, adds
   total.result()

Progress
~~~~~~~~

The interface shows the progress of the various computations as well as the
exact number completed.

.. image:: ../../debian/bokeh-progress.gif
   :alt: Resources view of Dask web interface

Each bar is assigned a color according to the function being run.  Each bar
has a few components.  On the left the lighter shade is the number of tasks
that have both completed and have been released from memory.  The darker shade
to the right corresponds to the tasks that are completed and whose data still
reside  in memory.  If errors occur then they appear as a black colored block
to the right.

Typical computations may involve dozens of kinds of functions.  We handle this
visually with the following approaches:

1.  Functions are ordered by the number of total tasks
2.  The colors are assigned in a round-robin fashion from a standard palette
3.  The progress bars shrink horizontally to make space for more functions
4.  Only the largest functions (in terms of number of tasks) are displayed

.. image:: ../../debian/bokeh-progress-large.gif
   :alt: Progress bar plot of Dask web interface

Counts of tasks processing, waiting for dependencies, processing, etc.. are
displayed in the title bar.

Memory Use
~~~~~~~~~~

The interface shows the relative memory use of each function with a horizontal
bar sorted by function name.

.. image:: ../../debian/bokeh-memory-use.gif
   :alt: Memory use plot of Dask web interface

The title shows the number of total bytes in use.  Hovering over any bar
tells you the specific function and how many bytes its results are actively
taking up in memory.  This does not count data that has been released.

Task Stream
~~~~~~~~~~~

The task stream plot shows when tasks complete on which workers.  Worker cores
are on the y-axis and time is on the x-axis.  As a worker completes a task its
start and end times are recorded and a rectangle is added to this plot
accordingly.

.. image:: ../../debian/bokeh-task-stream.gif
   :alt: Task stream plot of Dask web interface

The colors signifying the following:

1.  Serialization (gray)
2.  Communication between workers (red)
3.  Disk I/O (orange)
4.  Error (black)
5.  Execution times (colored by task: purple, green, yellow, etc)


If data transfer occurs between workers a *red* bar appears preceding the
task bar showing the duration of the transfer.  If an error occurs than a
*black* bar replaces the normal color.  This plot show the last 1000 tasks.
It resets if there is a delay greater than 10 seconds.

For a full history of the last 100,000 tasks see the ``tasks/`` page.

Resources
~~~~~~~~~

The resources plot show the average CPU and Memory use over time as well as
average network traffic.  More detailed information on a per-worker basis is
available in the ``workers/`` page.

.. image:: ../../debian/bokeh-resources.gif
   :alt: Resources view of Dask web interface

Per-worker resources
~~~~~~~~~~~~~~~~~~~~

The ``workers/`` page shows per-worker resources, the main ones being CPU and
memory use. Custom metrics can be registered and displayed in this page. Here
is an example showing how to display GPU utilization and GPU memory use:

.. code-block:: python

   import subprocess

   def nvidia_data(name):
       def dask_function(dask_worker):
           cmd = 'nvidia-smi --query-gpu={} --format=csv,noheader'.format(name)
           result = subprocess.check_output(cmd.split())
           return result.strip().decode()
       return dask_function

   def register_metrics(dask_worker):
       for name in ['utilization.gpu', 'utilization.memory']:
           dask_worker.metrics[name] = nvidia_data(name)

   client.run(register_metrics)

Connecting to Web Interface
---------------------------

Default
~~~~~~~

By default, ``dask-scheduler`` prints out the address of the web interface::

   INFO -  Bokeh UI at:  http://10.129.39.91:8787/status
   ...
   INFO - Starting Bokeh server on port 8787 with applications at paths ['/status', '/tasks']

The machine hosting the scheduler runs an HTTP server serving at that address.


Troubleshooting
---------------

Some clusters restrict the ports that are visible to the outside world.  These
ports may include the default port for the web interface, ``8787``.  There are
a few ways to handle this:

1.  Open port ``8787`` to the outside world.  Often this involves asking your
    cluster administrator.
2.  Use a different port that is publicly accessible using the
    ``--dashboard-address :8787`` option on the ``dask-scheduler`` command.
3.  Use fancier techniques, like `Port Forwarding`_

Running distributed on a remote machine can cause issues with viewing the web
UI -- this depends on the remote machines network configuration.

.. _`Port Forwarding`: https://en.wikipedia.org/wiki/Port_forwarding


Port Forwarding
~~~~~~~~~~~~~~~

If you have SSH access then one way to gain access to a blocked port is through
SSH port forwarding. A typical use case looks like the following:

.. code:: bash

   local$ ssh -L 8000:localhost:8787 user@remote
   remote$ dask-scheduler  # now, the web UI is visible at localhost:8000
   remote$ # continue to set up dask if needed -- add workers, etc

It is then possible to go to ``localhost:8000`` and see Dask Web UI. This same approach is
not specific to dask.distributed, but can be used by any service that operates over a
network, such as Jupyter notebooks. For example, if we chose to do this we could
forward port 8888 (the default Jupyter port) to port 8001 with
``ssh -L 8001:localhost:8888 user@remote``.