File: gpu.rst

package info (click to toggle)
dask 2021.01.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 9,172 kB
  • sloc: python: 74,608; javascript: 186; makefile: 150; sh: 94
file content (161 lines) | stat: -rw-r--r-- 5,820 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
GPUs
====

Dask works with GPUs in a few ways.


Custom Computations
-------------------

Many people use Dask alongside GPU-accelerated libraries like PyTorch and
TensorFlow to manage workloads across several machines.  They typically use
Dask's custom APIs, notably :doc:`Delayed <delayed>` and :doc:`Futures
<futures>`.

Dask doesn't need to know that these functions use GPUs.  It just runs Python
functions.  Whether or not those Python functions use a GPU is orthogonal to
Dask.  It will work regardless.

As a worked example, you may want to view this talk:

.. raw:: html

    <video width="560" height="315" controls>
        <source src="https://developer.download.nvidia.com/video/gputechconf/gtc/2019/video/S9198/s9198-dask-and-v100s-for-fast-distributed-batch-scoring-of-computer-vision-workloads.mp4"
                type="video/mp4">
    </video>

High Level Collections
----------------------

Dask can also help to scale out large array and dataframe computations by
combining the Dask Array and DataFrame collections with a GPU-accelerated
array or dataframe library.

Recall that :doc:`Dask Array <array>` creates a large array out of many NumPy
arrays and :doc:`Dask DataFrame <dataframe>` creates a large dataframe out of
many Pandas dataframes.  We can use these same systems with GPUs if we swap out
the NumPy/Pandas components with GPU-accelerated versions of those same
libraries, as long as the GPU accelerated version looks enough like
NumPy/Pandas in order to interoperate with Dask.

Fortunately, libraries that mimic NumPy, Pandas, and Scikit-Learn on the GPU do
exist.


DataFrames
~~~~~~~~~~

The `RAPIDS <https://rapids.ai>`_ libraries provide a GPU accelerated
Pandas-like library,
`cuDF <https://github.com/rapidsai/cudf>`_,
which interoperates well and is tested against Dask DataFrame.

If you have cuDF installed then you should be able to convert a Pandas-backed
Dask DataFrame to a cuDF-backed Dask DataFrame as follows:

.. code-block:: python

   import cudf

   df = df.map_partitions(cudf.from_pandas)  # convert pandas partitions into cudf partitions

However, cuDF does not support the entire Pandas interface, and so a variety of
Dask DataFrame operations will not function properly. Check the
`cuDF API Reference <https://docs.rapids.ai/api/cudf/stable/>`_
for currently supported interface.


Arrays
~~~~~~

.. note:: Dask's integration with CuPy relies on features recently added to
   NumPy and CuPy, particularly in version ``numpy>=1.17`` and ``cupy>=6``

`Chainer's CuPy <https://cupy.chainer.org/>`_ library provides a GPU
accelerated NumPy-like library that interoperates nicely with Dask Array.

If you have CuPy installed then you should be able to convert a NumPy-backed
Dask Array into a CuPy backed Dask Array as follows:

.. code-block:: python

   import cupy

   x = x.map_blocks(cupy.asarray)

CuPy is fairly mature and adheres closely to the NumPy API.  However, small
differences do exist and these can cause Dask Array operations to function
improperly. Check the
`CuPy Reference Manual <https://docs-cupy.chainer.org/en/stable/reference/index.html>`_
for API compatibility.


Scikit-Learn
~~~~~~~~~~~~

There are a variety of GPU accelerated machine learning libraries that follow
the Scikit-Learn Estimator API of fit, transform, and predict.  These can
generally be used within `Dask-ML's <https://ml.dask.org>`_ meta estimators,
such as `hyper parameter optimization <https://ml.dask.org/hyper-parameter-search.html>`_.

Some of these include:

-  `Skorch <https://skorch.readthedocs.io/>`_
-  `cuML <https://rapidsai.github.io/projects/cuml/en/latest/>`_
-  `LightGBM <https://github.com/Microsoft/LightGBM>`_
-  `XGBoost <https://xgboost.readthedocs.io/en/latest/>`_
-  `Thunder SVM <https://github.com/Xtra-Computing/thundersvm>`_
-  `Thunder GBM <https://github.com/Xtra-Computing/thundergbm>`_


Setup
-----

From the examples above we can see that the user experience of using Dask with
GPU-backed libraries isn't very different from using it with CPU-backed
libraries.  However, there are some changes you might consider making when
setting up your cluster.

Restricting Work
~~~~~~~~~~~~~~~~

By default Dask allows as many tasks as you have CPU cores to run concurrently.
However if your tasks primarily use a GPU then you probably want far fewer
tasks running at once.  There are a few ways to limit parallelism here:

-   Limit the number of threads explicitly on your workers using the
    ``--nthreads`` keyword in the CLI or the ``ncores=`` keyword the
    Cluster constructor.
-   Use `worker resources <https://distributed.dask.org/en/latest/resources.html>`_ and tag certain
    tasks as GPU tasks so that the scheduler will limit them, while leaving the
    rest of your CPU cores for other work

Specifying GPUs per Machine
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some configurations may have many GPU devices per node.  Dask is often used to
balance and coordinate work between these devices.

In these situations it is common to start one Dask worker per device, and use
the CUDA environment variable ``CUDA_VISIBLE_DEVICES`` to pin each worker to
prefer one device.

.. code-block:: bash

   # If we have four GPUs on one machine
   CUDA_VISIBLE_DEVICES=0 dask-worker ...
   CUDA_VISIBLE_DEVICES=1 dask-worker ...
   CUDA_VISIBLE_DEVICES=2 dask-worker ...
   CUDA_VISIBLE_DEVICES=3 dask-worker ...

The `Dask CUDA <https://github.com/rapidsai/dask-cuda>`_ project contains some
convenience CLI and Python utilities to automate this process.

Work in Progress
----------------

GPU computing is a quickly moving field today and as a result the information
in this page is likely to go out of date quickly.  We encourage interested
readers to check out `Dask's Blog <https://blog.dask.org>`_ which has more
timely updates on ongoing work.