1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250
|
CUDA Host API
=============
.. cuda-deprecated::
Device Management
-----------------
Device detection and enquiry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following functions are available for querying the available hardware:
.. autofunction:: numba.cuda.is_available
.. autofunction:: numba.cuda.detect
Context management
~~~~~~~~~~~~~~~~~~
CUDA Python functions execute within a CUDA context. Each CUDA device in a
system has an associated CUDA context, and Numba presently allows only one context
per thread. For further details on CUDA Contexts, refer to the `CUDA Driver API
Documentation on Context Management
<http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html>`_ and the
`CUDA C Programming Guide Context Documentation
<http://docs.nvidia.com/cuda/cuda-c-programming-guide/#context>`_. CUDA Contexts
are instances of the :class:`~numba.cuda.cudadrv.driver.Context` class:
.. autoclass:: numba.cuda.cudadrv.driver.Context
:members: reset, get_memory_info, push, pop
The following functions can be used to get or select the context:
.. autofunction:: numba.cuda.current_context
.. autofunction:: numba.cuda.require_context
The following functions affect the current context:
.. autofunction:: numba.cuda.synchronize
.. autofunction:: numba.cuda.close
Device management
~~~~~~~~~~~~~~~~~
Numba maintains a list of supported CUDA-capable devices:
.. attribute:: numba.cuda.gpus
An indexable list of supported CUDA devices. This list is indexed by integer
device ID.
Alternatively, the current device can be obtained:
.. attribute:: numba.cuda.gpus.current
The currently-selected device.
Getting a device through :attr:`numba.cuda.gpus` always provides an instance of
:class:`numba.cuda.cudadrv.devices._DeviceContextManager`, which acts as a
context manager for the selected device:
.. autoclass:: numba.cuda.cudadrv.devices._DeviceContextManager
One may also select a context and device or get the current device using the
following three functions:
.. autofunction:: numba.cuda.select_device
.. autofunction:: numba.cuda.get_current_device
.. autofunction:: numba.cuda.list_devices
The :class:`numba.cuda.cudadrv.driver.Device` class can be used to enquire about
the functionality of the selected device:
.. class:: numba.cuda.cudadrv.driver.Device
The device associated with a particular context.
.. attribute:: compute_capability
A tuple, *(major, minor)* indicating the supported compute capability.
.. attribute:: id
The integer ID of the device.
.. attribute:: name
The name of the device (e.g. "GeForce GTX 970").
.. attribute:: uuid
The UUID of the device (e.g. "GPU-e6489c45-5b68-3b03-bab7-0e7c8e809643").
.. method:: reset
Delete the context for the device. This will destroy all memory
allocations, events, and streams created within the context.
.. attribute:: supports_float16
Return ``True`` if the device supports float16 operations, ``False``
otherwise.
Compilation
-----------
Numba provides an entry point for compiling a Python function without invoking
any of the driver API. This can be useful for:
- Generating PTX that is to be inlined into other PTX code (e.g. from outside
the Numba / Python ecosystem).
- Generating PTX or LTO-IR to link with objects from non-Python translation
units.
- Generating code when there is no device present.
- Generating code prior to a fork without initializing CUDA.
.. note:: It is the user's responsibility to manage any ABI issues arising from
the use of compilation to PTX / LTO-IR. Passing the ``abi="c"`` keyword
argument can provide a solution to most issues that may arise - see
:ref:`cuda-using-the-c-abi`.
.. autofunction:: numba.cuda.compile
The environment variable ``NUMBA_CUDA_DEFAULT_PTX_CC`` can be set to control
the default compute capability targeted by ``compile`` - see
:ref:`numba-envvars-gpu-support`. If code for the compute capability of the
current device is required, the ``compile_for_current_device`` function can
be used:
.. autofunction:: numba.cuda.compile_for_current_device
Numba also provides two functions that may be used in legacy code that
specifically compile to PTX only:
.. autofunction:: numba.cuda.compile_ptx
.. autofunction:: numba.cuda.compile_ptx_for_current_device
Measurement
-----------
.. _cuda-profiling:
Profiling
~~~~~~~~~
The NVidia Visual Profiler can be used directly on executing CUDA Python code -
it is not a requirement to insert calls to these functions into user code.
However, these functions can be used to allow profiling to be performed
selectively on specific portions of the code. For further information on
profiling, see the `NVidia Profiler User's Guide
<https://docs.nvidia.com/cuda/profiler-users-guide/>`_.
.. autofunction:: numba.cuda.profile_start
.. autofunction:: numba.cuda.profile_stop
.. autofunction:: numba.cuda.profiling
.. _events:
Events
~~~~~~
Events can be used to monitor the progress of execution and to record the
timestamps of specific points being reached. Event creation returns immediately,
and the created event can be queried to determine if it has been reached. For
further information, see the `CUDA C Programming Guide Events section
<http://docs.nvidia.com/cuda/cuda-c-programming-guide/#events>`_.
The following functions are used for creating and measuring the time between
events:
.. autofunction:: numba.cuda.event
.. autofunction:: numba.cuda.event_elapsed_time
Events are instances of the :class:`numba.cuda.cudadrv.driver.Event` class:
.. autoclass:: numba.cuda.cudadrv.driver.Event
:members: query, record, synchronize, wait
.. _streams:
Stream Management
-----------------
Streams allow concurrency of execution on a single device within a given
context. Queued work items in the same stream execute sequentially, but work
items in different streams may execute concurrently. Most operations involving a
CUDA device can be performed asynchronously using streams, including data
transfers and kernel execution. For further details on streams, see the `CUDA C
Programming Guide Streams section
<http://docs.nvidia.com/cuda/cuda-c-programming-guide/#streams>`_.
Numba defaults to using the legacy default stream as the default stream. The
per-thread default stream can be made the default stream by setting the
environment variable ``NUMBA_CUDA_PER_THREAD_DEFAULT_STREAM`` to ``1`` (see the
:ref:`CUDA Environment Variables section <numba-envvars-gpu-support>`).
Regardless of this setting, the objects representing the legacy and per-thread
default streams can be constructed using the functions below.
Streams are instances of :class:`numba.cuda.cudadrv.driver.Stream`:
.. autoclass:: numba.cuda.cudadrv.driver.Stream
:members: synchronize, auto_synchronize, add_callback, async_done
To create a new stream:
.. autofunction:: numba.cuda.stream
To get the default stream:
.. autofunction:: numba.cuda.default_stream
To get the default stream with an explicit choice of whether it is the legacy
or per-thread default stream:
.. autofunction:: numba.cuda.legacy_default_stream
.. autofunction:: numba.cuda.per_thread_default_stream
To construct a Numba ``Stream`` object using a stream allocated elsewhere, the
``external_stream`` function is provided. Note that the lifetime of external
streams must be managed by the user - Numba will not deallocate an external
stream, and the stream must remain valid whilst the Numba ``Stream`` object is
in use.
.. autofunction:: numba.cuda.external_stream
Runtime
-------
Numba generally uses the Driver API, but it provides a simple wrapper to the
Runtime API so that the version of the runtime in use can be queried. This is
accessed through ``cuda.runtime``, which is an instance of the
:class:`numba.cuda.cudadrv.runtime.Runtime` class:
.. autoclass:: numba.cuda.cudadrv.runtime.Runtime
:members: get_version, is_supported_version, supported_versions
Whether the current runtime is officially supported and tested with the current
version of Numba can also be queried:
.. autofunction:: numba.cuda.is_supported_version
|