1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
|
Welcome to PyCUDA's documentation!
==================================
PyCUDA gives you easy, Pythonic access to `Nvidia <http://nvidia.com>`_'s `CUDA
<http://nvidia.com/cuda/>`_ parallel computation API. Several wrappers of the
CUDA API already exist--so why the need for PyCUDA?
* Object cleanup tied to lifetime of objects. This idiom,
often called
`RAII <http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization>`_
in C++, makes it much easier to write correct, leak- and
crash-free code. PyCUDA knows about dependencies, too, so (for example)
it won't detach from a context before all memory allocated in it is also
freed.
* Convenience. Abstractions like :class:`pycuda.compiler.SourceModule` and
:class:`pycuda.gpuarray.GPUArray` make CUDA programming even more convenient
than with Nvidia's C-based runtime.
* Completeness. PyCUDA puts the full power of CUDA's driver API at your
disposal, if you wish.
* Automatic Error Checking. All CUDA errors are automatically translated
into Python exceptions.
* Speed. PyCUDA's base layer is written in C++, so all the niceties above
are virtually free.
* Helpful Documentation. You're looking at it. ;)
Here's an example, to given you an impression::
import pycuda.autoinit
import pycuda.driver as drv
import numpy
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
const int i = threadIdx.x;
dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
multiply_them(
drv.Out(dest), drv.In(a), drv.In(b),
block=(400,1,1), grid=(1,1))
print dest-a*b
(This example is :file:`examples/hello_gpu.py` in the PyCUDA
source distribution.)
On the surface, this program will print a screenful of zeros. Behind
the scenes, a lot more interesting stuff is going on:
* PyCUDA has compiled the CUDA source code and uploaded it to the card.
.. note:: This code doesn't have to be a constant--you can easily have Python
generate the code you want to compile. See :ref:`metaprog`.
* PyCUDA's numpy interaction code has automatically allocated
space on the device, copied the numpy arrays *a* and *b* over,
launched a 400x1x1 single-block grid, and copied *dest* back.
Note that you can just as well keep your data on the card between
kernel invocations--no need to copy data all the time.
* See how there's no cleanup code in the example? That's not because we
were lazy and just skipped it. It simply isn't needed. PyCUDA will
automatically infer what cleanup is necessary and do it for you.
Curious? Let's get started.
Contents
=========
.. toctree::
:maxdepth: 2
install
tutorial
driver
util
gl
array
metaprog
misc
Note that this guide will not explain CUDA programming and technology. Please
refer to Nvidia's `programming documentation
<http://www.nvidia.com/object/cuda_learn.html>`_ for that.
PyCUDA also has its own `web site <http://mathema.tician.de/software/pycuda>`_,
where you can find updates, new versions, documentation, and support.
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
|