1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
|
.. role:: ref(emphasis)
.. _futhark-cuda(1):
==============
futhark-cuda
==============
SYNOPSIS
========
futhark cuda [options...] <program.fut>
DESCRIPTION
===========
``futhark cuda`` translates a Futhark program to C code invoking CUDA
kernels, and either compiles that C code with a C compiler to an
executable binary program, or produces a ``.h`` and ``.c`` file that
can be linked with other code. The standard Futhark optimisation
pipeline is used.
``futhark cuda`` uses ``-lcuda -lcudart -lnvrtc`` to link. If using
``--library``, you will need to do the same when linking the final
binary.
The generated CUDA code can be called from multiple CPU threads, as it
brackets every API operation with ``cuCtxPushCurrent()`` and
``cuCtxPopCurrent()``.
OPTIONS
=======
Accepts the same options as :ref:`futhark-c(1)`.
ENVIRONMENT VARIABLES
=====================
``CC``
The C compiler used to compile the program. Defaults to ``cc`` if
unset.
``CFLAGS``
Space-separated list of options passed to the C compiler. Defaults
to ``-O -std=c99`` if unset.
EXECUTABLE OPTIONS
==================
Generated executables accept the same options as those generated by
:ref:`futhark-c(1)`. The ``-t`` option behaves as with
:ref:`futhark-opencl(1)`.
The following additional options are accepted.
-h, --help
Print help text to standard output and exit.
--default-thread-block-size=INT
The default size of thread blocks that are launched. Capped to the
hardware limit if necessary.
--default-num-thread-blocks=INT
The default number of thread blocks that are launched.
--default-threshold=INT
The default parallelism threshold used for comparisons when
selecting between code versions generated by incremental flattening.
Intuitively, the amount of parallelism needed to saturate the GPU.
--default-tile-size=INT
The default tile size used when performing two-dimensional tiling
(the workgroup size will be the square of the tile size).
--dump-cuda=FILE
Don't run the program, but instead dump the embedded CUDA kernels to
the indicated file. Useful if you want to see what is actually
being executed.
--dump-ptx=FILE
Don't run the program, but instead dump the PTX-compiled version of
the embedded kernels to the indicated file.
--load-cuda=FILE
Instead of using the embedded CUDA kernels, load them from the
indicated file.
--load-ptx=FILE
Load PTX code from the indicated file.
--nvrtc-option=OPT
Add an additional build option to the string passed to NVRTC. Refer
to the CUDA documentation for which options are supported. Be
careful - some options can easily result in invalid results.
ENVIRONMENT
===========
If run without ``--library``, ``futhark cuda`` will invoke a C
compiler to compile the generated C program into a binary. This only
works if the C compiler can find the necessary CUDA libraries. On
most systems, CUDA is installed in ``/usr/local/cuda``, which is
usually not part of the default compiler search path. You may need to
set the following environment variables before running ``futhark
cuda``::
LIBRARY_PATH=/usr/local/cuda/lib64
LD_LIBRARY_PATH=/usr/local/cuda/lib64/
CPATH=/usr/local/cuda/include
At runtime the generated program must be able to find the CUDA
installation directory, which is normally located at
``/usr/local/cuda``. If you have CUDA installed elsewhere, set any of
the ``CUDA_HOME``, ``CUDA_ROOT``, or ``CUDA_PATH`` environment
variables to the proper directory.
SEE ALSO
========
:ref:`futhark-opencl(1)`
|