1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595
|
.. _usage:
Basic Usage
===========
Futhark contains several code generation backends. Each is provided
as subcommand of the ``futhark`` binary. For example, ``futhark c``
compiles a Futhark program by translating it to sequential C code,
while ``futhark pyopencl`` generates Python code with calls to the
PyOpenCL library. The different compilers all contain the same
frontend and optimisation pipeline - only the code generator is
different. They all provide roughly the same command line interface,
but there may be minor differences and quirks due to characteristics
of the specific backends.
There are three main ways of compiling a Futhark program: to an
ordinary executable (by using ``--executable``, which is the default),
to a *server executable* (``--server``), and to a library
(``--library``). Plain executables can be run immediately, but are
useful mostly for testing and benchmarking. Server executables are
discussed in :ref:`server-protocol`. Libraries can be called from
non-Futhark code.
.. _executable:
Compiling to Executable
-----------------------
A Futhark program is stored in a file with the extension ``.fut``. It
can be compiled to an executable program as follows::
$ futhark c prog.fut
This makes use of the ``futhark c`` compiler, but any other will work
as well. The compiler will automatically invoke ``cc`` to produce an
executable binary called ``prog``. If we had used ``futhark python``
instead of ``futhark c``, the ``prog`` file would instead have
contained Python code, along with a `shebang`_ for easy execution. In
general, when compiling file ``foo.fut``, the result will be written
to a file ``foo`` (i.e. the extension will be stripped off). This can
be overridden using the ``-o`` option. For more details on specific
compilers, see their individual manual pages.
.. _shebang: https://en.wikipedia.org/wiki/Shebang_%28Unix%29
Executables generated by the various Futhark compilers share a common
command-line interface, but may also individually support more
options. When a Futhark program is run, execution starts at one of
its *entry points*. By default, the entry point named ``main`` is
run. An alternative entry point can be indicated by using the ``-e``
option. All entry point functions must be declared appropriately in
the program (see :ref:`entry-points`). If the entry point takes any
parameters, these will be read from standard input in a subset of the
Futhark syntax. A binary input format is also supported; see
:ref:`binary-data-format`. The result of the entry point is printed
to standard output.
Only a subset of all Futhark values can be passed to an executable.
Specifically, only primitives and arrays of primitive types are
supported. In particular, nested tuples and arrays of tuples are not
permitted. Non-nested tuples are supported are supported as simply
flat values. This restriction is not present for Futhark programs
compiled to libraries. If an entry point *returns* any such value,
its printed representation is unspecified. As a special case, an
entry point is allowed to return a flat tuple.
Instead of compiling, there is also an interpreter, accessible as
``futhark run`` and ``futhark repl``. The latter is an interactive
prompt, useful for experimenting with Futhark expressions. Be aware
that the interpreter runs code very slowly.
.. _executable-options:
Executable Options
^^^^^^^^^^^^^^^^^^
All generated executables support the following options.
``-h/--help``
Print help text to standard output and exit.
``-D/--debugging``
Print debugging information on standard error. Exactly what is
printed, and how it looks, depends on which Futhark compiler is
used. This option may also enable more conservative (and slower)
execution, such as frequently synchronising to check for errors.
This implies ``--log``.
``-L/--log``
Print low-overhead logging information during initialisation and
during execution of entry points. Enabling this option should not
affect program performance.
``--cache-file FILE``
Create (if necessary) and use data in the provided cache file to
speed up subsequent launches of the same program. The cache file
is automatically updated by the running program as necessary. It
is safe to delete at any time, and will be recreated as necessary.
``--print-params``
Print a list of tuning parameters followed by their *parameter
class* in parentheses, which indicates what they are used for.
``--param SIZE=VALUE``
Set one of the tunable sizes to the given value. Using the
``--tuning`` option is more convenient.
``--tuning FILE``
Load tuning options from the indicated *tuning file*. The file
must contain lines of the form ``SIZE=VALUE``, where each *SIZE*
must be one of the sizes listed by the ``--print-params`` option
(without size class), and the *VALUE* must be a non-negative
integer. Extraneous spaces or blank lines are not allowed. A zero
means to use the default size, whatever it may be. In case of
duplicate assignments to the same size, the last one takes
predecence. This is equivalent to passing each size setting on
the command line using the ``--params`` option, but more convenient.
Non-Server Executable Options
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following options are only supported on non-server executables,
because they make no sense in a server context.
``-t/--write-runtime-to FILE``
Print the time taken to execute the program to the indicated file,
an integral number of microseconds. The time taken to perform setup
or teardown, including reading the input or writing the result, is
not included in the measurement. See the documentation for specific
compilers to see exactly what is measured.
``-r/--runs RUNS``
Run the specified entry point the given number of times (plus a
warmup run). The program result is only printed once, after the
last run. If combined with ``-t``, one measurement is printed per
run. This is a good way to perform benchmarking.
``-b/--binary-output``
Print the result using the binary data format
(:ref:`binary-data-format`). For large outputs, this is
significantly faster and takes up less space.
``-n/--no-print-result``
Do not print the result of running the program.
GPU Options
~~~~~~~~~~~
The following options are supported by executables generated with the
GPU backends (``opencl``, ``pyopencl``, ``hip``, and ``cuda``).
``-d/--device DEVICE``
Pick the first device whose name contains the given string. The
special string ``#k``, where ``k`` is an integer, can be used to
pick the *k*-th device, numbered from zero.
``--default-thread-block-size INT``
The default size of GPU thread blocks that are launched. Capped to
the hardware limit if necessary.
``--default-num-thread-blocks INT``
The default number of GPU thread blocks that are launched.
``-P/--profile``
Measure the time taken by various GPU operations (such as kernels)
and print a summary at the end. Unfortunately, it is currently
nontrivial (and manual) to relate these operations back to source
Futhark code.
``--unified-memory INT``
Corresponds to
:c:func:`futhark_context_config_set_unified_memory`.
OpenCL-specific Options
~~~~~~~~~~~~~~~~~~~~~~~
The following options are supported by executables generated with the
OpenCL backends (``opencl``, ``pyopencl``):
``-p/--platform PLATFORM``
Pick the first OpenCL platform whose name contains the given
string. The special string ``#k``, where ``k`` is an integer, can
be used to pick the *k*-th platform, numbered from zero. If used
in conjunction with ``-d``, only the devices from matching
platforms are considered.
``--default-group-size INT``
The default size of OpenCL workgroups that are launched. Capped
to the hardware limit if necessary.
``--default-num-groups INT``
The default number of OpenCL workgroups that are launched.
``--dump-opencl FILE``
Don't run the program, but instead dump the embedded OpenCL
program to the indicated file. Useful if you want to see what is
actually being executed.
``--load-opencl FILE``
Instead of using the embedded OpenCL program, load it from the
indicated file. This is extremely unlikely to result in succesful
execution unless this file is the result of a previous call to
``--dump-opencl`` (perhaps lightly modified).
``--dump-opencl-binary FILE``
Don't run the program, but instead dump the compiled version of
the embedded OpenCL program to the indicated file. On NVIDIA
platforms, this will be PTX code. If this option is set, no entry
point will be run.
``--load-opencl-binary FILE``
Load an OpenCL binary from the indicated file.
``--build-option OPT``
Add an additional build option to the string passed to
``clBuildProgram()``. Refer to the OpenCL documentation for which
options are supported. Be careful - some options can easily
result in invalid results.
``--list-devices``
List all OpenCL devices and platforms available on the system.
There is rarely a need to use both ``-p`` and ``-d``. For example, to
run on the first available NVIDIA GPU, ``-p NVIDIA`` is sufficient, as
there is likely only a single device associated with this platform.
On \*nix (including macOS), the `clinfo
<https://github.com/Oblomov/clinfo>`_ tool (available in many package
managers) can be used to determine which OpenCL platforms and devices
are available on a given system.
CUDA-specific Options
~~~~~~~~~~~~~~~~~~~~~
The following options are supported by executables generated by the
``cuda`` backend:
``--dump-cuda FILE``
Don't run the program, but instead dump the embedded CUDA program
to the indicated file. Useful if you want to see what is actually
being executed.
``--load-cuda FILE``
Instead of using the embedded CUDA program, load it from the
indicated file. This is extremely unlikely to result in succesful
execution unless this file is the result of a previous call to
``--dump-cuda`` (perhaps lightly modified).
``--dump-ptx FILE``
As ``--dump-cuda``, but dumps the compiled PTX code instead.
``--load-ptx FILE``
Instead of using the embedded CUDA program, load compiled PTX code
from the indicated file.
``--nvrtc-option OPT``
Add the given option to the command line used to compile CUDA
kernels with NVRTC. The list of supported options varies with the
CUDA version but can be `found in the NVRTC
documentation
<https://docs.nvidia.com/cuda/nvrtc/index.html#group__options>`_.
For convenience, CUDA executables also accept the same
``--default-num-groups`` and ``--default-group-size`` options that the
OpenCL backend uses. These then refer to grid size and thread block
size, respectively.
Multicore options
~~~~~~~~~~~~~~~~~
The following options are supported by executables generated by the
``multicore`` backend:
``--num-threads INT``
The number of threads used to run parallel operations. If set to
a value less than ``1``, then the runtime system will use one
thread per detected core.
``-P/--profile``
Measure the time taken by various parallel sections and print a
summary at the end. Unfortunately, it is currently nontrivial
(and manual) to relate these operations back to source Futhark
code.
Compiling to Library
--------------------
While compiling a Futhark program to an executable is useful for
testing, it is not suitable for production use. Instead, a Futhark
program should be compiled into a reusable library in some target
language, enabling integration into a larger program.
General Concerns
^^^^^^^^^^^^^^^^
Futhark entry points are mapped to some form of function or method in
the target language. Generally, an entry point taking *n* parameters
will result in a function taking *n* parameters. If the entry point
returns an *m*-element tuple, then the function will return *m* values
(although the tuple can be replaced with a single opaque value, see
below). Extra parameters may be added to pass in context data, or
*out*-parameters for writing the result, for target languages that do
not support multiple return values from functions.
The entry point should have a name that is also a valid identifier in
the target language (usually C).
Not all Futhark types can be mapped cleanly to the target language.
Arrays of tuples, for example, are a common issue. In such cases,
*opaque types* are used in the generated code. Values of these types
cannot be directly inspected, but can be passed back to Futhark entry
points. In the general case, these types will be named with a random
hash. However, if you insert an explicit type annotation (and the
type name contains only characters valid for identifiers for the used
backend), the indicated name will be used. Note that arrays contain
brackets, which are usually not valid in identifiers. Defining and
using a type abbreviation is the best way around this.
.. _valuemapping:
Value Mapping
~~~~~~~~~~~~~
The rules for how Futhark values are mapped to target language values
are as follows:
* Primitive types or arrays of primitive types are mapped
transparently (although for the C backends, this still involves a
distinct type for arrays).
* All other types are mapped to an opaque type. Use a type ascription
with a type abbreviation to give it a specific name, otherwise one
will be generated.
Return types follow these rules, with one addition:
* If the return type is an *m*-element tuple, then the function
returns *m* values, mapped according to the rules above (but not
including this one - nested tuples are not mapped directly). This
rule does not apply when the entry point has been given a return
type ascription that is not syntactically a tuple type.
.. _api-consumption:
Consumption and Aliasing
~~~~~~~~~~~~~~~~~~~~~~~~
Futhark's support for :ref:`in-place-updates` has implications for the
generated API. Unfortunately, The type system of most languages
(e.g. C) is not rich enough to express the rules, so they are not
statically (or currently even dynamically checked). Since Futhark
will never infer a unique/consuming type for an entry point parameter,
this section can be ignored unless uniqueness annotations have been
manually added to the entry points parameter types. The rules are
essentially the same as in the language itself:
1. Each entry point input parameter is either *consuming* or
*nonconsuming* (the default). This corresponds to unique and
nonunique types in the original Futhark program. A value passed
for a consuming parameter is considered *consumed*, now has an
unspecified value, and may never be used again. It must still be
manually freed, if applicable.
Further, any *aliases* of that value are also considered consumed
and may not be used.
2. Each entry point output is either *unique* or *nonunique*. A
unique output has no aliases. A nonunique output aliases *every*
nonconsuming input parameter.
Note that these distinctions are currently usually not visible in the
generated API, and so correct usage requires knowledge of the original
types in the Futhark function. The safest strategy is to not expose
unique types in entry points.
Generating C
^^^^^^^^^^^^
A Futhark program ``futlib.fut`` can be compiled to reusable C code
using either::
$ futhark c --library futlib.fut
Or::
$ futhark opencl --library futlib.fut
This produces three files in the current directory: ``futlib.c``,
``futlib.h``, and ``futlib.json`` ( see :ref:`manifest` for more on
the latter).
If we wish (and are on a Unix system), we can then compile
``futlib.c`` to an object file like this::
$ gcc futlib.c -c
This produces a file ``futlib.o`` that can then be linked with the
main application. Details of how to link the generated code with
other C code is highly system-dependent, and outside the scope of this
manual. On Unix, we can simply add ``futlib.o`` to the final compiler
or linker command line::
$ gcc main.c -o main futlib.o
Depending on the Futhark backend you are using, you may need to add
some linker flags. For example, ``futhark opencl`` requires
``-lOpenCL`` (``-framework OpenCL`` on macOS). See the manual page
for each compiler for details.
It is also possible to simply add the generated ``.c`` file to the C
compiler command line used for compiling our whole program (here
``main.c``)::
$ gcc main.c -o main futlib.c
The downside of this approach is that the generated ``.c`` file may
contain code that causes the C compiler to warn (for example, unused
support code that is not needed by the Futhark program).
The generated header file (here, ``futlib.h``) specifies the API, and
is intended to be human-readable. See :ref:`c-api` for more
information.
The basic usage revolves around creating a *configuration object*,
which can then be used to obtain a *context object*, which must be
passed whenever entry points are called.
The configuration object is created using the following function::
struct futhark_context_config *futhark_context_config_new();
Depending on the backend, various functions are generated to modify
the configuration. The following is always available::
void futhark_context_config_set_debugging(struct futhark_context_config *cfg,
int flag);
A configuration object can be used to create a context with the
following function::
struct futhark_context *futhark_context_new(struct futhark_context_config *cfg);
Context creation may fail. Immediately after
``futhark_context_new()``, call ``futhark_context_get_error()`` (see
below), which will return a non-NULL error string if context creation
failed. The API functions are all thread safe.
Memory management is entirely manual. Deallocation functions are
provided for all types defined in the header file. Everything
returned by an entry point must be manually deallocated.
For now, many internal errors, such as failure to allocate memory,
will cause the function to ``abort()`` rather than return an error
code. However, all application errors (such as bounds and array size
checks) will produce an error code.
C with OpenCL
~~~~~~~~~~~~~
When generating C code with ``futhark opencl``, you will need to link
against the OpenCL library when linking the final binary::
$ gcc main.c -o main futlib.o -lOpenCL
When using the OpenCL backend, extra API functions are provided for
directly accessing or providing the OpenCL objects used by Futhark.
Take care when using these functions. In particular, a Futhark
context can now be configured with the command queue to use::
void futhark_context_config_set_command_queue(struct futhark_context_config *cfg, cl_command_queue queue);
As a ``cl_command_queue`` specifies an OpenCL device, this is also how
manual platform and device selection is possible. A function is also
provided for retrieving the command queue used by some Futhark
context::
cl_command_queue futhark_context_get_command_queue(struct futhark_context *ctx);
This can be used to connect two separate Futhark contexts that have
been loaded dynamically.
The raw ``cl_mem`` object underlying a Futhark array can be accessed
with the function named ``futhark_values_raw_type``, where ``type``
depends on the array in question. For example::
cl_mem futhark_values_raw_i32_1d(struct futhark_context *ctx, struct futhark_i32_1d *arr);
The array will be stored in row-major form in the returned memory
object. The function performs no copying, so the ``cl_mem`` still
belongs to Futhark, and may be reused for other purposes when the
corresponding array is freed. A dual function can be used to
construct a Futhark array from a ``cl_mem``::
struct futhark_i32_1d *futhark_new_raw_i32_1d(struct futhark_context *ctx,
cl_mem data,
int offset,
int dim0);
This function *does* copy the provided memory into fresh internally
allocated memory. The array is assumed to be stored in row-major form
``offset`` bytes into the memory region.
See also :ref:`futhark-opencl(1)`.
Generating Python
^^^^^^^^^^^^^^^^^
The ``futhark python`` and ``futhark pyopencl`` compilers both support
generating reusable Python code, although the latter of these
generates code of sufficient performance to be worthwhile. The
following mentions options and parameters only available for
``futhark pyopencl``. You will need at least PyOpenCL version 2015.2.
We can use ``futhark pyopencl`` to translate the program
``futlib.fut`` into a Python module ``futlib.py`` with the following
command::
$ futhark pyopencl --library futlib.fut
This will create a file ``futlib.py``, which contains Python code that
defines a class named ``futlib``. This class defines one method for
each entry point function (see :ref:`entry-points`) in the Futhark
program. The methods take one parameter for each parameter in the
corresponding entry point, and return a tuple containing a value for
every value returned by the entry point. For entry points returning a
single (non-tuple) value, just that value is returned (that is,
single-element tuples are not returned).
After the class has been instantiated, these methods can be invoked to
run the corresponding Futhark function. The constructor for the class
takes various keyword parameters:
``interactive=BOOL``
If ``True`` (the default is ``False``), show a menu of available
OpenCL platforms and devices, and use the one chosen by the user.
``platform_pref=STR``
Use the first platform that contains the given string. Similar to
the ``-p`` option for executables.
``device_pref=STR``
Use the first device that contains the given string. Similar to
the ``-d`` option for executables.
Futhark arrays are mapped to either the Numpy ``ndarray`` type or the
`pyopencl.array <https://documen.tician.de/pyopencl/array.html>`_
type. Scalars are mapped to Numpy scalar types.
Reproducibility
---------------
The Futhark compiler is deterministic by design, meaning that
repeatedly compiling the *same program* with the *same compilation
flags* and using the *same version* of the compiler will produce
identical output every time.
Note that this only applies to the code generated by the Futhark
compiler itself. When compiling to an executable with one of the C
backends (see :ref:`executable`), Futhark will invoke a C compiler
that may not be perfectly reproducible. In such cases the generated
``.c`` and ``.h`` files will be reproducible, but the final executable
may not.
|