1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386
|
.. meta::
:description: hipFFT documentation and API reference library
:keywords: FFT, hipFFT, rocFFT, ROCm, API, documentation
.. _hipfft-api-usage:
********************************************************************
hipFFT API usage
********************************************************************
This section describes how to use the hipFFT library API. The hipFFT
API follows the NVIDIA CUDA `cuFFT`_ API.
.. _cuFFT: https://docs.nvidia.com/cuda/cufft/
Data types
==========
There are a few data structures that are internal to the library. The
pointer types to these structures are listed below. Use these types to
create handles and pass them between
different library functions.
.. doxygendefine:: HIPFFT_FORWARD
.. doxygendefine:: HIPFFT_BACKWARD
.. doxygenenum:: hipfftType
.. doxygentypedef:: hipfftHandle
.. doxygenenum:: hipfftResult
Simple plans
============
These planning routines allocate a plan for you. If execution of the
plan requires a work buffer, it will be created and destroyed
automatically.
.. doxygenfunction:: hipfftPlan1d
.. doxygenfunction:: hipfftPlan2d
.. doxygenfunction:: hipfftPlan3d
User managed simple plans
-------------------------
These planning routines assume that you have allocated a plan
(``hipfftHandle``) yourself and that you will manage a work area.
.. doxygenfunction:: hipfftCreate
.. doxygenfunction:: hipfftDestroy
.. doxygenfunction:: hipfftSetAutoAllocation
.. doxygenfunction:: hipfftMakePlan1d
.. doxygenfunction:: hipfftMakePlan2d
.. doxygenfunction:: hipfftMakePlan3d
Advanced plans
===================
.. doxygenfunction:: hipfftMakePlanMany
.. doxygenfunction:: hipfftXtMakePlanMany
Estimating work area sizes
==========================
These calls return estimates of the work area required to support a
plan generated with the same parameters (either with the simple or
extensible API). Applications that manage the work area allocation
themselves must use this call after plan generation and
after any ``hipfftSet*()`` calls subsequent to the plan generation if those
calls can alter the required work space size.
.. doxygenfunction:: hipfftEstimate1d
.. doxygenfunction:: hipfftEstimate2d
.. doxygenfunction:: hipfftEstimate3d
.. doxygenfunction:: hipfftEstimateMany
Accurate work area sizes
------------------------
After plan generation is complete, an accurate work area size can be
obtained using these routines.
.. doxygenfunction:: hipfftGetSize1d
.. doxygenfunction:: hipfftGetSize2d
.. doxygenfunction:: hipfftGetSize3d
.. doxygenfunction:: hipfftGetSizeMany
.. doxygenfunction:: hipfftXtGetSizeMany
Executing plans
===============
After you have created an FFT plan, you can execute it using one of the
``hipfftExec*`` functions.
.. doxygenfunction:: hipfftExecC2C
.. doxygenfunction:: hipfftExecR2C
.. doxygenfunction:: hipfftExecC2R
.. doxygenfunction:: hipfftExecZ2Z
.. doxygenfunction:: hipfftExecD2Z
.. doxygenfunction:: hipfftExecZ2D
.. doxygenfunction:: hipfftXtExec
.. _hip-graph-support-for-hipfft:
HIP graph support for hipFFT
============================
hipFFT supports capturing kernels launched during FFT execution into
HIP graph nodes. This way, you can capture the FFT execution and other work
into a HIP graph and launch the work in the graph
multiple times.
The following hipFFT APIs can be used with graph capture:
* :cpp:func:`hipfftExecC2C`
* :cpp:func:`hipfftExecR2C`
* :cpp:func:`hipfftExecC2R`
* :cpp:func:`hipfftExecZ2Z`
* :cpp:func:`hipfftExecD2Z`
* :cpp:func:`hipfftExecZ2D`
.. note::
Each launch of a HIP graph provides the same arguments
to the kernels in the graph. This implies that all of
the parameters to the above APIs remain valid while the HIP graph is
in use, including:
* The hipFFT plan
* The input and output buffers
hipFFT does not support capturing work performed by other API
functions other than those listed above.
Callbacks
=========
.. doxygenfunction:: hipfftXtSetCallback
.. doxygenfunction:: hipfftXtClearCallback
.. doxygenfunction:: hipfftXtSetCallbackSharedSize
Single-process multi-GPU transforms
===================================
hipFFT offers experimental support for distributing a transform
across multiple GPUs in a single process.
To implement this functionality, use the API as follows:
#. Create a hipFFT plan handle using :cpp:func:`hipfftCreate`.
#. Associate a set of GPU devices to the plan by calling :cpp:func:`hipfftXtSetGPUs`.
#. Make the plan by calling one of:
* :cpp:func:`hipfftMakePlan1d`
* :cpp:func:`hipfftMakePlan2d`
* :cpp:func:`hipfftMakePlan3d`
* :cpp:func:`hipfftMakePlanMany`
* :cpp:func:`hipfftMakePlanMany64`
* :cpp:func:`hipfftXtMakePlanMany`
#. Allocate memory for the data on the devices with
:cpp:func:`hipfftXtMalloc`, which returns the allocated memory as
a :cpp:struct:`hipLibXtDesc` descriptor.
#. Copy data from the host to the descriptor with :cpp:func:`hipfftXtMemcpy`.
#. Execute the plan by calling one of:
* :cpp:func:`hipfftXtExecDescriptor`
* :cpp:func:`hipfftXtExecDescriptorC2C`
* :cpp:func:`hipfftXtExecDescriptorR2C`
* :cpp:func:`hipfftXtExecDescriptorC2R`
* :cpp:func:`hipfftXtExecDescriptorZ2Z`
* :cpp:func:`hipfftXtExecDescriptorD2Z`
* :cpp:func:`hipfftXtExecDescriptorZ2D`
Pass the descriptor as input and output.
#. Copy the output from the descriptor back to the host with :cpp:func:`hipfftXtMemcpy`.
#. Free the descriptor using :cpp:func:`hipfftXtFree`.
#. Clean up the plan by calling :cpp:func:`hipfftDestroy`.
.. doxygenfunction:: hipfftXtSetGPUs
.. doxygenstruct:: hipXtDesc
.. doxygenstruct:: hipLibXtDesc
.. doxygenfunction:: hipfftXtMalloc
.. doxygenfunction:: hipfftXtFree
.. doxygenfunction:: hipfftXtMemcpy
.. doxygengroup:: hipfftXtExecDescriptor
Multi-process transforms
========================
hipFFT has experimental support for transforms that are distributed across MPI (Message
Passing Interface) processes.
Support for MPI transforms was introduced in ROCm 6.4 as part of hipFFT 1.0.18.
MPI must be initialized before creating a multi-process hipFFT plan.
.. note::
hipFFT MPI support is only available when the library is built
with the ``HIPFFT_MPI_ENABLE`` CMake option enabled. By default, MPI support
is off.
In addition, hipFFT MPI support requires the backend FFT library
to also support MPI. This means that either an MPI-enabled rocFFT
library or cuFFTMp must be used.
Finally, hipFFT API calls made on different ranks might return
different values. You must take care to ensure that all ranks
have successfully created their plans before attempting to execute
a distributed transform. It's possible for one rank to fail
to create and execute a plan while the others succeed.
Built-in decomposition
----------------------
hipFFT can automatically decide on the data decomposition for
distributed transforms. The API usage is similar to the
single-process, multi-GPU case described above.
#. On all ranks in the MPI communicator:
#. Create a hipFFT plan handle with :cpp:func:`hipfftCreate`.
#. Attach the MPI communicator to the plan with :cpp:func:`hipfftMpAttachComm`.
#. Make the plan by calling one of:
* :cpp:func:`hipfftMakePlan1d`
* :cpp:func:`hipfftMakePlan2d`
* :cpp:func:`hipfftMakePlan3d`
* :cpp:func:`hipfftMakePlanMany`
* :cpp:func:`hipfftMakePlanMany64`
* :cpp:func:`hipfftXtMakePlanMany`
.. note::
Not all backend FFT libraries support distributing all
transforms. Check the documentation for the backend FFT library
for any restrictions on distributed transform types, placement,
sizes, or data layouts.
#. Copy data from the host to the descriptor using :cpp:func:`hipfftXtMemcpy`.
#. Execute the plan by calling one of:
* :cpp:func:`hipfftXtExec`
* :cpp:func:`hipfftXtExecDescriptorC2C`
* :cpp:func:`hipfftXtExecDescriptorR2C`
* :cpp:func:`hipfftXtExecDescriptorC2R`
* :cpp:func:`hipfftXtExecDescriptorZ2Z`
* :cpp:func:`hipfftXtExecDescriptorD2Z`
* :cpp:func:`hipfftXtExecDescriptorZ2D`
#. Copy the output from the descriptor back to the host with :cpp:func:`hipfftXtMemcpy`.
#. Free the descriptor with :cpp:func:`hipfftXtFree`.
#. On all ranks in the MPI communicator, clean up the plan by calling :cpp:func:`hipfftDestroy`.
Custom decomposition
--------------------
hipFFT also allows an arbitrary decomposition of the FFT into 1D, 2D, or
3D bricks. Each MPI rank calls :cpp:func:`hipfftXtSetDistribution`
during plan creation to declare which input and output brick resides
on that rank.
The same API calls are made on each rank in the MPI communicator as follows:
#. Create a hipFFT plan handle with :cpp:func:`hipfftCreate`.
#. Attach the MPI communicator to the plan with :cpp:func:`hipfftMpAttachComm`.
#. Call :cpp:func:`hipfftXtSetDistribution` to specify the input and output brick for the current rank.
Bricks are specified by their lower and upper coordinates in
the input/output index space. The lower coordinate is
inclusive (contained within the brick) and the upper
coordinate is exclusive (first index past the end of the
brick).
Strides for the input/output data are also provided, to
describe how the bricks are laid out in physical memory.
Each coordinate and stride contain the same number of elements as
the number of dimensions in the FFT. This also implies
that batched FFTs are not supported when using MPI, because the
coordinates and strides do not contain information about the batch
dimension.
#. Make the plan by calling one of:
* :cpp:func:`hipfftMakePlan1d`
* :cpp:func:`hipfftMakePlan2d`
* :cpp:func:`hipfftMakePlan3d`
The "PlanMany" APIs enable batched FFTs and are not usable with
MPI.
.. note::
Not all backend FFT libraries support distributing all
transforms. Consult the documentation for the backend FFT library
for any restrictions on distributed transform types, placement,
sizes, or data layouts.
#. Call :cpp:func:`hipfftXtMalloc` with
:cpp:enum:`HIPFFT_XT_FORMAT_DISTRIBUTED_INPUT` to
allocate the input brick on the current rank. The allocated
memory is returned as a :cpp:struct:`hipLibXtDesc` descriptor.
#. Call :cpp:func:`hipfftXtMalloc` with
:cpp:enum:`HIPFFT_XT_FORMAT_DISTRIBUTED_OUTPUT` to
allocate the output brick on the current rank. The allocated
memory is returned as a :cpp:struct:`hipLibXtDesc` descriptor.
#. Initialize the memory pointed to by the descriptor.
#. Execute the plan by calling one of:
* :cpp:func:`hipfftXtExecDescriptor`
* :cpp:func:`hipfftXtExecDescriptorC2C`
* :cpp:func:`hipfftXtExecDescriptorR2C`
* :cpp:func:`hipfftXtExecDescriptorC2R`
* :cpp:func:`hipfftXtExecDescriptorZ2Z`
* :cpp:func:`hipfftXtExecDescriptorD2Z`
* :cpp:func:`hipfftXtExecDescriptorZ2D`
Pass the input descriptor as input and the output descriptor as output.
#. Use the transformed data pointed to by the output descriptor.
#. Free the descriptors with :cpp:func:`hipfftXtFree`.
#. Clean up the plan by calling :cpp:func:`hipfftDestroy`.
.. doxygenfunction:: hipfftMpAttachComm
.. doxygenfunction:: hipfftXtSetDistribution
.. doxygenfunction:: hipfftXtSetSubformatDefault
|