File: fft-api-usage.rst

package info (click to toggle)
hipfft 6.4.3-2
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 2,256 kB
  • sloc: cpp: 27,648; python: 170; makefile: 48; xml: 15; sh: 12
file content (386 lines) | stat: -rw-r--r-- 11,368 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
.. meta::
  :description: hipFFT documentation and API reference library
  :keywords: FFT, hipFFT, rocFFT, ROCm, API, documentation

.. _hipfft-api-usage:

********************************************************************
hipFFT API usage
********************************************************************

This section describes how to use the hipFFT library API. The hipFFT
API follows the NVIDIA CUDA `cuFFT`_ API.

.. _cuFFT: https://docs.nvidia.com/cuda/cufft/

Data types
==========

There are a few data structures that are internal to the library. The
pointer types to these structures are listed below. Use these types to
create handles and pass them between
different library functions.

.. doxygendefine:: HIPFFT_FORWARD

.. doxygendefine:: HIPFFT_BACKWARD

.. doxygenenum:: hipfftType

.. doxygentypedef:: hipfftHandle

.. doxygenenum:: hipfftResult


Simple plans
============

These planning routines allocate a plan for you.  If execution of the
plan requires a work buffer, it will be created and destroyed
automatically.

.. doxygenfunction:: hipfftPlan1d

.. doxygenfunction:: hipfftPlan2d

.. doxygenfunction:: hipfftPlan3d


User managed simple plans
-------------------------

These planning routines assume that you have allocated a plan
(``hipfftHandle``) yourself and that you will manage a work area.

.. doxygenfunction:: hipfftCreate

.. doxygenfunction:: hipfftDestroy

.. doxygenfunction:: hipfftSetAutoAllocation

.. doxygenfunction:: hipfftMakePlan1d

.. doxygenfunction:: hipfftMakePlan2d

.. doxygenfunction:: hipfftMakePlan3d


Advanced plans
===================

.. doxygenfunction:: hipfftMakePlanMany
.. doxygenfunction:: hipfftXtMakePlanMany



Estimating work area sizes
==========================

These calls return estimates of the work area required to support a
plan generated with the same parameters (either with the simple or
extensible API). Applications that manage the work area allocation
themselves must use this call after plan generation and
after any ``hipfftSet*()`` calls subsequent to the plan generation if those
calls can alter the required work space size.

.. doxygenfunction:: hipfftEstimate1d

.. doxygenfunction:: hipfftEstimate2d

.. doxygenfunction:: hipfftEstimate3d

.. doxygenfunction:: hipfftEstimateMany


Accurate work area sizes
------------------------

After plan generation is complete, an accurate work area size can be
obtained using these routines.

.. doxygenfunction:: hipfftGetSize1d

.. doxygenfunction:: hipfftGetSize2d

.. doxygenfunction:: hipfftGetSize3d

.. doxygenfunction:: hipfftGetSizeMany

.. doxygenfunction:: hipfftXtGetSizeMany
		     

Executing plans
===============

After you have created an FFT plan, you can execute it using one of the
``hipfftExec*`` functions.

.. doxygenfunction:: hipfftExecC2C

.. doxygenfunction:: hipfftExecR2C

.. doxygenfunction:: hipfftExecC2R

.. doxygenfunction:: hipfftExecZ2Z

.. doxygenfunction:: hipfftExecD2Z

.. doxygenfunction:: hipfftExecZ2D

.. doxygenfunction:: hipfftXtExec
		     
.. _hip-graph-support-for-hipfft:

HIP graph support for hipFFT
============================

hipFFT supports capturing kernels launched during FFT execution into
HIP graph nodes. This way, you can capture the FFT execution and other work
into a HIP graph and launch the work in the graph
multiple times.

The following hipFFT APIs can be used with graph capture:

* :cpp:func:`hipfftExecC2C`

* :cpp:func:`hipfftExecR2C`

* :cpp:func:`hipfftExecC2R`

* :cpp:func:`hipfftExecZ2Z`

* :cpp:func:`hipfftExecD2Z`

* :cpp:func:`hipfftExecZ2D`

.. note::

   Each launch of a HIP graph provides the same arguments
   to the kernels in the graph. This implies that all of
   the parameters to the above APIs remain valid while the HIP graph is
   in use, including:

   *  The hipFFT plan

   *  The input and output buffers

   hipFFT does not support capturing work performed by other API
   functions other than those listed above.

Callbacks
=========

.. doxygenfunction:: hipfftXtSetCallback
.. doxygenfunction:: hipfftXtClearCallback	     
.. doxygenfunction:: hipfftXtSetCallbackSharedSize

		     
Single-process multi-GPU transforms
===================================

hipFFT offers experimental support for distributing a transform
across multiple GPUs in a single process.

To implement this functionality, use the API as follows:

#. Create a hipFFT plan handle using :cpp:func:`hipfftCreate`.

#. Associate a set of GPU devices to the plan by calling :cpp:func:`hipfftXtSetGPUs`.

#. Make the plan by calling one of:

   * :cpp:func:`hipfftMakePlan1d`
   * :cpp:func:`hipfftMakePlan2d`
   * :cpp:func:`hipfftMakePlan3d`
   * :cpp:func:`hipfftMakePlanMany`
   * :cpp:func:`hipfftMakePlanMany64`
   * :cpp:func:`hipfftXtMakePlanMany`

#. Allocate memory for the data on the devices with
   :cpp:func:`hipfftXtMalloc`, which returns the allocated memory as
   a :cpp:struct:`hipLibXtDesc` descriptor.

#. Copy data from the host to the descriptor with :cpp:func:`hipfftXtMemcpy`.

#. Execute the plan by calling one of:

   * :cpp:func:`hipfftXtExecDescriptor`
   * :cpp:func:`hipfftXtExecDescriptorC2C`
   * :cpp:func:`hipfftXtExecDescriptorR2C`
   * :cpp:func:`hipfftXtExecDescriptorC2R`
   * :cpp:func:`hipfftXtExecDescriptorZ2Z`
   * :cpp:func:`hipfftXtExecDescriptorD2Z`
   * :cpp:func:`hipfftXtExecDescriptorZ2D`

   Pass the descriptor as input and output.

#. Copy the output from the descriptor back to the host with :cpp:func:`hipfftXtMemcpy`.

#. Free the descriptor using :cpp:func:`hipfftXtFree`.

#. Clean up the plan by calling :cpp:func:`hipfftDestroy`.

.. doxygenfunction:: hipfftXtSetGPUs

.. doxygenstruct:: hipXtDesc
.. doxygenstruct:: hipLibXtDesc

.. doxygenfunction:: hipfftXtMalloc
.. doxygenfunction:: hipfftXtFree
.. doxygenfunction:: hipfftXtMemcpy
		     
.. doxygengroup:: hipfftXtExecDescriptor

Multi-process transforms
========================

hipFFT has experimental support for transforms that are distributed across MPI (Message 
Passing Interface) processes.

Support for MPI transforms was introduced in ROCm 6.4 as part of hipFFT 1.0.18.

MPI must be initialized before creating a multi-process hipFFT plan.

.. note::

   hipFFT MPI support is only available when the library is built
   with the ``HIPFFT_MPI_ENABLE`` CMake option enabled. By default, MPI support
   is off.

   In addition, hipFFT MPI support requires the backend FFT library
   to also support MPI. This means that either an MPI-enabled rocFFT
   library or cuFFTMp must be used.

   Finally, hipFFT API calls made on different ranks might return
   different values. You must take care to ensure that all ranks
   have successfully created their plans before attempting to execute
   a distributed transform. It's possible for one rank to fail
   to create and execute a plan while the others succeed.

Built-in decomposition
----------------------

hipFFT can automatically decide on the data decomposition for
distributed transforms. The API usage is similar to the
single-process, multi-GPU case described above.

#. On all ranks in the MPI communicator:

   #. Create a hipFFT plan handle with :cpp:func:`hipfftCreate`.

   #. Attach the MPI communicator to the plan with :cpp:func:`hipfftMpAttachComm`.

   #. Make the plan by calling one of:

      * :cpp:func:`hipfftMakePlan1d`
      * :cpp:func:`hipfftMakePlan2d`
      * :cpp:func:`hipfftMakePlan3d`
      * :cpp:func:`hipfftMakePlanMany`
      * :cpp:func:`hipfftMakePlanMany64`
      * :cpp:func:`hipfftXtMakePlanMany`

   .. note::

      Not all backend FFT libraries support distributing all
      transforms. Check the documentation for the backend FFT library
      for any restrictions on distributed transform types, placement,
      sizes, or data layouts.

#. Copy data from the host to the descriptor using :cpp:func:`hipfftXtMemcpy`.

#. Execute the plan by calling one of:

   * :cpp:func:`hipfftXtExec`
   * :cpp:func:`hipfftXtExecDescriptorC2C`
   * :cpp:func:`hipfftXtExecDescriptorR2C`
   * :cpp:func:`hipfftXtExecDescriptorC2R`
   * :cpp:func:`hipfftXtExecDescriptorZ2Z`
   * :cpp:func:`hipfftXtExecDescriptorD2Z`
   * :cpp:func:`hipfftXtExecDescriptorZ2D`

#. Copy the output from the descriptor back to the host with :cpp:func:`hipfftXtMemcpy`.

#. Free the descriptor with :cpp:func:`hipfftXtFree`.

#. On all ranks in the MPI communicator, clean up the plan by calling :cpp:func:`hipfftDestroy`.

Custom decomposition
--------------------

hipFFT also allows an arbitrary decomposition of the FFT into 1D, 2D, or
3D bricks. Each MPI rank calls :cpp:func:`hipfftXtSetDistribution`
during plan creation to declare which input and output brick resides
on that rank.

The same API calls are made on each rank in the MPI communicator as follows:

#. Create a hipFFT plan handle with :cpp:func:`hipfftCreate`.

#. Attach the MPI communicator to the plan with :cpp:func:`hipfftMpAttachComm`.

#. Call :cpp:func:`hipfftXtSetDistribution` to specify the input and output brick for the current rank.

   Bricks are specified by their lower and upper coordinates in
   the input/output index space. The lower coordinate is
   inclusive (contained within the brick) and the upper
   coordinate is exclusive (first index past the end of the
   brick).

   Strides for the input/output data are also provided, to
   describe how the bricks are laid out in physical memory.

   Each coordinate and stride contain the same number of elements as
   the number of dimensions in the FFT. This also implies
   that batched FFTs are not supported when using MPI, because the
   coordinates and strides do not contain information about the batch
   dimension.

#. Make the plan by calling one of:

   * :cpp:func:`hipfftMakePlan1d`
   * :cpp:func:`hipfftMakePlan2d`
   * :cpp:func:`hipfftMakePlan3d`

   The "PlanMany" APIs enable batched FFTs and are not usable with
   MPI.

   .. note::

      Not all backend FFT libraries support distributing all
      transforms. Consult the documentation for the backend FFT library
      for any restrictions on distributed transform types, placement,
      sizes, or data layouts.

#. Call :cpp:func:`hipfftXtMalloc` with
   :cpp:enum:`HIPFFT_XT_FORMAT_DISTRIBUTED_INPUT` to
   allocate the input brick on the current rank. The allocated
   memory is returned as a :cpp:struct:`hipLibXtDesc` descriptor.

#. Call :cpp:func:`hipfftXtMalloc` with
   :cpp:enum:`HIPFFT_XT_FORMAT_DISTRIBUTED_OUTPUT` to
   allocate the output brick on the current rank. The allocated
   memory is returned as a :cpp:struct:`hipLibXtDesc` descriptor.

#. Initialize the memory pointed to by the descriptor.

#. Execute the plan by calling one of:

   * :cpp:func:`hipfftXtExecDescriptor`
   * :cpp:func:`hipfftXtExecDescriptorC2C`
   * :cpp:func:`hipfftXtExecDescriptorR2C`
   * :cpp:func:`hipfftXtExecDescriptorC2R`
   * :cpp:func:`hipfftXtExecDescriptorZ2Z`
   * :cpp:func:`hipfftXtExecDescriptorD2Z`
   * :cpp:func:`hipfftXtExecDescriptorZ2D`

   Pass the input descriptor as input and the output descriptor as output.

#. Use the transformed data pointed to by the output descriptor.

#. Free the descriptors with :cpp:func:`hipfftXtFree`.

#. Clean up the plan by calling :cpp:func:`hipfftDestroy`.

.. doxygenfunction:: hipfftMpAttachComm
.. doxygenfunction:: hipfftXtSetDistribution
.. doxygenfunction:: hipfftXtSetSubformatDefault