File: beta-features.rst

package info (click to toggle)
rocblas 6.4.4-3
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 1,082,776 kB
  • sloc: cpp: 244,923; f90: 50,012; python: 50,003; sh: 24,630; asm: 8,917; makefile: 151; ansic: 107; xml: 36; awk: 14
file content (81 lines) | stat: -rw-r--r-- 3,206 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
.. meta::
  :description: rocBLAS documentation and API reference library
  :keywords: rocBLAS, ROCm, API, Linear Algebra, documentation

.. _beta-features:

********************************************************************
rocBLAS beta features
********************************************************************

To allow for future growth and changes, the features in this section are not subject to the same
level of backwards compatibility and support as the normal rocBLAS API. These features are subject
to change and removal in future release of rocBLAS.

To use the following beta API features, define ``ROCBLAS_BETA_FEATURES_API`` before including ``rocblas.h``.

rocblas_gemm_ex_get_solutions + batched, strided_batched
=========================================================

.. doxygenfunction:: rocblas_gemm_ex_get_solutions
.. doxygenfunction:: rocblas_gemm_ex_get_solutions_by_type
.. doxygenfunction:: rocblas_gemm_batched_ex_get_solutions
.. doxygenfunction:: rocblas_gemm_batched_ex_get_solutions_by_type
.. doxygenfunction:: rocblas_gemm_strided_batched_ex_get_solutions

rocblas_gemm_ex3 + batched, strided_batched
=========================================================

.. doxygenfunction:: rocblas_gemm_ex3
.. doxygenfunction:: rocblas_gemm_batched_ex3
.. doxygenfunction:: rocblas_gemm_strided_batched_ex3

Graph support for rocBLAS
=========================================================

Most of the rocBLAS functions can be captured into a graph node via Graph Management HIP APIs,
except those listed in :ref:`Functions Unsupported with Graph Capture`.
For a list of graph related HIP APIs, see
`Graph Management HIP API <https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/group___graph.html#graph-management>`_.

The following code creates a graph with ``rocblas_function()`` as graph node.

.. code-block:: c++

      CHECK_HIP_ERROR((hipStreamBeginCapture(stream, hipStreamCaptureModeGlobal));
      rocblas_<function>(<arguments>);
      CHECK_HIP_ERROR(hipStreamEndCapture(stream, &graph));

The captured graph can be launched as shown below:

.. code-block:: c++

      CHECK_HIP_ERROR(hipGraphInstantiate(&instance, graph, NULL, NULL, 0));
      CHECK_HIP_ERROR(hipGraphLaunch(instance, stream));

Graph support requires asynchronous HIP APIs, so users must enable stream-order memory allocation.
For more details, see :ref:`stream order alloc`.

During stream capture, rocBLAS stores the allocated host and device memory in the handle.
The allocated memory is freed when the handle is destroyed.

.. _Functions Unsupported with Graph Capture:

Functions unsupported with Graph Capture
=========================================================

The following Level-1 functions place results into host buffers (in pointer mode host) which enforces synchronization.

*  ``dot``
*  ``asum``
*  ``nrm2``
*  ``imax``
*  ``imin``

BLAS Level-3 and BLAS-EX functions in pointer mode device do not support HIP Graph. Support will be added in a future release.

HIP Graph known issues in rocBLAS
=========================================================

On the Windows platform, batched functions (Level-1, Level-2, and Level-3) produce incorrect results.