Package: rocblas / 5.5.1+dfsg-7

Metadata

Package Version Patches format
rocblas 5.5.1+dfsg-7 3.0 (quilt)

Patch series

view the series file
Patch File delta Description
0001 use generic blas for reference.patch | (download)

clients/CMakeLists.txt | 22 2 + 20 - 0 !
1 file changed, 2 insertions(+), 20 deletions(-)

 use generic blas for reference

The upstream project typically uses either the AOCL BLIS library or the
Netlib BLAS library as the reference implementation in the test suite on
Linux. However, the OpenBLAS library is used by upstream on Windows. It
would be nice to use OpenBLAS on Debian for performance reasons (as the
test suite is heavily CPU-bound), however, the Netlib implementation
seems to be more reliable for achieving a full suite of passing tests.

0002 remove use of pip and virtualenv.patch | (download)

CMakeLists.txt | 27 2 + 25 - 0 !
1 file changed, 2 insertions(+), 25 deletions(-)

 remove use of pip and virtualenv

The upstream project creates a virtualenv and uses pip to install the
Python dependencies during a build. In the Debian build, all the Python
dependencies are already provided by packages, so there's no need for
all that complexity.

When contributed upstream, this functionality was guarded behind the
cmake option -DBUILD_WITH_PIP=OFF. Tensile_ROOT can also be passed from
d/rules (if necessary) so this patch can be dropped with ROCm 5.7.

0003 use local mathjax.patch | (download)

docs/source/conf.py | 3 3 + 0 - 0 !
1 file changed, 3 insertions(+)

 use local mathjax

The sphinx.ext.mathjax extension defaults to loading mathjax from a
CDN, which results in the lintian warning 'privacy-breach-generic'.
Use a local copy of mathjax to prevent that problem.

0004 mark known bugs.patch | (download)

clients/gtest/known_bugs.yaml | 2 2 + 0 - 0 !
1 file changed, 2 insertions(+)

 mark known bugs

In ROCm 5.5, the FP16 High-Precision Accumulate checks are also offset
from the correct answer by margins slightly greater than those allowed.

0005 make openmp optional.patch | (download)

clients/common/blis_interface.cpp | 2 2 + 0 - 0 !
clients/common/cblas_interface.cpp | 12 12 + 0 - 0 !
clients/include/rocblas_init.hpp | 52 52 + 0 - 0 !
clients/samples/example_openmp.cpp | 2 2 + 0 - 0 !
tensile/HostLibraryTests/CachingLibrary_test.cpp | 2 2 + 0 - 0 !
tensile/HostLibraryTests/testlib/include/TestUtils.hpp | 7 6 + 1 - 0 !
6 files changed, 76 insertions(+), 1 deletion(-)

 make openmp optional

0006 move tensile library into versioned subdir.patch | (download)

library/src/tensile_host.cpp | 10 9 + 1 - 0 !
1 file changed, 9 insertions(+), 1 deletion(-)

 move tensile library into versioned subdir

The Tensile library contains optimized kernels that are loaded at
runtime by rocblas, and thus must be a part of the library package.
0007 remove references to dfsg violating kernels.patch | (download)

library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Ailk_Bjlk_DB.yaml | 169126 0 + 169126 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Ailk_Bjlk_DB_GB.yaml | 152505 0 + 152505 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran_104cu/aldebaran_Cijk_Ailk_Bjlk_DB.yaml | 155832 0 + 155832 - 0 !
library/src/blas3/Tensile/Logic/nonMFMA_legacy/aldebaran_Cijk_Ailk_Bjlk_DB.yaml | 144526 0 + 144526 - 0 !
4 files changed, 621989 deletions(-)

 remove references to dfsg-violating kernels

The DGEMM_Aldebaran_PKFixedAtomic512Latest and
DGEMM_Aldebaran_PKFixedAtomic512_104 kernels were removed for dfsg
reasons, and references to those kernels must be removed to fix the
build. This will result in a performance drop on MI200 GPUs because
the tuned assembly kernels will be replaced with fallback
implementations for these problems.

This problem has been reported upstream and they intend to supply a
better fix.

0008 ensure replacementkernels cov3 dir exists.patch | (download)

tensile/Tensile/ReplacementKernels-cov3/README.txt | 1 1 + 0 - 0 !
1 file changed, 1 insertion(+)

 ensure replacementkernels-cov3 dir exists

All files in this directory were removed for dfsg violations, but the
directory itself is still required for the build.

0009 hide kernel symbols.patch | (download)

library/src/blas2/rocblas_trsv_kernels.cpp | 2 1 + 1 - 0 !
library/src/include/macros.hpp | 4 2 + 2 - 0 !
2 files changed, 3 insertions(+), 3 deletions(-)

 hide kernel symbols

If not marked as static, the rocblas kernels would be weak public
symbols. They are not intended to be visible, but are not affected
by -fvisiblity=hidden and cannot be entirely hidden except by being
marked as static.

Applied-Uptream: https://github.com/ROCmSoftwarePlatform/rocBLAS/commit/c311f3ce684368091acae744c924bcddea4add33

0010 fix sample includes.patch | (download)

clients/samples/example_c_dgeam.c | 2 1 + 1 - 0 !
clients/samples/example_hip_complex_her2.cpp | 2 1 + 1 - 0 !
clients/samples/example_sgemm_strided_batched.cpp | 2 1 + 1 - 0 !
3 files changed, 3 insertions(+), 3 deletions(-)

 fix sample includes

0011 disable stdc extension in header.patch | (download)

library/include/internal/rocblas-types.h | 3 0 + 3 - 0 !
1 file changed, 3 deletions(-)

 disable stdc extension in header

The request for any STDC extension should not be controlled by a header or
else the behaviour of the program will change depending on the order of the
includes. This define is being removed upstream in ROCm 6.0.

Bug: https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1301

0012 expand isa compatibility.patch | (download)

library/src/handle.cpp | 26 26 + 0 - 0 !
library/src/rocblas_auxiliary.cpp | 26 26 + 0 - 0 !
library/src/tensile_host.cpp | 26 26 + 0 - 0 !
tensile/Tensile/Source/lib/source/hip/HipHardware.cpp | 27 27 + 0 - 0 !
4 files changed, 105 insertions(+)

 expand isa compatibility

This is not an ideal solution, but there are a number of ISAs that are
subsets of gfx900, gfx1010 and gfx1030. The simplest way to get
rocBLAS and Tensile to load the compatible kernels when running on
architectures compatible with those ISAs is to simply report the
GPU as being of the supported type.

There is no way this patch would be accepted upstream as it is expected
that they will implement a better solution... eventually.

0013 disable rotg nan check.patch | (download)

clients/gtest/blas1_gtest.yaml | 18 15 + 3 - 0 !
clients/include/blas1/testing_rotg.hpp | 202 91 + 111 - 0 !
clients/include/blas1/testing_rotg_batched.hpp | 229 108 + 121 - 0 !
clients/include/blas1/testing_rotg_strided_batched.hpp | 239 116 + 123 - 0 !
clients/include/lapack_utilities.hpp | 25 12 + 13 - 0 !
clients/include/type_dispatch.hpp | 19 12 + 7 - 0 !
library/src/blas1/rocblas_rotg_kernels.cpp | 14 7 + 7 - 0 !
7 files changed, 361 insertions(+), 385 deletions(-)

 [patch] refactor rotg_test code (#1632)

* refactor rotg_test code

* for rotg use alpha and beta in place of rotga and rotgb

* correct rotg initialization

Bug: https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1287
0014 spellcheck.patch | (download)

clients/include/rocblas_common.yaml | 2 1 + 1 - 0 !
library/src/check_numerics_vector.cpp | 4 2 + 2 - 0 !
tensile/Tensile/Source/lib/include/Tensile/PropertyMatching.hpp | 2 1 + 1 - 0 !
3 files changed, 4 insertions(+), 4 deletions(-)

 spellcheck

All fixes have been forwarded. Some were also fixed upstream in
https://github.com/ROCmSoftwarePlatform/rocBLAS/commit/53c8ce8d3eb2eee9c7ca6711522efbf882de1646

0015 move rocsblas test data to share.patch | (download)

clients/gtest/rocblas_gtest_main.cpp | 23 22 + 1 - 0 !
1 file changed, 22 insertions(+), 1 deletion(-)

 move rocsblas test data to share

The rocblas_test.data file is a binary file containing arguments to
test with rocblas functions (e.g., various combinations of matrix
sizes and other similar options). It is created by rocblas_gentest.py
and is architecture-independent (as it always uses network byte order).

0016 disable replacement kernels.patch | (download)

library/src/blas3/Tensile/Logic/archive/vega20_Cijk_Ailk_Bjlk_DB.yaml | 172 86 + 86 - 0 !
library/src/blas3/Tensile/Logic/asm_ci/vega20_Cijk_Ailk_Bjlk_DB.yaml | 172 86 + 86 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Alik_Bljk_SB_GB.yaml | 8 4 + 4 - 0 !
library/src/blas3/Tensile/Logic/asm_full/arcturus/arcturus_Cijk_Ailk_Bjlk_DB.yaml | 80 40 + 40 - 0 !
library/src/blas3/Tensile/Logic/asm_full/arcturus/arcturus_Cijk_Alik_Bljk_SB.yaml | 8 4 + 4 - 0 !
library/src/blas3/Tensile/Logic/asm_full/vega20/vega20_Cijk_Ailk_Bjlk_DB.yaml | 172 86 + 86 - 0 !
6 files changed, 306 insertions(+), 306 deletions(-)

 disable replacement kernels

The replacement kernels were removed for DFSG reasons. Attempting to
use them in rocBLAS anyway will cause non-deterministic errors at
build-time and run-time.

The upstream project is committed to eliminating the closed-source
kernels, so this should be a non-issue in the near future.

Bug-Debian: http://bugs.debian.org/1042036
0017 print kernel name for missing attribute error.patch | (download)

tensile/Tensile/KernelWriter.py | 2 2 + 0 - 0 !
1 file changed, 2 insertions(+)

 print kernel name for missing attribute error

0018 verbose tensile source kernel build.patch | (download)

tensile/Tensile/TensileCreateLibrary.py | 9 2 + 7 - 0 !
1 file changed, 2 insertions(+), 7 deletions(-)

 verbose tensile source kernel build

The build of the Tensile source kernels takes quite a long time, so it
may time out on slower machines if there is no output in too long. The
verbose flag should add some output at the start of the build for each
offload architecture, which should help prevent timeout.

0019 remove x86 intrinsics.patch | (download)

clients/include/rocblas_math.hpp | 1 0 + 1 - 0 !
1 file changed, 1 deletion(-)

 remove x86 intrinsics

The x86 intrinsics don't seem to be used.

0020 msgpack names.patch | (download)

tensile/Tensile/Source/lib/CMakeLists.txt | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 [patch] fix for newer windows vcpkg msgpack (#1827)


0021 msgpack cxx support.patch | (download)

tensile/Tensile/Source/lib/CMakeLists.txt | 4 3 + 1 - 0 !
1 file changed, 3 insertions(+), 1 deletion(-)

 [patch] another vcpkg version package name fix (#1836)

* more vcpkg package options


0022 reserved identifiers.patch | (download)

library/include/internal/rocblas-auxiliary.h | 8 4 + 4 - 0 !
library/include/internal/rocblas-beta.h | 8 4 + 4 - 0 !
library/include/internal/rocblas-complex-types.h | 6 3 + 3 - 0 !
library/include/internal/rocblas-functions.h | 8 4 + 4 - 0 !
library/include/internal/rocblas-types.h | 8 4 + 4 - 0 !
library/include/internal/rocblas-version.h.in | 8 4 + 4 - 0 !
library/include/internal/rocblas_bfloat16.h | 6 3 + 3 - 0 !
library/include/rocblas.h | 8 4 + 4 - 0 !
8 files changed, 30 insertions(+), 30 deletions(-)

 [patch] fix reserved identifiers in include guards (#1600)

The include guards have been changed to the filename in uppercase
letters with all non-alphanumeric symbols replaced by underscore.
This include guard pattern matches the guard that is used for the
generated file rocsparse-export.h.

The C and C++ standards reserve all identifiers that begin with an
underscore followed by a capital letter [C99 7.1.3]
[C++11 17.6.4.3.2].

0023 remove mf16c flag.patch | (download)

clients/benchmarks/CMakeLists.txt | 4 0 + 4 - 0 !
clients/gtest/CMakeLists.txt | 4 0 + 4 - 0 !
clients/samples/CMakeLists.txt | 2 0 + 2 - 0 !
library/src/CMakeLists.txt | 2 0 + 2 - 0 !
4 files changed, 12 deletions(-)

 [patch] remove mf16c flag as f16 intrinsics _cvtss_sh, _cvtsh_ss no
 longer used

Bug: https://github.com/ROCm/rocBLAS/issues/1422
Bug-Debian: https://bugs.debian.org/1075724
0024 use xnack specialized assembly kernels with gfx90a.patch | (download)

CMakeLists.txt | 5 4 + 1 - 0 !
1 file changed, 4 insertions(+), 1 deletion(-)

 use xnack-specialized assembly kernels with gfx90a

This change passes the xnack-specialized targets gfx90a:xnack- and
gfx90a:xnack+ for the Tensile architectures when rocBLAS is built for
the non-specialized gfx90a target. This helps to reduce the library
binary size without affecting the assembly kernels in Tensile.

0025 spelling.patch | (download)

clients/benchmarks/client.cpp | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 fix spelling