Package: rocblas / 6.4.4-1

Metadata

Package Version Patches format
rocblas 6.4.4-1 3.0 (quilt)

Patch series

view the series file
Patch File delta Description
use generic blas for reference.patch | (download)

clients/CMakeLists.txt | 53 3 + 50 - 0 !
1 file changed, 3 insertions(+), 50 deletions(-)

 use generic blas for reference

The upstream project typically uses either the AOCL BLIS library or the
Netlib BLAS library as the reference implementation in the test suite on
Linux. However, the OpenBLAS library is used by upstream on Windows. It
would be nice to use OpenBLAS on Debian for performance reasons (as the
test suite is heavily CPU-bound), however, the Netlib implementation
seems to be more reliable for achieving a full suite of passing tests.

remove use of pip and virtualenv.patch | (download)

CMakeLists.txt | 3 2 + 1 - 0 !
1 file changed, 2 insertions(+), 1 deletion(-)

 remove use of pip and virtualenv

The upstream project creates a virtualenv and uses pip to install the
Python dependencies during a build. In the Debian build, all the Python
dependencies are already provided by packages, so there's no need for
all that complexity.

When contributed upstream, this functionality was guarded behind the
cmake option -DBUILD_WITH_PIP=OFF. Tensile_ROOT can also be passed from
d/rules (if necessary) so this patch can be dropped with ROCm 5.7.

mark known bugs.patch | (download)

clients/gtest/known_bugs.yaml | 2 2 + 0 - 0 !
1 file changed, 2 insertions(+)

 mark known bugs

In ROCm 5.5, the FP16 High-Precision Accumulate checks are also offset
from the correct answer by margins slightly greater than those allowed.

make openmp optional.patch | (download)

tensile/HostLibraryTests/CachingLibrary_test.cpp | 2 2 + 0 - 0 !
tensile/HostLibraryTests/testlib/include/TestUtils.hpp | 7 6 + 1 - 0 !
2 files changed, 8 insertions(+), 1 deletion(-)

 make openmp optional

move tensile library into versioned subdir.patch | (download)

library/src/tensile_host.cpp | 10 9 + 1 - 0 !
1 file changed, 9 insertions(+), 1 deletion(-)

 move tensile library into versioned subdir

The Tensile library contains optimized kernels that are loaded at
runtime by rocblas, and thus must be a part of the library package.
expand isa compatibility.patch | (download)

library/src/handle.cpp | 1 1 + 0 - 0 !
library/src/rocblas_auxiliary.cpp | 46 46 + 0 - 0 !
library/src/tensile_host.cpp | 1 1 + 0 - 0 !
tensile/Tensile/Source/lib/source/hip/HipHardware.cpp | 49 49 + 0 - 0 !
4 files changed, 97 insertions(+)

 expand isa compatibility

This is not an ideal solution, but there are a number of ISAs that are
subsets of gfx900, gfx1010 and gfx1030. The simplest way to get
rocBLAS and Tensile to load the compatible kernels when running on
architectures compatible with those ISAs is to simply report the
GPU as being of the supported type.

There is no way this patch would be accepted upstream as it is expected
that they will implement a better solution... eventually.

Updated by @ckk to support HIP >= 6.

Enable changing directory for test data.patch | (download)

clients/gtest/CMakeLists.txt | 5 5 + 0 - 0 !
clients/gtest/rocblas_gtest_main.cpp | 10 9 + 1 - 0 !
2 files changed, 14 insertions(+), 1 deletion(-)

 enable changing directory for test data

On Debian, we install to a versioned directory based on the library
name.

print kernel name for missing attribute error.patch | (download)

tensile/Tensile/KernelWriter.py | 2 2 + 0 - 0 !
1 file changed, 2 insertions(+)

 print kernel name for missing attribute error

verbose tensile source kernel build.patch | (download)

tensile/Tensile/TensileCreateLibrary.py | 16 10 + 6 - 0 !
1 file changed, 10 insertions(+), 6 deletions(-)

 verbose tensile source kernel build

The build of the Tensile source kernels takes quite a long time, so it
may time out on slower machines if there is no output in too long. The
verbose flag should add some output at the start of the build for each
offload architecture, which should help prevent timeout.

Skip git requirement.patch | (download)

library/CMakeLists.txt | 15 8 + 7 - 0 !
1 file changed, 8 insertions(+), 7 deletions(-)

 skip git requirement

It appears to be used only for the git commit ID, which we can work
around.

Use local mathjax.patch | (download)

docs/conf.py | 3 3 + 0 - 0 !
1 file changed, 3 insertions(+)

 use local mathjax

The sphinx.ext.mathjax extension defaults to loading mathjax from a
CDN, which results in the lintian warning 'privacy-breach-generic'.
Use a local copy of mathjax to prevent that problem.

Extend docs conf.py for offline build.patch | (download)

docs/conf.py | 2 2 + 0 - 0 !
1 file changed, 2 insertions(+)

 extend docs/conf.py for offline build

By setting these extra variables, we can suppress a remote call which
would cause the build to fail.

drop Cijk_Ailk_Bjlk_4xi8 as workaround.patch | (download)

library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Ailk_Bjlk_4xi8II_BH.yaml | 12777 0 + 12777 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Ailk_Bjlk_4xi8II_BH_GB.yaml | 10163 0 + 10163 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Ailk_Bljk_4xi8II_BH.yaml | 21761 0 + 21761 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Ailk_Bljk_4xi8II_BH_GB.yaml | 21215 0 + 21215 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Alik_Bjlk_4xi8II_BH.yaml | 17197 0 + 17197 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Alik_Bjlk_4xi8II_BH_GB.yaml | 13657 0 + 13657 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Alik_Bljk_4xi8II_BH.yaml | 17713 0 + 17713 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran/aldebaran_Cijk_Alik_Bljk_4xi8II_BH_GB.yaml | 17317 0 + 17317 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran_104cu/aldebaran_Cijk_Ailk_Bjlk_4xi8II_BH.yaml | 11818 0 + 11818 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran_104cu/aldebaran_Cijk_Ailk_Bljk_4xi8II_BH.yaml | 21561 0 + 21561 - 0 !
library/src/blas3/Tensile/Logic/asm_full/aldebaran_104cu/aldebaran_Cijk_Alik_Bljk_4xi8II_BH.yaml | 15612 0 + 15612 - 0 !
library/src/blas3/Tensile/Logic/asm_full/arcturus/arcturus_Cijk_Ailk_Bjlk_4xi8II_BH.yaml | 10112 0 + 10112 - 0 !
library/src/blas3/Tensile/Logic/asm_full/arcturus/arcturus_Cijk_Ailk_Bljk_4xi8II_BH.yaml | 21097 0 + 21097 - 0 !
library/src/blas3/Tensile/Logic/asm_full/arcturus/arcturus_Cijk_Alik_Bjlk_4xi8II_BH.yaml | 13584 0 + 13584 - 0 !
library/src/blas3/Tensile/Logic/asm_full/arcturus/arcturus_Cijk_Alik_Bljk_4xi8II_BH.yaml | 17223 0 + 17223 - 0 !
library/src/blas3/Tensile/Logic/asm_full/hip/hip_Cijk_Ailk_Bjlk_4xi8II_BH.yaml | 1924 0 + 1924 - 0 !
library/src/blas3/Tensile/Logic/asm_full/hip/hip_Cijk_Ailk_Bjlk_4xi8II_BH_GB.yaml | 1937 0 + 1937 - 0 !
library/src/blas3/Tensile/Logic/asm_full/hip/hip_Cijk_Ailk_Bljk_4xi8II_BH.yaml | 1635 0 + 1635 - 0 !
library/src/blas3/Tensile/Logic/asm_full/hip/hip_Cijk_Ailk_Bljk_4xi8II_BH_GB.yaml | 1646 0 + 1646 - 0 !
library/src/blas3/Tensile/Logic/asm_full/hip/hip_Cijk_Alik_Bjlk_4xi8II_BH.yaml | 1924 0 + 1924 - 0 !
library/src/blas3/Tensile/Logic/asm_full/hip/hip_Cijk_Alik_Bjlk_4xi8II_BH_GB.yaml | 1937 0 + 1937 - 0 !
library/src/blas3/Tensile/Logic/asm_full/hip/hip_Cijk_Alik_Bljk_4xi8II_BH.yaml | 1635 0 + 1635 - 0 !
library/src/blas3/Tensile/Logic/asm_full/hip/hip_Cijk_Alik_Bljk_4xi8II_BH_GB.yaml | 1646 0 + 1646 - 0 !
library/src/blas3/Tensile/Logic/asm_full/vega20/vega20_Cijk_Ailk_Bjlk_4xi8II_BH.yaml | 10112 0 + 10112 - 0 !
library/src/blas3/Tensile/Logic/asm_full/vega20/vega20_Cijk_Ailk_Bljk_4xi8II_BH.yaml | 21097 0 + 21097 - 0 !
library/src/blas3/Tensile/Logic/asm_full/vega20/vega20_Cijk_Alik_Bjlk_4xi8II_BH.yaml | 13584 0 + 13584 - 0 !
library/src/blas3/Tensile/Logic/asm_full/vega20/vega20_Cijk_Alik_Bljk_4xi8II_BH.yaml | 17223 0 + 17223 - 0 !
27 files changed, 319107 deletions(-)

 drop cijk_ailk_bjlk_4xi8 as workaround

This is a workaround for issues compiling these assembly files with LLVM 20
and newer.

Bug: https://github.com/llvm/llvm-project/issues/163647