File: extensions.md

package info (click to toggle)
openblas 0.3.29%2Bds-3
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 62,984 kB
  • sloc: asm: 1,264,442; ansic: 412,266; fortran: 74,453; makefile: 13,665; sh: 4,892; perl: 4,468; python: 1,555; cpp: 244
file content (39 lines) | stat: -rw-r--r-- 2,461 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
OpenBLAS for the most part contains implementations of the reference (Netlib)
BLAS, CBLAS, LAPACK and LAPACKE interfaces. A few OpenBLAS-specific functions
are also provided however, which mostly can be seen as "BLAS extensions".
This page documents those non-standard APIs.

## BLAS-like extensions

| Routine       | Data Types    | Description                                     |
| ------------- |:------------- | :-----------------------------------------------|
| ?axpby        | s,d,c,z       | like `axpy` with a multiplier for `y`           |
| ?gemm3m       | c,z           | `gemm3m`                                        |
| ?imatcopy     | s,d,c,z       | in-place transposition/copying                  |
| ?omatcopy     | s,d,c,z       | out-of-place transposition/copying              |
| ?geadd        | s,d,c,z       | ATLAS-like matrix add `B = α*A+β*B`  |
| ?gemmt        | s,d,c,z       | `gemm` but only a triangular part updated       |


## bfloat16 functionality

BLAS-like and conversion functions for `bfloat16` (available when OpenBLAS was compiled with `BUILD_BFLOAT16=1`):

* `void cblas_sbstobf16` converts a float array to an array of bfloat16 values by rounding
* `void cblas_sbdtobf16` converts a double array to an array of bfloat16 values by rounding
* `void cblas_sbf16tos` converts a bfloat16 array to an array of floats
* `void cblas_dbf16tod` converts a bfloat16 array to an array of doubles
* `float cblas_sbdot` computes the dot product of two bfloat16 arrays
* `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16
* `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16

## Utility functions

* `openblas_get_num_threads`
* `openblas_set_num_threads`
* `int openblas_get_num_procs(void)` returns the number of processors available on the system (may include "hyperthreading cores")
* `int openblas_get_parallel(void)` returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading
* `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell`
* `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the CPU affinity mask of the given thread
  to the provided cpuset. Only available on Linux, with semantics identical to `pthread_setaffinity_np`.