1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
|
# clpeak
[](https://app.travis-ci.com/github/krrishnarraj/clpeak)
[](https://snapcraft.io/clpeak)
A synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case
## Building
```console
git submodule update --init --recursive --remote
mkdir build
cd build
cmake ..
cmake --build .
```
## Sample
```text
Platform: NVIDIA CUDA
Device: Tesla V100-SXM2-16GB
Driver version : 390.77 (Linux x64)
Compute units : 80
Clock frequency : 1530 MHz
Global memory bandwidth (GBPS)
float : 767.48
float2 : 810.81
float4 : 843.06
float8 : 726.12
float16 : 735.98
Single-precision compute (GFLOPS)
float : 15680.96
float2 : 15674.50
float4 : 15645.58
float8 : 15583.27
float16 : 15466.50
No half precision support! Skipped
Double-precision compute (GFLOPS)
double : 7859.49
double2 : 7849.96
double4 : 7832.96
double8 : 7799.82
double16 : 7740.88
Integer compute (GIOPS)
int : 15653.47
int2 : 15654.40
int4 : 15655.21
int8 : 15659.04
int16 : 15608.65
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 10.64
enqueueReadBuffer : 11.92
enqueueMapBuffer(for read) : 9.97
memcpy from mapped ptr : 8.62
enqueueUnmap(after write) : 11.04
memcpy to mapped ptr : 9.16
Kernel launch latency : 7.22 us
```
|