File: unified_backend.md

package info (click to toggle)
arrayfire 3.3.2%2Bdfsg1-4
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 109,016 kB
  • sloc: cpp: 127,909; lisp: 6,878; python: 3,923; ansic: 1,051; sh: 347; makefile: 338; xml: 175
file content (214 lines) | stat: -rw-r--r-- 7,151 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
Unified Backend {#unifiedbackend}
==========

[TOC]

# Introduction

The Unified backend was introduced in ArrayFire with version 3.2.
While this is not an independent backend, it allows the user to switch between
the different ArrayFire backends (CPU, CUDA and OpenCL) at runtime.

# Compiling with Unified

The steps to compile with the unified backend are the same as compiling with
any of the other backends.
The only change being that the executable needs to be linked with the __af__
library (`libaf.so` (Linux), `libaf.dylib` (OSX), `af.lib` (Windows)).

Check the Using with [Linux](\ref using_on_linux), [OSX](\ref using_on_osx),
[Windows](\ref using_on_windows) for more details.

To use with CMake, use the __ArrayFire_Unified_LIBRARIES__ variable.

# Using the Unified Backend

The Unified backend will try to dynamically load the backend libraries. The
priority of backends is __CUDA -> OpenCL -> CPU__

The most important aspect to note here is that all the libraries the ArrayFire
libs depend on need to be in the environment paths

* `LD_LIBRARY_PATH` -> Linux, Unix, OSX
* `DYLD_LIBRARY_PATH` -> OSX
* `PATH` -> Windows

If any of the libs are missing, then the library will fail to load and the
backend will be marked as unavailable.

Optionally, The ArrayFire libs may be present in `AF_PATH` or `AF_BUILD_PATH`
environment variables if the path is not in the system paths. These are
treated as fallback paths in case the files are not found in the system paths.
However, all the other upstream libraries for ArrayFire libs must be present
in the system path variables shown above.

### Special Mention: CUDA NVVM
For the CUDA backend, ensure that the CUDA NVVM libs/dlls are in the path.
These can be easily missed since CUDA installation does not add the paths by default.

On Linux and OSX, add `/usr/local/cuda/nvvm/(lib or lib64)` to LD_LIBRARY_PATH or
DYLD_LIBRARY_PATH.

On Windows, you can set up a post build event that copys the NVVM dlls to
the executable directory by using the following commands:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.c}
echo copy "$(CUDA_PATH)\nvvm\bin\nvvm64*.dll" "$(OutDir)"
copy "$(CUDA_PATH)\nvvm\bin\nvvm64*.dll" "$(OutDir)"
if errorlevel 1 (
    echo "CUDA NVVM DLLs copy failed due to missing files."
    exit /B 0
)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This ensures that the NVVM DLLs are copied if present, but does not fail the
build if the copy fails. This is how ArrayFire ships it's examples.

The other option is to set `%%CUDA_PATH%/nvvm/bin` in the PATH environment
variable.

# Switching Backends

The af_backend enum stores the possible backends.
To select a backend, call the af::setBackend function as shown below.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.c}
af::setBackend(AF_BACKEND_OPENCL);    // Sets CUDA as current backend
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To get the count of the number of backends available (the number of `libaf*`
backend libraries loaded successfully), call the af::getBackendCount function.

# Example

This example is shortened form of [basic.cpp](\ref unified/basic.cpp).

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.c}
#include <arrayfire.h>

void testBackend()
{
    af::info();
    af_print(af::randu(5, 4));
}

int main()
{
    try {
        printf("Trying CPU Backend\n");
        af::setBackend(AF_BACKEND_CPU);
        testBackend();
    } catch (af::exception& e) {
        printf("Caught exception when trying CPU backend\n");
        fprintf(stderr, "%s\n", e.what());
    }

    try {
        printf("Trying CUDA Backend\n");
        af::setBackend(AF_BACKEND_CUDA);
        testBackend();
    } catch (af::exception& e) {
        printf("Caught exception when trying CUDA backend\n");
        fprintf(stderr, "%s\n", e.what());
    }

    try {
        printf("Trying OpenCL Backend\n");
        af::setBackend(AF_BACKEND_OPENCL);
        testBackend();
    } catch (af::exception& e) {
        printf("Caught exception when trying OpenCL backend\n");
        fprintf(stderr, "%s\n", e.what());
    }

    return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This output would be:

    Trying CPU Backend
    ArrayFire v3.2.0 (CPU, 64-bit Linux, build fc7630f)
    [0] Intel: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz Max threads(8)
    af::randu(5, 4)
    [5 4 1 1]
        0.0000     0.2190     0.3835     0.5297
        0.1315     0.0470     0.5194     0.6711
        0.7556     0.6789     0.8310     0.0077
        0.4587     0.6793     0.0346     0.3834
        0.5328     0.9347     0.0535     0.0668

    Trying CUDA Backend
    ArrayFire v3.2.0 (CUDA, 64-bit Linux, build fc7630f)
    Platform: CUDA Toolkit 7.5, Driver: 355.11
    [0] Quadro K5000, 4093 MB, CUDA Compute 3.0
    af::randu(5, 4)
    [5 4 1 1]
        0.7402     0.4464     0.7762     0.2920
        0.9210     0.6673     0.2948     0.3194
        0.0390     0.1099     0.7140     0.8109
        0.9690     0.4702     0.3585     0.1541
        0.9251     0.5132     0.6814     0.4452

    Trying OpenCL Backend
    ArrayFire v3.2.0 (OpenCL, 64-bit Linux, build fc7630f)
    [0] NVIDIA  : Quadro K5000
    -1- INTEL   : Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
    af::randu(5, 4)
    [5 4 1 1]
        0.4107     0.0081     0.6600     0.1046
        0.8224     0.3775     0.0764     0.8827
        0.9518     0.3027     0.0901     0.1647
        0.1794     0.6456     0.5933     0.8060
        0.4198     0.5591     0.1098     0.5938

# Dos and Don'ts

It is very easy to run into exceptions if you are not careful with the
switching of backends.

### Don't: Do not use arrays between different backends

ArrayFire checks the input arrays to functions for mismatches with the active
backend. If an array created on one backend, but used when another backend is
set to active, an exception with code 503 (`AF_ERR_ARR_BKND_MISMATCH`) is
thrown.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.c}
#include <arrayfire.h>

int main()
{
    try {
        af::setBackend(AF_BACKEND_CUDA);
        af::array A = af::randu(5, 5);

        af::setBackend(AF_BACKEND_OPENCL);
        af::array B = af::constant(10, 5, 5);
        af::array C = af::matmul(A, B);     // This will throw an exception

    } catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
    }

    return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

### Do: Use a naming scheme to track arrays and backends

We recommend that you use a technique to track the arrays on the backends. One
suggested technique would be to use a suffix of `_cpu`, `_cuda`, `_opencl`
with the array names. So an array created on the CUDA backend would be named
`myarray_cuda`.

If you have not used the af::setBackend function anywhere in your code, then
you do not have to worry about this as all the arrays will be created on the
same default backend.

### Don't: Do not use custom kernels (CUDA/OpenCL) with the Unified backend

This is another area that is a no go when using the Unified backend. It not
recommended that you use custom kernels with unified backend. This is mainly
becuase the Unified backend is meant to be ultra portable and should use only
ArrayFire and native CPU code.