File: interop_opencl.md

package info (click to toggle)
arrayfire 3.3.2%2Bdfsg1-4
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 109,016 kB
  • sloc: cpp: 127,909; lisp: 6,878; python: 3,923; ansic: 1,051; sh: 347; makefile: 338; xml: 175
file content (189 lines) | stat: -rw-r--r-- 8,046 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
Interoperability with OpenCL {#interop_opencl}
========

As extensive as ArrayFire is, there are a few cases where you are still working
with custom [CUDA] (@ref interop_cuda) or [OpenCL] (@ref interop_opencl) kernels.
For example, you may want to integrate ArrayFire into an existing code base for
productivity or you may want to keep it around the old implementation for testing
purposes. Arrayfire provides a number of functions that allow it to work alongside
native OpenCL commands. In this tutorial we are going to talk about how to use
native OpenCL memory operations and custom OpenCL kernels alongside ArrayFire
in a seamless fashion.

# OpenCL Kernels with Arrayfire arrays
First, we will see how custom OpenCL kernels can be integrated into Arrayfire code.
Let's consider the following code and then break it down bit by bit.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
int main() {
    af::array x = randu(num);
    af::array y = randu(num);

    float *d_x = x.device<float>();
    float *d_y = y.device<float>();

    // Launch kernel to do the following operations
    // y = sin(x)^2 + cos(x)^2
    launch_simple_kernel(d_x, d_y, num);

    x.unlock();
    y.unlock();

    // check for errors, should be 0,
    // since sin(x)^2 + cos(x)^2 == 1
    float err = af::sum<float>(af::abs(y-1));
    printf("Error: %f\n", err);
    return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Breakdown
Most kernels require an input. In this case, we created a random uniform array `x`
We also go ahead and prepare the output array. The necessary memory required is
allocated in array `y` before the kernel launch.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
    af::array x = randu(num);
    af::array y = randu(num);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this example, the output is the same size as in the input. Note that the actual
output data type is not specified. For such cases, ArrayFire assumes the data type
is single precision floating point (\ref af::f32). If necessary, the data type can
be specified at the end of the array(..) constructor. Once you have the input and
output arrays, you will need to extract the device pointers / objects using
af::array::device() method in the following manner.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
    float *d_x = x.device<float>();
    float *d_y = y.device<float>();
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Accesing the device pointer in this manner internally sets a flag prohibiting
the arrayfire object from further managing the memory. Ownership will need to be
returned to the af::array object once we are finished using it.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
    // Launch kernel to do the following operations
    // y = sin(x)^2 + cos(x)^2
    launch_simple_kernel(d_x, d_y, num);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The function `launch_simple_kernel` handles the launching of your custom kernel.
We will have a look at the specific functions Arrayfire provides to interface with
OpenCL later in the post.

Once you have finished your computations, you have to tell ArrayFire to take control
of the memory objects.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
    x.unlock();
    y.unlock();
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is a very crucial step as ArrayFire believes the user is still in control
of the pointer. This means that ArrayFire will not perform garbage collection
on these objects resulting in memory leaks. You can now proceed with the rest of
the program. In our particular example, we are just performing an error check and exiting.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
    // check for errors, should be 0,
    // since sin(x)^2 + cos(x)^2 == 1
    float err = af::sum(af::abs(y-1));
    printf("Error: %f\n", err);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Launching an OpenCL kernel
If you are integrating an OpenCL kernel into your ArrayFire code base you will
need several additional steps to access Arrayfire's internal OpenCL context.
Once you have access to the same context ArrayFire is using, the rest of the
process is exactly the same as launching a stand alone OpenCL context.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
void inline launch_simple_kernel(float *d_y,
                                 const float *d_x,
                                 const int num)
{
    std::string simple_kernel_str = CONST_KERNEL_STRING;

    // Get OpenCL context from memory buffer and create a Queue
    cl::Context context(afcl::getContext(true));
    cl::CommandQueue queue(afcl::getQueue(true));

    //Build program and get the required kernel
    cl::Program prog = cl::Program(context, simple_kernel_str, true);
    cl::Kernel  kern = cl::Kernel(prog, "simple_kernel");

    //set global work dimensions
    static const cl::NDRange global(num);

    //prepare argumenst
    kern.setArg(0, d_y);
    kern.setArg(1, d_x);
    kern.setArg(2, num);

    //run kernel
    queue.enqueueNDRangeKernel(kern, cl::NullRange, global);
    queue.finish();

    return;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First of all, to access to OpenCL and the interoperability functions we need to
include the appropriate headers.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
#include <af/opencl.h>
#include <CL/cl.hpp>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The opencl.h header includes a number of functions for getting and setting the
context, queue, and device ids used internally in Arrayfire. There are also a
number of methods to construct an af::array from an OpenCL `cl_mem` buffer
object.  There are both C and C++ versions of these functions, and the C++
versions are wrapped inside the \ref afcl namespace. See full datails of these
functions in the [af/opencl.h documentation] (\ref opencl_mat).


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
cl::Context context(afcl::getContext(true));
cl::CommandQueue queue(afcl::getQueue(true));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We start to use these functions by getting Arrayfire's context and queue. For
the C++ api, a `true` flag must be passed for the retain parameter which calls
the `clRetainQueue()` and `clRetainContext()` functions before returning. This
allows us to use Arrayfire's internal OpenCL structures inside of the
cl::Context and cl::CommandQueue objects from the C++ api. Once we have them,
we can proceed to set up and enqueue the kernel like we would in any other
OpenCL program.  The kernel we are using is actually simple and can be seen
below.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
std::string CONST_KERNEL_STRING = R"(
__kernel
void simple_kernel(__global float *d_y,
                   __global const float *d_x,
                   const int num)
{
    const int id = get_global_id(0);

    if (id < num) {
        float x = d_x[id];
        float sin_x = sin(x);
        float cos_x = cos(x);
        d_y[id] = (sin_x * sin_x) + (cos_x * cos_x);
    }
}
)";
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# Reversing the workflow: Arrayfire arrays from OpenCL Memory
Unfortunately, Arrayfire's interoperability functions don't yet allow us to work with
external OpenCL contexts. This is currently an open issue and can be tracked here:
https://github.com/arrayfire/arrayfire/issues/1002.

Once the issue is addressed, it will be possible to take the reverse route and start with
completely custom OpenCL code, then transfer our results into af::array objects.