File: manual-custom-kernels.html

package info (click to toggle)
viennacl 1.7.1%2Bdfsg1-6
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, sid, trixie
  • size: 114,428 kB
  • sloc: sh: 454,206; cpp: 109,088; ansic: 2,103; perl: 104; makefile: 22
file content (156 lines) | stat: -rw-r--r-- 14,140 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.6"/>
<title>ViennaCL - The Vienna Computing Library: Custom OpenCL Compute Kernels</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
  $(document).ready(initResizable);
  $(window).load(resizeHeight);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
  $(document).ready(function() { searchBox.OnSelectItem(0); });
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
 <tbody>
 <tr style="height: 56px;">
  <td style="padding-left: 0.5em;">
   <div id="projectname">ViennaCL - The Vienna Computing Library
   &#160;<span id="projectnumber">1.7.1</span>
   </div>
   <div id="projectbrief">Free open-source GPU-accelerated linear algebra and solver library.</div>
  </td>
   <td>        <div id="MSearchBox" class="MSearchBoxInactive">
        <span class="left">
          <img id="MSearchSelect" src="search/mag_sel.png"
               onmouseover="return searchBox.OnSearchSelectShow()"
               onmouseout="return searchBox.OnSearchSelectHide()"
               alt=""/>
          <input type="text" id="MSearchField" value="Search" accesskey="S"
               onfocus="searchBox.OnSearchFieldFocus(true)" 
               onblur="searchBox.OnSearchFieldFocus(false)" 
               onkeyup="searchBox.OnSearchFieldChange(event)"/>
          </span><span class="right">
            <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
          </span>
        </div>
</td>
 </tr>
 </tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.6 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
  <div id="nav-tree">
    <div id="nav-tree-contents">
      <div id="nav-sync" class="sync"></div>
    </div>
  </div>
  <div id="splitbar" style="-moz-user-select:none;" 
       class="ui-resizable-handle">
  </div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('manual-custom-kernels.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
     onmouseover="return searchBox.OnSearchSelectShow()"
     onmouseout="return searchBox.OnSearchSelectHide()"
     onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark">&#160;</span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark">&#160;</span>Classes</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark">&#160;</span>Namespaces</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark">&#160;</span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark">&#160;</span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark">&#160;</span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark">&#160;</span>Typedefs</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(7)"><span class="SelectionMark">&#160;</span>Enumerations</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(8)"><span class="SelectionMark">&#160;</span>Enumerator</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(9)"><span class="SelectionMark">&#160;</span>Friends</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(10)"><span class="SelectionMark">&#160;</span>Macros</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(11)"><span class="SelectionMark">&#160;</span>Pages</a></div>

<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0" 
        name="MSearchResults" id="MSearchResults">
</iframe>
</div>

<div class="header">
  <div class="headertitle">
<div class="title">Custom OpenCL Compute Kernels </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>For custom algorithms the built-in functionality of ViennaCL may not be sufficient or not fast enough. In such cases it can be desirable to write a custom OpenCL compute kernel, which is explained in this chapter. The following steps are necessary and explained one after another:</p>
<ul>
<li>Write the OpenCL source code</li>
<li>Compile the compute kernel</li>
<li>Launching the kernel</li>
</ul>
<p>A tutorial on this topic can be found at <code><a class="el" href="custom-kernels_8cpp.html">examples/tutorial/custom-kernels.cpp</a></code>.</p>
<h1><a class="anchor" id="manual-custom-kernels-opencl-source"></a>
Setting up the OpenCL Source Code</h1>
<p>The OpenCL source code has to be provided as a string. One can either write the source code directly into a string within C++ files, or one can read the OpenCL source from a file. For demonstration purposes, we write the source directly as a string constant: </p>
<div class="fragment"><div class="line"><span class="keyword">const</span> <span class="keywordtype">char</span> * my_compute_program =</div>
<div class="line"><span class="stringliteral">&quot;__kernel void elementwise_prod(\n&quot;</span></div>
<div class="line"><span class="stringliteral">&quot;          __global const float * vec1,\n&quot;</span></div>
<div class="line"><span class="stringliteral">&quot;          __global const float * vec2, \n&quot;</span></div>
<div class="line"><span class="stringliteral">&quot;          __global float * result,\n&quot;</span></div>
<div class="line"><span class="stringliteral">&quot;          unsigned int size) \n&quot;</span></div>
<div class="line"><span class="stringliteral">&quot;{ \n&quot;</span></div>
<div class="line"><span class="stringliteral">&quot;  for (unsigned int i = get_global_id(0); i &lt; size; i += get_global_size(0))\n&quot;</span></div>
<div class="line"><span class="stringliteral">&quot;    result[i] = vec1[i] * vec2[i];\n&quot;</span></div>
<div class="line"><span class="stringliteral">&quot;};\n&quot;</span>;</div>
</div><!-- fragment --><p> The kernel takes three vector arguments <code>vec1</code>, <code>vec2</code> and <code>result</code> and the vector length variable <code>size</code>. It computes the entry-wise product of the vectors <code>vec1</code> and <code>vec2</code> and writes the result to the vector <code>result</code>. For more detailed explanation of the OpenCL source code, please refer to the specification available at the Khronos group webpage <a class="el" href="citelist.html#CITEREF_khronoscl">[18]</a> .</p>
<h1><a class="anchor" id="manual-custom-kernels-opencl-build"></a>
Compilation of the OpenCL Source Code</h1>
<p>The source code in the string constant <code>my_compute_kernel</code> has to be compiled to an OpenCL program. An OpenCL program is a compilation unit and may contain several different compute kernels, For example, one could also include another kernel function <code>inplace_elementwise_prod</code> which writes the result directly to one of the two operands <code>vec1</code> or <code>vec2</code> in the same program. </p>
<div class="fragment"><div class="line"><a class="code" href="classviennacl_1_1ocl_1_1program.html">viennacl::ocl::program</a> &amp; my_prog = <a class="code" href="namespaceviennacl_1_1ocl.html#a16fc917445ce2010e7b16fbc968771b7">viennacl::ocl::current_context</a>().<a class="code" href="classviennacl_1_1ocl_1_1context.html#a9094ac71f0cdf80df698c0c84ebb483d">add_program</a>(my_compute_program, <span class="stringliteral">&quot;my_compute_program&quot;</span>);</div>
</div><!-- fragment --><p> The next step is to extract the kernel object <code>my_kernel</code> from the compiled program (an explicit kernel registration was needed prior to ViennaCL 1.5.0, but is no longer needed): </p>
<div class="fragment"><div class="line"><a class="code" href="classviennacl_1_1ocl_1_1kernel.html">viennacl::ocl::kernel</a> &amp; my_kernel = my_prog.<a class="code" href="classviennacl_1_1ocl_1_1program.html#a117eef2c7894ab1d794bc115d2a23f4a">get_kernel</a>(<span class="stringliteral">&quot;elementwise_prod&quot;</span>);</div>
</div><!-- fragment --><p> Now, the kernel is set up to use the function <code>elementwise_prod</code> compiled into the program <code>my_prog</code>.</p>
<dl class="section warning"><dt>Warning</dt><dd>Note that C++ references to kernels and programs may become invalid as other kernels or programs are added. </dd>
<dd>
Therefore, first allocate the required ViennaCL objects and compile/add all custom kernels, before you start taking references to custom programs or kernels.</dd></dl>
<p>Instead of extracting references to programs and kernels directly at program compilation, one can obtain them at other places within the application source code by </p>
<div class="fragment"><div class="line"><a class="code" href="classviennacl_1_1ocl_1_1program.html">viennacl::ocl::program</a> &amp; prog      = <a class="code" href="namespaceviennacl_1_1ocl.html#a16fc917445ce2010e7b16fbc968771b7">viennacl::ocl::current_context</a>().<a class="code" href="classviennacl_1_1ocl_1_1context.html#a1b1339c711fd666c8bdc9104f47670d4">get_program</a>(<span class="stringliteral">&quot;my_compute_program&quot;</span>);</div>
<div class="line"><a class="code" href="classviennacl_1_1ocl_1_1kernel.html">viennacl::ocl::kernel</a>  &amp; my_kernel = my_prog.<a class="code" href="classviennacl_1_1ocl_1_1program.html#a117eef2c7894ab1d794bc115d2a23f4a">get_kernel</a>(<span class="stringliteral">&quot;elementwise_prod&quot;</span>);</div>
</div><!-- fragment --><p> This simplifies application development considerably, since no program and kernel objects need to be passed around.</p>
<h1><a class="anchor" id="manual-custom-kernels-opencl-launch"></a>
Launching the OpenCL Kernel</h1>
<p>Before launching the kernel, one may adjust the global and local work sizes (readers not familiar with that are encouraged to read the OpenCL standard <a class="el" href="citelist.html#CITEREF_khronoscl">[18]</a> ). The following code specifies a one-dimensional execution model with 16 local workers and 128 global workers: </p>
<div class="fragment"><div class="line">my_kernel.<a class="code" href="classviennacl_1_1ocl_1_1kernel.html#ae2b841c487f9ddbfcfb6297d648e2d7f">local_work_size</a>(0, 16);</div>
<div class="line">my_kernel.<a class="code" href="classviennacl_1_1ocl_1_1kernel.html#a0144c18ae9f07722e5b5697335b7cff5">global_work_size</a>(0, 128);</div>
</div><!-- fragment --><p> In order to use a two-dimensional execution, additionally parameters for the second dimension are set by </p>
<div class="fragment"><div class="line">my_kernel.<a class="code" href="classviennacl_1_1ocl_1_1kernel.html#ae2b841c487f9ddbfcfb6297d648e2d7f">local_work_size</a>(1, 16);</div>
<div class="line">my_kernel.<a class="code" href="classviennacl_1_1ocl_1_1kernel.html#a0144c18ae9f07722e5b5697335b7cff5">global_work_size</a>(1, 128);</div>
</div><!-- fragment --><p> However, for the simple kernel in this example it is not necessary to specify any work sizes at all. The default work sizes (which can be found in <code><a class="el" href="kernel_8hpp.html" title="Representation of an OpenCL kernel in ViennaCL. ">viennacl/ocl/kernel.hpp</a></code>) suffice for most cases. We recommend to write kernels which do <em>NOT</em> depend on a particular thread configuration, as this will usually lead to non-portability of performance.</p>
<p>Kernel arguments are set in the same way as for ordinary functions. We assume that three ViennaCL vectors <code>vec1</code>, <code>vec2</code> and <code>result</code> have already been set up: </p>
<div class="fragment"><div class="line"><a class="code" href="namespaceviennacl_1_1ocl.html#a5f2022f653ea1cf364d20e3ff84dcada">viennacl::ocl::enqueue</a>(my_kernel(vec1, vec2, result, cl_uint(vec1.size())));</div>
</div><!-- fragment --><p> Per default, the kernel is enqueued in the first queue of the currently active device. A custom queue can be specified as optional second argument.</p>
<dl class="section note"><dt>Note</dt><dd>Integer arguments need to be provided using the corresponding OpenCL types <code>cl_int</code>, <code>cl_uint</code>, etc. </dd>
<dd>
Do not pass arguments of type <code>size_t</code>, because <code>size_t</code> might differ on the host and the compute device. </dd></dl>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
  <ul>
    <li class="footer">Generated on Wed Jan 20 2016 22:32:44 for ViennaCL - The Vienna Computing Library by
    <a href="http://www.doxygen.org/index.html">
    <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.6 </li>
  </ul>
</div>
</body>
</html>