
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.6"/>
<title>ViennaCL - The Vienna Computing Library: Design Decisions</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
$(window).load(resizeHeight);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { searchBox.OnSelectItem(0); });
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<div id="projectname">ViennaCL - The Vienna Computing Library
 <span id="projectnumber">1.7.1</span>
</div>
<div id="projectbrief">Free open-source GPU-accelerated linear algebra and solver library.</div>
</td>
<td> <div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.6 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('manual-design.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark"> </span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark"> </span>Classes</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark"> </span>Namespaces</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark"> </span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark"> </span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark"> </span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark"> </span>Typedefs</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(7)"><span class="SelectionMark"> </span>Enumerations</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(8)"><span class="SelectionMark"> </span>Enumerator</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(9)"><span class="SelectionMark"> </span>Friends</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(10)"><span class="SelectionMark"> </span>Macros</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(11)"><span class="SelectionMark"> </span>Pages</a></div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<div class="header">
<div class="headertitle">
<div class="title">Design Decisions </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>During the implementation of ViennaCL, several design decisions were necessary, which are often a trade-off among various advantages and disadvantages. In the following, we discuss several design decisions and the reasoning that has led to the same.</p>
<h1><a class="anchor" id="manual-design-transfer-scalars"></a>
Transfer CPU-GPU-CPU for Scalars</h1>
<p>The ViennaCL scalar type <code>scalar<></code> essentially behaves like a CPU scalar in order to make any access to GPU ressources as simple as possible, for example </p>
<pre class="fragment">float cpu_float = 1.0f;
viennacl::linalg::scalar<float> gpu_float = cpu_float;
gpu_float = gpu_float * gpu_float;
gpu_float -= cpu_float;
cpu_float = gpu_float;
</pre><p>As an alternative, the user could have been required to use <code>copy</code> as for the vector and matrix classes, but this would unnecessarily complicated many commonly used operations like </p>
<pre class="fragment">if (norm_2(gpu_vector) < 1e-10) { ... }
</pre><p>or </p>
<pre class="fragment">gpu_vector[0] = 2.0f;
</pre><p>where one of the operands resides on the CPU and the other on the GPU. Initialization of a separate type followed by a call to <code>copy</code> iscertainly not desired for the above examples.</p>
<p>However, one should use <code>scalar<></code> with care, because the overhead for transfers from CPU to GPU and vice versa is very large for the simple <code>scalar<></code> type.</p>
<dl class="section warning"><dt>Warning</dt><dd>Use <code>scalar<></code> with care, operations may be much slower than built-in types on the CPU!</dd></dl>
<h1><a class="anchor" id="manual-design-transfer-vectors"></a>
Transfer CPU-GPU-CPU for Vectors</h1>
<p>The present way of data transfer for vectors and matrices from CPU to GPU to CPU is to use the provided <code>copy</code> function, which is similar to its counterpart in the Standard Template Library (STL): </p>
<div class="fragment"><div class="line">std::vector<float> cpu_vector(10);</div>
<div class="line">ViennaCL::LinAlg::vector<float> gpu_vector(10);</div>
<div class="line"></div>
<div class="line"><span class="comment">// fill cpu_vector here</span></div>
<div class="line"></div>
<div class="line"><span class="comment">//transfer values to gpu:</span></div>
<div class="line"><a class="code" href="namespaceviennacl.html#a10b7f8cf6b8864a7aa196d670481a453">copy</a>(cpu_vector.begin(), cpu_vector.end(), gpu_vector.begin());</div>
<div class="line"></div>
<div class="line"><span class="comment">// compute something on GPU here</span></div>
<div class="line"></div>
<div class="line"><span class="comment">//transfer back to cpu:</span></div>
<div class="line"><a class="code" href="namespaceviennacl.html#a10b7f8cf6b8864a7aa196d670481a453">copy</a>(gpu_vector.begin(), gpu_vector.end(), cpu_vector.begin());</div>
</div><!-- fragment --><p> A first alternative approach would have been to to overload the assignment operator like this: </p>
<div class="fragment"><div class="line"><span class="comment">//transfer values to gpu:</span></div>
<div class="line">gpu_vector = cpu_vector;</div>
<div class="line"></div>
<div class="line"><span class="comment">// compute something on GPU here</span></div>
<div class="line"></div>
<div class="line"><span class="comment">//transfer back to cpu:</span></div>
<div class="line">cpu_vector = gpu_vector;</div>
</div><!-- fragment --><p> The first overload can be directly applied to the <code>vector</code>-class provided by ViennaCL. However, the question of accessing data in the <code>cpu_vector</code> object arises. For <code>std::vector</code> and C arrays, the bracket operator can be used, but the parenthesis operator cannot. However, other vector types may not provide a bracket operator. Using STL iterators is thus the more reliable variant.</p>
<p>The transfer from GPU to CPU would require to overload the assignment operator for the CPU class, which cannot be done by ViennaCL. Thus, the only possibility within ViennaCL is to provide conversion operators. Since many different libraries could be used in principle, the only possibility is to provide conversion of the form </p>
<div class="fragment"><div class="line"><span class="keyword">template</span> <<span class="keyword">typename</span> T></div>
<div class="line"><span class="keyword">operator</span> T() {</div>
<div class="line"> <span class="comment">// implementation here</span></div>
<div class="line">}</div>
</div><!-- fragment --><p> for the types in ViennaCL. However, this would allow even totally meaningless conversions, e.g. from a GPU vector to a CPU boolean and may result in obscure unexpected behavior.</p>
<p>Moreover, with the use of <code>copy</code> functions it is much clearer, at which point in the source code large amounts of data are transferred between CPU and GPU.</p>
<h1><a class="anchor" id="manual-design-solver"></a>
Solver Interface</h1>
<p>We decided to provide an interface compatible to Boost.uBLAS for dense matrix operations. The only possible generalization for iterative solvers was to use the tagging facility for the specification of the desired iterative solver.</p>
<h1><a class="anchor" id="manual-design-iterators"></a>
Iterators</h1>
<p>Since we use the iterator-driven <code>copy</code> function for transfer from CPU to GPU to CPU, iterators have to be provided anyway. However, it has to be repeated that they are usually VERY slow, because each data access (i.e. dereferentiation) implies a new transfer between CPU and GPU. Nevertheless, CPU-cached vector and matrix classes could be introduced in future releases of ViennaCL.</p>
<p>A remedy for quick iteration over the entries of e.g. a vector is the following: </p>
<div class="fragment"><div class="line">std::vector<double> temp(gpu_vector.size());</div>
<div class="line"><a class="code" href="namespaceviennacl.html#a10b7f8cf6b8864a7aa196d670481a453">copy</a>(gpu_vector.begin(), gpu_vector.end(), temp.begin());</div>
<div class="line"><span class="keywordflow">for</span> (std::vector<double>::iterator it = temp.begin();</div>
<div class="line"> it != temp.end();</div>
<div class="line"> ++it)</div>
<div class="line">{</div>
<div class="line"> <span class="comment">//do something with the data here</span></div>
<div class="line">}</div>
<div class="line"><a class="code" href="namespaceviennacl.html#a10b7f8cf6b8864a7aa196d670481a453">copy</a>(temp.begin(), temp.end(), gpu_vector.begin());</div>
</div><!-- fragment --><p> The three extra code lines can be wrapped into a separate iterator class by the library user, who also has to ensure data consistency during the loop.</p>
<h1><a class="anchor" id="manual-design-init"></a>
Initialization of Compute Kernels</h1>
<p>Since OpenCL relies on passing the OpenCL source code to a built-in just-in-time compiler at run time, the necessary kernels have to be generated every time an application using ViennaCL is started.</p>
<p>One possibility was to require a mandatory </p>
<pre class="fragment">viennacl::init();
</pre><p>before using any other objects provided by ViennaCL, but this approach was discarded for the following two reasons:</p>
<ul>
<li>If <code>viennacl::init();</code> is accidentally forgotten by the user, the program will most likely terminate in a rather uncontrolled way.</li>
<li>It requires the user to remember and write one extra line of code, even if the default settings are fine.</li>
</ul>
<p>Initialization is instead done in a lazy manner when requesting OpenCL kernels. Kernels with similar functionality are grouped together in a common compilation units. This allows a fine-grained control over which source code to compile where and when. For example, there is no reason to compile the sparse matrix compute kernels at program startup if there are no sparse matrices used at all.</p>
<p>Moreover, the just-in-time compilation of all available compute kernels in ViennaCL takes several seconds. Therefore, a request-based compilation is used to minimize any overhead due to just-in-time compilation.</p>
<p>The request-based compilation is a two-step process: At the first instantiation of an object of a particular type from ViennaCL, the full source code for all objects of the same type is compiled into an OpenCL program for that type. Each program contains plenty of compute kernels, which are not yet initialized. Only if an argument for a compute kernel is set, the kernel actually cares about its own initialization. Any subsequent calls of that kernel reuse the already compiled and initialized compute kernel.</p>
<dl class="section note"><dt>Note</dt><dd>When benchmarking ViennaCL, a dummy call to the functionality of interest should be issued prior to taking timings. Otherwise, benchmark results include the just-in-time compilation, which is a constant independent of the data size. </dd></dl>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="footer">Generated on Wed Jan 20 2016 22:32:44 for ViennaCL - The Vienna Computing Library by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.6 </li>
</ul>
</div>
</body>
</html>
|