1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329
|
<!DOCTYPE html
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- saved from url=(0014)about:internet -->
<html xmlns:MSHelp="http://www.microsoft.com/MSHelp/" lang="en-us" xml:lang="en-us"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="DC.Type" content="reference">
<meta name="DC.Title" content="parallel_scan Template Function">
<meta name="DC.subject" content="parallel_scan Template Function">
<meta name="keywords" content="parallel_scan Template Function">
<meta name="DC.Relation" scheme="URI" content="../../reference/algorithms.htm">
<meta name="DC.Relation" scheme="URI" content="../../reference/algorithms/parallel_scan_func/pre_scan_tag_and_final_scan_tag_clses.htm">
<meta name="DC.Relation" scheme="URI" content="parallel_scan_func/pre_scan_tag_and_final_scan_tag_clses.htm#pre_scan_tag_and_final_scan_tag_clses">
<meta name="DC.Format" content="XHTML">
<meta name="DC.Identifier" content="parallel_scan_func">
<meta name="DC.Language" content="en-US">
<link rel="stylesheet" type="text/css" href="../../intel_css_styles.css">
<title>parallel_scan Template Function</title>
<xml>
<MSHelp:Attr Name="DocSet" Value="Intel"></MSHelp:Attr>
<MSHelp:Attr Name="Locale" Value="kbEnglish"></MSHelp:Attr>
<MSHelp:Attr Name="TopicType" Value="kbReference"></MSHelp:Attr>
</xml>
</head>
<body id="parallel_scan_func">
<!-- ==============(Start:NavScript)================= -->
<script src="..\..\NavScript.js" language="JavaScript1.2" type="text/javascript"></script>
<script language="JavaScript1.2" type="text/javascript">WriteNavLink(2);</script>
<!-- ==============(End:NavScript)================= -->
<a name="parallel_scan_func"><!-- --></a>
<h1 class="topictitle1">parallel_scan Template Function</h1>
<div>
<div class="section"><h2 class="sectiontitle">Summary</h2> Template function that computes parallel prefix. </div>
<div class="section"><h2 class="sectiontitle">Header</h2>
<pre>#include "tbb/parallel_scan.h"</pre>
</div>
<div class="section"><h2 class="sectiontitle">Syntax</h2> <pre>
template<typename Range, typename Body>
void parallel_scan( const Range& range, Body& body );
template<typename Range, typename Body>
void parallel_scan( const Range& range, Body& body, const auto_partitioner& );
template<typename Range, typename Body>
void parallel_scan( const Range& range, Body& body, const simple_partitioner& );
</pre>
</div>
<div class="section"><h2 class="sectiontitle">Description</h2><p>A <samp class="codeph">parallel_scan</samp>(<em>range</em>,<em>body</em>)
computes a parallel prefix, also known as parallel scan. This computation is an advanced concept
in parallel computing that is sometimes useful in scenarios that appear to have inherently
serial dependences.</p>
<p>A mathematical definition of the parallel prefix is as follows. Let
× be an associative operation with left-identity element id<sub>×</sub>.
The parallel prefix of × over a sequence <em>z</em><sub>0</sub>, <em>z</em><sub>1</sub>, ...<em>z</em><sub>n-1</sub>
is a sequence <em>y</em><sub>0</sub>, <em>y</em><sub>1</sub>, <em>y</em><sub>2</sub>, ...<em>y</em><sub>n-1</sub>
where:</p>
<ul type="disc"> <li> y<sub>0</sub> = id<sub>×</sub> × z<sub>0</sub>
</li>
<li> y<sub>i</sub> = y<sub>i-1</sub> × z<sub>i</sub>
</li>
</ul>
<p>For example, if × is addition, the parallel prefix corresponds a running sum. A
serial implementation of parallel prefix
is:</p>
<pre>
T temp = id<sub>×</sub>;
for( int i=1; i<=n; ++i ) {
temp = temp × z[i];
y[i] = temp;
}
</pre><p>Parallel
prefix performs this in parallel by reassociating the application of × and using two
passes. It may invoke × up to twice as many times as the serial prefix algorithm. Given
the right grain size and sufficient hardware threads, it can out perform the serial prefix
because even though it does more work, it can distribute the work across more than one hardware
thread.</p>
<div class="Note"><h3 class="NoteTipHead">
Tip</h3>
<p> Because <samp class="codeph">parallel_scan</samp> needs two passes, systems with only two hardware
threads tend to exhibit small speedup. <samp class="codeph">parallel_scan</samp> is best considered a
glimpse of a technique for future systems with more than two cores. It is nonetheless of
interest because it shows how a problem that appears inherently sequential can be
parallelized.</p>
<p>The template <samp class="codeph">parallel_scan<Range,Body></samp> implements parallel prefix
generically. It requires the signatures described in the table below.</p>
</div>
<div class="tablenoborder"><table cellpadding="4" summary="" width="100%" frame="hsides" border="1" rules="all"><caption><span class="tablecap">parallel_scan Requirements</span></caption>
<thead align="left">
<tr>
<th class="cellrowborder" valign="top" width="50.24875621890548%" id="d7237e170">
<p>Pseudo-Signature </p>
</th>
<th class="row-nocellborder" valign="top" width="49.75124378109454%" id="d7237e176">
<p>Semantics </p>
</th>
</tr>
</thead>
<tbody>
<tr valign="top">
<td class="cellrowborder" valign="top" width="50.24875621890548%" headers="d7237e170 ">
<p><samp class="codeph">void Body::operator()( const Range& r, pre_scan_tag )
</samp>
</p>
</td>
<td class="row-nocellborder" valign="top" width="49.75124378109454%" headers="d7237e176 ">
<p> Accumulate summary for range <samp class="codeph">r</samp> . </p>
</td>
</tr>
<tr valign="top">
<td class="cellrowborder" valign="top" width="50.24875621890548%" headers="d7237e170 ">
<p><samp class="codeph">void Body::operator()( const Range& r, final_scan_tag )
</samp>
</p>
</td>
<td class="row-nocellborder" valign="top" width="49.75124378109454%" headers="d7237e176 ">
<p> Compute scan result and summary for range<samp class="codeph"> r</samp>.</p>
</td>
</tr>
<tr valign="top">
<td class="cellrowborder" valign="top" width="50.24875621890548%" headers="d7237e170 ">
<p><samp class="codeph">Body::Body( Body& b, split )</samp>
</p>
</td>
<td class="row-nocellborder" valign="top" width="49.75124378109454%" headers="d7237e176 ">
<p>Split <samp class="codeph">b</samp> so that <samp class="codeph">this</samp> and <samp class="codeph">b</samp> can
accumulate summaries separately. Body <samp class="codeph">*this</samp> is object <samp class="codeph">a</samp> in the table row
below.</p>
</td>
</tr>
<tr valign="top">
<td class="cellrowborder" valign="top" width="50.24875621890548%" headers="d7237e170 ">
<p><samp class="codeph"> void Body::reverse_join( Body& a )</samp>
</p>
</td>
<td class="row-nocellborder" valign="top" width="49.75124378109454%" headers="d7237e176 ">
<p> Merge summary accumulated by <samp class="codeph">a</samp> into summary accumulated by
<samp class="codeph">this</samp>, where <samp class="codeph">this</samp> was created earlier from <samp class="codeph">a</samp> by <samp class="codeph">a</samp>'s
splitting constructor. Body <samp class="codeph">*this</samp> is object <samp class="codeph">b</samp> in the table row above.</p>
</td>
</tr>
<tr valign="top">
<td class="cellrowborder" valign="top" width="50.24875621890548%" headers="d7237e170 ">
<p><samp class="codeph"> void Body::assign( Body& b )</samp></p>
</td>
<td class="row-nocellborder" valign="top" width="49.75124378109454%" headers="d7237e176 ">
<p>Assign summary of <samp class="codeph">b</samp> to <samp class="codeph">this</samp>.</p>
</td>
</tr>
</tbody>
</table>
</div>
<p>A summary contains enough information such that for two consecutive subranges <em>r</em>
and <em>s</em>:</p>
<ul type="disc">
<li> If <em>r</em> has no preceding subrange, the scan result for <em>s</em> can be computed from
knowing <em>s</em> and the summary for <em>r</em>.</li>
<li> A summary of <em>r</em> concatenated with <em>s</em> can be computed from the summaries of <em>r</em>
and <em>s</em>.</li>
</ul>
<p>For example, if computing a running sum of an array, the summary for a range <em>r</em> is
the sum of the array elements corresponding to <em>r</em>.</p>
<p>The figure below shows one way that
<samp class="codeph">parallel_scan</samp> might compute the running sum of an array containing the
integers 1-16. Time flows downwards in the diagram. Each color denotes a separate <samp class="codeph">Body</samp> object.
Summaries are shown in brackets.</p>
<ol>
<li> The first two steps split the original blue body into the pink and yellow bodies. Each body
operates on a quarter of the input array in parallel. The last quarter is processed later in
step 5.</li>
<li> The blue body computes the final scan and summary for 1-4. The pink and yellow bodies
compute their summaries by prescanning 5-8 and 9-12 respectively. </li>
<li> The pink body computes its summary for 1-8 by performing a reverse_join with the blue body. </li>
<li> The yellow body computes its summary for 1-12 by performing a reverse_join with the pink
body. </li>
<li> The blue, pink, and yellow bodies compute final scans and summaries for portions of the
array. </li>
<li> The yellow summary is assigned to the blue body. The pink and yellow bodies are destroyed.
</li>
</ol>
<p>Note that two quarters of the array were not prescanned. The
<samp class="codeph">parallel_scan</samp> template makes an effort to avoid prescanning where possible, to
improve performance when there are only a few or no extra worker threads. If no other workers
are available, <samp class="codeph">parallel_scan</samp> processes the subranges without any pre_scans, by processing the
subranges from left to right using final scans. That's why final scans must compute a summary as
well as the final scan result. The summary might be needed to process the next subrange if no
worker thread has prescanned it yet.</p>
<p><strong>Example Execution of
parallel_scan</strong></p>
<img src="../Resources/parll_scan.jpg"><p>The following code demonstrates how the
signatures could be implemented to use <samp class="codeph">parallel_scan</samp> to compute the same result as the earlier
sequential example involving ×.
</p>
<pre>
using namespace tbb;
class Body {
T sum;
T* const y;
const T* const z;
public:
Body( T y_[], const T z_[] ) : sum(id<sub>×</sub>), z(z_), y(y_) {}
T get_sum() const {return sum;}
template<typename Tag>
void operator()( const blocked_range<int>& r, Tag ) {
T temp = sum;
for( int i=r.begin(); i<r.end(); ++i ) {
temp = temp × z[i];
if( Tag::is_final_scan() )
y[i] = temp;
}
sum = temp;
}
Body( Body& b, split ) : z(b.z), y(b.y), sum(id<sub>×</sub>) {}
void reverse_join( Body& a ) { sum = a.sum ןƒ… sum;}
void assign( Body& b ) {sum = b.sum;}
};
float DoParallelScan( T y[], const T z[], int n ) {
Body body(y,z);
parallel_scan( blocked_range<int>(0,n), body );
return body.get_sum();
}
</pre><p>The
definition of <samp class="codeph">operator()</samp> demonstrates typical patterns when using
<samp class="codeph">parallel_scan</samp>.</p>
<ul type="disc">
<li> A single template defines both versions. Doing so is not required, but usually saves coding
effort, because the two versions are usually similar. The library defines static method
<samp class="codeph">is_final_scan(</samp>) to enable differentiation between the versions.</li>
<li> The prescan variant computes the × reduction, but does not update
<samp class="codeph">y</samp>. The prescan is used by <samp class="codeph">parallel_scan</samp> to generate
look-ahead partial reductions.</li>
<li> The final scan variant computes the × reduction and updates
<samp class="codeph">y</samp>.</li>
</ul>
<p>The operation <samp class="codeph">reverse_join</samp> is similar to the operation
<samp class="codeph">join</samp> used by <samp class="codeph">parallel_reduce</samp>, except that the arguments are
reversed. That is, <samp class="codeph">this</samp> is the <em>right</em> argument of ×.
Template function <samp class="codeph">parallel_scan</samp> decides if and when to generate parallel work.
It is thus crucial that × is associative and that the methods of Body faithfully represent it.
Operations such as floating-point addition that are somewhat associative can be used, with the
understanding that the results may be rounded differently depending upon the association used by
<samp class="codeph">parallel_scan</samp>. The reassociation may differ between runs even on the same
machine. However, if there are no worker threads available, execution associates identically to
the serial form shown at the beginning of this section.</p>
<p>If you change the example to use a
<samp class="codeph">simple_partitioner</samp>, be sure to provide a grainsize. The code below shows the
how to do this for a grainsize of
1000:</p>
<pre>parallel_scan(blocked_range<int>(0,n,1000), total,
simple_partitioner() );
</pre></div>
</div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="../../reference/algorithms.htm">Algorithms</a></div>
</div>
<div class="See Also">
<ul class="ullinks">
<li class="ulchildlink"><a href="../../reference/algorithms/parallel_scan_func/pre_scan_tag_and_final_scan_tag_clses.htm">pre_scan_tag and final_scan_tag Classes</a><br>
</li>
</ul>
<h2>See Also</h2>
<div class="linklist">
<div><a href="parallel_scan_func/pre_scan_tag_and_final_scan_tag_clses.htm#pre_scan_tag_and_final_scan_tag_clses">pre_scan_tag and final_scan_tag Classes</a> </div></div>
</div>
</body>
</html>
|