1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
|
<!DOCTYPE html
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- saved from url=(0014)about:internet -->
<html xmlns:MSHelp="http://www.microsoft.com/MSHelp/" lang="en-us" xml:lang="en-us"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="DC.Type" content="topic">
<meta name="DC.Title" content="parallel_for">
<meta name="DC.subject" content="parallel_for">
<meta name="keywords" content="parallel_for">
<meta name="DC.Relation" scheme="URI" content="../tbb_userguide/Parallelizing_Simple_Loops.htm">
<meta name="DC.Relation" scheme="URI" content="../tbb_userguide/Lambda_Expressions.htm">
<meta name="DC.Relation" scheme="URI" content="../tbb_userguide/Automatic_Chunking.htm">
<meta name="DC.Relation" scheme="URI" content="../tbb_userguide/Controlling_Chunking.htm">
<meta name="DC.Relation" scheme="URI" content="../tbb_userguide/Bandwidth_and_Cache_Affinity.htm">
<meta name="DC.Relation" scheme="URI" content="../tbb_userguide/Partitioner_Summary.htm">
<meta name="DC.Relation" scheme="URI" content="Advanced_Topic_Other_Kinds_of_Iteration_Spaces.htm#tutorial_Advanced_Topic_Other_Kinds_of_Iteration_Spaces">
<meta name="DC.Format" content="XHTML">
<meta name="DC.Identifier" content="tutorial_parallel_for">
<link rel="stylesheet" type="text/css" href="../intel_css_styles.css">
<title>parallel_for</title>
<xml>
<MSHelp:Attr Name="DocSet" Value="Intel"></MSHelp:Attr>
<MSHelp:Attr Name="Locale" Value="kbEnglish"></MSHelp:Attr>
<MSHelp:Attr Name="TopicType" Value="kbReference"></MSHelp:Attr>
</xml>
</head>
<body id="tutorial_parallel_for">
<!-- ==============(Start:NavScript)================= -->
<script src="..\NavScript.js" language="JavaScript1.2" type="text/javascript"></script>
<script language="JavaScript1.2" type="text/javascript">WriteNavLink(1);</script>
<!-- ==============(End:NavScript)================= -->
<a name="tutorial_parallel_for"><!-- --></a>
<h1 class="topictitle1">parallel_for</h1>
<div>
<p>Suppose you want to apply a function
<samp class="codeph">Foo</samp> to each element of an array, and it is safe to
process each element concurrently. Here is the sequential code to do this:
</p>
<pre>void SerialApplyFoo( float a[], size_t n ) {
for( size_t i=0; i!=n; ++i )
Foo(a[i]);
}</pre>
<p>The iteration space here is of type
<samp class="codeph">size_t</samp>, and goes from
<samp class="codeph">0</samp> to
<samp class="codeph">n-1</samp>. The template function
<span class="option">tbb::parallel_for</span> breaks this iteration space into chunks,
and runs each chunk on a separate thread. The first step in parallelizing this
loop is to convert the loop body into a form that operates on a chunk. The form
is an STL-style function object, called the
<em>body</em> object, in which
<samp class="codeph">operator()</samp> processes a chunk. The following code declares
the body object. The extra code required for Intel® Threading Building Blocks
is shown in
<samp class="codeph"><span style="color:blue"><strong>bold font</strong></span></samp>.
</p>
<pre><span style="color:blue"><strong>#include "tbb/tbb.h</strong>"</span>
<span style="color:blue"><strong>using namespace tbb;</strong></span>
<span style="color:blue"><strong>class ApplyFoo {</strong></span>
<span style="color:blue"><strong>float *const my_a;</strong></span>
<span style="color:blue"><strong>public:</strong></span>
<span style="color:blue"><strong>void operator()( const blocked_range<size_t>& r ) const {</strong></span>
<span style="color:blue"><strong>float *a = my_a;</strong></span>
for( size_t i=r.begin(); i!=r.end(); ++i )
Foo(a[i]);
<span style="color:blue"><strong>}</strong></span>
<span style="color:blue"><strong>ApplyFoo( float a[] ) :</strong></span>
<span style="color:blue"><strong>my_a(a)</strong></span>
<span style="color:blue"><strong>{}</strong></span>
<span style="color:blue"><strong>};</strong></span></pre>
<p>The
<samp class="codeph">using</samp> directive in the example enables you to use the
library identifiers without having to write out the namespace prefix
<samp class="codeph">tbb</samp> before each identifier. The rest of the examples
assume that such a
<samp class="codeph">using</samp> directive is present.
</p>
<p>Note the argument to
<samp class="codeph">operator()</samp>. A
<samp class="codeph">blocked_range<T></samp> is a template class provided by
the library. It describes a one-dimensional iteration space over type
<samp class="codeph">T</samp>. Class
<samp class="codeph">parallel_for</samp> works with other kinds of iteration spaces
too. The library provides
<samp class="codeph">blocked_range2d</samp> for two-dimensional spaces. You can
define your own spaces as explained in
<strong>Advanced Topic: Other Kinds of Iteration Spaces</strong>.
</p>
<p>An instance of
<samp class="codeph">ApplyFoo</samp> needs member fields that remember all the local
variables that were defined outside the original loop but used inside it.
Usually, the constructor for the body object will initialize these fields,
though
<samp class="codeph">parallel_for</samp> does not care how the body object is
created. Template function
<samp class="codeph">parallel_for</samp> requires that the body object have a copy
constructor, which is invoked to create a separate copy (or copies) for each
worker thread. It also invokes the destructor to destroy these copies. In most
cases, the implicitly generated copy constructor and destructor work correctly.
If they do not, it is almost always the case (as usual in C++) that you must
define
<em>both</em> to be consistent.
</p>
<p>Because the body object might be copied, its
<samp class="codeph">operator()</samp> should not modify the body. Otherwise the
modification might or might not become visible to the thread that invoked
<samp class="codeph">parallel_for</samp>, depending upon whether
<samp class="codeph">operator()</samp> is acting on the original or a copy. As a
reminder of this nuance,
<samp class="codeph">parallel_for</samp> requires that the body object's
<samp class="codeph">operator()</samp> be declared
<samp class="codeph">const</samp>.
</p>
<p>The example
<samp class="codeph">operator()</samp> loads
<samp class="codeph">my_a</samp> into a local variable
<samp class="codeph">a</samp>. Though not necessary, there are two reasons for doing
this in the example:
</p>
<ul type="disc">
<li>
<p><strong>Style</strong>. It makes the loop body look more like the original.
</p>
</li>
<li>
<p><strong>Performance</strong>. Sometimes putting frequently accessed values
into local variables helps the compiler optimize the loop better, because local
variables are often easier for the compiler to track.
</p>
</li>
</ul>
<p>Once you have the loop body written as a body object, invoke the
template function
<samp class="codeph">parallel_for</samp>, as follows:
</p>
<pre>#include "tbb/tbb.h"
void ParallelApplyFoo( float a[], size_t n ) {
parallel_for(blocked_range<size_t>(0,n), ApplyFoo(a));
}</pre>
<p>The
<samp class="codeph">blocked_range</samp> constructed here represents the entire
iteration space from 0 to n-1, which
<samp class="codeph">parallel_for</samp> divides into subspaces for each processor.
The general form of the constructor is
<samp class="codeph">blocked_range<T>(<em>begin</em>,<em>end</em>,<em>grainsize</em>)</samp>.
The
<var>T</var> specifies the value type. The arguments
<var>begin</var> and
<samp class="codeph"><em>end</em></samp> specify the iteration space STL-style as a
half-open interval [<samp class="codeph"><em>begin</em></samp>,<var>end</var>). The
argument
<em>grainsize</em> is explained in the
<strong>Controlling Chunking
</strong>section. The example uses the default grainsize of 1 because by
default
<samp class="codeph">parallel_for</samp> applies a heuristic that works well with
the default grainsize.
</p>
</div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="../tbb_userguide/Parallelizing_Simple_Loops.htm">Parallelizing Simple Loops</a></div>
</div>
<div class="See Also">
<ul class="ullinks">
<li class="ulchildlink"><a href="../tbb_userguide/Lambda_Expressions.htm">Lambda Expressions</a><br>
</li>
<li class="ulchildlink"><a href="../tbb_userguide/Automatic_Chunking.htm">Automatic Chunking</a><br>
</li>
<li class="ulchildlink"><a href="../tbb_userguide/Controlling_Chunking.htm">Controlling Chunking</a><br>
</li>
<li class="ulchildlink"><a href="../tbb_userguide/Bandwidth_and_Cache_Affinity.htm">Bandwidth and Cache Affinity</a><br>
</li>
<li class="ulchildlink"><a href="../tbb_userguide/Partitioner_Summary.htm">Partitioner Summary</a><br>
</li>
</ul>
<h2>See Also</h2>
<div class="linklist">
<div><a href="Advanced_Topic_Other_Kinds_of_Iteration_Spaces.htm#tutorial_Advanced_Topic_Other_Kinds_of_Iteration_Spaces">Advanced Topic: Other Kinds of Iteration Spaces
</a></div></div>
</div>
</body>
</html>
|