1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267
|
<!DOCTYPE html
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- saved from url=(0014)about:internet -->
<html xmlns:MSHelp="http://www.microsoft.com/MSHelp/" lang="en-us" xml:lang="en-us"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="DC.Type" content="topic">
<meta name="DC.Title" content="Reduction">
<meta name="DC.subject" content="Reduction">
<meta name="keywords" content="Reduction">
<meta name="DC.Relation" scheme="URI" content="../../tbb_userguide/Design_Patterns/Design_Patterns.htm">
<meta name="DC.Relation" scheme="URI" content="Agglomeration.htm#Agglomeration">
<meta name="DC.Relation" scheme="URI" content="Elementwise.htm#Elementwise">
<meta name="DC.Relation" scheme="URI" content="Divide_and_Conquer.htm#Divide_and_Conquer">
<meta name="DC.Format" content="XHTML">
<meta name="DC.Identifier" content="Reduction">
<link rel="stylesheet" type="text/css" href="../../intel_css_styles.css">
<title>Reduction</title>
<xml>
<MSHelp:Attr Name="DocSet" Value="Intel"></MSHelp:Attr>
<MSHelp:Attr Name="Locale" Value="kbEnglish"></MSHelp:Attr>
<MSHelp:Attr Name="TopicType" Value="kbReference"></MSHelp:Attr>
</xml>
</head>
<body id="Reduction">
<!-- ==============(Start:NavScript)================= -->
<script src="..\..\NavScript.js" language="JavaScript1.2" type="text/javascript"></script>
<script language="JavaScript1.2" type="text/javascript">WriteNavLink(2);</script>
<!-- ==============(End:NavScript)================= -->
<a name="Reduction"><!-- --></a>
<h1 class="topictitle1">Reduction</h1>
<div>
<div class="section"><h2 class="sectiontitle">Problem</h2>
<p>Perform an associative reduction operation across a data set.
</p>
</div>
<div class="section"><h2 class="sectiontitle">Context</h2>
<p>Many serial algorithms sweep over a set of items to collect summary
information.
</p>
</div>
<div class="section"><h2 class="sectiontitle">Forces</h2>
<p>The summary can be expressed as an associative operation over the data
set, or at least is close enough to associative that reassociation does not
matter.
</p>
</div>
<div class="section"><h2 class="sectiontitle">Solution</h2>
<p>Two solutions exist in Intel® Threading Building Blocks (Intel® TBB).
The choice on which to use depends upon several considerations:
</p>
<ul type="disc">
<li>
<p>Is the operation commutative as well as associative?
</p>
</li>
<li>
<p>Are instances of the reduction type expensive to construct and
destroy. For example, a floating point number is inexpensive to construct. A
sparse floating-point matrix might be very expensive to construct.
</p>
</li>
</ul>
<p>Use
<samp class="codeph">tbb::parallel_reduce</samp> when the objects are inexpensive
to construct. It works even if the reduction operation is not commutative. The
Intel® TBB Tutorial describes how to use
<samp class="codeph">tbb::parallel_reduce</samp> for basic reductions.
</p>
<p>Use
<samp class="codeph">tbb::parallel_for</samp> and
<samp class="codeph">tbb::combinable</samp> if the reduction operation is
commutative and instances of the type are expensive.
</p>
<p>If the operation is not precisely associative but a precisely
deterministic result is required, use recursive reduction and parallelize it
using
<samp class="codeph">tbb::parallel_invoke</samp>.
</p>
</div>
<div class="section"><h2 class="sectiontitle">Examples</h2>
<p>The examples presented here illustrate the various solutions and some
tradeoffs.
</p>
<p>The first example uses
<samp class="codeph">tbb::parallel_reduce</samp> to do a + reduction over sequence
of type <samp class="codeph">T</samp>. The sequence is defined by a half-open interval [first,last).
</p>
<pre>T AssocReduce( const T* first, const T* last, T identity ) {
return tbb::parallel_reduce(
// Index range for reduction
tbb::blocked_range<const T*>(first,last),
// Identity element
identity,
// Reduce a subrange and partial sum
[&]( tbb::blocked_range<const T*> r, T partial_sum )->float {
return std::accumulate( r.begin(), r.end(), partial_sum );
},
// Reduce two partial sums
std::plus<T>()
);
} </pre>
<p>The third and fourth arguments to this form of <samp class="codeph">parallel_reduce</samp> are a
built in form of the agglomeration pattern. If there is an elementwise action
to be performed before the reduction, incorporating it into the third argument
(reduction of a subrange) may improve performance because of better locality of
reference.
</p>
<p>The second example assumes the + is commutative on <samp class="codeph">T</samp>. It is a good
solution when <samp class="codeph">T</samp> objects are expensive to construct.
</p>
<pre>T CombineReduce( const T* first, const T* last, T identity ) {
tbb::combinable<T> sum(identity);
tbb::parallel_for(
tbb::blocked_range<const T*>(first,last),
[&]( tbb::blocked_range<const T*> r ) {
sum.local() += std::accumulate(r.begin(), r.end(), identity);
}
);
return sum.combine( []( const T& x, const T& y ) {return x+y;} );
}</pre>
<p>Sometimes it is desirable to destructively use the partial results to
generate the final result. For example, if the partial results are lists, they
can be spliced together to form the final result. In that case use class
<samp class="codeph">tbb::enumerable_thread_specific</samp> instead of
<samp class="codeph">combinable</samp>. The
<samp class="codeph">ParallelFindCollisions</samp>
example in <em>Divide amd Conquer</em> demonstrates the technique.
</p>
<p>Floating-point addition and multiplication are almost associative.
Reassociation can cause changes because of rounding effects. The techniques
shown so far reassociate terms non-deterministically. Fully deterministic
parallel reduction for a not quite associative operation requires using
deterministic reassociation. The code below demonstrates this in the form of a
template that does a + reduction over a sequence of values of type <samp class="codeph">T</samp>.
</p>
<pre>template<typename T>
T RepeatableReduce( const T* first, const T* last, T identity ) {
if( last-first<=1000 ) {
// Use serial reduction
return std::accumulate( first, last, identity );
} else {
// Do parallel divide-and-conquer reduction
const T* mid = first+(last-first)/2;
T left, right;
tbb::parallel_invoke(
[&]{left=RepeatableReduce(first,mid,identity);},
[&]{right=RepeatableReduce(mid,last,identity);}
);
return left+right;
}
}</pre>
<p>The outer if-else is an instance of the agglomeration pattern for
recursive computations. The reduction graph, though not a strict binary tree,
is fully deterministic. Thus the result will always be the same for a given
input sequence, assuming all threads do identical floating-point rounding.
</p>
<p>The final example shows how a problem that typically is not viewed as
a reduction can be parallelized by viewing it as a reduction. The problem is
retrieving floating-point exception flags for a computation across a data set.
The serial code might look something like:
</p>
<pre> feclearexcept(FE_ALL_EXCEPT);
for( int i=0; i<N; ++i )
C[i]=A[i]*B[i];
int flags = fetestexcept(FE_ALL_EXCEPT);
if (flags & FE_DIVBYZERO) ...;
if (flags & FE_OVERFLOW) ...;
...</pre>
<p>The code can be parallelized by computing chunks of the loop
separately, and merging floating-point flags from each chunk. To do this with
<samp class="codeph">tbb:parallel_reduce</samp>, first define a
"body" type, as shown below.
</p>
<pre>struct ComputeChunk {
int flags; // Holds floating-point exceptions seen so far.
void reset_fpe() {
flags=0;
feclearexcept(FE_ALL_EXCEPT);
}
ComputeChunk () {
reset_fpe();
}
// "Splitting constructor"called by parallel_reduce when splitting a range into subranges.
ComputeChunk ( const ComputeChunk&, tbb::split ) {
reset_fpe();
}
// Operates on a chunk and collects floating-point exception state into flags member.
void operator()( tbb::blocked_range<int> r ) {
int end=r.end();
for( int i=r.begin(); i!=end; ++i )
C[i] = A[i]/B[i];
// It is critical to do |= here, not =, because otherwise we
// might lose earlier exceptions from the same thread.
flags |= fetestexcept(FE_ALL_EXCEPT);
}
// Called by parallel_reduce when joining results from two subranges.
void join( Body& other ) {
flags |= other.flags;
}
};</pre>
<p>Then invoke it as follows:
</p>
<pre>// Construction of cc implicitly resets FP exception state.
ComputeChunk cc;
tbb::parallel_reduce( tbb::blocked_range<int>(0,N), cc );
if (cc.flags & FE_DIVBYZERO) ...;
if (cc.flags & FE_OVERFLOW) ...;
...</pre>
<p>
</p>
</div>
</div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="../../tbb_userguide/Design_Patterns/Design_Patterns.htm">Design Patterns</a></div>
</div>
<div class="See Also">
<h2>See Also</h2>
<div class="linklist">
<div><a href="Agglomeration.htm#Agglomeration">Agglomeration
</a></div>
<div><a href="Elementwise.htm#Elementwise">Elementwise
</a></div>
<div><a href="Divide_and_Conquer.htm#Divide_and_Conquer">Divide and Conquer
</a></div></div>
</div>
</body>
</html>
|