File: Wavefront.htm

package info (click to toggle)
tbb 4.2~20140122-5
  • links: PTS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 21,492 kB
  • ctags: 21,278
  • sloc: cpp: 92,813; ansic: 9,775; asm: 1,070; makefile: 1,057; sh: 351; java: 226; objc: 98; pascal: 71; xml: 41
file content (225 lines) | stat: -rwxr-xr-x 8,824 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
<!DOCTYPE html
  PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- saved from url=(0014)about:internet -->
<html xmlns:MSHelp="http://www.microsoft.com/MSHelp/" lang="en-us" xml:lang="en-us"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<meta name="DC.Type" content="topic">
<meta name="DC.Title" content="Wavefront">
<meta name="DC.subject" content="Wavefront">
<meta name="keywords" content="Wavefront">
<meta name="DC.Relation" scheme="URI" content="../../tbb_userguide/Design_Patterns/Design_Patterns.htm">
<meta name="DC.Relation" scheme="URI" content="General_References.htm#General_References">
<meta name="DC.Format" content="XHTML">
<meta name="DC.Identifier" content="Wavefront">
<link rel="stylesheet" type="text/css" href="../../intel_css_styles.css">
<title>Wavefront</title>
<xml>
<MSHelp:Attr Name="DocSet" Value="Intel"></MSHelp:Attr>
<MSHelp:Attr Name="Locale" Value="kbEnglish"></MSHelp:Attr>
<MSHelp:Attr Name="TopicType" Value="kbReference"></MSHelp:Attr>
</xml>
</head>
<body id="Wavefront">
 <!-- ==============(Start:NavScript)================= -->
 <script src="..\..\NavScript.js" language="JavaScript1.2" type="text/javascript"></script>
 <script language="JavaScript1.2" type="text/javascript">WriteNavLink(2);</script>
 <!-- ==============(End:NavScript)================= -->
<a name="Wavefront"><!-- --></a>

 
  <h1 class="topictitle1">Wavefront</h1>
 
   
  <div> 
	 <div class="section"><h2 class="sectiontitle">Problem</h2> 
		 
		<p>Perform computations on items in a data set, where the computation on
		  an item uses results from computations on predecessor items. 
		</p>
 
	 </div>
 
	 <div class="section"><h2 class="sectiontitle">Context</h2> 
		 
		<p>The dependences between computations form an acyclic graph. 
		</p>
 
	 </div>
 
	 <div class="section"><h2 class="sectiontitle">Forces</h2> 
		 
		<ul type="disc"> 
		  <li> 
			 <p>Dependence constraints between items form an acyclic graph. 
			 </p>
 
		  </li>
 
		  <li> 
			 <p>The number of immediate predecessors in the graph is known in
				advance, or can be determined some time before the last predecessor completes. 
			 </p>
 
		  </li>
 
		</ul>
 
	 </div>
 
	 <div class="section"><h2 class="sectiontitle">Solution</h2> 
		 
		<p>The solution is a parallel variant of topological sorting, using 
		  <samp class="codeph">tbb::parallel_do</samp> to process items. Associate an atomic
		  counter with each item. Initialize each counter to the number of predecessors.
		  Invoke tbb::parallel_do to process the items that have no predessors (have
		  counts of zero). After an item is processed, decrement the counters of its
		  successors. If a successor's counter reaches zero, add that successor to the 
		  <samp class="codeph">tbb::parallel_do</samp> via a "feeder". 
		</p>
 
		<p>If the number of predecessors for an item cannot be determined in
		  advance, treat the information "know number of predecessors" as an
		  additional predecessor. When the number of predecessors becomes known, treat
		  this conceptual predecessor as completed. 
		</p>
 
		<p>If the overhead of counting individual items is excessive, aggregate
		  items into blocks, and do the wavefront over the blocks. 
		</p>
 
	 </div>
 
	 <div class="section"><h2 class="sectiontitle">Example</h2> 
		 
		<p>Below is a serial kernel for the longest common subsequence algorithm.
		  The parameters are strings 
		  <samp class="codeph">x</samp> and 
		  <samp class="codeph">y</samp> with respective lengths 
		  <samp class="codeph">xlen</samp> and 
		  <samp class="codeph">ylen</samp>. 
		</p>
 
		<pre>int F[MAX_LEN+1][MAX_LEN+1];

void SerialLCS( const char* x, size_t xlen, const char* y, size_t ylen )
{
   for( size_t i=1; i&lt;=xlen; ++i )
       for( size_t j=1; j&lt;=ylen; ++j )
           F[i][j] = x[i-1]==y[j-1] ? F[i-1][j-1]+1:
                                      max(F[i][j-1],F[i-1][j]);
}</pre> 
		<p>The kernel sets 
		  <samp class="codeph">F[i][j]</samp> to the length of the longest common
		  subsequence shared by 
		  <samp class="codeph">x[0..i-1]</samp> and 
		  <samp class="codeph">y[0..j-1]</samp>. It assumes that F[0][0..ylen] and 
		  <samp class="codeph">F[0..xlen][0]</samp> have already been initialized to zero. 
		</p>
 
		<p>The following figure shows the data dependences for calculating 
		  <samp class="codeph">F[i][j]</samp>. 
		</p>
 
		<div class="fignone" id="fig3"><a name="fig3"><!-- --></a><span class="figcap">Data dependences for longest common substring
			 calculation.</span> 
		  <br><img width="122" height="122" src="Images/image005.jpg"><br> 
		</div>
 
		<p>The following figure shows the gray diagonal dependence is the
		  transitive closure of other dependencies. Thus for parallelization purposes it
		  is a redundant dependence that can be ignored. 
		</p>
 
		<div class="fignone" id="fig4"><a name="fig4"><!-- --></a><span class="figcap">Diagonal dependence is redundant.</span> 
		  <br><img width="122" height="122" src="Images/image006.jpg"><br> 
		</div>
 
		<p>It is generally good to remove redundant dependences from
		  consideration, because the atomic counting incurs a cost for each dependence
		  considered. 
		</p>
 
		<p>Another consideration is grain size. Scheduling each 
		  <samp class="codeph">F[i][j]</samp> element calculation separately is
		  prohibitively expensive. A good solution is to aggregate the elements into
		  contiguous blocks, and process the contents of a block serially. The blocks
		  have the same dependence pattern, but at a block scale. Hence scheduling
		  overheads can be amortized over blocks. 
		</p>
 
		<p>The parallel code follows. Each block consists of 
		  <samp class="codeph">N×N</samp> elements. Each block has an associated atomic
		  counter. Array 
		  <samp class="codeph">Count</samp> organizes these counters for easy lookup. The
		  code initializes the counters and then rolls a wavefront using 
		  <samp class="codeph">parallel_do</samp>, starting with the block at the origin
		  since it has no predecessors. 
		</p>
 
		<pre>const int N = 64;
tbb::atomic&lt;char&gt; Count[MAX_LEN/N+1][MAX_LEN/N+1];
&nbsp;
void ParallelLCS( const char* x, size_t xlen, const char* y, size_t ylen ) {
   // Initialize predecessor counts for blocks.
   size_t m = (xlen+N-1)/N;
   size_t n = (ylen+N-1)/N;
   for( int i=0; i&lt;m; ++i )
       for( int j=0; j&lt;n; ++j )
           Count[i][j] = (i&gt;0)+(j&gt;0);
   // Roll the wavefront from the origin.
   typedef pair&lt;size_t,size_t&gt; block;
   block origin(0,0);
   tbb::parallel_do( &amp;origin, &amp;origin+1,
       [=]( const block&amp; b, tbb::parallel_do_feeder&lt;block&gt;&amp;feeder ) {
           // Extract bounds on block
           size_t bi = b.first;
           size_t bj = b.second;
           size_t xl = N*bi+1;
           size_t xu = min(xl+N,xlen+1);
           size_t yl = N*bj+1;
           size_t yu = min(yl+N,ylen+1);
           // Process the block
           for( size_t i=xl; i&lt;xu; ++i )
               for( size_t j=yl; j&lt;yu; ++j )
                   F[i][j] = x[i-1]==y[j-1] ? F[i-1][j-1]+1:
                                              max(F[i][j-1],F[i-1][j]);
           // Account for successors
           if( bj+1&lt;n &amp;&amp; ‐‐Count[bi][bj+1]==0 )
               feeder.add( block(bi,bj+1) );
           if( bi+1&lt;m &amp;&amp; ‐‐Count[bi+1][bj]==0 )
               feeder.add( block(bi+1,bj) );       }
   );
}</pre> 
		<p>A regular structure simplifies implementation of the wavefront
		  pattern, but is not required. The parallel preorder traversal in 
		  <samp class="codeph">examples/parallel_do/parallel_preorder</samp> applies the
		  wavefront pattern to traverse each node of a graph in parallel, subject to the
		  constraint that a node is traversed after its predecessors are traversed. In
		  that example, each node in the graph stores its predecessor count. 
		</p>
 
	 </div>
 
	 <div class="section"><h2 class="sectiontitle">References</h2> 
		 
		<p>Eun-Gyu Kim and Mark Snir, "Wavefront Pattern",
		  http://www.cs.uiuc.edu/homes/snir/PPP/patterns/wavefront.pdf 
		</p>
 
	 </div>
 
  </div>
 
  
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong>&nbsp;<a href="../../tbb_userguide/Design_Patterns/Design_Patterns.htm">Design Patterns</a></div>
</div>
<div class="See Also">
<h2>See Also</h2>
<div class="linklist">
<div><a href="General_References.htm#General_References">General References 
		  </a></div></div>
</div> 

</body>
</html>