1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244
|
<!DOCTYPE html
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- saved from url=(0014)about:internet -->
<html xmlns:MSHelp="http://www.microsoft.com/MSHelp/" lang="en-us" xml:lang="en-us"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="DC.Type" content="topic">
<meta name="DC.Title" content="How Task Scheduling Works">
<meta name="DC.subject" content="How Task Scheduling Works">
<meta name="keywords" content="How Task Scheduling Works">
<meta name="DC.Relation" scheme="URI" content="../tbb_userguide/The_Task_Scheduler.htm">
<meta name="DC.Relation" scheme="URI" content="Scheduler_Bypass.htm">
<meta name="DC.Relation" scheme="URI" content="Simple_Example_Fibonacci_Numbers.htm#tutorial_Simple_Example_Fibonacci_Numbers">
<meta name="DC.Relation" scheme="URI" content="Continuation_Passing.htm#tutorial_Continuation_Passing">
<meta name="DC.Format" content="XHTML">
<meta name="DC.Identifier" content="tutorial_How_Task_Scheduling_Works">
<link rel="stylesheet" type="text/css" href="../intel_css_styles.css">
<title>How Task Scheduling Works</title>
<xml>
<MSHelp:Attr Name="DocSet" Value="Intel"></MSHelp:Attr>
<MSHelp:Attr Name="Locale" Value="kbEnglish"></MSHelp:Attr>
<MSHelp:Attr Name="TopicType" Value="kbReference"></MSHelp:Attr>
</xml>
</head>
<body id="tutorial_How_Task_Scheduling_Works">
<!-- ==============(Start:NavScript)================= -->
<script src="..\NavScript.js" language="JavaScript1.2" type="text/javascript"></script>
<script language="JavaScript1.2" type="text/javascript">WriteNavLink(1);</script>
<!-- ==============(End:NavScript)================= -->
<a name="tutorial_How_Task_Scheduling_Works"><!-- --></a>
<h1 class="topictitle1">How Task Scheduling Works</h1>
<div>
<p>The scheduler evaluates a
<em>task graph.</em> The graph is a directed graph where each node is a
task. Each task points to its
<em>successor</em>, which is another task that is waiting on it to
complete, or NULL. Method
<span class="option">task::parent</span><samp class="codeph">()</samp>
gives you read-only access to the successor pointer. Each task has a
<em>refcount</em> that counts the number of tasks that have it as a successor.
The graph evolves over time.
</p>
<div class="fignone" id="fig7"><a name="fig7"><!-- --></a><span class="figcap">Snapshot of Task Graph for the Fibonacci Example</span>
<br><img src="Images/image014.jpg"><br>
</div>
<p> The figure above shows a snapshot of a task graph for the Fibonacci
example where:
</p>
<ul type="disc">
<li>
<p>Tasks A, B, and C spawned child tasks that they are waiting upon.
Their
<em>refcount</em> values are the number of children in flight plus one.
The extra one is part of a convention for explicit waiting that is explained
later in this section.
</p>
</li>
<li>
<p>Task D is running, but has not yet spawned any children, and so it has
not had to set its reference count yet.
</p>
</li>
<li>
<p>Tasks E, F, and G have been spawned, but have not yet started
executing.
</p>
</li>
</ul>
<p>The scheduler runs tasks in a way that tends to minimize both memory
demands and cross-thread communication. The intuition is that a balance must be
reached between depth-first and breadth-first execution. Assuming that the tree
is finite, depth-first is best for sequential execution for the following
reasons:
</p>
<ul type="disc">
<li>
<p><strong>Strike when the cache is hot</strong>. The deepest tasks are the most
recently created tasks, and therefore are hottest in cache. Also, if they can
complete, then task C can continue executing, and though not the hottest in
cache, it is still warmer than the older tasks above it.
</p>
</li>
<li>
<p><strong>Minimize space</strong>. Executing the shallowest task leads to
breadth-first unfolding of the tree. This creates an exponential number of
nodes that coexist simultaneously. In contrast, depth-first execution creates
the same number of nodes, but only a linear number have to exist at the same
time, because it stacks the other ready tasks (E, F, and G in the picture).
</p>
</li>
</ul>
<p>Though breadth-first execution has a severe problem with memory
consumption, it does maximize parallelism if you have an unlimited number of
physical threads. Since physical threads are limited, it is better to use only
enough breadth-first execution to keep the available processors busy.
</p>
<p>The scheduler implements a hybrid of depth-first and breadth-first
execution. Each thread has its own deque<a href="#ftn8"><sup><sup>[8]</sup></sup></a> of tasks that are ready to
run. When a thread spawns a task, it pushes it onto the bottom of its deque.
The following figure shows a snapshot of a thread's deque that corresponds to
the task graph in figure above.
</p>
<div class="fignone" id="fig8"><a name="fig8"><!-- --></a><span class="figcap">A Thread's Deque</span>
<br><img src="Images/image015.jpg" width="222" height="93"><br>
</div>
<p>When a thread participates in task graph evaluation, it continually
executes a task obtained by the first rule below that applies:
</p>
<ol>
<li>
<p id="Execution_Rules"><a name="Execution_Rules"><!-- --></a>Get the task returned by method
<samp class="codeph">execute</samp> for the previous task. This rule does not
apply if
<samp class="codeph">execute</samp> returned
<samp class="codeph">NULL</samp>.
</p>
</li>
<li>
<p>Pop a task from the
<em>bottom</em> of its
<em>own</em> deque. This rule does not apply if the deque is empty.
</p>
</li>
<li>
<p>Steal a task from the
<em>top</em> of
<em>another</em> randomly chosen deque. If the chosen deque is empty, the
thread tries this rule again until it succeeds.
</p>
</li>
</ol>
<p>Rule 1 is discussed in
<strong>Scheduler Bypass</strong>. The overall effect of rule 2 is to execute the
<em>youngest</em> task spawned by the thread, which causes depth-first
execution until the thread runs out of work. Then rule 3 applies. It steals the
<em>oldest</em> task spawned by another thread, which causes temporary
breadth-first execution that converts potential parallelism into actual
parallelism.
</p>
<p>Getting a task is always automatic; it happens as part of task graph
evaluation. Putting a task into a deque can be explicit or implicit. A thread
always pushes a task onto the bottom of its own deque, never another thread's
deque. Only theft can transfer a task spawned by one thread to another thread.
</p>
<p>There are three conditions that cause a thread to push a task onto its
deque:
</p>
<ul type="disc">
<li>
<p>The task is explicitly spawned by the thread, for example, by method
<samp class="codeph">spawn</samp>.
</p>
</li>
<li>
<p>A task has been marked for re-execution by method
<samp class="codeph">task::recycle_to_reexecute</samp>.
</p>
</li>
<li>
<p>The thread completes execution of the last predecessor task
<em>and</em> after doing so implicitly decrements the task's reference
count to zero. If so, the thread implicitly pushes the successor task onto the
bottom of its deque. Completing the last child does not cause the reference
count to become zero if the reference count includes extra references.
</p>
</li>
</ul>
<p>The example in
<strong>Fibonacci Numbers</strong> does not have implicit pushing, because it
explicitly waits for children to complete. It uses
<samp class="codeph">set_ref_count(3)</samp> for a task having only two children. The
extra reference protects the successor from being implicitly pushed.
<strong>Continuation Passing</strong> has a similar example that employs implicit
pushing. It uses
<samp class="codeph">set_ref_count(2)</samp> for a task having two children, so that
that task executes automatically when the children complete.
</p>
<p>To summarize, the task scheduler's fundamental strategy is 'breadth-first
theft and depth-first work". The breadth-first theft rule raises
parallelism sufficiently to keep threads busy. The depth-first work rule keeps
each thread operating efficiently once it has sufficient work to do.
</p>
</div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="../tbb_userguide/The_Task_Scheduler.htm">The Task Scheduler</a></div>
</div>
<div class="See Also">
<h2>See Also</h2>
<div class="linklist">
<div><a href="Scheduler_Bypass.htm">Scheduler Bypass
</a></div>
<div><a href="Simple_Example_Fibonacci_Numbers.htm#tutorial_Simple_Example_Fibonacci_Numbers">Simple Example: Fibonacci Numbers
</a></div>
<div><a href="Continuation_Passing.htm#tutorial_Continuation_Passing">
</a></div></div>
</div>
<p class="tfootnote"><a id="ftn8"><sup>[8]</sup></a> Double-ended queue.</p>
</body>
</html>
|