1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171
|
namespace tf {
/** @page ParallelReduction Parallel Reduction
%Taskflow provides template function that constructs a task to perform parallel reduction over a range of items.
@tableofcontents
@section ParallelReductionInclude Include the Header
You need to include the header file, <tt>taskflow/algorithm/reduce.hpp</tt>,
for creating a parallel-reduction task.
@code{.cpp}
#include <taskflow/algorithm/reduce.hpp>
@endcode
@section A2ParallelReduction Create a Parallel-Reduction Task
The reduction task created by
tf::Taskflow::reduce(B first, E last, T& result, O bop, P&& part) performs
parallel reduction over a range of elements specified by <tt>[first, last)</tt> using the binary operator @c bop and stores the reduced result in @c result.
It represents the parallel execution of the following reduction loop:
@code{.cpp}
for(auto itr=first; itr<last; itr++) {
result = bop(result, *itr);
}
@endcode
At runtime, the reduction task spawns a subflow to perform parallel reduction.
The reduced result is stored in @c result that will be captured by reference in the reduction task.
It is your responsibility to ensure @c result remains alive during the parallel execution.
@code{.cpp}
int sum = 100;
std::vector<int> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
tf::Task task = taskflow.reduce(vec.begin(), vec.end(), sum,
[] (int l, int r) { return l + r; } // binary reducer operator
);
executor.run(taskflow).wait();
assert(sum == 100 + 55);
@endcode
The order in which the binary operator is applied to pairs of elements is @em unspecified.
In other words, the elements of the range may be grouped and rearranged in arbitrary order.
The result and the argument types of the binary operator must be consistent with the input data type.
@section ParallelReductionCaptureIteratorsByReference Capture Iterators by Reference
You can pass iterators by reference using @std_ref
to marshal parameter update between dependent tasks.
This is especially useful when the range is unknown
at the time of creating a parallel-reduction task,
but needs initialization from another task.
@code{.cpp}
int sum = 100;
std::vector<int> vec;
std::vector<int>::iterator first, last;
tf::Task init = taskflow.emplace([&](){
vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
first = vec.begin();
last = vec.end();
});
tf::Task task = taskflow.reduce(std::ref(first), std::ref(last), sum,
[] (int l, int r) { return l + r; } // binary reducer operator
);
// wrong! must use std::ref, or first and last are captured by copy
// tf::Task task = taskflow.reduce(first, last, sum, [] (int l, int r) {
// return l + r; // binary reducer operator
// });
init.precede(task);
executor.run(taskflow).wait();
assert(sum == 100 + 55);
@endcode
In the above example, when @c init finishes, @c vec has been initialized to 10 elements with
@c first and @c last pointing to the data range of @c vec.
The reduction task will then work on this initialized range
as a result of passing iterators by reference.
@section A2ParallelTransformationReduction Create a Parallel-Transform-Reduction Task
It is common to transform each element into a new data type and
then perform reduction on the transformed elements.
%Taskflow provides a method,
tf::Taskflow::transform_reduce(B first, E last, T& result, BOP bop, UOP uop, P&& part),
that applies @c uop to transform each element in the specified range
and then perform parallel reduction over @c result and transformed elements.
It represents the parallel execution of the following reduction loop:
@code{.cpp}
for(auto itr=first; itr<last; itr++) {
result = bop(result, uop(*itr));
}
@endcode
The example below transforms each digit in a string to an integer number
and then sums up all integers in parallel.
@code{.cpp}
std::string str = "12345678";
int sum {0};
tf::Task task = taskflow.transform_reduce(str.begin(), str.end(), sum,
[] (int a, int b) { // binary reduction operator
return a + b;
},
[] (char c) -> int { // unary transformation operator
return c - '0';
}
);
executor.run(taskflow).wait();
assert(sum == 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8); // sum will be 36
@endcode
The order in which we apply the binary operator on the transformed elements is @em unspecified.
It is possible that the binary operator will take @em r-value in both arguments, for example,
<tt>bop(uop(*itr1), uop(*itr2))</tt>, due to the transformed temporaries.
When data passing is expensive,
you may define the result type @c T to be move-constructible.
@section ParallelReductionCfigureAPartitioner Configure a Partitioner
You can configure a partitioner for parallel-reduction tasks to run with different
scheduling methods, such as guided partitioning, dynamic partitioning, and static partitioning.
The following example creates two parallel-reduction tasks using two different
partitioners, one with the static partitioning algorithm and
another one with the guided partitioning algorithm:
@code{.cpp}
tf::StaticPartitioner static_partitioner;
tf::GuidedPartitioner guided_partitioner;
int sum1 = 100, sum2 = 100;
std::vector<int> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
// create a parallel-reduction task with static partitioner
taskflow.reduce(vec.begin(), vec.end(), sum1,
[] (int l, int r) { return l + r; },
static_partitioner
);
// create a parallel-reduction task with guided partitioner
taskflow.reduce(vec.begin(), vec.end(), sum2,
[] (int l, int r) { return l + r; },
guided_partitioner
);
@endcode
@attention
By default, parallel-reduction tasks use tf::DefaultPartitioner
if no partitioner is specified.
*/
}
|