1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
|
namespace tf {
/** @page CUDASTDTransform Parallel Transforms
%Taskflow provides template methods for transforming ranges of items
to different outputs.
@tableofcontents
@section CUDASTDParallelTransformsIncludeTheHeader Include the Header
You need to include the header file, `%taskflow/cuda/algorithm/transform.hpp`,
for using the parallel-transform algorithm.
@code{.cpp}
#include <taskflow/cuda/algorithm/transform.hpp>
@endcode
@section CUDASTDTransformARangeOfItems Transform a Range of Items
Parallel-transform algorithm applies the given transform function to a range of items and store the result in another range specified
by two iterators, @c first and @c last.
The task created by tf::cuda_transform(P&& p, I first, I last, O output, C op)
represents a parallel execution for the following loop:
@code{.cpp}
while (first != last) {
*output++ = op(*first++);
}
@endcode
The following example creates a transform kernel that transforms an input
range of @c N items to an output range by multiplying each item by 10.
@code{.cpp}
tf::cudaDefaultExecutionPolicy policy;
// output[i] = input[i]*10
tf::cuda_transform(
policy, input, input + N, output, [] __device__ (int x) { return x*10; }
);
// synchronize the execution
policy.synchronize();
@endcode
Each iteration is independent of each other and is assigned one kernel thread
to run the callable.
The transform algorithm runs @em asynchronously through the stream specified
in the execution policy. You need to synchronize the stream to
obtain correct results.
@section CUDASTDTransformTwoRangesOfItems Transform Two Ranges of Items
You can transform two ranges of items to an output range through a binary operator.
The task created by
tf::cuda_transform(P&& p, I1 first1, I1 last1, I2 first2, O output, C op)
represents a parallel execution for the following loop:
@code{.cpp}
while (first1 != last1) {
*output++ = op(*first1++, *first2++);
}
@endcode
The following example creates a transform kernel that transforms two input
ranges of @c N items to an output range by summing each pair of items
in the input ranges.
@code{.cpp}
tf::cudaDefaultExecutionPolicy policy;
// output[i] = input1[i] + inpu2[i]
tf::cuda_transform(policy,
input1, input1+N, input2, output, []__device__(int a, int b) { return a+b; }
);
// synchronize the execution
policy.synchronize();
@endcode
*/
}
|