File: cuda_std_transform.dox

package info (click to toggle)
taskflow 3.9.0%2Bds-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 45,948 kB
  • sloc: cpp: 39,058; xml: 35,572; python: 12,935; javascript: 1,732; makefile: 59; sh: 16
file content (90 lines) | stat: -rw-r--r-- 2,394 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
namespace tf {

/** @page CUDASTDTransform Parallel Transforms

%Taskflow provides template methods for transforming ranges of items
to different outputs.

@tableofcontents

@section CUDASTDParallelTransformsIncludeTheHeader Include the Header

You need to include the header file, `%taskflow/cuda/algorithm/transform.hpp`, 
for using the parallel-transform algorithm.

@code{.cpp}
#include <taskflow/cuda/algorithm/transform.hpp>
@endcode

@section CUDASTDTransformARangeOfItems Transform a Range of Items

Parallel-transform algorithm applies the given transform function to a range of items and store the result in another range specified 
by two iterators, @c first and @c last.
The task created by tf::cuda_transform(P&& p, I first, I last, O output, C op) 
represents a parallel execution for the following loop:
    
@code{.cpp}
while (first != last) {
  *output++ = op(*first++);
}
@endcode

The following example creates a transform kernel that transforms an input
range of @c N items to an output range by multiplying each item by 10.

@code{.cpp}
tf::cudaDefaultExecutionPolicy policy;

// output[i] = input[i]*10
tf::cuda_transform(
  policy, input, input + N, output, [] __device__ (int x) { return x*10; }
);

// synchronize the execution
policy.synchronize();
@endcode

Each iteration is independent of each other and is assigned one kernel thread 
to run the callable.
The transform algorithm runs @em asynchronously through the stream specified
in the execution policy. You need to synchronize the stream to
obtain correct results.

@section CUDASTDTransformTwoRangesOfItems Transform Two Ranges of Items

You can transform two ranges of items to an output range through a binary operator.
The task created by 
tf::cuda_transform(P&& p, I1 first1, I1 last1, I2 first2, O output, C op) 
represents a parallel execution for the following loop:
    
@code{.cpp}
while (first1 != last1) {
  *output++ = op(*first1++, *first2++);
}
@endcode

The following example creates a transform kernel that transforms two input
ranges of @c N items to an output range by summing each pair of items 
in the input ranges.

@code{.cpp}
tf::cudaDefaultExecutionPolicy policy;

// output[i] = input1[i] + inpu2[i]
tf::cuda_transform(policy,
  input1, input1+N, input2, output, []__device__(int a, int b) { return a+b; }
); 

// synchronize the execution
policy.synchronize();
@endcode


*/
}