File: make_pipeline.md

package info (click to toggle)
libcudacxx 1.8.1-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 66,464 kB
  • sloc: cpp: 517,767; ansic: 9,474; python: 6,108; sh: 2,225; asm: 2,154; makefile: 7
file content (127 lines) | stat: -rw-r--r-- 4,402 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
grand_parent: Extended API
parent: Synchronization Primitives
---

# `cuda::make_pipeline`

Defined in header `<cuda/pipeline>`:

```cuda
// (1)
__host__ __device__
cuda::pipeline<cuda::thread_scope_thread> cuda::make_pipeline();

// (2)
template <typename Group,
          cuda::thread_scope Scope,
          cuda::std::uint8_t StagesCount>
__host__ __device__
cuda::pipeline<Scope>
cuda::make_pipeline(Group const& group,
                    cuda::pipeline_shared_state<Scope, StagesCount>* shared_state);

// (3)
template <typename Group,
          cuda::thread_scope Scope,
          cuda::std::uint8_t StagesCount>
__host__ __device__
cuda::pipeline<Scope>
cuda::make_pipeline(Group const& group,
                    cuda::pipeline_shared_state<Scope, StagesCount>* shared_state,
                    cuda::std::size_t producer_count);

// (4)
template <typename Group,
          cuda::thread_scope Scope,
          cuda::std::uint8_t StagesCount>
__host__ __device__
cuda::pipeline<Scope>
cuda::make_pipeline(Group const& group,
                    cuda::pipeline_shared_state<Scope, StagesCount>* shared_state,
                    cuda::pipeline_role role);
```

1. Creates a _unified pipeline_ such that the calling thread is the only
   participating thread and performs both producer and consumer actions.
2. Creates a _unified pipeline_ such that all the threads in `group` are
   performing both producer and consumer actions.
3. Creates a _partitioned pipeline_ such that `producer_threads` number of threads
   in `group` are performing producer actions while the others are performing
   consumer actions.
4. Creates a _partitioned pipeline_ where each thread's role is explicitly
   specified.

## Notes

All threads in `group` acquire collective ownership of the `shared_state`
  storage.

`make_pipeline` must be invoked by every threads in `group` such that
  `group::sync` may be invoked.

`shared_state` and `producer_count` must be the same across all threads in
  `group`, else the behavior is undefined.

`producer_count` must be strictly inferior to `group::size`, otherwise the
  behavior is undefined.

## Template Parameters

| `Group` | A type satisfying the [_ThreadGroup_] concept. |

## Parameters

| `group`          | The group of threads.                                                                                                    |
| `shared_state`   | A pointer to an object of type [`cuda::pipeline_shared_state<Scope>`] with `Scope` including all the threads in `group`. |
| `producer_count` | The number of _producer threads_ in the pipeline.                                                                        |
| `role`           | The role of the current thread in the pipeline.                                                                          |

## Return Value

A `cuda::pipeline` object.

## Example

```cuda
#include <cuda/pipeline>
#include <cooperative_groups.h>

// Disables `pipeline_shared_state` initialization warning.
#pragma diag_suppress static_var_with_dynamic_init

__global__ void example_kernel() {
  __shared__ cuda::pipeline_shared_state<cuda::thread_scope_block, 2> pss0;
  __shared__ cuda::pipeline_shared_state<cuda::thread_scope_block, 2> pss1;
  __shared__ cuda::pipeline_shared_state<cuda::thread_scope_block, 2> pss2;

  auto group = cooperative_groups::this_thread_block();

  // Create a single thread scoped pipeline.
  cuda::pipeline<cuda::thread_scope_thread> p0 = cuda::make_pipeline();

  // Create a unified block-scoped pipeline.
  cuda::pipeline<cuda::thread_scope_block> p1 = cuda::make_pipeline(group, &pss0);

  // Create a partitioned block-scoped pipeline where half the threads are producers.
  cuda::std::size_t producer_count = group.size() / 2;
  cuda::pipeline<cuda::thread_scope_block> p2
    = cuda::make_pipeline(group, &pss1, producer_count);

  // Create a partitioned block-scoped pipeline where all threads with an even
  // `thread_rank` are producers.
  auto thread_role = (group.thread_rank() % 2)
                     ? cuda::pipeline_role::producer
                    : cuda::pipeline_role::consumer;
  cuda::pipeline<cuda::thread_scope_block> p3
    = cuda::make_pipeline(group, &pss2, thread_role);
}
```

[See it on Godbolt](https://godbolt.org/z/aPcGEr64j){: .btn }


[_ThreadGroup_]: ../thread_groups.md

[`cuda::pipeline_shared_state<Scope>`]: ./pipeline_shared_state.md