File: annotated_ptr.md

package info (click to toggle)
libcudacxx 1.8.1-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 66,464 kB
  • sloc: cpp: 517,767; ansic: 9,474; python: 6,108; sh: 2,225; asm: 2,154; makefile: 7
file content (290 lines) | stat: -rw-r--r-- 11,582 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
---
parent: Memory access properties
grand_parent: Extended API
nav_order: 1
---

# `cuda::annotated_ptr`

Defined in header `<cuda/annotated_ptr>`:

```cuda
namespace cuda {
template <typename Type, typename Property>
class annotated_ptr<Type, Property>;
} // namespace cuda
```

**Mandates**: `Property` is one of:

* [`cuda::access_property::shared`],
* [`cuda::access_property::global`],
* [`cuda::access_property::persisting`],
* [`cuda::access_property::normal`],
* [`cuda::access_property::streaming`], or
* [`cuda::access_property`] (a type-erased property with a runtime value).

_Note_: if `Property` is [`cuda::access_property`], i.e. a dynamic property with a runtime value, then `sizeof(cuda::annotated_ptr<Type, cuda::access_property>) == 2 * sizeof(Type*)`. Otherwise, its size is `sizeof(Type*)`.

The class template [`cuda::annotated_ptr`] is a pointer annotated with an access property that _may_ be applied to memory operations performed through the [`cuda::annotated_ptr`].

In contrast with [`cuda::associate_access_property`], [`cuda::annotated_ptr`] maintains the association when passed through ABI boundaries, e.g., calling a non-inlined library function with a [`cuda::annotated_ptr`] argument.

It implements a pointer-like interface:

| Pointer Expression  | `cuda::annotated_ptr<T, P>`               | Description                                 |
|=====================|===========================================|=============================================|
| `T* a`              | `cuda::annotated_ptr<T, P> a`             | non-`const` pointer to non-`const` memory   |
| `T const * a`       | `cuda::annotated_ptr<T const, P> a`       | non-`const` pointer to `const` memory       |
| `T* const a`        | `const cuda::annotated_ptr<T, P> a`       | `const` pointer to non-`const` memory       |
| `T const* const a`  | `const cuda::annotated_ptr<T const, P> a` | `const` pointer to `const` memory           |
| `val = *a;`         | `val = *a;`                               | dereference operator to load an element     |
| `*a = val;`         | `*a = val;`                               | dereference operator to store an element    |
| `val = a[n];`       | `val = a[n];`                             | subscript operator to load an element       |
| `a[n] = val;`       | `a[n] = val;`                             | subscript operator to store an element      |
| `T* a = nullptr;`   | `annotated_ptr<T, P> a = nullptr;`        | `nullptr` initialization                    |
| `n = a - b;`        | `n = a - b;`                              | difference operator                         |
| `if (a) { ... }`    | `if (a) { ... }`                          | explicit bool conversion                    |

But it is not a drop-in replacement for pointers since, among others, it does not:

* model any [`Iterator`] concept,
* implement [`std::pointer_traits`], [`std::iterator_traits`], etc.
* have the same variance as pointer.

```cuda
namespace cuda {

template<class Type, class Property>
class annotated_ptr {
public:
  using value_type = Type;
  using size_type = std::size_t;
  using reference = value_type &;
  using pointer = value_type *;
  using const_pointer = value_type const *;
  using difference_type = std::ptrdiff_t;

  __host__ __device__ constexpr annotated_ptr() noexcept;
  __host__ __device__ constexpr annotated_ptr(annotated_ptr const&) noexcept = default;
  __host__ __device__ constexpr annotated_ptr& operator=(annotated_ptr const&) noexcept = default;
  __host__ __device__ explicit annotated_ptr(pointer);
  template <class RuntimeProperty>
  __host__ __device__ annotated_ptr(pointer, RuntimeProperty);
  template <class T, class P>
  __host__ __device__ annotated_ptr(annotated_ptr<T,P> const&);

  __host__ __device__ constexpr explicit operator bool() const noexcept;
  __host__ __device__ pointer get() const noexcept;

  __host__ __device__ reference operator*() const;
  __host__ __device__ pointer operator->() const;
  __host__ __device__ reference operator[](std::ptrdiff_t) const;
  __host__ __device__ constexpr difference_type operator-(annotated_ptr);

private:
  pointer ptr;   // exposition only
  Property prop; // exposition only
};

} // namespace cuda
```

## Constructors and assignment

### Default constructor

```cuda
constexpr annotated_ptr() noexcept;
```

**Effects**:  as if constructed by `annotated_ptr(nullptr)`;

### Constructor from pointer

```cuda
constexpr explicit annotated_ptr(pointer ptr);
```

**Preconditions**:

* if `Property` is [`cuda::access_property::shared`] then `ptr` must be a generic pointer that is valid to cast to a pointer to the shared memory address space.
* if `Property` is [`cuda::access_property::global`], [`cuda::access_property::normal`], [`cuda::access_property::streaming`], [`cuda::access_property::persisting`], or [`cuda::access_property`]  then `ptr` must be a generic pointer that is valid to cast to a pointer to the global memory address space.

**Effects**:  Constructs an `annotated_ptr` requesting associating `ptr` with `Property`. 
If `Property` is [`cuda::access_property`] then `prop` is initialized with [`cuda::access_property::global`].

**Note**: in **Preconditions** "valid" means that casting the generic pointer to the corresponding address space does not introduce undefined behavior.

### Constructor from pointer and access property

```cuda
template <class RuntimeProperty>
annotated_ptr(pointer ptr, RuntimeProperty prop);
```

**Mandates**:

* `Property` is [`cuda::access_property`].
* `RuntimeProperty` is any of [`cuda::access_property::global`], [`cuda::access_property::normal`], [`cuda::access_property::streaming`], [`cuda::access_property::persisting`], or [`cuda::access_property`].

**Preconditions**: `ptr` is a pointer to a valid allocation in the global memory address space.

**Effects**:  Constructs an `annotated_ptr` requesting the association of `ptr` with the property `prop`.

# Copy constructor from a different `annotated_ptr`

```cuda
template <class T, class P>
constexpr annotated_ptr(annotated_ptr<T,P> const& a);
```

**Mandates**:

* `annotated_ptr<Type, Property>::pointer` is assignable from `annotated_ptr<T, P>::pointer`.
* `Property` is either [`cuda::access_property`] or `P`.
* `Property` and `P` specify the same memory space.

**Preconditions**: `pointer` is compatible with `Property`.

**Effects**: Constructs an `annotated_ptr` for the same pointer as the input `annotated_ptr`.


## Explicit conversion operator to `bool`

```cuda
constexpr operator bool() const noexcept;
```

**Returns**: `false` if the pointer is a `nullptr`, `true` otherwise.


## Raw pointer access

```cuda
pointer get() const noexcept;
```

**Returns**: A pointer derived from the `annotated_ptr`.

## Operators

### Dereference

```cuda
reference operator*() const;
```

**Preconditions**: The `annotated_ptr` is not null and points to a valid `T` value.

**Returns**: [`*cuda::associate_access_property(ptr, prop)`][`cuda::associate_access_property`]

### Pointer-to-member

```cuda
pointer operator->() const;
```

**Preconditions**: the `annotated_ptr` is not null.

**Returns**: [`cuda::associate_access_property(ptr, prop)`][`cuda::associate_access_property`]

### Subscript

```cuda
reference operator[](ptrdiff_t i) const;
```

**Preconditions**: `ptr` points to a valid allocation of at least size `[ptr, ptr+i]`.

**Returns**: [`*cuda::associate_access_property(ptr+i,prop)`][`cuda::associate_access_property`]

### Pointer distance

```cuda
constexpr difference_type operator-(annotated_ptr p) const;
```

**Preconditions**: `ptr` and `p` point to the same allocation.

**Returns**: as-if `get() - p.get()`.

## Example

Given three input and output vectors `x`, `y`, and `z`, and two arrays of coefficients `a` and `b`, all of length `N`:

```cuda
size_t N;
int* x, *y, *z;
int* a, *b;
```

the grid-strided kernel:

```cuda
__global__ void update(int* const x, int const* const a, int const* const b, size_t N) {
    auto g = cooperative_groups::this_grid();
    for (int i = g.thread_rank(); idx < N; idx += g.size()) {
        x[i] = a[i] * x[i] + b[i];
    }
}
```

updates `x`, `y`, and `z` as follows:

```cuda
update<<<grid, block>>>(x, a, b, N);
update<<<grid, block>>>(y, a, b, N);
update<<<grid, block>>>(z, a, b, N);
```

The elements of `a` and `b` are used in all kernels.
If `N` is large enough, elements of `a` and `b` might be evicted from the L2 cache, requiring these to be re-loaded from memory in the next `update`.

We can make the `update` kernel generic to allow the caller to pass [`cuda::annotated_ptr`] objects that hint at how memory will be accessed:

```cuda
template <typename PointerX, typename PointerA, typename PointerB>
__global__ void update_template(PointerX x, PointerA a, PointerB b, size_t N) {
    auto g = cooperative_groups::this_grid();
    for (int idx = g.thread_rank(); idx < N; idx += g.size()) {
        x[idx] = a[idx] * x[idx] + b[idx];
    }
}
```

With [`cuda::annotated_ptr`], the caller can then specify the temporal locality of the memory accesses:

```cuda
// Frequent accesses to "a" and "b"; infrequent accesses to "x" and "y":
cuda::annotated_ptr<int const, cuda::access_property::persisting> a_p {a}, b_p{b};
cuda::annotated_ptr<int, cuda::access_property::streaming> x_s{x}, y_s{y};
update_template<<<grid, block>>>(x_s, a_p, b_p, N);
update_template<<<grid, block>>>(y_s, a_p, b_p, N);

// Infrequent accesses to "a" and "b"; frequent acceses to "z":
cuda::annotated_ptr<int const, cuda::access_property::streaming> a_s {a}, b_s{b};
cuda::annotated_ptr<int, cuda::access_property::persisting> z_p{z};
update_template<<<grid, block>>>(z_p, a_s, b_s, N);

// Different kernel, "update_z", uses "z" again one last time.
// Since "z" was accessed as "persisting" by the previous kernel,
// parts of it are more likely to have previously survived in the L2 cache.
update_z<<<grid, block>>>(z, ...);
```

Notice how the raw pointers to `a` and `b` can be wrapped by both `annotated_ptr<T, persistent>` and `annotated_ptr<T, streaming>`, and accesses through each pointer applies the corresponding access property.

[`Iterator`]: https://en.cppreference.com/w/cpp/iterator
[`std::pointer_traits`]: https://en.cppreference.com/w/cpp/memory/pointer_traits
[`std::iterator_traits`]: https://en.cppreference.com/w/cpp/iterator/iterator_traits

[`cuda::annotated_ptr`]: {{ "extended_api/memory_access_properties/annotated_ptr.html" | relative_url }}
[`cuda::access_propety`]: {{ "extended_api/memory_access_properties/access_property.html" | relative_url }}
[`cuda::associate_access_property`]: {{ "extended_api/memory_access_properties/associate_access_property.html" | relative_url }}
[`cuda::apply_access_property`]: {{ "extended_api/memory_access_properties/apply_access_property.html" | relative_url }}
[`cuda::access_property::shared`]: {{ "extended_api/memory_access_properties/access_property.html#kinds-of-access-properties" | relative_url }}
[`cuda::access_property::global`]: {{ "extended_api/memory_access_properties/access_property.html#kinds-of-access-properties" | relative_url }}
[`cuda::access_property::persisting`]: {{ "extended_api/memory_access_properties/access_property.html#kinds-of-access-properties" | relative_url }}
[`cuda::access_property::normal`]: {{ "extended_api/memory_access_properties/access_property.html#kinds-of-access-properties" | relative_url }}
[`cuda::access_property::streaming`]: {{ "extended_api/memory_access_properties/access_property.html#kinds-of-access-properties" | relative_url }}