1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290
|
---
parent: Memory access properties
grand_parent: Extended API
nav_order: 1
---
# `cuda::annotated_ptr`
Defined in header `<cuda/annotated_ptr>`:
```cuda
namespace cuda {
template <typename Type, typename Property>
class annotated_ptr<Type, Property>;
} // namespace cuda
```
**Mandates**: `Property` is one of:
* [`cuda::access_property::shared`],
* [`cuda::access_property::global`],
* [`cuda::access_property::persisting`],
* [`cuda::access_property::normal`],
* [`cuda::access_property::streaming`], or
* [`cuda::access_property`] (a type-erased property with a runtime value).
_Note_: if `Property` is [`cuda::access_property`], i.e. a dynamic property with a runtime value, then `sizeof(cuda::annotated_ptr<Type, cuda::access_property>) == 2 * sizeof(Type*)`. Otherwise, its size is `sizeof(Type*)`.
The class template [`cuda::annotated_ptr`] is a pointer annotated with an access property that _may_ be applied to memory operations performed through the [`cuda::annotated_ptr`].
In contrast with [`cuda::associate_access_property`], [`cuda::annotated_ptr`] maintains the association when passed through ABI boundaries, e.g., calling a non-inlined library function with a [`cuda::annotated_ptr`] argument.
It implements a pointer-like interface:
| Pointer Expression | `cuda::annotated_ptr<T, P>` | Description |
|=====================|===========================================|=============================================|
| `T* a` | `cuda::annotated_ptr<T, P> a` | non-`const` pointer to non-`const` memory |
| `T const * a` | `cuda::annotated_ptr<T const, P> a` | non-`const` pointer to `const` memory |
| `T* const a` | `const cuda::annotated_ptr<T, P> a` | `const` pointer to non-`const` memory |
| `T const* const a` | `const cuda::annotated_ptr<T const, P> a` | `const` pointer to `const` memory |
| `val = *a;` | `val = *a;` | dereference operator to load an element |
| `*a = val;` | `*a = val;` | dereference operator to store an element |
| `val = a[n];` | `val = a[n];` | subscript operator to load an element |
| `a[n] = val;` | `a[n] = val;` | subscript operator to store an element |
| `T* a = nullptr;` | `annotated_ptr<T, P> a = nullptr;` | `nullptr` initialization |
| `n = a - b;` | `n = a - b;` | difference operator |
| `if (a) { ... }` | `if (a) { ... }` | explicit bool conversion |
But it is not a drop-in replacement for pointers since, among others, it does not:
* model any [`Iterator`] concept,
* implement [`std::pointer_traits`], [`std::iterator_traits`], etc.
* have the same variance as pointer.
```cuda
namespace cuda {
template<class Type, class Property>
class annotated_ptr {
public:
using value_type = Type;
using size_type = std::size_t;
using reference = value_type &;
using pointer = value_type *;
using const_pointer = value_type const *;
using difference_type = std::ptrdiff_t;
__host__ __device__ constexpr annotated_ptr() noexcept;
__host__ __device__ constexpr annotated_ptr(annotated_ptr const&) noexcept = default;
__host__ __device__ constexpr annotated_ptr& operator=(annotated_ptr const&) noexcept = default;
__host__ __device__ explicit annotated_ptr(pointer);
template <class RuntimeProperty>
__host__ __device__ annotated_ptr(pointer, RuntimeProperty);
template <class T, class P>
__host__ __device__ annotated_ptr(annotated_ptr<T,P> const&);
__host__ __device__ constexpr explicit operator bool() const noexcept;
__host__ __device__ pointer get() const noexcept;
__host__ __device__ reference operator*() const;
__host__ __device__ pointer operator->() const;
__host__ __device__ reference operator[](std::ptrdiff_t) const;
__host__ __device__ constexpr difference_type operator-(annotated_ptr);
private:
pointer ptr; // exposition only
Property prop; // exposition only
};
} // namespace cuda
```
## Constructors and assignment
### Default constructor
```cuda
constexpr annotated_ptr() noexcept;
```
**Effects**: as if constructed by `annotated_ptr(nullptr)`;
### Constructor from pointer
```cuda
constexpr explicit annotated_ptr(pointer ptr);
```
**Preconditions**:
* if `Property` is [`cuda::access_property::shared`] then `ptr` must be a generic pointer that is valid to cast to a pointer to the shared memory address space.
* if `Property` is [`cuda::access_property::global`], [`cuda::access_property::normal`], [`cuda::access_property::streaming`], [`cuda::access_property::persisting`], or [`cuda::access_property`] then `ptr` must be a generic pointer that is valid to cast to a pointer to the global memory address space.
**Effects**: Constructs an `annotated_ptr` requesting associating `ptr` with `Property`.
If `Property` is [`cuda::access_property`] then `prop` is initialized with [`cuda::access_property::global`].
**Note**: in **Preconditions** "valid" means that casting the generic pointer to the corresponding address space does not introduce undefined behavior.
### Constructor from pointer and access property
```cuda
template <class RuntimeProperty>
annotated_ptr(pointer ptr, RuntimeProperty prop);
```
**Mandates**:
* `Property` is [`cuda::access_property`].
* `RuntimeProperty` is any of [`cuda::access_property::global`], [`cuda::access_property::normal`], [`cuda::access_property::streaming`], [`cuda::access_property::persisting`], or [`cuda::access_property`].
**Preconditions**: `ptr` is a pointer to a valid allocation in the global memory address space.
**Effects**: Constructs an `annotated_ptr` requesting the association of `ptr` with the property `prop`.
# Copy constructor from a different `annotated_ptr`
```cuda
template <class T, class P>
constexpr annotated_ptr(annotated_ptr<T,P> const& a);
```
**Mandates**:
* `annotated_ptr<Type, Property>::pointer` is assignable from `annotated_ptr<T, P>::pointer`.
* `Property` is either [`cuda::access_property`] or `P`.
* `Property` and `P` specify the same memory space.
**Preconditions**: `pointer` is compatible with `Property`.
**Effects**: Constructs an `annotated_ptr` for the same pointer as the input `annotated_ptr`.
## Explicit conversion operator to `bool`
```cuda
constexpr operator bool() const noexcept;
```
**Returns**: `false` if the pointer is a `nullptr`, `true` otherwise.
## Raw pointer access
```cuda
pointer get() const noexcept;
```
**Returns**: A pointer derived from the `annotated_ptr`.
## Operators
### Dereference
```cuda
reference operator*() const;
```
**Preconditions**: The `annotated_ptr` is not null and points to a valid `T` value.
**Returns**: [`*cuda::associate_access_property(ptr, prop)`][`cuda::associate_access_property`]
### Pointer-to-member
```cuda
pointer operator->() const;
```
**Preconditions**: the `annotated_ptr` is not null.
**Returns**: [`cuda::associate_access_property(ptr, prop)`][`cuda::associate_access_property`]
### Subscript
```cuda
reference operator[](ptrdiff_t i) const;
```
**Preconditions**: `ptr` points to a valid allocation of at least size `[ptr, ptr+i]`.
**Returns**: [`*cuda::associate_access_property(ptr+i,prop)`][`cuda::associate_access_property`]
### Pointer distance
```cuda
constexpr difference_type operator-(annotated_ptr p) const;
```
**Preconditions**: `ptr` and `p` point to the same allocation.
**Returns**: as-if `get() - p.get()`.
## Example
Given three input and output vectors `x`, `y`, and `z`, and two arrays of coefficients `a` and `b`, all of length `N`:
```cuda
size_t N;
int* x, *y, *z;
int* a, *b;
```
the grid-strided kernel:
```cuda
__global__ void update(int* const x, int const* const a, int const* const b, size_t N) {
auto g = cooperative_groups::this_grid();
for (int i = g.thread_rank(); idx < N; idx += g.size()) {
x[i] = a[i] * x[i] + b[i];
}
}
```
updates `x`, `y`, and `z` as follows:
```cuda
update<<<grid, block>>>(x, a, b, N);
update<<<grid, block>>>(y, a, b, N);
update<<<grid, block>>>(z, a, b, N);
```
The elements of `a` and `b` are used in all kernels.
If `N` is large enough, elements of `a` and `b` might be evicted from the L2 cache, requiring these to be re-loaded from memory in the next `update`.
We can make the `update` kernel generic to allow the caller to pass [`cuda::annotated_ptr`] objects that hint at how memory will be accessed:
```cuda
template <typename PointerX, typename PointerA, typename PointerB>
__global__ void update_template(PointerX x, PointerA a, PointerB b, size_t N) {
auto g = cooperative_groups::this_grid();
for (int idx = g.thread_rank(); idx < N; idx += g.size()) {
x[idx] = a[idx] * x[idx] + b[idx];
}
}
```
With [`cuda::annotated_ptr`], the caller can then specify the temporal locality of the memory accesses:
```cuda
// Frequent accesses to "a" and "b"; infrequent accesses to "x" and "y":
cuda::annotated_ptr<int const, cuda::access_property::persisting> a_p {a}, b_p{b};
cuda::annotated_ptr<int, cuda::access_property::streaming> x_s{x}, y_s{y};
update_template<<<grid, block>>>(x_s, a_p, b_p, N);
update_template<<<grid, block>>>(y_s, a_p, b_p, N);
// Infrequent accesses to "a" and "b"; frequent acceses to "z":
cuda::annotated_ptr<int const, cuda::access_property::streaming> a_s {a}, b_s{b};
cuda::annotated_ptr<int, cuda::access_property::persisting> z_p{z};
update_template<<<grid, block>>>(z_p, a_s, b_s, N);
// Different kernel, "update_z", uses "z" again one last time.
// Since "z" was accessed as "persisting" by the previous kernel,
// parts of it are more likely to have previously survived in the L2 cache.
update_z<<<grid, block>>>(z, ...);
```
Notice how the raw pointers to `a` and `b` can be wrapped by both `annotated_ptr<T, persistent>` and `annotated_ptr<T, streaming>`, and accesses through each pointer applies the corresponding access property.
[`Iterator`]: https://en.cppreference.com/w/cpp/iterator
[`std::pointer_traits`]: https://en.cppreference.com/w/cpp/memory/pointer_traits
[`std::iterator_traits`]: https://en.cppreference.com/w/cpp/iterator/iterator_traits
[`cuda::annotated_ptr`]: {{ "extended_api/memory_access_properties/annotated_ptr.html" | relative_url }}
[`cuda::access_propety`]: {{ "extended_api/memory_access_properties/access_property.html" | relative_url }}
[`cuda::associate_access_property`]: {{ "extended_api/memory_access_properties/associate_access_property.html" | relative_url }}
[`cuda::apply_access_property`]: {{ "extended_api/memory_access_properties/apply_access_property.html" | relative_url }}
[`cuda::access_property::shared`]: {{ "extended_api/memory_access_properties/access_property.html#kinds-of-access-properties" | relative_url }}
[`cuda::access_property::global`]: {{ "extended_api/memory_access_properties/access_property.html#kinds-of-access-properties" | relative_url }}
[`cuda::access_property::persisting`]: {{ "extended_api/memory_access_properties/access_property.html#kinds-of-access-properties" | relative_url }}
[`cuda::access_property::normal`]: {{ "extended_api/memory_access_properties/access_property.html#kinds-of-access-properties" | relative_url }}
[`cuda::access_property::streaming`]: {{ "extended_api/memory_access_properties/access_property.html#kinds-of-access-properties" | relative_url }}
|