File: atomic.md

package info (click to toggle)
libcudacxx 1.8.1-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 66,464 kB
  • sloc: cpp: 517,767; ansic: 9,474; python: 6,108; sh: 2,225; asm: 2,154; makefile: 7
file content (95 lines) | stat: -rw-r--r-- 3,506 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 0
---

# `cuda::atomic`

Defined in header `<cuda/atomic>`:

```cuda
template <typename T, cuda::thread_scope Scope = cuda::thread_scope_system>
class cuda::atomic;
```

The class template `cuda::atomic` is an extended form of [`cuda::std::atomic`]
  that takes an additional [`cuda::thread_scope`] argument, defaulted to
  `cuda::std::thread_scope_system`.
It has the same interface and semantics as [`cuda::std::atomic`], with the
  following additional operations.

## Atomic Fence Operations

| [`cuda::atomic_thread_fence`] | Memory order and scope dependent fence synchronization primitive. `(function)` |

## Atomic Extrema Operations

| [`cuda::atomic::fetch_min`] | Atomically find the minimum of the stored value and a provided value. `(member function)` |
| [`cuda::atomic::fetch_max`] | Atomically find the maximum of the stored value and a provided value. `(member function)` |

## Concurrency Restrictions

An object of type `cuda::atomic` or [`cuda::std::atomic`] shall not be accessed
  concurrently by CPU and GPU threads unless:
- it is in unified memory and the [`concurrentManagedAccess` property] is 1, or
- it is in CPU memory and the [`hostNativeAtomicSupported` property] is 1.

Note, for objects of scopes other than `cuda::thread_scope_system` this is a
  data-race, and thefore also prohibited regardless of memory characteristics.

Under CUDA Compute Capability 6 (Pascal), an object of type `atomic` may not be
  used:
- with automatic storage duration, or
- if `is_always_lock_free()` is `false`.

Under CUDA Compute Capability prior to 6 (Pascal), objects of type
  `cuda::atomic` or [`cuda::std::atomic`] may not be used.

## Implementation-Defined Behavior

For each type `T` and [`cuda::thread_scope`] `S`, the value of
  `cuda::atomic<T, S>::is_always_lock_free()` is as follows:

| Type `T` | [`cuda::thread_scope`] `S` | `cuda::atomic<T, S>::is_always_lock_free()` |
|----------|----------------------------|---------------------------------------------|
| Any      | Any                        | `sizeof(T) <= 8`                            |

## Example

```cuda
#include <cuda/atomic>

__global__ void example_kernel() {
  // This atomic is suitable for all threads in the system.
  cuda::atomic<int, cuda::thread_scope_system> a;

  // This atomic has the same type as the previous one (`a`).
  cuda::atomic<int> b;

  // This atomic is suitable for all threads on the current processor (e.g. GPU).
  cuda::atomic<int, cuda::thread_scope_device> c;

  // This atomic is suitable for threads in the same thread block.
  cuda::atomic<int, cuda::thread_scope_block> d;
}
```

[See it on Godbolt](https://godbolt.org/z/avo3Evbee){: .btn }


[`cuda::thread_scope`]: ../thread_scopes.md

[`cuda::atomic_thread_fence`]: ./atomic/atomic_thread_fence.md

[`cuda::atomic::fetch_min`]: ./atomic/fetch_min.md
[`cuda::atomic::fetch_max`]: ./atomic/fetch_max.md

[`cuda::std::atomic`]: https://en.cppreference.com/w/cpp/atomic/atomic

[atomics.types.int]: https://eel.is/c++draft/atomics.types.int
[atomics.types.pointer]: https://eel.is/c++draft/atomics.types.pointer

[`concurrentManagedAccess` property]: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_116f9619ccc85e93bc456b8c69c80e78b
[`hostNativeAtomicSupported` property]: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_1ef82fd7d1d0413c7d6f33287e5b6306f