1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
|
---
grand_parent: Extended API
parent: Asynchronous Operations
---
# `cuda::memcpy_async`
Defined in header `<cuda/barrier>`:
```cuda
// (1)
template <typename Shape, cuda::thread_scope Scope, typename CompletionFunction>
__host__ __device__
void cuda::memcpy_async(void* destination, void const* source, Shape size,
cuda::barrier<Scope, CompletionFunction>& barrier);
// (2)
template <typename Group,
typename Shape, cuda::thread_scope Scope, typename CompletionFunction>
__host__ __device__
void cuda::memcpy_async(Group const& group,
void* destination, void const* source, Shape size,
cuda::barrier<Scope, CompletionFunction>& barrier);
```
Defined in header `<cuda/pipeline>`:
```cuda
// (3)
template <typename Shape, cuda::thread_scope Scope>
__host__ __device__
void cuda::memcpy_async(void* destination, void const* source, Shape size,
cuda::pipeline<Scope>& pipeline);
// (4)
template <typename Group, typename Shape, cuda::thread_scope Scope>
__host__ __device__
void cuda::memcpy_async(Group const& group,
void* destination, void const* source, Shape size,
cuda::pipeline<Scope>& pipeline);
```
Defined in header `<cuda/annotated_ptr>`:
```cuda
// (5)
template <typename Dst, typename Src, typename SrcProperty, typename Shape, typename Sync>
__host__ __device__
void memcpy_async(Dst* dst, cuda::annotated_ptr<Src, SrcProperty> src, Shape size, Sync& sync);
// (6)
template<typename Dst, typename DstProperty, typename Src, typename SrcProperty, typename Shape, typename Sync>
__host__ __device__
void memcpy_async(cuda::annotated_ptr<Dst, DstProperty> dst, cuda::annotated_ptr<Src, SrcProperty> src, Shape size, Sync& sync);
// (7)
template<typename Group, typename Dst, typename Src, typename SrcProperty, typename Shape, typename Sync>
__host__ __device__
void memcpy_async(Group const& group, Dst* dst, cuda::annotated_ptr<Src, SrcProperty> src, Shape size, Sync& sync);
// (8)
template<typename Group, typename Dst, typename DstProperty, typename Src, typename SrcProperty, typename Shape, typename Sync>
__host__ __device__
void memcpy_async(Group const& group, cuda::annotated_ptr<Dst, DstProperty> dst, cuda::annotated_ptr<Src, SrcProperty> src, Shape size, Sync& sync);
```
`cuda::memcpy_async` asynchronously copies `size` bytes from the memory
location pointed to by `source` to the memory location pointed to by
`destination`.
Both objects are reinterpreted as arrays of `unsigned char`.
1. Binds the asynchronous copy completion to `cuda::barrier` and issues the copy
in the current thread.
2. Binds the asynchronous copy completion to `cuda::barrier` and cooperatively
issues the copy across all threads in `group`.
3. Binds the asynchronous copy completion to `cuda::pipeline` and issues the copy
in the current thread.
4. Binds the asynchronous copy completion to `cuda::pipeline` and cooperatively
issues the copy across all threads in `group`.
5. 5-8: convenience wrappers using `cuda::annotated_ptr` where `Sync` is
either `cuda::barrier` or `cuda::pipeline`.
## Notes
`cuda::memcpy_async` have similar constraints to [`std::memcpy`], namely:
* If the objects overlap, the behavior is undefined.
* If either `destination` or `source` is an invalid or null pointer, the
behavior is undefined (even if `count` is zero).
* If the objects are [potentially-overlapping] the behavior is undefined.
* If the objects are not of [_TriviallyCopyable_] type the program is
ill-formed, no diagnostic required.
If _Shape_ is [`cuda::aligned_size_t`], `source` and `destination` are both
required to be aligned on `cuda::aligned_size_t::align`, else the behavior is
undefined.
If `cuda::pipeline` is in a _quitted state_ (see [`cuda::pipeline::quit`]), the
behavior is undefined.
For cooperative variants, if the parameters are not the same across all threads
in `group`, the behavior is undefined.
## Template Parameters
| `Group` | A type satisfying the [_Group_] concept. |
| `Shape` | Either [`cuda::std::size_t`] or [`cuda::aligned_size_t`]. |
## Parameters
| `group` | The group of threads. |
| `destination` | Pointer to the memory location to copy to. |
| `source` | Pointer to the memory location to copy from. |
| `size` | The number of bytes to copy. |
| `barrier` | The barrier object used to wait on the copy completion. |
| `pipeline` | The pipeline object used to wait on the copy completion. |
## Examples
```cuda
#include <cuda/barrier>
__global__ void example_kernel(char* dst, char* src) {
cuda::barrier<cuda::thread_scope_system> bar;
init(&bar, 1);
cuda::memcpy_async(dst, src, 1, bar);
cuda::memcpy_async(dst + 1, src + 8, 1, bar);
cuda::memcpy_async(dst + 2, src + 16, 1, bar);
cuda::memcpy_async(dst + 3, src + 24, 1, bar);
bar.arrive_and_wait();
}
```
[See it on Godbolt](https://godbolt.org/z/od6q9s8fq){: .btn }
[`std::memcpy`]: https://en.cppreference.com/w/cpp/string/byte/memcpy
[potentially-overlapping]: https://en.cppreference.com/w/cpp/language/object#Subobjects
[_TriviallyCopyable_]: https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable
[_ThreadGroup_]: ./thread_group.md
[`cuda::std::size_t`]: https://en.cppreference.com/w/c/types/size_t
[`cuda::aligned_size_t`]: ./shapes/aligned_size_t.md
[`cuda::pipeline::quit`]: ./pipelines/pipeline/quit.md
|