File: fi_trigger.3

package info (click to toggle)
mpich 5.0.0-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 251,848 kB
  • sloc: ansic: 1,323,147; cpp: 82,869; f90: 72,420; javascript: 40,763; perl: 28,296; sh: 19,399; python: 16,191; xml: 14,418; makefile: 9,474; fortran: 8,046; java: 4,635; pascal: 352; asm: 324; ruby: 176; awk: 27; lisp: 19; php: 8; sed: 4
file content (344 lines) | stat: -rw-r--r-- 13,918 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
.\" Automatically generated by Pandoc 2.9.2.1
.\"
.TH "fi_trigger" "3" "2022\-12\-09" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
.hy
.SH NAME
.PP
fi_trigger - Triggered operations
.SH SYNOPSIS
.IP
.nf
\f[C]
#include <rdma/fi_trigger.h>
\f[R]
.fi
.SH DESCRIPTION
.PP
Triggered operations allow an application to queue a data transfer
request that is deferred until a specified condition is met.
A typical use is to send a message only after receiving all input data.
Triggered operations can help reduce the latency needed to initiate a
transfer by removing the need to return control back to an application
prior to the data transfer starting.
.PP
An endpoint must be created with the FI_TRIGGER capability in order for
triggered operations to be specified.
A triggered operation is requested by specifying the FI_TRIGGER flag as
part of the operation.
Such an endpoint is referred to as a trigger-able endpoint.
.PP
Any data transfer operation is potentially trigger-able, subject to
provider constraints.
Trigger-able endpoints are initialized such that only those interfaces
supported by the provider which are trigger-able are available.
.PP
Triggered operations require that applications use struct
fi_triggered_context as their per operation context parameter, or if the
provider requires the FI_CONTEXT2 mode, struct fi_trigger_context2.
The use of struct fi_triggered_context[2] replaces struct fi_context[2],
if required by the provider.
Although struct fi_triggered_context[2] is not opaque to the
application, the contents of the structure may be modified by the
provider once it has been submitted as an operation.
This structure has similar requirements as struct fi_context[2].
It must be allocated by the application and remain valid until the
corresponding operation completes or is successfully canceled.
.PP
Struct fi_triggered_context[2] is used to specify the condition that
must be met before the triggered data transfer is initiated.
If the condition is met when the request is made, then the data transfer
may be initiated immediately.
The format of struct fi_triggered_context[2] is described below.
.IP
.nf
\f[C]
struct fi_triggered_context {
    enum fi_trigger_event event_type;   /* trigger type */
    union {
        struct fi_trigger_threshold threshold;
        struct fi_trigger_xpu xpu;
        void *internal[3]; /* reserved */
    } trigger;
};

struct fi_triggered_context2 {
    enum fi_trigger_event event_type;   /* trigger type */
    union {
        struct fi_trigger_threshold threshold;
        struct fi_trigger_xpu xpu;
        void *internal[7]; /* reserved */
    } trigger;
};
\f[R]
.fi
.PP
The triggered context indicates the type of event assigned to the
trigger, along with a union of trigger details that is based on the
event type.
.SH COMPLETION BASED TRIGGERS
.PP
Completion based triggers defer a data transfer until one or more
related data transfers complete.
For example, a send operation may be deferred until a receive operation
completes, indicating that the data to be transferred is now available.
.PP
The following trigger event related to completion based transfers is
defined.
.TP
\f[I]FI_TRIGGER_THRESHOLD\f[R]
This indicates that the data transfer operation will be deferred until
an event counter crosses an application specified threshold value.
The threshold is specified using struct fi_trigger_threshold:
.IP
.nf
\f[C]
struct fi_trigger_threshold {
    struct fid_cntr *cntr; /* event counter to check */
    size_t threshold;      /* threshold value */
};
\f[R]
.fi
.PP
Threshold operations are triggered in the order of the threshold values.
This is true even if the counter increments by a value greater than 1.
If two triggered operations have the same threshold, they will be
triggered in the order in which they were submitted to the endpoint.
.SH XPU TRIGGERS
.PP
XPU based triggers work in conjunction with heterogenous memory (FI_HMEM
capability).
XPU triggers define a split execution model for specifying a data
transfer separately from initiating the transfer.
Unlike completion triggers, the user controls the timing of when the
transfer starts by writing data into a trigger variable location.
.PP
XPU transfers allow the requesting and triggering to occur on separate
computational domains.
For example, a process running on the host CPU can setup a data
transfer, with a compute kernel running on a GPU signaling the start of
the transfer.
XPU refers to a CPU, GPU, FPGA, or other acceleration device with some
level of computational ability.
.PP
Endpoints must be created with both the FI_TRIGGER and FI_XPU
capabilities to use XPU triggers.
XPU triggered enabled endpoints only support XPU triggered operations.
The behavior of mixing XPU triggered operations with normal data
transfers or non-XPU triggered operations is not defined by the API and
subject to provider support and implementation.
.PP
The use of XPU triggers requires coordination between the fabric
provider, application, and submitting XPU.
The result is that hardware implementation details need to be conveyed
across the computational domains.
The XPU trigger API abstracts those details.
When submitting a XPU trigger operation, the user identifies the XPU
where the triggering will occur.
The triggering XPU must match with the location of the local memory
regions.
For example, if triggering will be done by a GPU kernel, the type of GPU
and its local identifier are given.
As output, the fabric provider will return a list of variables and
corresponding values.
The XPU signals that the data transfer is safe to initiate by writing
the given values to the specified variable locations.
The number of variables and their sizes are provider specific.
.PP
XPU trigger operations are submitted using the FI_TRIGGER flag with
struct fi_triggered_context or struct fi_triggered_context2, as required
by the provider.
The trigger event_type is:
.TP
\f[I]FI_TRIGGER_XPU\f[R]
Indicates that the data transfer operation will be deferred until the
user writes provider specified data to provider indicated memory
locations.
The user indicates which device will initiate the write.
The struct fi_trigger_xpu is used to convey both input and output data
regarding the signaling of the trigger.
.IP
.nf
\f[C]
struct fi_trigger_var {
    enum fi_datatype datatype;
    int count;
    void *addr;
    union {
        uint8_t val8;
        uint16_t val16;
        uint32_t val32;
        uint64_t val64;
        uint8_t *data;
    } value;
};

struct fi_trigger_xpu {
    int count;
    enum fi_hmem_iface iface;
    union {
        uint64_t reserved;
        int cuda;
        int ze;
    } device;
    struct fi_trigger_var *var;
};
\f[R]
.fi
.PP
On input to a triggered operation, the iface field indicates the
software interface that will be used to write the variables.
The device union specifies the device identifier.
For valid iface and device values, see \f[C]fi_mr\f[R](3).
The iface and device must match with the iface and device of any local
HMEM memory regions.
Count should be set to the number of fi_trigger_var structures
available, with the var field pointing to an array of struct
fi_trigger_var.
The user is responsible for ensuring that there are sufficient
fi_trigger_var structures available and of an appropriate size.
The count and size of fi_trigger_var structures can be obtained by
calling fi_getopt() on the endpoint with the FI_OPT_XPU_TRIGGER option.
See \f[C]fi_endpoint\f[R](3) for details.
.PP
Each fi_trigger_var structure referenced should have the datatype and
count fields initialized to the number of values referenced by the
struct fi_trigger_val.
If the count is 1, one of the val fields will be used to return the
necessary data (val8, val16, etc.).
If count > 1, the data field will return all necessary data used to
signal the trigger.
The data field must reference a buffer large enough to hold the returned
bytes.
.PP
On output, the provider will set the fi_trigger_xpu count to the number
of fi_trigger_var variables that must be signaled.
Count will be less than or equal to the input value.
The provider will initialize each valid fi_trigger_var entry with
information needed to signal the trigger.
The datatype indicates the size of the data that must be written.
Valid datatype values are FI_UINT8, FI_UINT16, FI_UINT32, and FI_UINT64.
For signal variables <= 64 bits, the count field will be 1.
If a trigger requires writing more than 64-bits, the datatype field will
be set to FI_UINT8, with count set to the number of bytes that must be
written.
The data that must be written to signal the start of an operation is
returned through either the value union val fields or data array.
.PP
Users signal the start of a transfer by writing the returned data to the
given memory address.
The write must occur from the specified input XPU location (based on the
iface and device fields).
If a transfer cannot be initiated for some reason, such as an error
occurring before the transfer can start, the triggered operation should
be canceled to release any allocated resources.
If multiple variables are specified, they must be updated in order.
.PP
Note that the provider will not modify the fi_trigger_xpu or
fi_trigger_var structures after returning from the data transfer call.
.PP
In order to support multiple provider implementations, users should
trigger data transfer operations in the same order that they are queued
and should serialize the writing of triggers that reference the same
endpoint.
Providers may return the same trigger variable for multiple data
transfer requests.
.SH DEFERRED WORK QUEUES
.PP
The following feature and description are enhancements to triggered
operation support.
.PP
The deferred work queue interface is designed as primitive constructs
that can be used to implement application-level collective operations.
They are a more advanced form of triggered operation.
They allow an application to queue operations to a deferred work queue
that is associated with the domain.
Note that the deferred work queue is a conceptual construct, rather than
an implementation requirement.
Deferred work requests consist of three main components: an event or
condition that must first be met, an operation to perform, and a
completion notification.
.PP
Because deferred work requests are posted directly to the domain, they
can support a broader set of conditions and operations.
Deferred work requests are submitted using struct fi_deferred_work.
That structure, along with the corresponding operation structures
(referenced through the op union) used to describe the work must remain
valid until the operation completes or is canceled.
The format of the deferred work request is as follows:
.IP
.nf
\f[C]
struct fi_deferred_work {
    struct fi_context2    context;

    uint64_t              threshold;
    struct fid_cntr       *triggering_cntr;
    struct fid_cntr       *completion_cntr;

    enum fi_trigger_op    op_type;

    union {
        struct fi_op_msg            *msg;
        struct fi_op_tagged         *tagged;
        struct fi_op_rma            *rma;
        struct fi_op_atomic         *atomic;
        struct fi_op_fetch_atomic   *fetch_atomic;
        struct fi_op_compare_atomic *compare_atomic;
        struct fi_op_cntr           *cntr;
    } op;
};
\f[R]
.fi
.PP
Once a work request has been posted to the deferred work queue, it will
remain on the queue until the triggering counter (success plus error
counter values) has reached the indicated threshold.
If the triggering condition has already been met at the time the work
request is queued, the operation will be initiated immediately.
.PP
On the completion of a deferred data transfer, the specified completion
counter will be incremented by one.
Note that deferred counter operations do not update the completion
counter; only the counter specified through the fi_op_cntr is modified.
The completion_cntr field must be NULL for counter operations.
.PP
Because deferred work targets support of collective communication
operations, posted work requests do not generate any completions at the
endpoint by default.
For example, completed operations are not written to the EP\[cq]s
completion queue or update the EP counter (unless the EP counter is
explicitly referenced as the completion_cntr).
An application may request EP completions by specifying the
FI_COMPLETION flag as part of the operation.
.PP
It is the responsibility of the application to detect and handle
situations that occur which could result in a deferred work
request\[cq]s condition not being met.
For example, if a work request is dependent upon the successful
completion of a data transfer operation, which fails, then the
application must cancel the work request.
.PP
To submit a deferred work request, applications should use the
domain\[cq]s fi_control function with command FI_QUEUE_WORK and struct
fi_deferred_work as the fi_control arg parameter.
To cancel a deferred work request, use fi_control with command
FI_CANCEL_WORK and the corresponding struct fi_deferred_work to cancel.
The fi_control command FI_FLUSH_WORK will cancel all queued work
requests.
FI_FLUSH_WORK may be used to flush all work queued to the domain, or may
be used to cancel all requests waiting on a specific triggering_cntr.
.PP
Deferred work requests are not acted upon by the provider until the
associated event has occurred; although, certain validation checks may
still occur when a request is submitted.
Referenced data buffers are not read or otherwise accessed.
But the provider may validate fabric objects, such as endpoints and
counters, and that input parameters fall within supported ranges.
If a specific request is not supported by the provider, it will fail the
operation with -FI_ENOSYS.
.SH SEE ALSO
.PP
\f[C]fi_getinfo\f[R](3), \f[C]fi_endpoint\f[R](3), \f[C]fi_mr\f[R](3),
\f[C]fi_alias\f[R](3), \f[C]fi_cntr\f[R](3)
.SH AUTHORS
OpenFabrics.