File: fi_poll.3

package info (click to toggle)
mpich 4.3.2-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 101,184 kB
  • sloc: ansic: 1,040,629; cpp: 82,270; javascript: 40,763; perl: 27,933; python: 16,041; sh: 14,676; xml: 14,418; f90: 12,916; makefile: 9,270; fortran: 8,046; java: 4,635; asm: 324; ruby: 103; awk: 27; lisp: 19; php: 8; sed: 4
file content (397 lines) | stat: -rw-r--r-- 13,845 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
.\" Automatically generated by Pandoc 2.9.2.1
.\"
.TH "fi_poll" "3" "2022\-12\-09" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
.hy
.SH NAME
.PP
fi_poll - Polling and wait set operations
.TP
fi_poll_open / fi_close
Open/close a polling set
.TP
fi_poll_add / fi_poll_del
Add/remove a completion queue or counter to/from a poll set.
.TP
fi_poll
Poll for progress and events across multiple completion queues and
counters.
.TP
fi_wait_open / fi_close
Open/close a wait set
.TP
fi_wait
Waits for one or more wait objects in a set to be signaled.
.TP
fi_trywait
Indicate when it is safe to block on wait objects using native OS calls.
.TP
fi_control
Control wait set operation or attributes.
.SH SYNOPSIS
.IP
.nf
\f[C]
#include <rdma/fi_domain.h>

int fi_poll_open(struct fid_domain *domain, struct fi_poll_attr *attr,
    struct fid_poll **pollset);

int fi_close(struct fid *pollset);

int fi_poll_add(struct fid_poll *pollset, struct fid *event_fid,
    uint64_t flags);

int fi_poll_del(struct fid_poll *pollset, struct fid *event_fid,
    uint64_t flags);

int fi_poll(struct fid_poll *pollset, void **context, int count);

int fi_wait_open(struct fid_fabric *fabric, struct fi_wait_attr *attr,
    struct fid_wait **waitset);

int fi_close(struct fid *waitset);

int fi_wait(struct fid_wait *waitset, int timeout);

int fi_trywait(struct fid_fabric *fabric, struct fid **fids, size_t count);

int fi_control(struct fid *waitset, int command, void *arg);
\f[R]
.fi
.SH ARGUMENTS
.TP
\f[I]fabric\f[R]
Fabric provider
.TP
\f[I]domain\f[R]
Resource domain
.TP
\f[I]pollset\f[R]
Event poll set
.TP
\f[I]waitset\f[R]
Wait object set
.TP
\f[I]attr\f[R]
Poll or wait set attributes
.TP
\f[I]context\f[R]
On success, an array of user context values associated with completion
queues or counters.
.TP
\f[I]fids\f[R]
An array of fabric descriptors, each one associated with a native wait
object.
.TP
\f[I]count\f[R]
Number of entries in context or fids array.
.TP
\f[I]timeout\f[R]
Time to wait for a signal, in milliseconds.
.TP
\f[I]command\f[R]
Command of control operation to perform on the wait set.
.TP
\f[I]arg\f[R]
Optional control argument.
.SH DESCRIPTION
.SS fi_poll_open
.PP
fi_poll_open creates a new polling set.
A poll set enables an optimized method for progressing asynchronous
operations across multiple completion queues and counters and checking
for their completions.
.PP
A poll set is defined with the following attributes.
.IP
.nf
\f[C]
struct fi_poll_attr {
    uint64_t             flags;     /* operation flags */
};
\f[R]
.fi
.TP
\f[I]flags\f[R]
Flags that set the default operation of the poll set.
The use of this field is reserved and must be set to 0 by the caller.
.SS fi_close
.PP
The fi_close call releases all resources associated with a poll set.
The poll set must not be associated with any other resources prior to
being closed, otherwise the call will return -FI_EBUSY.
.SS fi_poll_add
.PP
Associates a completion queue or counter with a poll set.
.SS fi_poll_del
.PP
Removes a completion queue or counter from a poll set.
.SS fi_poll
.PP
Progresses all completion queues and counters associated with a poll set
and checks for events.
If events might have occurred, contexts associated with the completion
queues and/or counters are returned.
Completion queues will return their context if they are not empty.
The context associated with a counter will be returned if the
counter\[cq]s success value or error value have changed since the last
time fi_poll, fi_cntr_set, or fi_cntr_add were called.
The number of contexts is limited to the size of the context array,
indicated by the count parameter.
.PP
Note that fi_poll only indicates that events might be available.
In some cases, providers may consume such events internally, to drive
progress, for example.
This can result in fi_poll returning false positives.
Applications should drive their progress based on the results of reading
events from a completion queue or reading counter values.
The fi_poll function will always return all completion queues and
counters that do have new events.
.SS fi_wait_open
.PP
fi_wait_open allocates a new wait set.
A wait set enables an optimized method of waiting for events across
multiple completion queues and counters.
Where possible, a wait set uses a single underlying wait object that is
signaled when a specified condition occurs on an associated completion
queue or counter.
.PP
The properties and behavior of a wait set are defined by struct
fi_wait_attr.
.IP
.nf
\f[C]
struct fi_wait_attr {
    enum fi_wait_obj     wait_obj;  /* requested wait object */
    uint64_t             flags;     /* operation flags */
};
\f[R]
.fi
.TP
\f[I]wait_obj\f[R]
Wait sets are associated with specific wait object(s).
Wait objects allow applications to block until the wait object is
signaled, indicating that an event is available to be read.
The following values may be used to specify the type of wait object
associated with a wait set: FI_WAIT_UNSPEC, FI_WAIT_FD,
FI_WAIT_MUTEX_COND, and FI_WAIT_YIELD.
.TP
- \f[I]FI_WAIT_UNSPEC\f[R]
Specifies that the user will only wait on the wait set using fabric
interface calls, such as fi_wait.
In this case, the underlying provider may select the most appropriate or
highest performing wait object available, including custom wait
mechanisms.
Applications that select FI_WAIT_UNSPEC are not guaranteed to retrieve
the underlying wait object.
.TP
- \f[I]FI_WAIT_FD\f[R]
Indicates that the wait set should use a single file descriptor as its
wait mechanism, as exposed to the application.
Internally, this may require the use of epoll in order to support
waiting on a single file descriptor.
File descriptor wait objects must be usable in the POSIX select(2) and
poll(2), and Linux epoll(7) routines (if available).
Provider signal an FD wait object by marking it as readable or with an
error.
.TP
- \f[I]FI_WAIT_MUTEX_COND\f[R]
Specifies that the wait set should use a pthread mutex and cond variable
as a wait object.
.TP
- \f[I]FI_WAIT_POLLFD\f[R]
This option is similar to FI_WAIT_FD, but allows the wait mechanism to
use multiple file descriptors as its wait mechanism, as viewed by the
application.
The use of FI_WAIT_POLLFD can eliminate the need to use epoll to
abstract away needing to check multiple file descriptors when waiting
for events.
The file descriptors must be usable in the POSIX select(2) and poll(2)
routines, and match directly to being used with poll.
See the NOTES section below for details on using pollfd.
.TP
- \f[I]FI_WAIT_YIELD\f[R]
Indicates that the wait set will wait without a wait object but instead
yield on every wait.
.TP
\f[I]flags\f[R]
Flags that set the default operation of the wait set.
The use of this field is reserved and must be set to 0 by the caller.
.SS fi_close
.PP
The fi_close call releases all resources associated with a wait set.
The wait set must not be bound to any other opened resources prior to
being closed, otherwise the call will return -FI_EBUSY.
.SS fi_wait
.PP
Waits on a wait set until one or more of its underlying wait objects is
signaled.
.SS fi_trywait
.PP
The fi_trywait call was introduced in libfabric version 1.3.
The behavior of using native wait objects without the use of fi_trywait
is provider specific and should be considered non-deterministic.
.PP
The fi_trywait() call is used in conjunction with native operating
system calls to block on wait objects, such as file descriptors.
The application must call fi_trywait and obtain a return value of
FI_SUCCESS prior to blocking on a native wait object.
Failure to do so may result in the wait object not being signaled, and
the application not observing the desired events.
The following pseudo-code demonstrates the use of fi_trywait in
conjunction with the OS select(2) call.
.IP
.nf
\f[C]
fi_control(&cq->fid, FI_GETWAIT, (void *) &fd);
FD_ZERO(&fds);
FD_SET(fd, &fds);

while (1) {
    if (fi_trywait(&cq, 1) == FI_SUCCESS)
        select(fd + 1, &fds, NULL, &fds, &timeout);

    do {
        ret = fi_cq_read(cq, &comp, 1);
    } while (ret > 0);
}
\f[R]
.fi
.PP
fi_trywait() will return FI_SUCCESS if it is safe to block on the wait
object(s) corresponding to the fabric descriptor(s), or -FI_EAGAIN if
there are events queued on the fabric descriptor or if blocking could
hang the application.
.PP
The call takes an array of fabric descriptors.
For each wait object that will be passed to the native wait routine, the
corresponding fabric descriptor should first be passed to fi_trywait.
All fabric descriptors passed into a single fi_trywait call must make
use of the same underlying wait object type.
.PP
The following types of fabric descriptors may be passed into fi_trywait:
event queues, completion queues, counters, and wait sets.
Applications that wish to use native wait calls should select specific
wait objects when allocating such resources.
For example, by setting the item\[cq]s creation attribute wait_obj value
to FI_WAIT_FD.
.PP
In the case the wait object to check belongs to a wait set, only the
wait set itself needs to be passed into fi_trywait.
The fabric resources associated with the wait set do not.
.PP
On receiving a return value of -FI_EAGAIN from fi_trywait, an
application should read all queued completions and events, and call
fi_trywait again before attempting to block.
Applications can make use of a fabric poll set to identify completion
queues and counters that may require processing.
.SS fi_control
.PP
The fi_control call is used to access provider or implementation
specific details of a fids that support blocking calls, such as wait
sets, completion queues, counters, and event queues.
Access to the wait set or fid should be serialized across all calls when
fi_control is invoked, as it may redirect the implementation of wait set
operations.
The following control commands are usable with a wait set or fid.
.TP
\f[I]FI_GETWAIT (void **)\f[R]
This command allows the user to retrieve the low-level wait object
associated with a wait set or fid.
The format of the wait set is specified during wait set creation,
through the wait set attributes.
The fi_control arg parameter should be an address where a pointer to the
returned wait object will be written.
This should be an \[cq]int *\[cq] for FI_WAIT_FD, `struct fi_mutex_cond'
for FI_WAIT_MUTEX_COND, or `struct fi_wait_pollfd' for FI_WAIT_POLLFD.
Support for FI_GETWAIT is provider specific.
.TP
\f[I]FI_GETWAITOBJ (enum fi_wait_obj *)\f[R]
This command returns the type of wait object associated with a wait set
or fid.
.SH RETURN VALUES
.PP
Returns FI_SUCCESS on success.
On error, a negative value corresponding to fabric errno is returned.
.PP
Fabric errno values are defined in \f[C]rdma/fi_errno.h\f[R].
.TP
fi_poll
On success, if events are available, returns the number of entries
written to the context array.
.SH NOTES
.PP
In many situations, blocking calls may need to wait on signals sent to a
number of file descriptors.
For example, this is the case for socket based providers, such as tcp
and udp, as well as utility providers such as multi-rail.
For simplicity, when epoll is available, it can be used to limit the
number of file descriptors that an application must monitor.
The use of epoll may also be required in order to support FI_WAIT_FD.
.PP
However, in order to support waiting on multiple file descriptors on
systems where epoll support is not available, or where epoll performance
may negatively impact performance, FI_WAIT_POLLFD provides this
mechanism.
A significant different between using POLLFD versus FD wait objects is
that with FI_WAIT_POLLFD, the file descriptors may change dynamically.
As an example, the file descriptors associated with a completion
queues\[cq] wait set may change as endpoint associations with the CQ are
added and removed.
.PP
Struct fi_wait_pollfd is used to retrieve all file descriptors for fids
using FI_WAIT_POLLFD to support blocking calls.
.IP
.nf
\f[C]
struct fi_wait_pollfd {
    uint64_t      change_index;
    size_t        nfds;
    struct pollfd *fd;
};
\f[R]
.fi
.TP
\f[I]change_index\f[R]
The change_index may be used to determine if there have been any changes
to the file descriptor list.
Anytime a file descriptor is added, removed, or its events are updated,
this field is incremented by the provider.
Applications wishing to wait on file descriptors directly should cache
the change_index value.
Before blocking on file descriptor events, the app should use
fi_control() to retrieve the current change_index and compare that
against its cached value.
If the values differ, then the app should update its file descriptor
list prior to blocking.
.TP
\f[I]nfds\f[R]
On input to fi_control(), this indicates the number of entries in the
struct pollfd * array.
On output, this will be set to the number of entries needed to store the
current number of file descriptors.
If the input value is smaller than the output value, fi_control() will
return the error -FI_ETOOSMALL.
Note that setting nfds = 0 allows an efficient way of checking the
change_index.
.TP
\f[I]fd\f[R]
This points to an array of struct pollfd entries.
The number of entries is specified through the nfds field.
If the number of needed entries is less than or equal to the number of
entries available, the struct pollfd array will be filled out with a
list of file descriptors and corresponding events that can be used in
the select(2) and poll(2) calls.
.PP
The change_index is updated only when the file descriptors associated
with the pollfd file set has changed.
Checking the change_index is an additional step needed when working with
FI_WAIT_POLLFD wait objects directly.
The use of the fi_trywait() function is still required if accessing wait
objects directly.
.SH SEE ALSO
.PP
\f[C]fi_getinfo\f[R](3), \f[C]fi_domain\f[R](3), \f[C]fi_cntr\f[R](3),
\f[C]fi_eq\f[R](3)
.SH AUTHORS
OpenFabrics.