File: io_uring_multishot.7

package info (click to toggle)
liburing 2.14-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 3,448 kB
  • sloc: ansic: 59,512; sh: 816; makefile: 603; cpp: 32
file content (246 lines) | stat: -rw-r--r-- 7,663 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
.\" Copyright (C) 2025 Jens Axboe <axboe@kernel.dk>
.\" SPDX-License-Identifier: LGPL-2.0-or-later
.\"
.TH io_uring_multishot 7 "January 18, 2025" "Linux" "Linux Programmer's Manual"
.SH NAME
io_uring_multishot \- io_uring multishot requests overview
.SH DESCRIPTION
Multishot requests are a class of io_uring operations where a single
submission queue entry (SQE) can generate multiple completion queue
entries (CQEs). This is in contrast to normal "oneshot" operations where
each SQE produces exactly one CQE.
.SS Why use multishot requests?
Traditional I/O operations require submitting a new request for each
operation. For high-frequency operations like accepting connections or
receiving data, this creates overhead:
.IP \(bu 2
CPU cycles spent preparing and submitting SQEs
.IP \(bu
Memory bandwidth for SQE/CQE processing
.IP \(bu
Potential for gaps between completions and new submissions
.PP
Multishot requests eliminate this overhead by keeping the operation
active after each completion. The kernel automatically re-arms the
operation, generating a new CQE when the next event occurs.
Additionally, the internal poll mechanism remains persistent for the
request, avoiding the need to manipulate poll state for each operation.

Multishot operations are most beneficial for:
.IP \(bu 2
Network servers accepting many connections
.IP \(bu
Applications receiving data on long-lived connections
.IP \(bu
Event monitoring with poll
.IP \(bu
Any scenario with repeated identical operations
.SS How multishot works
When a multishot operation completes, the CQE has the
.B IORING_CQE_F_MORE
flag set in
.IR cqe->flags .
This indicates that the operation remains active and more completions
will follow. The operation continues until:
.IP \(bu 2
An error occurs (the final CQE will not have
.B IORING_CQE_F_MORE
set)
.IP \(bu
The operation is explicitly canceled
.IP \(bu
A termination condition specific to the operation is met (e.g., buffer
exhaustion for receives)
.PP
The final CQE for a multishot operation will not have
.B IORING_CQE_F_MORE
set, indicating the operation has terminated.
.SS Multishot accept
.BR io_uring_prep_multishot_accept (3)
and
.BR io_uring_prep_multishot_accept_direct (3)
set up a multishot accept operation. Each incoming connection generates
a CQE with the new file descriptor in
.IR cqe->res .
.PP
.in +4n
.EX
struct io_uring_sqe *sqe = io_uring_get_sqe(ring);
io_uring_prep_multishot_accept(sqe, listen_fd, NULL, NULL, 0);
.EE
.in
.PP
The operation continues accepting connections until an error occurs or
it is canceled. Using the direct variant with
.B IORING_FILE_INDEX_ALLOC
allows accepted sockets to be placed directly into the fixed file table.
.SS Multishot receive
.BR io_uring_prep_recv_multishot (3)
sets up a multishot receive operation. Each time data arrives on the
socket, a CQE is generated. This is typically used with provided buffers
(see
.BR io_uring_provided_buffers (7)):
.PP
.in +4n
.EX
struct io_uring_sqe *sqe = io_uring_get_sqe(ring);
io_uring_prep_recv_multishot(sqe, sockfd, NULL, 0, 0);
sqe->buf_group = bgid;
sqe->flags |= IOSQE_BUFFER_SELECT;
.EE
.in
.PP
Each completion includes:
.IP \(bu 2
.B IORING_CQE_F_MORE
if more completions will follow
.IP \(bu
.B IORING_CQE_F_BUFFER
indicating a buffer was selected
.IP \(bu
The buffer ID in the upper bits of
.I cqe->flags
.IP \(bu
The number of bytes received in
.I cqe->res
.PP
The multishot receive terminates when an error occurs, the connection
closes, or the buffer ring is exhausted.
.SS Multishot recvmsg
.BR io_uring_prep_recvmsg_multishot (3)
is similar to multishot receive but uses the
.I msghdr
structure for scatter/gather I/O and ancillary data. A provided buffer
is used for each message, with the kernel writing a
.I struct io_uring_recvmsg_out
header at the start of the buffer containing the actual message
parameters.
.SS Multishot read
.BR io_uring_prep_read_multishot (3)
sets up a multishot read operation, typically used with pipes or other
stream-oriented file descriptors. Like multishot receive, this is used
with provided buffers:
.PP
.in +4n
.EX
struct io_uring_sqe *sqe = io_uring_get_sqe(ring);
io_uring_prep_read_multishot(sqe, fd, 0, 0, bgid);
.EE
.in
.PP
The operation generates a CQE each time data becomes available to read.
.SS Multishot poll
.BR io_uring_prep_poll_multishot (3)
sets up a multishot poll operation, or it can be done manually by
setting the
.B IORING_POLL_ADD_MULTI
flag:
.PP
.in +4n
.EX
struct io_uring_sqe *sqe = io_uring_get_sqe(ring);
io_uring_prep_poll_multishot(sqe, fd, POLLIN);
/* or equivalently: */
io_uring_prep_poll_add(sqe, fd, POLLIN);
sqe->len |= IORING_POLL_ADD_MULTI;
.EE
.in
.PP
Each time the polled condition becomes true, a CQE is generated with
the triggered events in
.IR cqe->res .
Unlike oneshot poll which is automatically removed after triggering,
multishot poll remains active.

For level-triggered events, the application should be careful to handle
the event (e.g., read all available data) before the next poll
completion, or spurious wakeups may occur.
.SS Multishot waitid
.BR io_uring_prep_waitid (3)
can operate in multishot mode by setting
.B IORING_ACCEPT_MULTISHOT
in the flags. This allows waiting for multiple child process state
changes with a single SQE.
.SS Handling multishot completions
Applications must check for
.B IORING_CQE_F_MORE
to determine if the operation is still active:
.PP
.in +4n
.EX
struct io_uring_cqe *cqe;

while (io_uring_peek_cqe(ring, &cqe) == 0) {
    if (cqe->res < 0) {
        /* Error occurred, operation terminated */
        handle_error(cqe->res);
    } else {
        process_completion(cqe);
    }

    if (!(cqe->flags & IORING_CQE_F_MORE)) {
        /* Operation terminated, may need to resubmit */
        rearm_if_needed();
    }

    io_uring_cqe_seen(ring, cqe);
}
.EE
.in
.SS Canceling multishot operations
Multishot operations can be canceled using
.BR io_uring_prep_cancel (3)
or related functions. The cancellation request generates its own CQE,
and the multishot operation generates a final CQE (typically with
.BR -ECANCELED )
without
.B IORING_CQE_F_MORE
set.
.PP
.in +4n
.EX
struct io_uring_sqe *sqe = io_uring_get_sqe(ring);
io_uring_prep_cancel64(sqe, user_data, 0);
.EE
.in
.SS Integration with provided buffers
Multishot receive and read operations are designed to work with provided
buffer rings (see
.BR io_uring_provided_buffers (7)).
Each completion consumes a buffer from the ring, and the application
must return buffers to the ring to keep the operation running.

If the buffer ring becomes empty, the multishot operation terminates
with
.BR -ENOBUFS .
Applications should ensure adequate buffers are available and promptly
return used buffers to the ring.
.SH NOTES
.IP \(bu 2
Always check
.B IORING_CQE_F_MORE
to know if a multishot operation is still active.
.IP \(bu
Multishot operations may generate many CQEs quickly. Ensure the CQ ring
is large enough to avoid overflow.
.IP \(bu
When using provided buffers with multishot receives, monitor buffer
availability to prevent premature termination.
.IP \(bu
Multishot operations are edge-triggered conceptually \(em they generate
completions when events occur, not continuously while conditions are
true.
.IP \(bu
Error completions from multishot operations do not have
.B IORING_CQE_F_MORE
set, indicating termination.
.SH SEE ALSO
.BR io_uring (7),
.BR io_uring_provided_buffers (7),
.BR io_uring_prep_multishot_accept (3),
.BR io_uring_prep_recv_multishot (3),
.BR io_uring_prep_recvmsg_multishot (3),
.BR io_uring_prep_read_multishot (3),
.BR io_uring_prep_poll_add (3),
.BR io_uring_prep_poll_multishot (3),
.BR io_uring_prep_cancel (3)