1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197
|
.\" Copyright (C) 2019 Jens Axboe <axboe@kernel.dk>
.\" Copyright (C) 2019 Red Hat, Inc.
.\"
.\" SPDX-License-Identifier: LGPL-2.0-or-later
.\"
.TH io_uring_register 2 2019-01-17 "Linux" "Linux Programmer's Manual"
.SH NAME
io_uring_register \- register files or user buffers for asynchronous I/O
.SH SYNOPSIS
.nf
.BR "#include <liburing.h>"
.PP
.BI "int io_uring_register(unsigned int " fd ", unsigned int " opcode ,
.BI " void *" arg ", unsigned int " nr_args );
.fi
.PP
.SH DESCRIPTION
.PP
The
.BR io_uring_register (2)
system call registers resources (e.g. user buffers, files, eventfd,
personality, restrictions) for use in an
.BR io_uring (7)
instance referenced by
.IR fd .
Registering files or user buffers allows the kernel to take long term
references to internal data structures or create long term mappings of
application memory, greatly reducing per-I/O overhead.
.I fd
is the file descriptor returned by a call to
.BR io_uring_setup (2).
If
.I opcode
has the flag
.B IORING_REGISTER_USE_REGISTERED_RING
ored into it,
.I fd
is instead the index of a registered ring fd.
.I opcode
can be one of:
.TP
.B IORING_REGISTER_BUFFERS
.I arg
points to a
.I struct iovec
array of
.I nr_args
entries. The buffers associated with the iovecs will be locked in
memory and charged against the user's
.B RLIMIT_MEMLOCK
resource limit. See
.BR getrlimit (2)
for more information. Additionally, there is a size limit of 1GiB per
buffer. Currently, the buffers must be anonymous, non-file-backed
memory, such as that returned by
.BR malloc (3)
or
.BR mmap (2)
with the
.B MAP_ANONYMOUS
flag set. It is expected that this limitation will be lifted in the
future. Huge pages are supported as well. Note that the entire huge
page will be pinned in the kernel, even if only a portion of it is
used.
After a successful call, the supplied buffers are mapped into the
kernel and eligible for I/O. To make use of them, the application
must specify the
.B IORING_OP_READ_FIXED
or
.B IORING_OP_WRITE_FIXED
opcodes in the submission queue entry (see the
.I struct io_uring_sqe
definition in
.BR io_uring_enter (2)),
and set the
.I buf_index
field to the desired buffer index. The memory range described by the
submission queue entry's
.I addr
and
.I len
fields must fall within the indexed buffer.
It is perfectly valid to setup a large buffer and then only use part
of it for an I/O, as long as the range is within the originally mapped
region.
An application can increase or decrease the size or number of
registered buffers by first unregistering the existing buffers, and
then issuing a new call to
.BR io_uring_register (2)
with the new buffers.
Note that before 5.13 registering buffers would wait for the ring to idle.
If the application currently has requests in-flight, the registration will
wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting
down the io_uring instance. Note, however, that shutdown processing may run
asynchronously within the kernel. As a result, it is not guaranteed that
pages are immediately unpinned in this case. Available since 5.1.
.TP
.B IORING_REGISTER_BUFFERS2
Register buffers for I/O. Similar to
.B IORING_REGISTER_BUFFERS
but aims to have a more extensible ABI.
.I arg
points to a
.I struct
.IR io_uring_rsrc_register ,
and
.I nr_args
should be set to the number of bytes in the structure.
.IP
.in +4n
.EX
struct io_uring_rsrc_register {
__u32 nr;
__u32 flags;
__u64 resv2;
__aligned_u64 data;
__aligned_u64 tags;
};
.EE
.in
.IP
The
.I data
field contains a pointer to a
.I struct iovec
array of
.I nr
entries.
The
.I tags
field should either be 0, then tagging is disabled, or point to an array
of
.I nr
"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this
particular resource (a buffer in this case) is disabled. Otherwise, after the
resource had been unregistered and it's not used anymore, a CQE will be
posted with
.I user_data
set to the specified tag and all other fields zeroed.
The
.I flags
field supports the following flags:
.IP
.in +4n
.B IORING_RSRC_REGISTER_SPARSE
If set, io_uring will register
.I nr
empty buffers, which need to be updated before use. When this flag is set,
.I data
and
.I tags
must be NULL. Available since 5.19.
.in
.IP
Note that resource updates, e.g.
.BR IORING_REGISTER_BUFFERS_UPDATE ,
don't necessarily deallocate resources by the time it returns, but they might
be held alive until all requests using it complete.
Available since 5.13.
.TP
.B IORING_REGISTER_BUFFERS_UPDATE
Updates registered buffers with new ones, either turning a sparse entry into
a real one, or replacing an existing entry.
.I arg
must contain a pointer to a
.I struct
.IR io_uring_rsrc_update2 ,
which contains
an offset on which to start the update, and an array of
.I struct
.IR iovec .
.I tags
points to an array of tags.
.I nr
must contain the number of descriptors in the passed in arrays.
See
.B IORING_REGISTER_BUFFERS2
for the resource tagging description.
.PP
.in +8n
.EX
struct io_uring_rsrc_update2 {
__u32 offset;
__u32 resv;
__aligned_u64 data;
__aligned_u64 tags;
__u32 nr;
__u32 resv2;
};
.EE
.in
.PP
.in +8n
Available since 5.13.
.TP
.B IORING_UNREGISTER_BUFFERS
This operation takes no argument, and
.I arg
must be passed as NULL. All previously registered buffers associated
with the io_uring instance will be released synchronously. Available since 5.1.
.TP
.B IORING_REGISTER_FILES
Register files for I/O.
.I arg
contains a pointer to an array of
.I nr_args
file descriptors (signed 32 bit integers).
To make use of the registered files, the
.B IOSQE_FIXED_FILE
flag must be set in the
.I flags
member of the
.IR "struct io_uring_sqe" ,
and the
.I fd
member is set to the index of the file in the file descriptor array.
The file set may be sparse, meaning that the
.B fd
field in the array may be set to
.BR -1 .
See
.B IORING_REGISTER_FILES_UPDATE
for how to update files in place.
Note that before 5.13 registering files would wait for the ring to idle.
If the application currently has requests in-flight, the registration will
wait for those to finish before proceeding. See
.B IORING_REGISTER_FILES_UPDATE
for how to update an existing set without that limitation.
Files are automatically unregistered when the io_uring instance is
torn down. An application needs only unregister if it wishes to
register a new set of fds. Available since 5.1.
.TP
.B IORING_REGISTER_FILES2
Register files for I/O. Similar to
.BR IORING_REGISTER_FILES .
.I arg
points to a
.I struct
.IR io_uring_rsrc_register ,
and
.I nr_args
should be set to the number of bytes in the structure.
The
.I data
field contains a pointer to an array of
.I nr
file descriptors (signed 32 bit integers).
.I tags
field should either be 0 or or point to an array of
.I nr
"tags" (unsigned 64 bit integers). See
.B IORING_REGISTER_BUFFERS2
for more info on resource tagging.
Note that resource updates, e.g.
.BR IORING_REGISTER_FILES_UPDATE ,
don't necessarily deallocate resources, they might be held until all requests
using that resource complete.
Available since 5.13.
.TP
.B IORING_REGISTER_FILES_UPDATE
This operation replaces existing files in the registered file set with new
ones, either turning a sparse entry (one where fd is equal to
.BR -1 )
into a real one, removing an existing entry (new one is set to
.BR -1 ),
or replacing an existing entry with a new existing entry.
.I arg
must contain a pointer to a
.I struct
.IR io_uring_rsrc_update ,
which contains
an offset on which to start the update, and an array of file descriptors to
use for the update.
.I nr_args
must contain the number of descriptors in the passed in array. Available
since 5.5.
File descriptors can be skipped if they are set to
.BR IORING_REGISTER_FILES_SKIP .
Skipping an fd will not touch the file associated with the previous
fd at that index. Available since 5.12.
.TP
.B IORING_REGISTER_FILES_UPDATE2
Similar to
.BR IORING_REGISTER_FILES_UPDATE ,
replaces existing files in the
registered file set with new ones, either turning a sparse entry (one where
fd is equal to
.BR -1 )
into a real one, removing an existing entry (new one is set to
.BR -1 ),
or replacing an existing entry with a new existing entry.
.I arg
must contain a pointer to a
.I struct
.IR io_uring_rsrc_update2 ,
which contains
an offset on which to start the update, and an array of file descriptors to
use for the update stored in
.IR data .
.I tags
points to an array of tags.
.I nr
must contain the number of descriptors in the passed in arrays.
See
.B IORING_REGISTER_BUFFERS2
for the resource tagging description.
Available since 5.13.
.TP
.B IORING_UNREGISTER_FILES
This operation requires no argument, and
.I arg
must be passed as NULL. All previously registered files associated
with the io_uring instance will be unregistered. Available since 5.1.
.TP
.B IORING_REGISTER_EVENTFD
It's possible to use
.BR eventfd (2)
to get notified of completion events on an
io_uring instance. If this is desired, an eventfd file descriptor can be
registered through this operation.
.I arg
must contain a pointer to the eventfd file descriptor, and
.I nr_args
must be 1. Note that while io_uring generally takes care to avoid spurious
events, they can occur. Similarly, batched completions of CQEs may only trigger
a single eventfd notification even if multiple CQEs are posted. The application
should make no assumptions on number of events being available having a direct
correlation to eventfd notifications posted. An eventfd notification must thus
only be treated as a hint to check the CQ ring for completions. Available since
5.2.
An application can temporarily disable notifications, coming through the
registered eventfd, by setting the
.B IORING_CQ_EVENTFD_DISABLED
bit in the
.I flags
field of the CQ ring.
Available since 5.8.
.TP
.B IORING_REGISTER_EVENTFD_ASYNC
This works just like
.BR IORING_REGISTER_EVENTFD ,
except notifications are only posted for events that complete in an async
manner. This means that events that complete inline while being submitted
do not trigger a notification event. The arguments supplied are the same as
for
.BR IORING_REGISTER_EVENTFD .
Available since 5.6.
.TP
.B IORING_UNREGISTER_EVENTFD
Unregister an eventfd file descriptor to stop notifications. Since only one
eventfd descriptor is currently supported, this operation takes no argument,
and
.I arg
must be passed as NULL and
.I nr_args
must be zero. Available since 5.2.
.TP
.B IORING_REGISTER_PROBE
This operation returns a structure, io_uring_probe, which contains information
about the opcodes supported by io_uring on the running kernel.
.I arg
must contain a pointer to a struct io_uring_probe, and
.I nr_args
must contain the size of the ops array in that probe struct. The ops array
is of the type io_uring_probe_op, which holds the value of the opcode and
a flags field. If the flags field has
.B IO_URING_OP_SUPPORTED
set, then this opcode is supported on the running kernel. Available since 5.6.
.TP
.B IORING_REGISTER_PERSONALITY
This operation registers credentials of the running application with io_uring,
and returns an id associated with these credentials. Applications wishing to
share a ring between separate users/processes can pass in this credential id
in the sqe
.B personality
field. If set, that particular sqe will be issued with these credentials. Must
be invoked with
.I arg
set to NULL and
.I nr_args
set to zero. Available since 5.6.
.TP
.B IORING_UNREGISTER_PERSONALITY
This operation unregisters a previously registered personality with io_uring.
.I nr_args
must be set to the id in question, and
.I arg
must be set to NULL. Available since 5.6.
.TP
.B IORING_REGISTER_ENABLE_RINGS
This operation enables an io_uring ring started in a disabled state
.RB ( IORING_SETUP_R_DISABLED
was specified in the call to
.BR io_uring_setup (2)).
While the io_uring ring is disabled, submissions are not allowed and
registrations are not restricted.
After the execution of this operation, the io_uring ring is enabled:
submissions and registration are allowed, but they will
be validated following the registered restrictions (if any).
This operation takes no argument, must be invoked with
.I arg
set to NULL and
.I nr_args
set to zero. Available since 5.10.
.TP
.B IORING_REGISTER_RESTRICTIONS
.I arg
points to a
.I struct io_uring_restriction
array of
.I nr_args
entries.
With an entry it is possible to allow an
.BR io_uring_register (2)
.IR opcode ,
or specify which
.I opcode
and
.I flags
of the submission queue entry are allowed,
or require certain
.I flags
to be specified (these flags must be set on each submission queue entry).
All the restrictions must be submitted with a single
.BR io_uring_register (2)
call and they are handled as an allowlist (opcodes and flags not registered,
are not allowed).
Restrictions can be registered only if the io_uring ring started in a disabled
state
.RB ( IORING_SETUP_R_DISABLED
must be specified in the call to
.BR io_uring_setup (2)).
Available since 5.10.
.TP
.B IORING_REGISTER_IOWQ_AFF
By default, async workers created by io_uring will inherit the CPU mask of its
parent. This is usually all the CPUs in the system, unless the parent is being
run with a limited set. If this isn't the desired outcome, the application
may explicitly tell io_uring what CPUs the async workers may run on.
.I arg
must point to a
.B cpu_set_t
mask, and
.I nr_args
the byte size of that mask.
Available since 5.14.
.TP
.B IORING_UNREGISTER_IOWQ_AFF
Undoes a CPU mask previously set with
.BR IORING_REGISTER_IOWQ_AFF .
Must not have
.I arg
or
.I nr_args
set.
Available since 5.14.
.TP
.B IORING_REGISTER_IOWQ_MAX_WORKERS
By default, io_uring limits the unbounded workers created to the maximum
processor count set by
.I RLIMIT_NPROC
and the bounded workers is a function of the SQ ring size and the number
of CPUs in the system. Sometimes this can be excessive (or too little, for
bounded), and this command provides a way to change the count per ring (per NUMA
node) instead.
.I arg
must be set to an
.I unsigned int
pointer to an array of two values, with the values in the array being set to
the maximum count of workers per NUMA node. Index 0 holds the bounded worker
count, and index 1 holds the unbounded worker count. On successful return, the
passed in array will contain the previous maximum values for each type. If the
count being passed in is 0, then this command returns the current maximum values
and doesn't modify the current setting.
.I nr_args
must be set to 2, as the command takes two values.
Available since 5.15.
.TP
.B IORING_REGISTER_RING_FDS
Whenever
.BR io_uring_enter (2)
is called to submit request or wait for completions, the kernel must grab a
reference to the file descriptor. If the application using io_uring is threaded,
the file table is marked as shared, and the reference grab and put of the file
descriptor count is more expensive than it is for a non-threaded application.
Similarly to how io_uring allows registration of files, this allow registration
of the ring file descriptor itself. This reduces the overhead of the
.BR io_uring_enter (2)
system call.
.I arg
must be set to a pointer to an array of type
.I struct io_uring_rsrc_update
of
.I nr_args
number of entries. The
.B data
field of this struct must contain an io_uring file descriptor, and the
.B offset
field can be either
.B -1
or an explicit offset desired for the registered file descriptor value. If
.B -1
is used, then upon successful return of this system call, the field will
contain the value of the registered file descriptor to be used for future
.BR io_uring_enter (2)
system calls.
On successful completion of this request, the returned descriptors may be used
instead of the real file descriptor for
.BR io_uring_enter (2),
provided that
.B IORING_ENTER_REGISTERED_RING
is set in the
.I flags
for the system call. This flag tells the kernel that a registered descriptor
is used rather than a real file descriptor.
Each thread or process using a ring must register the file descriptor directly
by issuing this request.
The maximum number of supported registered ring descriptors is currently
limited to
.B 16.
Available since 5.18.
.TP
.B IORING_UNREGISTER_RING_FDS
Unregister descriptors previously registered with
.BR IORING_REGISTER_RING_FDS .
.I arg
must be set to a pointer to an array of type
.I struct io_uring_rsrc_update
of
.I nr_args
number of entries. Only the
.B offset
field should be set in the structure, containing the registered file descriptor
offset previously returned from
.B IORING_REGISTER_RING_FDS
that the application wishes to unregister.
Note that this isn't done automatically on ring exit, if the thread or task
that previously registered a ring file descriptor isn't exiting. It is
recommended to manually unregister any previously registered ring descriptors
if the ring is closed and the task persists. This will free up a registration
slot, making it available for future use.
Available since 5.18.
.TP
.B IORING_REGISTER_PBUF_RING
Registers a shared buffer ring to be used with provided buffers. This is a
newer alternative to using
.B IORING_OP_PROVIDE_BUFFERS
which is more efficient, to be used with request types that support the
.B IOSQE_BUFFER_SELECT
flag.
The
.I arg
argument must be filled in with the appropriate information. It looks as
follows:
.PP
.in +12n
.EX
struct io_uring_buf_reg {
__u64 ring_addr;
__u32 ring_entries;
__u16 bgid;
__u16 pad;
__u64 resv[3];
};
.EE
.in
.PP
.in +8n
The
.I ring_addr
field must contain the address to the memory allocated to fit this ring.
The memory must be page aligned and hence allocated appropriately using eg
.BR posix_memalign (3)
or similar. The size of the ring is the product of
.I ring_entries
and the size of
.IR "struct io_uring_buf" .
.I ring_entries
is the desired size of the ring, and must be a power-of-2 in size. The maximum
size allowed is 2^15 (32768).
.I bgid
is the buffer group ID associated with this ring. SQEs that select a buffer
have a buffer group associated with them in their
.I buf_group
field, and the associated CQEs will have
.B IORING_CQE_F_BUFFER
set in their
.I flags
member, which will also contain the specific ID of the buffer selected. The rest
of the fields are reserved and must be cleared to zero.
.I nr_args
must be set to 1.
Also see
.BR io_uring_register_buf_ring (3)
for more details. Available since 5.19.
.TP
.B IORING_UNREGISTER_PBUF_RING
Unregister a previously registered provided buffer ring.
.I arg
must be set to the address of a struct io_uring_buf_reg, with just the
.I bgid
field set to the buffer group ID of the previously registered provided buffer
group.
.I nr_args
must be set to 1. Also see
.BR IORING_REGISTER_PBUF_RING .
Available since 5.19.
.TP
.B IORING_REGISTER_SYNC_CANCEL
Performs a synchronous cancelation request, which works in a similar fashion to
.B IORING_OP_ASYNC_CANCEL
except it completes inline. This can be useful for scenarios where cancelations
should happen synchronously, rather than needing to issue an SQE and wait for
completion of that specific CQE.
.I arg
must be set to a pointer to a struct io_uring_sync_cancel_reg structure, with
the details filled in for what request(s) to target for cancelation. See
.BR io_uring_register_sync_cancel (3)
for details on that. The return values are the same, except they are passed
back synchronously rather than through the CQE
.I res
field.
.I nr_args
must be set to 1.
Available since 6.0.
.TP
.B IORING_REGISTER_FILE_ALLOC_RANGE
sets the allowable range for fixed file index allocations within the
kernel. When requests that can instantiate a new fixed file are used with
.BR IORING_FILE_INDEX_ALLOC ,
the application is asking the kernel to allocate a new fixed file descriptor
rather than pass in a specific value for one. By default, the kernel will
pick any available fixed file descriptor within the range available.
This effectively allows the application to set aside a range just for dynamic
allocations, with the remainder being used for specific values.
.I nr_args
must be set to 1 and
.I arg
must be set to a pointer to a struct io_uring_file_index_range:
.PP
.in +12n
.EX
struct io_uring_file_index_range {
__u32 off;
__u32 len;
__u64 resv;
};
.EE
.in
.PP
.in +8n
with
.I off
being set to the starting value for the range, and
.I len
being set to the number of descriptors. The reserved
.I resv
field must be cleared to zero.
The application must have registered a file table first.
Available since 6.0.
.TP
.B IORING_REGISTER_PBUF_STATUS
Can be used to retrieve the current head of a ringbuffer provided earlier via
.BR IORING_REGISTER_PBUF_RING .
.I arg
must point to a
.PP
.in +12
.EX
struct io_uring_buf_status {
__u32 buf_group; /* input */
__u32 head; /* output */
__u32 resv[8];
};
.EE
.in
.PP
.in +8
of which
.I arg->buf_group
should contain the buffer group ID for the buffer ring in question,
.I nr_args
should be set to 1 and
.I arg->resv
should be zeroed out.
The current head of the ringbuffer will be returned in
.IR arg->head .
Available since 6.8.
.TP
.B IORING_REGISTER_NAPI
Registers a napi instance with the io_uring instance of
.IR fd .
.I arg
should point to a
.PP
.in +12
.EX
struct io_uring_napi {
__u32 busy_poll_to;
__u8 prefer_busy_poll;
__u8 pad[3];
__u64 resv;
};
.EE
.in
.PP
.in +8
in which
.I arg->busy_poll_to
should contain the busy poll timeout in micro seconds and
.I arg->prefer_busy_poll
should specify whether busy polling should be used rather than IRQs.
.I nr_args
should be set to 1 and
.I arg->pad
and
.I arg->resv
should be zeroed out.
On successful return the
.I io_uring_napi
struct pointed to by
.I arg
will contain the previously used settings.
Available since 6.9.
.TP
.B IORING_UNREGISTER_NAPI
Unregisters a napi instance previously registered via
.B IORING_REGISTER_NAPI
to the io_uring instance of
.IR fd .
.I arg
should point to a
.I struct
.IR io_uring_napi .
On successful return the
.I io_uring_napi
struct pointed to by
.I arg
will contain the previously used settings.
Available since 6.9.
.TP
.B IORING_REGISTER_CLOCK
Specifies which clock id io_uring will use for timers while waiting for
completion events with
.BR IORING_ENTER_GETEVENTS .
It's only effective if the timeout argument in
.I struct io_uring_getevents_arg
is passed, ignored otherwise.
When used in conjunction with
.BR IORING_ENTER_ABS_TIMER ,
interprets the timeout argument as absolute time of the specified clock.
The default clock is
.BR CLOCK_MONOTONIC .
Available since 6.12 and supports
.B CLOCK_MONOTONIC
and
.BR CLOCK_BOOTTIME .
.TP
.B IORING_REGISTER_CLONE_BUFFERS
Supports cloning buffers from a source ring to a destination ring, duplicating
previously registered buffers from source to destination.
.IR arg
must be set to a pointer to a
.I struct io_uring_clone_buffers
and
.IR nr_args
must be set to
.B 1 .
.I struct io_uring_buf_reg
looks as follows:
.PP
.in +12n
.EX
struct io_uring_clone_buffers {
__u32 src_fd;
__u32 flags;
__u32 src_off;
__u32 dst_off;
__u32 nr;
__u32 pad[3];
};
.EE
.in
.TP
.PP
where
.IR src_fd
indicates the fd of the source ring,
.IR flags
are modifier flags for the operation,
.IR src_off
indicates the offset from where to start the cloning from the source ring,
.IR dst_off
indicates the offset from where to start the cloning into the destination ring,
and
.IR nr
indicates the number of buffers to clone at the given offsets.
.IR pad
must be zero filled.
Kernel 6.12 added support for full range cloning, where
.IR src_off ,
.IR dst_off ,
and
.IR nr
must all be set to 0, indicating cloning of the entire table in source to
destination. Kernel 6.13 added support for specifying the offsets and
how many buffers to clone. Additionally, it added support for cloning into
a previously registered table in the destination as well, 6.12 would fail
that operation with
.B -EBUSY
if attempted. To replace existing nodes, or clone into an existing table,
.B IORING_REGISTER_DST_REPLACE
must be set in the
.IR flags
member.
.TP
.B IORING_REGISTER_SEND_MSG_RING
Supports sending of the equivalent of a
.B IORING_OP_MSG_RING
request, but without having a source ring available. Takes a pointer to a
.IR struct io_uring_sqe
which must be prepared with
.BR io_uring_prep_msg_ring (3)
before being submitted. Only supports
.B IORING_MSG_DATA
type of requests. Available since kernel 6.13.
.TP
.B IORING_REGISTER_RESIZE_RINGS
Supports resizing the SQ and CQ rings. Takes a pointer to a
.IR struct io_uring_params
as the argument, where
.IR sq_entries
and
.IR cq_entries
may be set to the desired values. Only supports a limited set of flags set
in the
.IR struct io_uring_params
argument, notably
.B IORING_SETUP_CQSIZE
and
.B IORING_SETUP_CLAMP
to modify the CQ ring sizing. See
.BR io_uring_resize_rings (3)
for details. Note that while liburing takes care of the ring unmap and mapping
for a resize operation, manual users of this register syscall must perform
those operations, similarly to when a new ring is created. The
.IR struct io_uring_params
structure will get the necessary offsets copied back upon successful completion
of this system call, which can be used to memory map the ring just like how
a new ring would've been mapped. Available since kernel 6.13.
.TP
.B IORING_REGISTER_MEM_REGION
Supports registering multiple purposes memory regions, avoiding unnecessary
copying in of
.IR struct io_uring_getevents_arg
for wait operations that specify a timeout or minimum timeout. Takes a pointer
to a
.IR struct io_uring_mem_region_reg
structure, which looks as follows:
.PP
.in +12n
.EX
struct io_uring_mem_region_reg {
__u64 region_uptr;
__u64 flags;
__u64 __resv[2];
};
.EE
.in
.TP
.PP
where
.IR region_uptr
must be set to the region being registered as memory regions,
.IR flags
specifies modifier flags (must currently be
.B IORING_MEM_REGION_REG_WAIT_ARG ). The pad fields must all be cleared to
.B 0 .
Each memory regions looks as follows:
.PP
.in +12n
.EX
struct io_uring_region_desc {
__u64 user_addr;
__u64 size;
__u32 flags;
__u32 id;
__u64 mmap_offset;
__u64 __resv[4];
};
.EE
.in
.TP
.PP
where
.IR user_addr
points to userspace memory mappings,
.IR size
is the size of userspace memory. Current supported userspace memory regions
looks as follows:
.PP
.in +12n
.EX
struct io_uring_reg_wait {
struct __kernel_timespec ts;
__u32 min_wait_usec;
__u32 flags;
__u64 sigmask;
__u32 sigmask_sz;
__u32 pad[3];
__u64 pad2[2];
};
.EE
.in
.TP
.PP
where
.IR ts
holds the timeout information for this region
.IR flags
holds information about the timeout region,
.IR sigmask
is a pointer to a signal mask, if used, and
.IR sigmask_sz
is the size of that signal mask. The pad fields must all be cleared to
.B 0 .
Currently the only valid flag is
.B IORING_REG_WAIT_TS ,
which, if set, says that the values in
.IR ts
are valid and should be used for a timeout operation. The
.IR user_addr
field of
.IR struct io_uring_region_desc
must be set to an address of
.IR struct io_uring_reg_wait
members, an up to a page size can be mapped. At the size of 64 bytes per
region, that allows at least 64 individual regions on a 4k page size system.
The offsets of these regions are used for an
.BR io_uring_enter (2)
system call, with the first one being 0, second one 1, and so forth. After
registration of the wait regions,
.BR io_uring_enter (2)
may be used with the enter flag of
.B IORING_ENTER_EXT_ARG_REG and an
.IR argp
set to the wait region offset, rather than a pointer to a
.IR struct io_uring_getevent_arg
structure. If used with
.B IORING_ENTER_GETEVENTS ,
then the wait operation will use the information in the registered wait
region rather than needing a io_uring_getevent_arg structure copied for each
operation. For high frequency waits, this can save considerable CPU cycles.
Note: once a region has been registered, it cannot get unregistered. It lives
for the life of the ring. Individual wait region offset may be modified before
any
.BR io_uring_enter (2)
system call. Available since kernel 6.13.
.SH RETURN VALUE
On success,
.BR io_uring_register (2)
returns either 0 or a positive value, depending on the
.I opcode
used. On error, a negative error value is returned. The caller should not rely
on the
.I errno
variable.
.SH ERRORS
.TP
.B EACCES
The
.I opcode
field is not allowed due to registered restrictions.
.TP
.B EBADF
One or more fds in the
.I fd
array are invalid.
.TP
.B EBADFD
.B IORING_REGISTER_ENABLE_RINGS
or
.B IORING_REGISTER_RESTRICTIONS
was specified, but the io_uring ring is not disabled.
.TP
.B EBUSY
.B IORING_REGISTER_BUFFERS
or
.B IORING_REGISTER_FILES
or
.B IORING_REGISTER_RESTRICTIONS
was specified, but there were already buffers, files, or restrictions
registered.
.TP
.B EEXIST
The thread performing the registration is invalid.
.TP
.B EFAULT
buffer is outside of the process' accessible address space, or
.I iov_len
is greater than 1GiB.
.TP
.B EINVAL
.B IORING_REGISTER_BUFFERS
or
.B IORING_REGISTER_FILES
was specified, but
.I nr_args
is 0.
.TP
.B EINVAL
.B IORING_REGISTER_BUFFERS
was specified, but
.I nr_args
exceeds
.B UIO_MAXIOV
.TP
.B EINVAL
.B IORING_UNREGISTER_BUFFERS
or
.B IORING_UNREGISTER_FILES
was specified, and
.I nr_args
is non-zero or
.I arg
is non-NULL.
.TP
.B EINVAL
.B IORING_REGISTER_RESTRICTIONS
was specified, but
.I nr_args
exceeds the maximum allowed number of restrictions or restriction
.I opcode
is invalid.
.TP
.B EMFILE
.B IORING_REGISTER_FILES
was specified and
.I nr_args
exceeds the maximum allowed number of files in a fixed file set.
.TP
.B EMFILE
.B IORING_REGISTER_FILES
was specified and adding
.I nr_args
file references would exceed the maximum allowed number of files the user
is allowed to have according to the
.B RLIMIT_NOFILE
resource limit and the caller does not have
.B CAP_SYS_RESOURCE
capability. Note that this is a per user limit, not per process.
.TP
.B ENOMEM
Insufficient kernel resources are available, or the caller had a
non-zero
.B RLIMIT_MEMLOCK
soft resource limit, but tried to lock more memory than the limit
permitted. This limit is not enforced if the process is privileged
.RB ( CAP_IPC_LOCK ).
.TP
.B ENXIO
.B IORING_UNREGISTER_BUFFERS
or
.B IORING_UNREGISTER_FILES
was specified, but there were no buffers or files registered.
.TP
.B ENXIO
Attempt to register files or buffers on an io_uring instance that is already
undergoing file or buffer registration, or is being torn down.
.TP
.B EOPNOTSUPP
User buffers point to file-backed memory.
.TP
.B EFAULT
User buffers point to file-backed memory (newer kernels).
.TP
.B ENOENT
.B IORING_REGISTER_PBUF_STATUS
was specified, but
.I buf_group
did not refer to a currently valid buffer group.
.TP
.B EINVAL
.B IORING_REGISTER_PBUF_STATUS
was specified, but the valid buffer group specified by
.I buf_group
did not refer to a buffer group registered via
.BR IORING_REGISTER_PBUF_RING .
.TP
.B EINVAL
.B IORING_REGISTER_NAPI
was specified, but the ring associated with
.I fd
has not been created with
.BR IORING_SETUP_IOPOLL .
|