1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
|
<?xml version='1.0'?> <!--*-nxml-*-->
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<!-- SPDX-License-Identifier: LGPL-2.1-or-later -->
<refentry id="systemd-nsresourced.service" conditional='ENABLE_NSRESOURCED'>
<refentryinfo>
<title>systemd-nsresourced.service</title>
<productname>systemd</productname>
</refentryinfo>
<refmeta>
<refentrytitle>systemd-nsresourced.service</refentrytitle>
<manvolnum>8</manvolnum>
</refmeta>
<refnamediv>
<refname>systemd-nsresourced.service</refname>
<refname>systemd-nsresourced</refname>
<refpurpose>User Namespace Resource Delegation Service</refpurpose>
</refnamediv>
<refsynopsisdiv>
<para><filename>systemd-nsresourced.service</filename></para>
<para><filename>/usr/lib/systemd/systemd-nsresourced</filename></para>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para><command>systemd-nsresourced</command> is a system service that permits transient delegation of a
UID/GID range to a user namespace (see <citerefentry
project='man-pages'><refentrytitle>user_namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry>)
allocated by a client, via a Varlink IPC API.</para>
<para>Unprivileged clients may allocate a user namespace, and then request a UID/GID range to be assigned
to it via this service. The user namespace may then be used to run containers and other sandboxes, and/or
apply it to an id-mapped mount.</para>
<para>Allocations of UIDs/GIDs this way are transient: when a user namespace goes away, its UID/GID range
is returned to the pool of available ranges. In order to ensure that clients cannot gain persistency in
their transient UID/GID range a BPF-LSM based policy is enforced that ensures that user namespaces set up
this way can only write to file systems they allocate themselves or that are explicitly allowlisted via
<command>systemd-nsresourced</command>.</para>
<para><command>systemd-nsresourced</command> automatically ensures that any registered UID ranges show up
in the system's NSS database via the <ulink url="https://systemd.io/USER_GROUP_API">User/Group Record
Lookup API via Varlink</ulink>.</para>
<para>Currently, only UID/GID ranges consisting of either exactly 1 or exactly 65536 UIDs/GIDs can be
registered with this service. Moreover, UIDs and GIDs are always allocated together, and
symmetrically.</para>
<para>The allocation API supports <emphasis>delegated ranges</emphasis>: additional UID/GID ranges that
are mapped 1:1 into the user namespace rather than being translated to a target UID/GID. These delegated
ranges enable nested user namespace scenarios where a container needs to create child user namespaces
with their own transient UID ranges. Normally, the kernel restricts which UIDs can be mapped into a user
namespace to those that are also mapped in the parent. Delegated ranges solve this by pre-allocating
additional ranges that are visible inside the user namespace and can be used by nested
<function>AllocateUserRange()</function> calls. Up to 16 delegated ranges can be requested per user
namespace, each of size 65536. The ranges are allocated from the container UID ranges as per
<ulink url="https://systemd.io/UIDS-GIDS">Users, Groups, UIDs and GIDs on systemd Systems</ulink>.</para>
<para>The allocation API also supports <emphasis>identity mappings</emphasis>: instead of allocating a
transient UID/GID range, the user namespace can be configured to map the caller's UID/GID to root (UID
0) inside the namespace, or to itself. Identity mappings can be combined with delegated ranges to enter
a privileged user namespace from which the container can be set up after which the container can run in
one of the delegated ranges. Identity mapped users are not subject to BPF-LSM write restrictions unlike
the transient ranges.</para>
<para>Additionally, the allocation API supports mapping the <emphasis>foreign UID range</emphasis> into
the user namespace. When this option is enabled, the foreign UID range is mapped 1:1 into the user
namespace, allowing processes inside to access and manipulate files owned by the foreign UID range.</para>
<para>The service provides API calls to allowlist mounts (referenced via their mount file descriptors as
per Linux <function>fsmount()</function> API), to pass ownership of a cgroup subtree to the user
namespace and to delegate a virtual Ethernet device pair to the user namespace. When used in combination
this is sufficient to implement fully unprivileged container environments, as implemented by
<citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry>, fully
unprivileged <varname>RootImage=</varname> (see
<citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>) or
fully unprivileged disk image tools such as
<citerefentry><refentrytitle>systemd-dissect</refentrytitle><manvolnum>1</manvolnum></citerefentry>.</para>
<para>This service provides one <ulink url="https://varlink.org/">Varlink</ulink> service:
<constant>io.systemd.NamespaceResource</constant> allows registering user namespaces, and assign mounts,
cgroups and network interfaces to it.</para>
</refsect1>
<refsect1>
<title>See Also</title>
<para><simplelist type="inline">
<member><citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd-mountfsd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd-dissect</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry project='man-pages'><refentrytitle>user_namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
</simplelist></para>
</refsect1>
</refentry>
|