Package: runc / 1.0.0~rc93+ds1-5+deb11u5

Metadata

Package Version Patches format
runc 1.0.0~rc93+ds1-5+deb11u5 3.0 (quilt)

Patch series

view the series file
Patch File delta Description
0001 skip test hugetlb_test.go random failures on ppc64el.patch | (download)

libcontainer/cgroups/fs/hugetlb_test.go | 4 4 + 0 - 0 !
1 file changed, 4 insertions(+)

 skip test: hugetlb_test.go, random failures on ppc64el, s390x

0002 skip privileged test TestFactoryNewTmpfs.patch | (download)

libcontainer/factory_linux_test.go | 1 1 + 0 - 0 !
1 file changed, 1 insertion(+)

 skip privileged test: testfactorynewtmpfs

0003 fix gccgo.patch | (download)

libcontainer/stacktrace/capture.go | 21 12 + 9 - 0 !
libcontainer/stacktrace/capture_test.go | 4 2 + 2 - 0 !
libcontainer/stacktrace/frame.go | 15 5 + 10 - 0 !
3 files changed, 19 insertions(+), 21 deletions(-)

 fix gccgo

0004 skip privileged test nsenter_test.go.patch | (download)

libcontainer/nsenter/nsenter_test.go | 2 2 + 0 - 0 !
1 file changed, 2 insertions(+)

 skip privileged test: nsenter_test.go


0005 skip privileged test fs_test.go.patch | (download)

libcontainer/cgroups/fs/fs_test.go | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 skip privileged test: fs_test.go


0006 skip privileged test fscommon_test.go.patch | (download)

libcontainer/cgroups/fscommon/fscommon_test.go | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 skip privileged test: fscommon_test.go


0007 skip test cgroups_test.go fail when cgroups is not m.patch | (download)

libcontainer/cgroups/cgroups_test.go | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 skip test: cgroups_test.go, fail when cgroups is not mounted


0008 fix patchpbf test on 32 bit.patch | (download)

libcontainer/seccomp/patchbpf/enosys_linux_test.go | 17 10 + 7 - 0 !
1 file changed, 10 insertions(+), 7 deletions(-)

 fix patchpbf test on 32-bit

0009 skip integration when no dev kmsg.patch | (download)

tests/integration/dev.bats | 4 4 + 0 - 0 !
1 file changed, 4 insertions(+)

 skip integration when no /dev/kmsg

By default, privileged lxc container doesn't have /dev/kmsg

0010 Ensure the seccomp pipe is being read while exportin.patch | (download)

libcontainer/seccomp/patchbpf/enosys_linux.go | 15 14 + 1 - 0 !
libcontainer/seccomp/patchbpf/enosys_linux_test.go | 20 20 + 0 - 0 !
2 files changed, 34 insertions(+), 1 deletion(-)

 ensure the seccomp pipe is being read while exporting bpf

CVE 2021 30465/rc93 0001 libct newInitConfig nit.patch | (download)

libcontainer/container_linux.go | 7 4 + 3 - 0 !
1 file changed, 4 insertions(+), 3 deletions(-)

 [patch 1/5] libct/newinitconfig: nit

Move the initialization of Console* fields as they are unconditional.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

CVE 2021 30465/rc93 0002 libct rootfs introduce and use mountConfig.patch | (download)

libcontainer/rootfs_linux.go | 42 26 + 16 - 0 !
1 file changed, 26 insertions(+), 16 deletions(-)

 [patch 2/5] libct/rootfs: introduce and use mountconfig

The code is already passing three parameters around from
mountToRootfs to mountCgroupV* to mountToRootfs again.

I am about to add another parameter, so let's introduce and
use struct mountConfig to pass around.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

CVE 2021 30465/rc93 0003 libct rootfs mountCgroupV2 minor refactor.patch | (download)

libcontainer/rootfs_linux.go | 10 6 + 4 - 0 !
1 file changed, 6 insertions(+), 4 deletions(-)

 [patch 3/5] libct/rootfs/mountcgroupv2: minor refactor

1. s/cgroupPath/dest/

2. don't hardcode /sys/fs/cgroup

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

CVE 2021 30465/rc93 0004 Fix cgroup2 mount for rootless case.patch | (download)

libcontainer/container_linux.go | 3 3 + 0 - 0 !
libcontainer/init_linux.go | 1 1 + 0 - 0 !
libcontainer/rootfs_linux.go | 28 21 + 7 - 0 !
libcontainer/specconv/example.go | 18 9 + 9 - 0 !
4 files changed, 34 insertions(+), 16 deletions(-)

 [patch 4/5] fix cgroup2 mount for rootless case

In case of rootless, cgroup2 mount is not possible (see [1] for more
details), so since commit 9c81440fb5a7 runc bind-mounts the whole
/sys/fs/cgroup into container.

Problem is, if cgroupns is enabled, /sys/fs/cgroup inside the container
is supposed to show the cgroup files for this cgroup, not the root one.

The fix is to pass through and use the cgroup path in case cgroup2
mount failed, cgroupns is enabled, and the path is non-empty.

Surely this requires the /sys/fs/cgroup mount in the spec, so modify
runc spec --rootless to keep it.

Before:

	$ ./runc run aaa
	# find /sys/fs/cgroup/ -type d
	/sys/fs/cgroup
	/sys/fs/cgroup/user.slice
	/sys/fs/cgroup/user.slice/user-1000.slice
	/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service
	...
	# ls -l /sys/fs/cgroup/cgroup.controllers
	-r--r--r--    1 nobody   nogroup          0 Feb 24 02:22 /sys/fs/cgroup/cgroup.controllers
	# wc -w /sys/fs/cgroup/cgroup.procs
	142 /sys/fs/cgroup/cgroup.procs
	# cat /sys/fs/cgroup/memory.current
	cat: can't open '/sys/fs/cgroup/memory.current': No such file or directory

After:

	# find /sys/fs/cgroup/ -type d
	/sys/fs/cgroup/
	# ls -l /sys/fs/cgroup/cgroup.controllers
	-r--r--r--    1 root     root             0 Feb 24 02:43 /sys/fs/cgroup/cgroup.controllers
	# wc -w /sys/fs/cgroup/cgroup.procs
	2 /sys/fs/cgroup/cgroup.procs
	# cat /sys/fs/cgroup/memory.current
	577536

[1] https://github.com/opencontainers/runc/issues/2158

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

CVE 2021 30465/rc93 0005 rootfs add mount destination validation.patch | (download)

libcontainer/container_linux.go | 1 0 + 1 - 0 !
libcontainer/rootfs_linux.go | 251 124 + 127 - 0 !
libcontainer/utils/utils.go | 54 54 + 0 - 0 !
libcontainer/utils/utils_test.go | 35 35 + 0 - 0 !
4 files changed, 213 insertions(+), 128 deletions(-)

 [patch 5/5] rootfs: add mount destination validation

Because the target of a mount is inside a container (which may be a
volume that is shared with another container), there exists a race
condition where the target of the mount may change to a path containing
a symlink after we have sanitised the path -- resulting in us
inadvertently mounting the path outside of the container.

This is not immediately useful because we are in a mount namespace with
MS_SLAVE mount propagation applied to "/", so we cannot mount on top of
host paths in the host namespace. However, if any subsequent mountpoints
in the configuration use a subdirectory of that host path as a source,
those subsequent mounts will use an attacker-controlled source path
(resolved within the host rootfs) -- allowing the bind-mounting of "/"
into the container.

While arguably configuration issues like this are not entirely within
runc's threat model, within the context of Kubernetes (and possibly
other container managers that provide semi-arbitrary container creation
privileges to untrusted users) this is a legitimate issue. Since we
cannot block mounting from the host into the container, we need to block
the first stage of this attack (mounting onto a path outside the
container).

The long-term plan to solve this would be to migrate to libpathrs, but
as a stop-gap we implement libpathrs-like path verification through
readlink(/proc/self/fd/$n) and then do mount operations through the
procfd once it's been verified to be inside the container. The target
could move after we've checked it, but if it is inside the container
then we can assume that it is safe for the same reason that libpathrs
operations would be safe.

A slight wrinkle is the "copyup" functionality we provide for tmpfs,
which is the only case where we want to do a mount on the host
filesystem. To facilitate this, I split out the copy-up functionality
entirely so that the logic isn't interspersed with the regular tmpfs
logic. In addition, all dependencies on m.Destination being overwritten
have been removed since that pattern was just begging to be a source of
more mount-target bugs (we do still have to modify m.Destination for
tmpfs-copyup but we only do it temporarily).

Fixes: CVE-2021-30465
Reported-by: Etienne Champetier <champetier.etienne@gmail.com>
Co-authored-by: Noah Meyerhans <nmeyerha@amazon.com>
default_retno.patch | (download)

libcontainer/configs/config.go | 7 4 + 3 - 0 !
libcontainer/seccomp/patchbpf/enosys_linux.go | 5 5 + 0 - 0 !
libcontainer/seccomp/seccomp_linux.go | 2 1 + 1 - 0 !
libcontainer/specconv/spec_linux.go | 1 1 + 0 - 0 !
tests/integration/seccomp.bats | 12 12 + 0 - 0 !
tests/integration/testdata/seccomp_syscall_test2.c | 12 12 + 0 - 0 !
tests/integration/testdata/seccomp_syscall_test2.json | 356 356 + 0 - 0 !
7 files changed, 391 insertions(+), 4 deletions(-)

---
CVE 2022 29162.patch | (download)

exec.go | 1 0 + 1 - 0 !
libcontainer/README.md | 16 0 + 16 - 0 !
libcontainer/integration/exec_test.go | 2 0 + 2 - 0 !
libcontainer/integration/template_test.go | 16 0 + 16 - 0 !
libcontainer/specconv/example.go | 5 0 + 5 - 0 !
5 files changed, 40 deletions(-)

---
CVE 2024 21626/0018 Fix File to Close.patch | (download)

libcontainer/cgroups/fs/fs.go | 1 1 + 0 - 0 !
update.go | 1 1 + 0 - 0 !
2 files changed, 2 insertions(+)

 fix file to close

(This is a cherry-pick of 937ca107c3d22da77eb8e8030f2342253b980980.)

Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

CVE 2024 21626/0019 init verify after chdir that cwd is inside the conta.patch | (download)

libcontainer/init_linux.go | 31 31 + 0 - 0 !
libcontainer/integration/seccomp_test.go | 20 10 + 10 - 0 !
2 files changed, 41 insertions(+), 10 deletions(-)

 init: verify after chdir that cwd is inside the container

If a file descriptor of a directory in the host's mount namespace is
leaked to runc init, a malicious config.json could use /proc/self/fd/...
as a working directory to allow for host filesystem access after the
container runs. This can also be exploited by a container process if it
knows that an administrator will use "runc exec --cwd" and the target
--cwd (the attacker can change that cwd to be a symlink pointing to
/proc/self/fd/... and wait for the process to exec and then snoop on
/proc/$pid/cwd to get access to the host). The former issue can lead to
a critical vulnerability in Docker and Kubernetes, while the latter is a
container breakout.

We can (ab)use the fact that getcwd(2) on Linux detects this exact case,
and getcwd(3) and Go's Getwd() return an error as a result. Thus, if we
just do os.Getwd() after chdir we can easily detect this case and error
out.

In runc 1.1, a /sys/fs/cgroup handle happens to be leaked to "runc
init", making this exploitable. On runc main it just so happens that the
leaked /sys/fs/cgroup gets clobbered and thus this is only consistently
exploitable for runc 1.1.

Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Co-developed-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: lifubang <lifubang@acmcoder.com>
[refactored the implementation and added more comments]
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

CVE 2024 21626/0020 setns init do explicit lookup of execve argument ear.patch | (download)

libcontainer/setns_init_linux.go | 14 13 + 1 - 0 !
1 file changed, 13 insertions(+), 1 deletion(-)

 setns init: do explicit lookup of execve argument early

(This is a partial backport of a minor change included in commit
dac41717465462b21fab5b5942fe4cb3f47d7e53.)

This mirrors the logic in standard_init_linux.go, and also ensures that
we do not call exec.LookPath in the final execve step.

While this is okay for regular binaries, it seems exec.LookPath calls
os.Getenv which tries to emit a log entry to the test harness when
running in "go test" mode. In a future patch (in order to fix
CVE-2024-21626), we will close all of the file descriptors immediately
before execve, which would mean the file descriptor for test harness
logging would be closed at execve time. So, moving exec.LookPath earlier
is necessary.

Ref: dac417174654 ("runc-dmz: reduce memfd binary cloning cost with small C binary")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

CVE 2024 21626/0021 init close internal fds before execve.patch | (download)

libcontainer/logs/logs.go | 9 9 + 0 - 0 !
libcontainer/setns_init_linux.go | 20 20 + 0 - 0 !
libcontainer/standard_init_linux.go | 20 20 + 0 - 0 !
libcontainer/utils/utils_unix.go | 72 64 + 8 - 0 !
4 files changed, 113 insertions(+), 8 deletions(-)

 init: close internal fds before execve

If we leak a file descriptor referencing the host filesystem, an
attacker could use a /proc/self/fd magic-link as the source for execve
to execute a host binary in the container. This would allow the binary
itself (or a process inside the container in the 'runc exec' case) to
write to a host binary, leading to a container escape.

The simple solution is to make sure we close all file descriptors
immediately before the execve(2) step. Doing this earlier can lead to very
serious issues in Go (as file descriptors can be reused, any (*os.File)
CVE 2024 21626/0022 cgroup plug leaks of sys fs cgroup handle.patch | (download)

libcontainer/cgroups/fscommon/open.go | 19 10 + 9 - 0 !
1 file changed, 10 insertions(+), 9 deletions(-)

 cgroup: plug leaks of /sys/fs/cgroup handle

We auto-close this file descriptor in the final exec step, but it's
probably a good idea to not possibly leak the file descriptor to "runc
init" (we've had issues like this in the past) especially since it is a
directory handle from the host mount namespace.

In practice, on runc 1.1 this does leak to "runc init" but on main the
handle has a low enough file descriptor that it gets clobbered by the
ForkExec of "runc init".

OPEN_TREE_CLONE would let us protect this handle even further, but the
performance impact of creating an anonymous mount namespace is probably
not worth it.

Also, switch to using an *os.File for the handle so if it goes out of
scope during setup (i.e. an error occurs during setup) it will get
cleaned up by the GC.

Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

CVE 2024 21626/0023 libcontainer mark all non stdio fds O_CLOEXEC before.patch | (download)

libcontainer/container_linux.go | 10 10 + 0 - 0 !
1 file changed, 10 insertions(+)

 libcontainer: mark all non-stdio fds o_cloexec before spawning init

Given the core issue in GHSA-xr7r-f8xq-vfvv was that we were unknowingly
leaking file descriptors to "runc init", it seems prudent to make sure
we proactively prevent this in the future. The solution is to simply
mark all non-stdio file descriptors as O_CLOEXEC before we spawn "runc
init".

For libcontainer library users, this could result in unrelated files
being marked as O_CLOEXEC -- however (for the same reason we are doing
this for runc), for security reasons those files should've been marked
as O_CLOEXEC anyway.

Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

CVE 2024 21626/0024 init don t special case logrus fds.patch | (download)

libcontainer/logs/logs.go | 9 0 + 9 - 0 !
libcontainer/utils/utils_unix.go | 8 0 + 8 - 0 !
2 files changed, 17 deletions(-)

 init: don't special-case logrus fds

We close the logfd before execve so there's no need to special case it.
In addition, it turns out that (*os.File).Fd() doesn't handle the case
where the file was closed and so it seems suspect to use that kind of
check.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

0025 Fix busybox tarball url in integration test.patch | (download)

tests/integration/multi-arch.bash | 4 2 + 2 - 0 !
1 file changed, 2 insertions(+), 2 deletions(-)

 fix busybox tarball url in integration test

https://github.com/opencontainers/runc/blob/main/tests/integration/get-images.sh

CVE 2021 43784.patch | (download)

libcontainer/container_linux.go | 20 19 + 1 - 0 !
libcontainer/message_linux.go | 10 10 + 0 - 0 !
2 files changed, 29 insertions(+), 1 deletion(-)

 fix cve-2021-43784

When writing netlink messages, it is possible to have a byte array
larger than UINT16_MAX which would result in the length field
overflowing and allowing user-controlled data to be parsed as control
characters (such as creating custom mount points, changing which set of
namespaces to allow, and so on).

0027 Fix test for newer kernels.patch | (download)

tests/integration/no_pivot.bats | 4 3 + 1 - 0 !
1 file changed, 3 insertions(+), 1 deletion(-)

 [patch] tests/int/no_pivot: fix for new kernels

The test is failing like this:

	not ok 70 runc run --no-pivot must not expose bare /proc
	# (in test file tests/integration/no_pivot.bats, line 20)
	#   `[[ "$output" == *"mount: permission denied"* ]]' failed
	# runc spec (status=0):
	#
	# runc run --no-pivot test_no_pivot (status=1):
	# unshare: write error: Operation not permitted

Apparently, a recent kernel commit db2e718a47984b9d prevents
root from doing unshare -r unless it has CAP_SETFPCAP.

Add the capability for this specific test.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

CVE 2023 25809.patch | (download)

libcontainer/rootfs_linux.go | 53 34 + 19 - 0 !
tests/integration/mounts.bats | 17 17 + 0 - 0 !
2 files changed, 51 insertions(+), 19 deletions(-)

 [patch] rootless: fix /sys/fs/cgroup mounts

It was found that rootless runc makes `/sys/fs/cgroup` writable in following conditons:

1. when runc is executed inside the user namespace, and the config.json does not specify the cgroup namespace to be unshared
   (e.g.., `(docker|podman|nerdctl) run --cgroupns=host`, with Rootless Docker/Podman/nerdctl)
2. or, when runc is executed outside the user namespace, and `/sys` is mounted with `rbind, ro`
   (e.g., `runc spec --rootless`; this condition is very rare)

A container may gain the write access to user-owned cgroup hierarchy `/sys/fs/cgroup/user.slice/...` on the host.
Other users's cgroup hierarchies are not affected.

To fix the issue, this commit does:
1. Remount `/sys/fs/cgroup` to apply `MS_RDONLY` when it is being bind-mounted
2. Mask `/sys/fs/cgroup` when the bind source is unavailable

Fix CVE-2023-25809 (GHSA-m8cg-xc2p-r3fc)

Co-authored-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit df4eae457b8ccffa619c659c2def5c777d8ff507)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

CVE 2023 27561 and CVE 2023 28642.patch | (download)

libcontainer/rootfs_linux.go | 23 17 + 6 - 0 !
tests/integration/mask.bats | 19 19 + 0 - 0 !
2 files changed, 36 insertions(+), 6 deletions(-)

 [patch] prohibit /proc and /sys to be symlinks

Commit 3291d66b9844 introduced a check for /proc and /sys, making sure
the destination (dest) is a directory (and not e.g. a symlink).

Later, a hunk from commit 0ca91f44f switched from using filepath.Join
to SecureJoin for dest. As SecureJoin follows and resolves symlinks,
the check whether dest is a symlink no longer works.

To fix, do the check without/before using SecureJoin.

Add integration tests to make sure we won't regress.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 0d72adf96dda1b687815bf89bb245b937a2f603c)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

This patch fixes both, CVE-2023-27561 and CVE-2023-28642