Package: runc | Debian Sources

Package: runc / 1.0.0~rc93+ds1-5+deb11u5

Metadata

Package	Version	Patches format
runc	1.0.0~rc93+ds1-5+deb11u5	3.0 (quilt)

Patch series

view the series file

Patch	File delta	Description
0001 skip test hugetlb_test.go random failures on ppc64el.patch \| (download)	libcontainer/cgroups/fs/hugetlb_test.go \| 4 4 + 0 - 0 ! 1 file changed, 4 insertions(+)	skip test: hugetlb_test.go, random failures on ppc64el, s390x
0002 skip privileged test TestFactoryNewTmpfs.patch \| (download)	libcontainer/factory_linux_test.go \| 1 1 + 0 - 0 ! 1 file changed, 1 insertion(+)	skip privileged test: testfactorynewtmpfs
0003 fix gccgo.patch \| (download)	libcontainer/stacktrace/capture.go \| 21 12 + 9 - 0 ! libcontainer/stacktrace/capture_test.go \| 4 2 + 2 - 0 ! libcontainer/stacktrace/frame.go \| 15 5 + 10 - 0 ! 3 files changed, 19 insertions(+), 21 deletions(-)	fix gccgo
0004 skip privileged test nsenter_test.go.patch \| (download)	libcontainer/nsenter/nsenter_test.go \| 2 2 + 0 - 0 ! 1 file changed, 2 insertions(+)	skip privileged test: nsenter_test.go
0005 skip privileged test fs_test.go.patch \| (download)	libcontainer/cgroups/fs/fs_test.go \| 2 1 + 1 - 0 ! 1 file changed, 1 insertion(+), 1 deletion(-)	skip privileged test: fs_test.go
0006 skip privileged test fscommon_test.go.patch \| (download)	libcontainer/cgroups/fscommon/fscommon_test.go \| 2 1 + 1 - 0 ! 1 file changed, 1 insertion(+), 1 deletion(-)	skip privileged test: fscommon_test.go
0007 skip test cgroups_test.go fail when cgroups is not m.patch \| (download)	libcontainer/cgroups/cgroups_test.go \| 2 1 + 1 - 0 ! 1 file changed, 1 insertion(+), 1 deletion(-)	skip test: cgroups_test.go, fail when cgroups is not mounted
0008 fix patchpbf test on 32 bit.patch \| (download)	libcontainer/seccomp/patchbpf/enosys_linux_test.go \| 17 10 + 7 - 0 ! 1 file changed, 10 insertions(+), 7 deletions(-)	fix patchpbf test on 32-bit
0009 skip integration when no dev kmsg.patch \| (download)	tests/integration/dev.bats \| 4 4 + 0 - 0 ! 1 file changed, 4 insertions(+)	skip integration when no /dev/kmsg By default, privileged lxc container doesn't have /dev/kmsg
0010 Ensure the seccomp pipe is being read while exportin.patch \| (download)	libcontainer/seccomp/patchbpf/enosys_linux.go \| 15 14 + 1 - 0 ! libcontainer/seccomp/patchbpf/enosys_linux_test.go \| 20 20 + 0 - 0 ! 2 files changed, 34 insertions(+), 1 deletion(-)	ensure the seccomp pipe is being read while exporting bpf
CVE 2021 30465/rc93 0001 libct newInitConfig nit.patch \| (download)	libcontainer/container_linux.go \| 7 4 + 3 - 0 ! 1 file changed, 4 insertions(+), 3 deletions(-)	[patch 1/5] libct/newinitconfig: nit Move the initialization of Console* fields as they are unconditional. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CVE 2021 30465/rc93 0002 libct rootfs introduce and use mountConfig.patch \| (download)	libcontainer/rootfs_linux.go \| 42 26 + 16 - 0 ! 1 file changed, 26 insertions(+), 16 deletions(-)	[patch 2/5] libct/rootfs: introduce and use mountconfig The code is already passing three parameters around from mountToRootfs to mountCgroupV* to mountToRootfs again. I am about to add another parameter, so let's introduce and use struct mountConfig to pass around. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CVE 2021 30465/rc93 0003 libct rootfs mountCgroupV2 minor refactor.patch \| (download)	libcontainer/rootfs_linux.go \| 10 6 + 4 - 0 ! 1 file changed, 6 insertions(+), 4 deletions(-)	[patch 3/5] libct/rootfs/mountcgroupv2: minor refactor 1. s/cgroupPath/dest/ 2. don't hardcode /sys/fs/cgroup Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CVE 2021 30465/rc93 0004 Fix cgroup2 mount for rootless case.patch \| (download)	libcontainer/container_linux.go \| 3 3 + 0 - 0 ! libcontainer/init_linux.go \| 1 1 + 0 - 0 ! libcontainer/rootfs_linux.go \| 28 21 + 7 - 0 ! libcontainer/specconv/example.go \| 18 9 + 9 - 0 ! 4 files changed, 34 insertions(+), 16 deletions(-)	[patch 4/5] fix cgroup2 mount for rootless case In case of rootless, cgroup2 mount is not possible (see [1] for more details), so since commit 9c81440fb5a7 runc bind-mounts the whole /sys/fs/cgroup into container. Problem is, if cgroupns is enabled, /sys/fs/cgroup inside the container is supposed to show the cgroup files for this cgroup, not the root one. The fix is to pass through and use the cgroup path in case cgroup2 mount failed, cgroupns is enabled, and the path is non-empty. Surely this requires the /sys/fs/cgroup mount in the spec, so modify runc spec --rootless to keep it. Before: $ ./runc run aaa # find /sys/fs/cgroup/ -type d /sys/fs/cgroup /sys/fs/cgroup/user.slice /sys/fs/cgroup/user.slice/user-1000.slice /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service ... # ls -l /sys/fs/cgroup/cgroup.controllers -r--r--r-- 1 nobody nogroup 0 Feb 24 02:22 /sys/fs/cgroup/cgroup.controllers # wc -w /sys/fs/cgroup/cgroup.procs 142 /sys/fs/cgroup/cgroup.procs # cat /sys/fs/cgroup/memory.current cat: can't open '/sys/fs/cgroup/memory.current': No such file or directory After: # find /sys/fs/cgroup/ -type d /sys/fs/cgroup/ # ls -l /sys/fs/cgroup/cgroup.controllers -r--r--r-- 1 root root 0 Feb 24 02:43 /sys/fs/cgroup/cgroup.controllers # wc -w /sys/fs/cgroup/cgroup.procs 2 /sys/fs/cgroup/cgroup.procs # cat /sys/fs/cgroup/memory.current 577536 [1] https://github.com/opencontainers/runc/issues/2158 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CVE 2021 30465/rc93 0005 rootfs add mount destination validation.patch \| (download)	libcontainer/container_linux.go \| 1 0 + 1 - 0 ! libcontainer/rootfs_linux.go \| 251 124 + 127 - 0 ! libcontainer/utils/utils.go \| 54 54 + 0 - 0 ! libcontainer/utils/utils_test.go \| 35 35 + 0 - 0 ! 4 files changed, 213 insertions(+), 128 deletions(-)	[patch 5/5] rootfs: add mount destination validation Because the target of a mount is inside a container (which may be a volume that is shared with another container), there exists a race condition where the target of the mount may change to a path containing a symlink after we have sanitised the path -- resulting in us inadvertently mounting the path outside of the container. This is not immediately useful because we are in a mount namespace with MS_SLAVE mount propagation applied to "/", so we cannot mount on top of host paths in the host namespace. However, if any subsequent mountpoints in the configuration use a subdirectory of that host path as a source, those subsequent mounts will use an attacker-controlled source path (resolved within the host rootfs) -- allowing the bind-mounting of "/" into the container. While arguably configuration issues like this are not entirely within runc's threat model, within the context of Kubernetes (and possibly other container managers that provide semi-arbitrary container creation privileges to untrusted users) this is a legitimate issue. Since we cannot block mounting from the host into the container, we need to block the first stage of this attack (mounting onto a path outside the container). The long-term plan to solve this would be to migrate to libpathrs, but as a stop-gap we implement libpathrs-like path verification through readlink(/proc/self/fd/$n) and then do mount operations through the procfd once it's been verified to be inside the container. The target could move after we've checked it, but if it is inside the container then we can assume that it is safe for the same reason that libpathrs operations would be safe. A slight wrinkle is the "copyup" functionality we provide for tmpfs, which is the only case where we want to do a mount on the host filesystem. To facilitate this, I split out the copy-up functionality entirely so that the logic isn't interspersed with the regular tmpfs logic. In addition, all dependencies on m.Destination being overwritten have been removed since that pattern was just begging to be a source of more mount-target bugs (we do still have to modify m.Destination for tmpfs-copyup but we only do it temporarily). Fixes: CVE-2021-30465 Reported-by: Etienne Champetier <champetier.etienne@gmail.com> Co-authored-by: Noah Meyerhans <nmeyerha@amazon.com>
default_retno.patch \| (download)	libcontainer/configs/config.go \| 7 4 + 3 - 0 ! libcontainer/seccomp/patchbpf/enosys_linux.go \| 5 5 + 0 - 0 ! libcontainer/seccomp/seccomp_linux.go \| 2 1 + 1 - 0 ! libcontainer/specconv/spec_linux.go \| 1 1 + 0 - 0 ! tests/integration/seccomp.bats \| 12 12 + 0 - 0 ! tests/integration/testdata/seccomp_syscall_test2.c \| 12 12 + 0 - 0 ! tests/integration/testdata/seccomp_syscall_test2.json \| 356 356 + 0 - 0 ! 7 files changed, 391 insertions(+), 4 deletions(-)	---
CVE 2022 29162.patch \| (download)	exec.go \| 1 0 + 1 - 0 ! libcontainer/README.md \| 16 0 + 16 - 0 ! libcontainer/integration/exec_test.go \| 2 0 + 2 - 0 ! libcontainer/integration/template_test.go \| 16 0 + 16 - 0 ! libcontainer/specconv/example.go \| 5 0 + 5 - 0 ! 5 files changed, 40 deletions(-)	---
CVE 2024 21626/0018 Fix File to Close.patch \| (download)	libcontainer/cgroups/fs/fs.go \| 1 1 + 0 - 0 ! update.go \| 1 1 + 0 - 0 ! 2 files changed, 2 insertions(+)	fix file to close (This is a cherry-pick of 937ca107c3d22da77eb8e8030f2342253b980980.) Signed-off-by: hang.jiang <hang.jiang@daocloud.io> Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
CVE 2024 21626/0019 init verify after chdir that cwd is inside the conta.patch \| (download)	libcontainer/init_linux.go \| 31 31 + 0 - 0 ! libcontainer/integration/seccomp_test.go \| 20 10 + 10 - 0 ! 2 files changed, 41 insertions(+), 10 deletions(-)	init: verify after chdir that cwd is inside the container If a file descriptor of a directory in the host's mount namespace is leaked to runc init, a malicious config.json could use /proc/self/fd/... as a working directory to allow for host filesystem access after the container runs. This can also be exploited by a container process if it knows that an administrator will use "runc exec --cwd" and the target --cwd (the attacker can change that cwd to be a symlink pointing to /proc/self/fd/... and wait for the process to exec and then snoop on /proc/$pid/cwd to get access to the host). The former issue can lead to a critical vulnerability in Docker and Kubernetes, while the latter is a container breakout. We can (ab)use the fact that getcwd(2) on Linux detects this exact case, and getcwd(3) and Go's Getwd() return an error as a result. Thus, if we just do os.Getwd() after chdir we can easily detect this case and error out. In runc 1.1, a /sys/fs/cgroup handle happens to be leaked to "runc init", making this exploitable. On runc main it just so happens that the leaked /sys/fs/cgroup gets clobbered and thus this is only consistently exploitable for runc 1.1. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Co-developed-by: lifubang <lifubang@acmcoder.com> Signed-off-by: lifubang <lifubang@acmcoder.com> [refactored the implementation and added more comments] Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
CVE 2024 21626/0020 setns init do explicit lookup of execve argument ear.patch \| (download)	libcontainer/setns_init_linux.go \| 14 13 + 1 - 0 ! 1 file changed, 13 insertions(+), 1 deletion(-)	setns init: do explicit lookup of execve argument early (This is a partial backport of a minor change included in commit dac41717465462b21fab5b5942fe4cb3f47d7e53.) This mirrors the logic in standard_init_linux.go, and also ensures that we do not call exec.LookPath in the final execve step. While this is okay for regular binaries, it seems exec.LookPath calls os.Getenv which tries to emit a log entry to the test harness when running in "go test" mode. In a future patch (in order to fix CVE-2024-21626), we will close all of the file descriptors immediately before execve, which would mean the file descriptor for test harness logging would be closed at execve time. So, moving exec.LookPath earlier is necessary. Ref: dac417174654 ("runc-dmz: reduce memfd binary cloning cost with small C binary") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
CVE 2024 21626/0021 init close internal fds before execve.patch \| (download)	libcontainer/logs/logs.go \| 9 9 + 0 - 0 ! libcontainer/setns_init_linux.go \| 20 20 + 0 - 0 ! libcontainer/standard_init_linux.go \| 20 20 + 0 - 0 ! libcontainer/utils/utils_unix.go \| 72 64 + 8 - 0 ! 4 files changed, 113 insertions(+), 8 deletions(-)	init: close internal fds before execve If we leak a file descriptor referencing the host filesystem, an attacker could use a /proc/self/fd magic-link as the source for execve to execute a host binary in the container. This would allow the binary itself (or a process inside the container in the 'runc exec' case) to write to a host binary, leading to a container escape. The simple solution is to make sure we close all file descriptors immediately before the execve(2) step. Doing this earlier can lead to very serious issues in Go (as file descriptors can be reused, any (*os.File)
CVE 2024 21626/0022 cgroup plug leaks of sys fs cgroup handle.patch \| (download)	libcontainer/cgroups/fscommon/open.go \| 19 10 + 9 - 0 ! 1 file changed, 10 insertions(+), 9 deletions(-)	cgroup: plug leaks of /sys/fs/cgroup handle We auto-close this file descriptor in the final exec step, but it's probably a good idea to not possibly leak the file descriptor to "runc init" (we've had issues like this in the past) especially since it is a directory handle from the host mount namespace. In practice, on runc 1.1 this does leak to "runc init" but on main the handle has a low enough file descriptor that it gets clobbered by the ForkExec of "runc init". OPEN_TREE_CLONE would let us protect this handle even further, but the performance impact of creating an anonymous mount namespace is probably not worth it. Also, switch to using an *os.File for the handle so if it goes out of scope during setup (i.e. an error occurs during setup) it will get cleaned up by the GC. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
CVE 2024 21626/0023 libcontainer mark all non stdio fds O_CLOEXEC before.patch \| (download)	libcontainer/container_linux.go \| 10 10 + 0 - 0 ! 1 file changed, 10 insertions(+)	libcontainer: mark all non-stdio fds o_cloexec before spawning init Given the core issue in GHSA-xr7r-f8xq-vfvv was that we were unknowingly leaking file descriptors to "runc init", it seems prudent to make sure we proactively prevent this in the future. The solution is to simply mark all non-stdio file descriptors as O_CLOEXEC before we spawn "runc init". For libcontainer library users, this could result in unrelated files being marked as O_CLOEXEC -- however (for the same reason we are doing this for runc), for security reasons those files should've been marked as O_CLOEXEC anyway. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
CVE 2024 21626/0024 init don t special case logrus fds.patch \| (download)	libcontainer/logs/logs.go \| 9 0 + 9 - 0 ! libcontainer/utils/utils_unix.go \| 8 0 + 8 - 0 ! 2 files changed, 17 deletions(-)	init: don't special-case logrus fds We close the logfd before execve so there's no need to special case it. In addition, it turns out that (*os.File).Fd() doesn't handle the case where the file was closed and so it seems suspect to use that kind of check. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
0025 Fix busybox tarball url in integration test.patch \| (download)	tests/integration/multi-arch.bash \| 4 2 + 2 - 0 ! 1 file changed, 2 insertions(+), 2 deletions(-)	fix busybox tarball url in integration test https://github.com/opencontainers/runc/blob/main/tests/integration/get-images.sh
CVE 2021 43784.patch \| (download)	libcontainer/container_linux.go \| 20 19 + 1 - 0 ! libcontainer/message_linux.go \| 10 10 + 0 - 0 ! 2 files changed, 29 insertions(+), 1 deletion(-)	fix cve-2021-43784 When writing netlink messages, it is possible to have a byte array larger than UINT16_MAX which would result in the length field overflowing and allowing user-controlled data to be parsed as control characters (such as creating custom mount points, changing which set of namespaces to allow, and so on).
0027 Fix test for newer kernels.patch \| (download)	tests/integration/no_pivot.bats \| 4 3 + 1 - 0 ! 1 file changed, 3 insertions(+), 1 deletion(-)	[patch] tests/int/no_pivot: fix for new kernels The test is failing like this: not ok 70 runc run --no-pivot must not expose bare /proc # (in test file tests/integration/no_pivot.bats, line 20) # `[[ "$output" == "mount: permission denied" ]]' failed # runc spec (status=0): # # runc run --no-pivot test_no_pivot (status=1): # unshare: write error: Operation not permitted Apparently, a recent kernel commit db2e718a47984b9d prevents root from doing unshare -r unless it has CAP_SETFPCAP. Add the capability for this specific test. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CVE 2023 25809.patch \| (download)	libcontainer/rootfs_linux.go \| 53 34 + 19 - 0 ! tests/integration/mounts.bats \| 17 17 + 0 - 0 ! 2 files changed, 51 insertions(+), 19 deletions(-)	[patch] rootless: fix /sys/fs/cgroup mounts It was found that rootless runc makes `/sys/fs/cgroup` writable in following conditons: 1. when runc is executed inside the user namespace, and the config.json does not specify the cgroup namespace to be unshared (e.g.., `(docker\|podman\|nerdctl) run --cgroupns=host`, with Rootless Docker/Podman/nerdctl) 2. or, when runc is executed outside the user namespace, and `/sys` is mounted with `rbind, ro` (e.g., `runc spec --rootless`; this condition is very rare) A container may gain the write access to user-owned cgroup hierarchy `/sys/fs/cgroup/user.slice/...` on the host. Other users's cgroup hierarchies are not affected. To fix the issue, this commit does: 1. Remount `/sys/fs/cgroup` to apply `MS_RDONLY` when it is being bind-mounted 2. Mask `/sys/fs/cgroup` when the bind source is unavailable Fix CVE-2023-25809 (GHSA-m8cg-xc2p-r3fc) Co-authored-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> (cherry picked from commit df4eae457b8ccffa619c659c2def5c777d8ff507) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
CVE 2023 27561 and CVE 2023 28642.patch \| (download)	libcontainer/rootfs_linux.go \| 23 17 + 6 - 0 ! tests/integration/mask.bats \| 19 19 + 0 - 0 ! 2 files changed, 36 insertions(+), 6 deletions(-)	[patch] prohibit /proc and /sys to be symlinks Commit 3291d66b9844 introduced a check for /proc and /sys, making sure the destination (dest) is a directory (and not e.g. a symlink). Later, a hunk from commit 0ca91f44f switched from using filepath.Join to SecureJoin for dest. As SecureJoin follows and resolves symlinks, the check whether dest is a symlink no longer works. To fix, do the check without/before using SecureJoin. Add integration tests to make sure we won't regress. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit 0d72adf96dda1b687815bf89bb245b937a2f603c) Signed-off-by: Sebastiaan van Stijn <github@gone.nl> This patch fixes both, CVE-2023-27561 and CVE-2023-28642

Patch	File delta	Description
0001 skip test hugetlb_test.go random failures on ppc64el.patch \| (download)	libcontainer/cgroups/fs/hugetlb_test.go \| 4 4 + 0 - 0 ! 1 file changed, 4 insertions(+)	skip test: hugetlb_test.go, random failures on ppc64el, s390x
0002 skip privileged test TestFactoryNewTmpfs.patch \| (download)	libcontainer/factory_linux_test.go \| 1 1 + 0 - 0 ! 1 file changed, 1 insertion(+)	skip privileged test: testfactorynewtmpfs
0003 fix gccgo.patch \| (download)	libcontainer/stacktrace/capture.go \| 21 12 + 9 - 0 ! libcontainer/stacktrace/capture_test.go \| 4 2 + 2 - 0 ! libcontainer/stacktrace/frame.go \| 15 5 + 10 - 0 ! 3 files changed, 19 insertions(+), 21 deletions(-)	fix gccgo
0004 skip privileged test nsenter_test.go.patch \| (download)	libcontainer/nsenter/nsenter_test.go \| 2 2 + 0 - 0 ! 1 file changed, 2 insertions(+)	skip privileged test: nsenter_test.go
0005 skip privileged test fs_test.go.patch \| (download)	libcontainer/cgroups/fs/fs_test.go \| 2 1 + 1 - 0 ! 1 file changed, 1 insertion(+), 1 deletion(-)	skip privileged test: fs_test.go
0006 skip privileged test fscommon_test.go.patch \| (download)	libcontainer/cgroups/fscommon/fscommon_test.go \| 2 1 + 1 - 0 ! 1 file changed, 1 insertion(+), 1 deletion(-)	skip privileged test: fscommon_test.go
0007 skip test cgroups_test.go fail when cgroups is not m.patch \| (download)	libcontainer/cgroups/cgroups_test.go \| 2 1 + 1 - 0 ! 1 file changed, 1 insertion(+), 1 deletion(-)	skip test: cgroups_test.go, fail when cgroups is not mounted
0008 fix patchpbf test on 32 bit.patch \| (download)	libcontainer/seccomp/patchbpf/enosys_linux_test.go \| 17 10 + 7 - 0 ! 1 file changed, 10 insertions(+), 7 deletions(-)	fix patchpbf test on 32-bit
0009 skip integration when no dev kmsg.patch \| (download)	tests/integration/dev.bats \| 4 4 + 0 - 0 ! 1 file changed, 4 insertions(+)	skip integration when no /dev/kmsg By default, privileged lxc container doesn't have /dev/kmsg
0010 Ensure the seccomp pipe is being read while exportin.patch \| (download)	libcontainer/seccomp/patchbpf/enosys_linux.go \| 15 14 + 1 - 0 ! libcontainer/seccomp/patchbpf/enosys_linux_test.go \| 20 20 + 0 - 0 ! 2 files changed, 34 insertions(+), 1 deletion(-)	ensure the seccomp pipe is being read while exporting bpf
CVE 2021 30465/rc93 0001 libct newInitConfig nit.patch \| (download)	libcontainer/container_linux.go \| 7 4 + 3 - 0 ! 1 file changed, 4 insertions(+), 3 deletions(-)	[patch 1/5] libct/newinitconfig: nit Move the initialization of Console* fields as they are unconditional. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CVE 2021 30465/rc93 0002 libct rootfs introduce and use mountConfig.patch \| (download)	libcontainer/rootfs_linux.go \| 42 26 + 16 - 0 ! 1 file changed, 26 insertions(+), 16 deletions(-)	[patch 2/5] libct/rootfs: introduce and use mountconfig The code is already passing three parameters around from mountToRootfs to mountCgroupV* to mountToRootfs again. I am about to add another parameter, so let's introduce and use struct mountConfig to pass around. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CVE 2021 30465/rc93 0003 libct rootfs mountCgroupV2 minor refactor.patch \| (download)	libcontainer/rootfs_linux.go \| 10 6 + 4 - 0 ! 1 file changed, 6 insertions(+), 4 deletions(-)	[patch 3/5] libct/rootfs/mountcgroupv2: minor refactor 1. s/cgroupPath/dest/ 2. don't hardcode /sys/fs/cgroup Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CVE 2021 30465/rc93 0004 Fix cgroup2 mount for rootless case.patch \| (download)	libcontainer/container_linux.go \| 3 3 + 0 - 0 ! libcontainer/init_linux.go \| 1 1 + 0 - 0 ! libcontainer/rootfs_linux.go \| 28 21 + 7 - 0 ! libcontainer/specconv/example.go \| 18 9 + 9 - 0 ! 4 files changed, 34 insertions(+), 16 deletions(-)	[patch 4/5] fix cgroup2 mount for rootless case In case of rootless, cgroup2 mount is not possible (see [1] for more details), so since commit 9c81440fb5a7 runc bind-mounts the whole /sys/fs/cgroup into container. Problem is, if cgroupns is enabled, /sys/fs/cgroup inside the container is supposed to show the cgroup files for this cgroup, not the root one. The fix is to pass through and use the cgroup path in case cgroup2 mount failed, cgroupns is enabled, and the path is non-empty. Surely this requires the /sys/fs/cgroup mount in the spec, so modify runc spec --rootless to keep it. Before: $ ./runc run aaa # find /sys/fs/cgroup/ -type d /sys/fs/cgroup /sys/fs/cgroup/user.slice /sys/fs/cgroup/user.slice/user-1000.slice /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service ... # ls -l /sys/fs/cgroup/cgroup.controllers -r--r--r-- 1 nobody nogroup 0 Feb 24 02:22 /sys/fs/cgroup/cgroup.controllers # wc -w /sys/fs/cgroup/cgroup.procs 142 /sys/fs/cgroup/cgroup.procs # cat /sys/fs/cgroup/memory.current cat: can't open '/sys/fs/cgroup/memory.current': No such file or directory After: # find /sys/fs/cgroup/ -type d /sys/fs/cgroup/ # ls -l /sys/fs/cgroup/cgroup.controllers -r--r--r-- 1 root root 0 Feb 24 02:43 /sys/fs/cgroup/cgroup.controllers # wc -w /sys/fs/cgroup/cgroup.procs 2 /sys/fs/cgroup/cgroup.procs # cat /sys/fs/cgroup/memory.current 577536 [1] https://github.com/opencontainers/runc/issues/2158 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CVE 2021 30465/rc93 0005 rootfs add mount destination validation.patch \| (download)	libcontainer/container_linux.go \| 1 0 + 1 - 0 ! libcontainer/rootfs_linux.go \| 251 124 + 127 - 0 ! libcontainer/utils/utils.go \| 54 54 + 0 - 0 ! libcontainer/utils/utils_test.go \| 35 35 + 0 - 0 ! 4 files changed, 213 insertions(+), 128 deletions(-)	[patch 5/5] rootfs: add mount destination validation Because the target of a mount is inside a container (which may be a volume that is shared with another container), there exists a race condition where the target of the mount may change to a path containing a symlink after we have sanitised the path -- resulting in us inadvertently mounting the path outside of the container. This is not immediately useful because we are in a mount namespace with MS_SLAVE mount propagation applied to "/", so we cannot mount on top of host paths in the host namespace. However, if any subsequent mountpoints in the configuration use a subdirectory of that host path as a source, those subsequent mounts will use an attacker-controlled source path (resolved within the host rootfs) -- allowing the bind-mounting of "/" into the container. While arguably configuration issues like this are not entirely within runc's threat model, within the context of Kubernetes (and possibly other container managers that provide semi-arbitrary container creation privileges to untrusted users) this is a legitimate issue. Since we cannot block mounting from the host into the container, we need to block the first stage of this attack (mounting onto a path outside the container). The long-term plan to solve this would be to migrate to libpathrs, but as a stop-gap we implement libpathrs-like path verification through readlink(/proc/self/fd/$n) and then do mount operations through the procfd once it's been verified to be inside the container. The target could move after we've checked it, but if it is inside the container then we can assume that it is safe for the same reason that libpathrs operations would be safe. A slight wrinkle is the "copyup" functionality we provide for tmpfs, which is the only case where we want to do a mount on the host filesystem. To facilitate this, I split out the copy-up functionality entirely so that the logic isn't interspersed with the regular tmpfs logic. In addition, all dependencies on m.Destination being overwritten have been removed since that pattern was just begging to be a source of more mount-target bugs (we do still have to modify m.Destination for tmpfs-copyup but we only do it temporarily). Fixes: CVE-2021-30465 Reported-by: Etienne Champetier <champetier.etienne@gmail.com> Co-authored-by: Noah Meyerhans <nmeyerha@amazon.com>
default_retno.patch \| (download)	libcontainer/configs/config.go \| 7 4 + 3 - 0 ! libcontainer/seccomp/patchbpf/enosys_linux.go \| 5 5 + 0 - 0 ! libcontainer/seccomp/seccomp_linux.go \| 2 1 + 1 - 0 ! libcontainer/specconv/spec_linux.go \| 1 1 + 0 - 0 ! tests/integration/seccomp.bats \| 12 12 + 0 - 0 ! tests/integration/testdata/seccomp_syscall_test2.c \| 12 12 + 0 - 0 ! tests/integration/testdata/seccomp_syscall_test2.json \| 356 356 + 0 - 0 ! 7 files changed, 391 insertions(+), 4 deletions(-)	---
CVE 2022 29162.patch \| (download)	exec.go \| 1 0 + 1 - 0 ! libcontainer/README.md \| 16 0 + 16 - 0 ! libcontainer/integration/exec_test.go \| 2 0 + 2 - 0 ! libcontainer/integration/template_test.go \| 16 0 + 16 - 0 ! libcontainer/specconv/example.go \| 5 0 + 5 - 0 ! 5 files changed, 40 deletions(-)	---
CVE 2024 21626/0018 Fix File to Close.patch \| (download)	libcontainer/cgroups/fs/fs.go \| 1 1 + 0 - 0 ! update.go \| 1 1 + 0 - 0 ! 2 files changed, 2 insertions(+)	fix file to close (This is a cherry-pick of 937ca107c3d22da77eb8e8030f2342253b980980.) Signed-off-by: hang.jiang <hang.jiang@daocloud.io> Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
CVE 2024 21626/0019 init verify after chdir that cwd is inside the conta.patch \| (download)	libcontainer/init_linux.go \| 31 31 + 0 - 0 ! libcontainer/integration/seccomp_test.go \| 20 10 + 10 - 0 ! 2 files changed, 41 insertions(+), 10 deletions(-)	init: verify after chdir that cwd is inside the container If a file descriptor of a directory in the host's mount namespace is leaked to runc init, a malicious config.json could use /proc/self/fd/... as a working directory to allow for host filesystem access after the container runs. This can also be exploited by a container process if it knows that an administrator will use "runc exec --cwd" and the target --cwd (the attacker can change that cwd to be a symlink pointing to /proc/self/fd/... and wait for the process to exec and then snoop on /proc/$pid/cwd to get access to the host). The former issue can lead to a critical vulnerability in Docker and Kubernetes, while the latter is a container breakout. We can (ab)use the fact that getcwd(2) on Linux detects this exact case, and getcwd(3) and Go's Getwd() return an error as a result. Thus, if we just do os.Getwd() after chdir we can easily detect this case and error out. In runc 1.1, a /sys/fs/cgroup handle happens to be leaked to "runc init", making this exploitable. On runc main it just so happens that the leaked /sys/fs/cgroup gets clobbered and thus this is only consistently exploitable for runc 1.1. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Co-developed-by: lifubang <lifubang@acmcoder.com> Signed-off-by: lifubang <lifubang@acmcoder.com> [refactored the implementation and added more comments] Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
CVE 2024 21626/0020 setns init do explicit lookup of execve argument ear.patch \| (download)	libcontainer/setns_init_linux.go \| 14 13 + 1 - 0 ! 1 file changed, 13 insertions(+), 1 deletion(-)	setns init: do explicit lookup of execve argument early (This is a partial backport of a minor change included in commit dac41717465462b21fab5b5942fe4cb3f47d7e53.) This mirrors the logic in standard_init_linux.go, and also ensures that we do not call exec.LookPath in the final execve step. While this is okay for regular binaries, it seems exec.LookPath calls os.Getenv which tries to emit a log entry to the test harness when running in "go test" mode. In a future patch (in order to fix CVE-2024-21626), we will close all of the file descriptors immediately before execve, which would mean the file descriptor for test harness logging would be closed at execve time. So, moving exec.LookPath earlier is necessary. Ref: dac417174654 ("runc-dmz: reduce memfd binary cloning cost with small C binary") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
CVE 2024 21626/0021 init close internal fds before execve.patch \| (download)	libcontainer/logs/logs.go \| 9 9 + 0 - 0 ! libcontainer/setns_init_linux.go \| 20 20 + 0 - 0 ! libcontainer/standard_init_linux.go \| 20 20 + 0 - 0 ! libcontainer/utils/utils_unix.go \| 72 64 + 8 - 0 ! 4 files changed, 113 insertions(+), 8 deletions(-)	init: close internal fds before execve If we leak a file descriptor referencing the host filesystem, an attacker could use a /proc/self/fd magic-link as the source for execve to execute a host binary in the container. This would allow the binary itself (or a process inside the container in the 'runc exec' case) to write to a host binary, leading to a container escape. The simple solution is to make sure we close all file descriptors immediately before the execve(2) step. Doing this earlier can lead to very serious issues in Go (as file descriptors can be reused, any (*os.File)
CVE 2024 21626/0022 cgroup plug leaks of sys fs cgroup handle.patch \| (download)	libcontainer/cgroups/fscommon/open.go \| 19 10 + 9 - 0 ! 1 file changed, 10 insertions(+), 9 deletions(-)	cgroup: plug leaks of /sys/fs/cgroup handle We auto-close this file descriptor in the final exec step, but it's probably a good idea to not possibly leak the file descriptor to "runc init" (we've had issues like this in the past) especially since it is a directory handle from the host mount namespace. In practice, on runc 1.1 this does leak to "runc init" but on main the handle has a low enough file descriptor that it gets clobbered by the ForkExec of "runc init". OPEN_TREE_CLONE would let us protect this handle even further, but the performance impact of creating an anonymous mount namespace is probably not worth it. Also, switch to using an *os.File for the handle so if it goes out of scope during setup (i.e. an error occurs during setup) it will get cleaned up by the GC. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
CVE 2024 21626/0023 libcontainer mark all non stdio fds O_CLOEXEC before.patch \| (download)	libcontainer/container_linux.go \| 10 10 + 0 - 0 ! 1 file changed, 10 insertions(+)	libcontainer: mark all non-stdio fds o_cloexec before spawning init Given the core issue in GHSA-xr7r-f8xq-vfvv was that we were unknowingly leaking file descriptors to "runc init", it seems prudent to make sure we proactively prevent this in the future. The solution is to simply mark all non-stdio file descriptors as O_CLOEXEC before we spawn "runc init". For libcontainer library users, this could result in unrelated files being marked as O_CLOEXEC -- however (for the same reason we are doing this for runc), for security reasons those files should've been marked as O_CLOEXEC anyway. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
CVE 2024 21626/0024 init don t special case logrus fds.patch \| (download)	libcontainer/logs/logs.go \| 9 0 + 9 - 0 ! libcontainer/utils/utils_unix.go \| 8 0 + 8 - 0 ! 2 files changed, 17 deletions(-)	init: don't special-case logrus fds We close the logfd before execve so there's no need to special case it. In addition, it turns out that (*os.File).Fd() doesn't handle the case where the file was closed and so it seems suspect to use that kind of check. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
0025 Fix busybox tarball url in integration test.patch \| (download)	tests/integration/multi-arch.bash \| 4 2 + 2 - 0 ! 1 file changed, 2 insertions(+), 2 deletions(-)	fix busybox tarball url in integration test https://github.com/opencontainers/runc/blob/main/tests/integration/get-images.sh
CVE 2021 43784.patch \| (download)	libcontainer/container_linux.go \| 20 19 + 1 - 0 ! libcontainer/message_linux.go \| 10 10 + 0 - 0 ! 2 files changed, 29 insertions(+), 1 deletion(-)	fix cve-2021-43784 When writing netlink messages, it is possible to have a byte array larger than UINT16_MAX which would result in the length field overflowing and allowing user-controlled data to be parsed as control characters (such as creating custom mount points, changing which set of namespaces to allow, and so on).
0027 Fix test for newer kernels.patch \| (download)	tests/integration/no_pivot.bats \| 4 3 + 1 - 0 ! 1 file changed, 3 insertions(+), 1 deletion(-)	[patch] tests/int/no_pivot: fix for new kernels The test is failing like this: not ok 70 runc run --no-pivot must not expose bare /proc # (in test file tests/integration/no_pivot.bats, line 20) # `[[ "$output" == "mount: permission denied" ]]' failed # runc spec (status=0): # # runc run --no-pivot test_no_pivot (status=1): # unshare: write error: Operation not permitted Apparently, a recent kernel commit db2e718a47984b9d prevents root from doing unshare -r unless it has CAP_SETFPCAP. Add the capability for this specific test. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CVE 2023 25809.patch \| (download)	libcontainer/rootfs_linux.go \| 53 34 + 19 - 0 ! tests/integration/mounts.bats \| 17 17 + 0 - 0 ! 2 files changed, 51 insertions(+), 19 deletions(-)	[patch] rootless: fix /sys/fs/cgroup mounts It was found that rootless runc makes `/sys/fs/cgroup` writable in following conditons: 1. when runc is executed inside the user namespace, and the config.json does not specify the cgroup namespace to be unshared (e.g.., `(docker\|podman\|nerdctl) run --cgroupns=host`, with Rootless Docker/Podman/nerdctl) 2. or, when runc is executed outside the user namespace, and `/sys` is mounted with `rbind, ro` (e.g., `runc spec --rootless`; this condition is very rare) A container may gain the write access to user-owned cgroup hierarchy `/sys/fs/cgroup/user.slice/...` on the host. Other users's cgroup hierarchies are not affected. To fix the issue, this commit does: 1. Remount `/sys/fs/cgroup` to apply `MS_RDONLY` when it is being bind-mounted 2. Mask `/sys/fs/cgroup` when the bind source is unavailable Fix CVE-2023-25809 (GHSA-m8cg-xc2p-r3fc) Co-authored-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> (cherry picked from commit df4eae457b8ccffa619c659c2def5c777d8ff507) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
CVE 2023 27561 and CVE 2023 28642.patch \| (download)	libcontainer/rootfs_linux.go \| 23 17 + 6 - 0 ! tests/integration/mask.bats \| 19 19 + 0 - 0 ! 2 files changed, 36 insertions(+), 6 deletions(-)	[patch] prohibit /proc and /sys to be symlinks Commit 3291d66b9844 introduced a check for /proc and /sys, making sure the destination (dest) is a directory (and not e.g. a symlink). Later, a hunk from commit 0ca91f44f switched from using filepath.Join to SecureJoin for dest. As SecureJoin follows and resolves symlinks, the check whether dest is a symlink no longer works. To fix, do the check without/before using SecureJoin. Add integration tests to make sure we won't regress. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit 0d72adf96dda1b687815bf89bb245b937a2f603c) Signed-off-by: Sebastiaan van Stijn <github@gone.nl> This patch fixes both, CVE-2023-27561 and CVE-2023-28642