Adam Iwaniuk and Borys Popławski discovered that an attacker can compromise the
runC host binary from inside a privileged runC container. As a result, this
could be exploited to gain root access on the host. runC is used as the default
runtime for containers with Docker, containerd, Podman, and CRI-O.
The attack can be made when attaching to a running container or when starting a
container running a specially crafted image. For example, when runC attaches
to a container the attacker can trick it into executing itself. This could be
done by replacing the target binary inside the container with a custom binary
pointing back at the runC binary itself. As an example, if the target binary
was /bin/bash, this could be replaced with an executable script specifying the
interpreter path #!/proc/self/exe (/proc/self/exec is a symbolic link created
by the kernel for every process which points to the binary that was executed
for that process). As such when /bin/bash is executed inside the container,
instead the target of /proc/self/exe will be executed - which will point to the
runc binary on the host. The attacker can then proceed to write to the target
of /proc/self/exe to try and overwrite the runC binary on the host. However in
general, this will not succeed as the kernel will not permit it to be
overwritten whilst runC is executing. To overcome this, the attacker can
instead open a file descriptor to /proc/self/exe using the O_PATH flag and then
proceed to reopen the binary as O_WRONLY through /proc/self/fd/<nr> and try to
write to it in a busy loop from a separate process. Ultimately it will succeed
when the runC binary exits. After this the runC binary is compromised and can
be used to attack other containers or the host itself.
This attack is only possible with privileged containers since it requires root
privilege on the host to overwrite the runC binary. Unprivileged containers
with a non-identity ID mapping do not have the permission to write to the host
binary and therefore are unaffected by this attack.
LXC is also impacted in a similar manner by this vulnerability, however as the
LXC project considers privileged containers to be unsafe no CVE has been
assigned for this issue for LXC. Quoting from the
https://linuxcontainers.org/lxc/security/ project's Security information page:
"As privileged containers are considered unsafe, we typically will not consider
new container escape exploits to be security issues worthy of a CVE and quick
fix. We will however try to mitigate those issues so that accidental damage to
the host is prevented."
To prevent this attack, LXC has been patched to create a temporary copy of the
calling binary itself when it starts or attaches to containers. To do this LXC
creates an anonymous, in-memory file using the memfd_create() system call and
copies itself into the temporary in-memory file, which is then sealed to
prevent further modifications. LXC then executes this sealed, in-memory file
instead of the original on-disk binary. Any compromising write operations from
a privileged container to the host LXC binary will then write to the temporary
in-memory binary and not to the host binary on-disk, preserving the integrity
of the host LXC binary. Also as the temporary, in-memory LXC binary is sealed,
writes to this will also fail.
Note: memfd_create() was added to the Linux kernel in the 3.17 release.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Co-Developed-by: Alesa Sarai <asarai@suse.de>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Warn when macros __TIME__, __DATE__ or __TIMESTAMP__ are encountered as
they might prevent bit-wise-identical reproducible compilations.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Warn about left shift overflows. This warning is enabled by default in
C99 and C++11 modes (and newer).
-Wshift-overflow=2
This warning level also warns about left-shifting 1 into the sign bit,
unless C++14 mode (or newer) is active.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
-Wimplicit-fallthrough=5 doesn’t recognize any comments as fallthrough
comments, only attributes disable the warning.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Enable -Wformat plus additional format checks. Currently equivalent to
-Wformat -Wformat-nonliteral -Wformat-security -Wformat-y2k.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Warn whenever a function is defined with a return type that defaults to
int. Also warn about any return statement with no return value in a
function whose return type is not void (falling off the end of the
function body is considered returning without a value).
For C only, warn about a return statement with an expression in a
function whose return type is void, unless the expression type is also
void. As a GNU extension, the latter case is accepted without a warning
unless -Wpedantic is used.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Warn about functions that might be candidates for attributes pure, const
or noreturn or malloc.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Warn about uninitialized variables that are initialized with themselves.
Note this option can only be used with the -Wuninitialized option.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Warn if an old-style function definition is used. A warning is given
even if there is a previous prototype.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Warn if a user-supplied include directory does not exist.
This already surfaced a bug that is fixed by this commit.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Compiler based hardening (including -fstack-protector-strong) are
enabled since version 3.0.3 and
2268c27754
However, some compilers could missed the needed library (-lssp or
-lssp_nonshared) at linking step so use ax_check_link_flag instead of
ax_check_compile_flag
Fixes:
- http://autobuild.buildroot.org/results/0b90e7dca2984652842832a41abad93ac49a9b86
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
The gcc implementation and the C standard are not to be considered sane
in this respect. We don't want to risk reordering of writes when the
compiler incorrectly *thinks* two types do not alias each other.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Fix#2592 by defining -Wvla -std=gnu11 even if --disable-werror is set
As -std=gnu11 is always set, bump requirement on gcc from 4.6 to 4.7
(see https://gcc.gnu.org/projects/cxx-status.html#cxx11)
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Fail the build if --enable-thread-safety is passed and the environment cannot
guarantee thread-safety.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
We line up with the Linux kernel and won't support any compiler under 4.6.
Additionally, we also require at least gnu99 so this is due anyway.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This copies lxd's apparmor profile generation. This tries to
detect features such as cgroup namespaces, apparmor
namespaces and stacking support, and has profile parts
conditionally for unprivileged containers.
This introduces the following changes to the configuration:
lxc.apparmor.profile = generated
The fixed value 'generated' will cause this
functionality to be used, otherwise there should be no
functional changes happening unless specifically
requested with the next key:
lxc.apparmor.allow_nesting
This is a boolean which, if enabled, causes the
following changes: When generated apparmor profiles are
used, they will contain the necessary changes to allow
creating a nested container. In addition to the usual
mount points, /dev/.lxc/proc and /dev/.lxc/sys will
contain procfs and sysfs mount points without the lxcfs
overlays, which, if generated apparmor profiles are
being used, will not be read/writable directly.
lxc.apparmor.raw
A list of raw apparmor profile lines to append to the
profile. Only valid when using generated profiles.
The following apparmor profile lines have not been copied
from lxd:
mount /var/lib/lxd/shmounts/ -> /var/lib/lxd/shmounts/,
mount none -> /var/lib/lxd/shmounts/,
mount options=bind /var/lib/lxd/shmounts/** -> /var/lib/lxd/**,
They should be added via lxc.apparmor.raw entries by lxd.
In order for apparmor_parser's cache to be of use, this adds
a --with-apparmor-cache-dir ./configure option.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Commit c06ed219c4 has broken
compilation with a static libcap and a shared gnutls.
This results in a build failure on init_lxc_static if gnutls is
a shared library as init_lxc_static is built with -all-static option
(see src/lxc/Makefile.am) and AC_CHECK_LIB adds gnutls to LIBS.
This commit fix the issue by removing default behavior of AC_CHECK_LIB
and handling manually GNUTLS_LIBS and HAVE_LIBGNUTLS
Fixes:
- http://autobuild.buildroot.net/results/b655d6853c25a195df28d91512b3ffb6c654fc90
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>