mirror of https://git.proxmox.com/git/mirror_lxc synced 2025-08-04 08:57:20 +00:00

Go to file

Christian Brauner 6400238d08 CVE-2019-5736 (runC): rexec callers as memfd Adam Iwaniuk and Borys Popławski discovered that an attacker can compromise the runC host binary from inside a privileged runC container. As a result, this could be exploited to gain root access on the host. runC is used as the default runtime for containers with Docker, containerd, Podman, and CRI-O. The attack can be made when attaching to a running container or when starting a container running a specially crafted image. For example, when runC attaches to a container the attacker can trick it into executing itself. This could be done by replacing the target binary inside the container with a custom binary pointing back at the runC binary itself. As an example, if the target binary was /bin/bash, this could be replaced with an executable script specifying the interpreter path #!/proc/self/exe (/proc/self/exec is a symbolic link created by the kernel for every process which points to the binary that was executed for that process). As such when /bin/bash is executed inside the container, instead the target of /proc/self/exe will be executed - which will point to the runc binary on the host. The attacker can then proceed to write to the target of /proc/self/exe to try and overwrite the runC binary on the host. However in general, this will not succeed as the kernel will not permit it to be overwritten whilst runC is executing. To overcome this, the attacker can instead open a file descriptor to /proc/self/exe using the O_PATH flag and then proceed to reopen the binary as O_WRONLY through /proc/self/fd/<nr> and try to write to it in a busy loop from a separate process. Ultimately it will succeed when the runC binary exits. After this the runC binary is compromised and can be used to attack other containers or the host itself. This attack is only possible with privileged containers since it requires root privilege on the host to overwrite the runC binary. Unprivileged containers with a non-identity ID mapping do not have the permission to write to the host binary and therefore are unaffected by this attack. LXC is also impacted in a similar manner by this vulnerability, however as the LXC project considers privileged containers to be unsafe no CVE has been assigned for this issue for LXC. Quoting from the https://linuxcontainers.org/lxc/security/ project's Security information page: "As privileged containers are considered unsafe, we typically will not consider new container escape exploits to be security issues worthy of a CVE and quick fix. We will however try to mitigate those issues so that accidental damage to the host is prevented." To prevent this attack, LXC has been patched to create a temporary copy of the calling binary itself when it starts or attaches to containers. To do this LXC creates an anonymous, in-memory file using the memfd_create() system call and copies itself into the temporary in-memory file, which is then sealed to prevent further modifications. LXC then executes this sealed, in-memory file instead of the original on-disk binary. Any compromising write operations from a privileged container to the host LXC binary will then write to the temporary in-memory binary and not to the host binary on-disk, preserving the integrity of the host LXC binary. Also as the temporary, in-memory LXC binary is sealed, writes to this will also fail. Note: memfd_create() was added to the Linux kernel in the 3.17 release. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Co-Developed-by: Alesa Sarai <asarai@suse.de> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>		2019-02-11 13:59:21 +01:00
.github	issue template: fix typo	2017-04-23 22:04:54 +02:00
coccinelle	coccinelle: use standard exit identifiers	2019-02-09 11:23:54 +01:00
config	apparmor: allow various remount,bind options	2018-11-16 12:17:30 +01:00
doc	doc: Add lxc.seccomp.allow_nesting to Japanese lxc.container.conf(5)	2019-01-28 19:01:40 +09:00
hooks	spelling: passphrase	2018-10-30 07:45:15 +00:00
src	CVE-2019-5736 (runC): rexec callers as memfd	2019-02-11 13:59:21 +01:00
templates	/etc/resolv.conf grows indefinitely	2019-01-27 13:46:48 +01:00
.gitignore	apparmor: account for specified rootfs path (closes #2617 )	2018-09-20 15:56:05 -07:00
.travis.yml	coverity: move to separate branch	2018-11-02 12:35:08 +01:00
AUTHORS	Initial revision	2008-08-06 14:32:29 +00:00
autogen.sh	Use libtool for liblxc.so	2016-10-21 18:32:18 -04:00
CODING_STYLE.md	autotools: add -Wimplicit-fallthrough	2018-09-21 15:24:14 +02:00
configure.ac	CVE-2019-5736 (runC): rexec callers as memfd	2019-02-11 13:59:21 +01:00
CONTRIBUTING	spelling: libraries	2018-10-30 07:18:08 +00:00
COPYING	Minor documentation updates	2012-12-06 00:02:36 -05:00
INSTALL	Minor documentation updates	2012-12-06 00:02:36 -05:00
lxc.pc.in	Revert "Add a prefix to the lxc.pc"	2017-06-23 19:47:12 +08:00
lxc.spec.in	fix rpm packaging for bash completion directory.	2019-02-05 17:10:20 +00:00
MAINTAINERS	MAINTAINERS: add Wolfgang Bumiller	2018-08-07 15:01:19 +02:00
Makefile.am	tree-wide: remove python3 bindings	2018-02-28 10:05:33 +01:00
NEWS	Initial revision	2008-08-06 14:32:29 +00:00
README	repo: add new README	2017-05-29 02:22:01 +02:00
README.md	README: add LGTM	2019-02-06 12:23:17 +01:00

README.md

LXC

LXC is the well-known and heavily tested low-level Linux container runtime. It is in active development since 2008 and has proven itself in critical production environments world-wide. Some of its core contributors are the same people that helped to implement various well-known containerization features inside the Linux kernel.

Status

Type	Service	Status
CI (Linux)	Jenkins
CI (Linux)	Travis
Project status	CII Best Practices
Code Quality	LGTM
Static Analysis	Coverity

System Containers

LXC's main focus is system containers. That is, containers which offer an environment as close as possible as the one you'd get from a VM but without the overhead that comes with running a separate kernel and simulating all the hardware.

This is achieved through a combination of kernel security features such as namespaces, mandatory access control and control groups.

Unprivileged Containers

Unprivileged containers are containers that are run without any privilege. This requires support for user namespaces in the kernel that the container is run on. LXC was the first runtime to support unprivileged containers after user namespaces were merged into the mainline kernel.

In essence, user namespaces isolate given sets of UIDs and GIDs. This is achieved by establishing a mapping between a range of UIDs and GIDs on the host to a different (unprivileged) range of UIDs and GIDs in the container. The kernel will translate this mapping in such a way that inside the container all UIDs and GIDs appear as you would expect from the host whereas on the host these UIDs and GIDs are in fact unprivileged. For example, a process running as UID and GID 0 inside the container might appear as UID and GID 100000 on the host. The implementation and working details can be gathered from the corresponding user namespace man page.

Since unprivileged containers are a security enhancement they naturally come with a few restrictions enforced by the kernel. In order to provide a fully functional unprivileged container LXC interacts with 3 pieces of setuid code:

lxc-user-nic (setuid helper to create a veth pair and bridge it on the host)
newuidmap (from the shadow package, sets up a uid map)
newgidmap (from the shadow package, sets up a gid map)

Everything else is run as your own user or as a uid which your user owns.

In general, LXC's goal is to make use of every security feature available in the kernel. This means LXC's configuration management will allow experienced users to intricately tune LXC to their needs.

A more detailed introduction into LXC security can be found under the following link

https://linuxcontainers.org/lxc/security/

Removing all Privilege

In principle LXC can be run without any of these tools provided the correct configuration is applied. However, the usefulness of such containers is usually quite restricted. Just to highlight the two most common problems:

Network: Without relying on a setuid helper to setup appropriate network devices for an unprivileged user (see LXC's lxc-user-nic binary) the only option is to share the network namespace with the host. Although this should be secure in principle, sharing the host's network namespace is still one step of isolation less and increases the attack vector. Furthermore, when host and container share the same network namespace the kernel will refuse any sysfs mounts. This usually means that the init binary inside of the container will not be able to boot up correctly.
User Namespaces: As outlined above, user namespaces are a big security enhancement. However, without relying on privileged helpers users who are unprivileged on the host are only permitted to map their own UID into a container. A standard POSIX system however, requires 65536 UIDs and GIDs to be available to guarantee full functionality.

Configuration

LXC is configured via a simple set of keys. For example,

lxc.rootfs.path
lxc.mount.entry

LXC namespaces configuration keys by using single dots. This means complex configuration keys such as lxc.net.0 expose various subkeys such as lxc.net.0.type, lxc.net.0.link, lxc.net.0.ipv6.address, and others for even more fine-grained configuration.

LXC is used as the default runtime for LXD, a container hypervisor exposing a well-designed and stable REST-api on top of it.

Kernel Requirements

LXC runs on any kernel from 2.6.32 onwards. All it requires is a functional C compiler. LXC works on all architectures that provide the necessary kernel features. This includes (but isn't limited to):

i686
x86_64
ppc, ppc64, ppc64le
s390x
armvl7, arm64

LXC also supports at least the following C standard libraries:

glibc
musl
bionic (Android's libc)

Backwards Compatibility

LXC has always focused on strong backwards compatibility. In fact, the API hasn't been broken from release 1.0.0 onwards. Main LXC is currently at version 2.*.*.

Reporting Security Issues

The LXC project has a good reputation in handling security issues quickly and efficiently. If you think you've found a potential security issue, please report it by e-mail to all of the following persons:

serge (at) hallyn (dot) com
stgraber (at) ubuntu (dot) com
christian.brauner (at) ubuntu (dot) com

For further details please have a look at

https://linuxcontainers.org/lxc/security/

Becoming Active in LXC development

We always welcome new contributors and are happy to provide guidance when necessary. LXC follows the kernel coding conventions. This means we only require that each commit includes a Signed-off-by line. The coding style we use is identical to the one used by the Linux kernel. You can find a detailed introduction at:

https://www.kernel.org/doc/html/v4.10/process/coding-style.html

and should also take a look at the CONTRIBUTING file in this repo.

If you want to become more active it is usually also a good idea to show up in the LXC IRC channel #lxc-dev on Freenode. We try to do all development out in the open and discussion of new features or bugs is done either in appropriate GitHub issues or on IRC.

When thinking about making security critical contributions or substantial changes it is usually a good idea to ping the developers first and ask whether a PR would be accepted.

Semantic Versioning

LXC and its related projects strictly adhere to a semantic versioning scheme.

Downloading the current source code

Source for the latest released version can always be downloaded from

https://linuxcontainers.org/downloads/

You can browse the up to the minute source code and change history online

https://github.com/lxc/lxc

Building LXC

Without considering distribution specific details a simple

./autogen.sh && ./configure && make && sudo make install

is usually sufficient.

In order to test current git master of LXC it is usually a good idea to compile with

./autogen.sh && ./configure && make

in a convenient directory and set LD_LIBRARY_PATH="${BUILD_DIR}"/lxc/src/lxc/.libs.

Getting help

When you find you need help, the LXC projects provides you with several options.

Discuss Forum

We maintain an discuss forum at

https://discuss.linuxcontainers.org/

where you can get support.

IRC

You can find support by joining #lxcontainers on Freenode.

Mailing Lists

You can check out one of the two LXC mailing list archives and register if interested: