![]() Adam Iwaniuk and Borys Popławski discovered that an attacker can compromise the runC host binary from inside a privileged runC container. As a result, this could be exploited to gain root access on the host. runC is used as the default runtime for containers with Docker, containerd, Podman, and CRI-O. The attack can be made when attaching to a running container or when starting a container running a specially crafted image. For example, when runC attaches to a container the attacker can trick it into executing itself. This could be done by replacing the target binary inside the container with a custom binary pointing back at the runC binary itself. As an example, if the target binary was /bin/bash, this could be replaced with an executable script specifying the interpreter path #!/proc/self/exe (/proc/self/exec is a symbolic link created by the kernel for every process which points to the binary that was executed for that process). As such when /bin/bash is executed inside the container, instead the target of /proc/self/exe will be executed - which will point to the runc binary on the host. The attacker can then proceed to write to the target of /proc/self/exe to try and overwrite the runC binary on the host. However in general, this will not succeed as the kernel will not permit it to be overwritten whilst runC is executing. To overcome this, the attacker can instead open a file descriptor to /proc/self/exe using the O_PATH flag and then proceed to reopen the binary as O_WRONLY through /proc/self/fd/<nr> and try to write to it in a busy loop from a separate process. Ultimately it will succeed when the runC binary exits. After this the runC binary is compromised and can be used to attack other containers or the host itself. This attack is only possible with privileged containers since it requires root privilege on the host to overwrite the runC binary. Unprivileged containers with a non-identity ID mapping do not have the permission to write to the host binary and therefore are unaffected by this attack. LXC is also impacted in a similar manner by this vulnerability, however as the LXC project considers privileged containers to be unsafe no CVE has been assigned for this issue for LXC. Quoting from the https://linuxcontainers.org/lxc/security/ project's Security information page: "As privileged containers are considered unsafe, we typically will not consider new container escape exploits to be security issues worthy of a CVE and quick fix. We will however try to mitigate those issues so that accidental damage to the host is prevented." To prevent this attack, LXC has been patched to create a temporary copy of the calling binary itself when it starts or attaches to containers. To do this LXC creates an anonymous, in-memory file using the memfd_create() system call and copies itself into the temporary in-memory file, which is then sealed to prevent further modifications. LXC then executes this sealed, in-memory file instead of the original on-disk binary. Any compromising write operations from a privileged container to the host LXC binary will then write to the temporary in-memory binary and not to the host binary on-disk, preserving the integrity of the host LXC binary. Also as the temporary, in-memory LXC binary is sealed, writes to this will also fail. Note: memfd_create() was added to the Linux kernel in the 3.17 release. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Co-Developed-by: Alesa Sarai <asarai@suse.de> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> |
||
---|---|---|
.github | ||
coccinelle | ||
config | ||
doc | ||
hooks | ||
src | ||
templates | ||
.gitignore | ||
.travis.yml | ||
AUTHORS | ||
autogen.sh | ||
CODING_STYLE.md | ||
configure.ac | ||
CONTRIBUTING | ||
COPYING | ||
INSTALL | ||
lxc.pc.in | ||
lxc.spec.in | ||
MAINTAINERS | ||
Makefile.am | ||
NEWS | ||
README | ||
README.md |
LXC
LXC is the well-known and heavily tested low-level Linux container runtime. It is in active development since 2008 and has proven itself in critical production environments world-wide. Some of its core contributors are the same people that helped to implement various well-known containerization features inside the Linux kernel.
Status
Type | Service | Status |
---|---|---|
CI (Linux) | Jenkins | |
CI (Linux) | Travis | |
Project status | CII Best Practices | |
Code Quality | LGTM | |
Static Analysis | Coverity | |
System Containers
LXC's main focus is system containers. That is, containers which offer an environment as close as possible as the one you'd get from a VM but without the overhead that comes with running a separate kernel and simulating all the hardware.
This is achieved through a combination of kernel security features such as namespaces, mandatory access control and control groups.
Unprivileged Containers
Unprivileged containers are containers that are run without any privilege. This requires support for user namespaces in the kernel that the container is run on. LXC was the first runtime to support unprivileged containers after user namespaces were merged into the mainline kernel.
In essence, user namespaces isolate given sets of UIDs and GIDs. This is achieved by establishing a mapping between a range of UIDs and GIDs on the host to a different (unprivileged) range of UIDs and GIDs in the container. The kernel will translate this mapping in such a way that inside the container all UIDs and GIDs appear as you would expect from the host whereas on the host these UIDs and GIDs are in fact unprivileged. For example, a process running as UID and GID 0 inside the container might appear as UID and GID 100000 on the host. The implementation and working details can be gathered from the corresponding user namespace man page.
Since unprivileged containers are a security enhancement they naturally come with a few restrictions enforced by the kernel. In order to provide a fully functional unprivileged container LXC interacts with 3 pieces of setuid code:
- lxc-user-nic (setuid helper to create a veth pair and bridge it on the host)
- newuidmap (from the shadow package, sets up a uid map)
- newgidmap (from the shadow package, sets up a gid map)
Everything else is run as your own user or as a uid which your user owns.
In general, LXC's goal is to make use of every security feature available in the kernel. This means LXC's configuration management will allow experienced users to intricately tune LXC to their needs.
A more detailed introduction into LXC security can be found under the following link
Removing all Privilege
In principle LXC can be run without any of these tools provided the correct configuration is applied. However, the usefulness of such containers is usually quite restricted. Just to highlight the two most common problems:
-
Network: Without relying on a setuid helper to setup appropriate network devices for an unprivileged user (see LXC's
lxc-user-nic
binary) the only option is to share the network namespace with the host. Although this should be secure in principle, sharing the host's network namespace is still one step of isolation less and increases the attack vector. Furthermore, when host and container share the same network namespace the kernel will refuse any sysfs mounts. This usually means that the init binary inside of the container will not be able to boot up correctly. -
User Namespaces: As outlined above, user namespaces are a big security enhancement. However, without relying on privileged helpers users who are unprivileged on the host are only permitted to map their own UID into a container. A standard POSIX system however, requires 65536 UIDs and GIDs to be available to guarantee full functionality.
Configuration
LXC is configured via a simple set of keys. For example,
lxc.rootfs.path
lxc.mount.entry
LXC namespaces configuration keys by using single dots. This means complex
configuration keys such as lxc.net.0
expose various subkeys such as
lxc.net.0.type
, lxc.net.0.link
, lxc.net.0.ipv6.address
, and others for
even more fine-grained configuration.
LXC is used as the default runtime for LXD, a container hypervisor exposing a well-designed and stable REST-api on top of it.
Kernel Requirements
LXC runs on any kernel from 2.6.32 onwards. All it requires is a functional C compiler. LXC works on all architectures that provide the necessary kernel features. This includes (but isn't limited to):
- i686
- x86_64
- ppc, ppc64, ppc64le
- s390x
- armvl7, arm64
LXC also supports at least the following C standard libraries:
- glibc
- musl
- bionic (Android's libc)
Backwards Compatibility
LXC has always focused on strong backwards compatibility. In fact, the API
hasn't been broken from release 1.0.0
onwards. Main LXC is currently at
version 2.*.*
.
Reporting Security Issues
The LXC project has a good reputation in handling security issues quickly and efficiently. If you think you've found a potential security issue, please report it by e-mail to all of the following persons:
- serge (at) hallyn (dot) com
- stgraber (at) ubuntu (dot) com
- christian.brauner (at) ubuntu (dot) com
For further details please have a look at
Becoming Active in LXC development
We always welcome new contributors and are happy to provide guidance when
necessary. LXC follows the kernel coding conventions. This means we only
require that each commit includes a Signed-off-by
line. The coding style we
use is identical to the one used by the Linux kernel. You can find a detailed
introduction at:
and should also take a look at the CONTRIBUTING file in this repo.
If you want to become more active it is usually also a good idea to show up in
the LXC IRC channel #lxc-dev
on Freenode
. We try to do all development out
in the open and discussion of new features or bugs is done either in
appropriate GitHub issues or on IRC.
When thinking about making security critical contributions or substantial changes it is usually a good idea to ping the developers first and ask whether a PR would be accepted.
Semantic Versioning
LXC and its related projects strictly adhere to a semantic versioning scheme.
Downloading the current source code
Source for the latest released version can always be downloaded from
You can browse the up to the minute source code and change history online
Building LXC
Without considering distribution specific details a simple
./autogen.sh && ./configure && make && sudo make install
is usually sufficient.
In order to test current git master of LXC it is usually a good idea to compile with
./autogen.sh && ./configure && make
in a convenient directory and set LD_LIBRARY_PATH="${BUILD_DIR}"/lxc/src/lxc/.libs
.
Getting help
When you find you need help, the LXC projects provides you with several options.
Discuss Forum
We maintain an discuss forum at
where you can get support.
IRC
You can find support by joining #lxcontainers
on Freenode
.
Mailing Lists
You can check out one of the two LXC mailing list archives and register if interested: