While lxc-copy is under review let users benefit (reboot survival etc.) from the
new lxc.ephemeral option already in lxc-start-ephemeral. This way we can remove
the lxc.hook.post-stop script-
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
A bit of pedantry usually doesn't hurt. The code should be easier to follow now
and avoids some repetitions.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Enable aarch64 seccomp support for LXC containers running on ARM64
architectures. Tested with libseccomp 2.2.0 and the default seccomp
policy example files delivered with the LXC package.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
On Ubuntu 15.04, lxc-start-ephemeral's call to pwd.getpwnam always
fails. While I haven't been able to prove it or track down an exact
cause, I strongly suspect that glibc does not guarantee that you can
call NSS functions after a context switch without re-execing. (Running
"id root" in a subprocess from the same point works fine.)
It's safer to use getent to extract the relevant line from the passwd
file and parse it directly.
Signed-off-by: Colin Watson <cjwatson@ubuntu.com>
When a container starts up, lxc sets up the container's inital fstree
by doing a bunch of mounting, guided by the container configuration
file. The container config is owned by the admin or user on the host,
so we do not try to guard against bad entries. However, since the
mount target is in the container, it's possible that the container admin
could divert the mount with symbolic links. This could bypass proper
container startup (i.e. confinement of a root-owned container by the
restrictive apparmor policy, by diverting the required write to
/proc/self/attr/current), or bypass the (path-based) apparmor policy
by diverting, say, /proc to /mnt in the container.
To prevent this,
1. do not allow mounts to paths containing symbolic links
2. do not allow bind mounts from relative paths containing symbolic
links.
Details:
Define safe_mount which ensures that the container has not inserted any
symbolic links into any mount targets for mounts to be done during
container setup.
The host's mount path may contain symbolic links. As it is under the
control of the administrator, that's ok. So safe_mount begins the check
for symbolic links after the rootfs->mount, by opening that directory.
It opens each directory along the path using openat() relative to the
parent directory using O_NOFOLLOW. When the target is reached, it
mounts onto /proc/self/fd/<targetfd>.
Use safe_mount() in mount_entry(), when mounting container proc,
and when needed. In particular, safe_mount() need not be used in
any case where:
1. the mount is done in the container's namespace
2. the mount is for the container's rootfs
3. the mount is relative to a tmpfs or proc/sysfs which we have
just safe_mount()ed ourselves
Since we were using proc/net as a temporary placeholder for /proc/sys/net
during container startup, and proc/net is a symbolic link, use proc/tty
instead.
Update the lxc.container.conf manpage with details about the new
restrictions.
Finally, add a testcase to test some symbolic link possibilities.
Reported-by: Roman Fiedler
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Freeing memory when calloc() fails doesn't make sense
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CAP_BLOCK_SUSPEND (since Linux 3.5)
Employ features that can block system suspend (epoll(7) EPOLLWAKEUP, /proc/sys/wake_lock).
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CAP_AUDIT_READ (since Linux 3.16)
Allow reading the audit log via a multicast netlink socket.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
The dpkg architecture isn't relevant to LXC, only the kernel arch is.
Signed-off-by: Gergely Szasz <szaszg@hu.inter.net>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
handler->conf can't be null bc we checked handler->conf->epheemral
before calling lxc_destroy_container_on_signal()
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Since we want to use null-terminated abstract sockets, let's compute the length
of them correctly.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
systemd wants it. It doesn't seem to be a big deal, but it's
one fewer error msg.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
I've noticed that a bunch of the code we've included over the past few
weeks has been using 8-spaces rather than tabs, making it all very hard
to read depending on your tabstop setting.
This commit attempts to revert all of that back to proper tabs and fix a
few more cases I've noticed here and there.
No functional changes are included in this commit.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Otherwise the kernel will umount when it gets around to it, but
that on lxc_destroy we may race with it and fail the rmdir of
the overmounted (BUSY) rootfs.
This makes lxc-test-snapshot pass for me again.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
(This *should* fix the lxc-test-snapshot testcase, but doesn't seem
to by itself.)
If it doesn't exist, we may as well start with an empty one. This
is needed when creating an overlayfs snapshot.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
We're asked to delete it, don't fail if it doesn't exist.
This stops lxc-destroy from failing when the container isn't fully
built.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
On shutdown ephemeral containers will be destroyed. We use mod_all_rdeps() from
lxccontainer.c to update the lxc_snapshots file of the original container. We
also include lxclock.h to lock the container when mod_all_rdeps() is called to
avoid races.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Here's some more config options that we do actually require to be able to
boot containers.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
I have no idea what this file is, but the build system seems to be
generating it, so let's ignore it.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Use pwrite() to write terminating \0-byte
This allows us to use standard string handling functions and we can avoid using
the GNU-extension memmem(). This simplifies removing the container from the
lxc_snapshots file. Wrap strstr() in a while loop to remove duplicate entries.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
When lxc.ephemeral is set to 1 in the containers config it will be destroyed on
shutdown.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
static do_bdev_destroy() and bdev_destroy_wrapper() from lxccontainer.c become
public bdev_destroy() and bdev_destroy_wrapper() in bdev.c and bdev.h
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Closes#655
We can't rsync the delta as unpriv user because we can't create
the chardevs representing a whiteout. We can however rsync the
rootfs and have the kernel create the whiteouts for us.
do_rsync: pass --delete
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Test edge cases (removing first and last entries in lxc_snapshots and the very
last snapshot) and make sure original container isn't destroyed while there are
snapshots, and is when there are none.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Newer kernels have added a new restriction: if /proc or /sys on the
host has files or non-empty directories which are over-mounted, and
there is no /proc which fully visible, then it assumes there is a
"security" reason for this. It prevents anyone in a non-initial user
namespace from creating a new proc or sysfs mount.
To work around this, this patch adds a new 'nesting.conf' which can be
lxc.include'd from a container configuration file. It adds a
non-overmounted mount of /proc and /sys under /dev/.lxc, so that the
kernel can see that we're not trying to *hide* things like /proc/uptime.
and /sys/devices/virtual/net. If the host adds this to the config file
for container w1, then container w1 will support unprivileged child
containers.
The nesting.conf file also sets the apparmor profile to the with-nesting
variant, since that is required anyway. This actually means that
supporting nesting isn't really more work than it used to be, just
different. Instead of adding
lxc.aa_profile = lxc-container-default-with-nesting
you now just need to
lxc.include = /usr/share/lxc/config/nesting.conf
(Look, fewer characters :)
Finally, in order to maintain the current apparmor protections on
proc and sys, we make /dev/.lxc/{proc,sys} non-read/writeable.
We don't need to be able to use them, we're just showing the
kernel what's what.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>