The lxc monitor does not store the container's cgroups, rather it
recalculates them whenever needed.
Systemd moves itself into a /init.scope cgroup for the systemd
controller.
It might be worth changing that (by storing all cgroup info in the
lxc_handler), but for now go the hacky route and chop off any
trailing /init.scope.
I definately thinkg we want to switch to storing as that will be
more bullet-proof, but for now we need a quick backportable fix
for systemd 226 guests.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
The mount_entry_create_*_dirs() functions currently assume that the rootfs of
the container is actually named "rootfs". This has the consequence that
del = strstr(lxcpath, "/rootfs");
if (!del) {
free(lxcpath);
lxc_free_array((void **)opts, free);
return -1;
}
*del = '\0';
will return NULL when the rootfs of a container is not actually named "rootfs".
This means the we return -1 and do not create the necessary upperdir/workdir
directories required for the overlay/aufs mount to work. Hence, let's not make
that assumption. We now pass lxc_path and lxc_name to
mount_entry_create_*_dirs() and create the path directly. To prevent failure we
also have mount_entry_create_*_dirs() check that lxc_name and lxc_path are not
empty when they are passed in.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
It causes trouble when importing from different paths and will always be
included ahead of time anyway.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
When users wanted to mount overlay directories with lxc.mount.entry they had to
create upperdirs and workdirs beforehand in order to mount them. To create it
for them we add the functions mount_entry_create_overlay_dirs() and
mount_entry_create_aufs_dirs() which do this for them. User can now simply
specify e.g.:
lxc.mount.entry = /lower merged overlay lowerdir=/lower,upper=/upper,workdir=/workdir,create=dir
and /upper and /workdir will be created for them. /upper and /workdir need to
be absolute paths to directories which are created under the containerdir (e.g.
under $lxcpath/CONTAINERNAME/). Relative mountpoints, mountpoints outside the
containerdir, and mountpoints within the container's rootfs are ignored. (The
latter *might* change in the future should it be considered safe/useful.)
Specifying
lxc.mount.entry = /lower merged overlay lowerdir=/lower:/lower2,create=dir
will lead to a read-only overlay mount in accordance with the
kernel-documentation.
Specifying
lxc.mount.entry = /lower merged overlay lowerdir=/lower,create=dir
will fail when no upperdir and workdir options are given.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
The default_mounts[i].destination is never NULL except in the last
'stop here' entry. Coverity doesn't know about that and so is spewing
a warning. In any case, let's add a more stringent check in case someone
accidentally adds a NULL there later.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This would have caught the regression last night.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
While lxc-copy is under review let users benefit (reboot survival etc.) from the
new lxc.ephemeral option already in lxc-start-ephemeral. This way we can remove
the lxc.hook.post-stop script-
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
A bit of pedantry usually doesn't hurt. The code should be easier to follow now
and avoids some repetitions.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Enable aarch64 seccomp support for LXC containers running on ARM64
architectures. Tested with libseccomp 2.2.0 and the default seccomp
policy example files delivered with the LXC package.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
On Ubuntu 15.04, lxc-start-ephemeral's call to pwd.getpwnam always
fails. While I haven't been able to prove it or track down an exact
cause, I strongly suspect that glibc does not guarantee that you can
call NSS functions after a context switch without re-execing. (Running
"id root" in a subprocess from the same point works fine.)
It's safer to use getent to extract the relevant line from the passwd
file and parse it directly.
Signed-off-by: Colin Watson <cjwatson@ubuntu.com>
When a container starts up, lxc sets up the container's inital fstree
by doing a bunch of mounting, guided by the container configuration
file. The container config is owned by the admin or user on the host,
so we do not try to guard against bad entries. However, since the
mount target is in the container, it's possible that the container admin
could divert the mount with symbolic links. This could bypass proper
container startup (i.e. confinement of a root-owned container by the
restrictive apparmor policy, by diverting the required write to
/proc/self/attr/current), or bypass the (path-based) apparmor policy
by diverting, say, /proc to /mnt in the container.
To prevent this,
1. do not allow mounts to paths containing symbolic links
2. do not allow bind mounts from relative paths containing symbolic
links.
Details:
Define safe_mount which ensures that the container has not inserted any
symbolic links into any mount targets for mounts to be done during
container setup.
The host's mount path may contain symbolic links. As it is under the
control of the administrator, that's ok. So safe_mount begins the check
for symbolic links after the rootfs->mount, by opening that directory.
It opens each directory along the path using openat() relative to the
parent directory using O_NOFOLLOW. When the target is reached, it
mounts onto /proc/self/fd/<targetfd>.
Use safe_mount() in mount_entry(), when mounting container proc,
and when needed. In particular, safe_mount() need not be used in
any case where:
1. the mount is done in the container's namespace
2. the mount is for the container's rootfs
3. the mount is relative to a tmpfs or proc/sysfs which we have
just safe_mount()ed ourselves
Since we were using proc/net as a temporary placeholder for /proc/sys/net
during container startup, and proc/net is a symbolic link, use proc/tty
instead.
Update the lxc.container.conf manpage with details about the new
restrictions.
Finally, add a testcase to test some symbolic link possibilities.
Reported-by: Roman Fiedler
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Freeing memory when calloc() fails doesn't make sense
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CAP_BLOCK_SUSPEND (since Linux 3.5)
Employ features that can block system suspend (epoll(7) EPOLLWAKEUP, /proc/sys/wake_lock).
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CAP_AUDIT_READ (since Linux 3.16)
Allow reading the audit log via a multicast netlink socket.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
The dpkg architecture isn't relevant to LXC, only the kernel arch is.
Signed-off-by: Gergely Szasz <szaszg@hu.inter.net>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
handler->conf can't be null bc we checked handler->conf->epheemral
before calling lxc_destroy_container_on_signal()
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Since we want to use null-terminated abstract sockets, let's compute the length
of them correctly.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
systemd wants it. It doesn't seem to be a big deal, but it's
one fewer error msg.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
I've noticed that a bunch of the code we've included over the past few
weeks has been using 8-spaces rather than tabs, making it all very hard
to read depending on your tabstop setting.
This commit attempts to revert all of that back to proper tabs and fix a
few more cases I've noticed here and there.
No functional changes are included in this commit.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>