Commit Graph

3598 Commits

Author SHA1 Message Date
Christian Brauner
7111ed68cb Make lxc-start-ephemeral use lxc.ephemeral
While lxc-copy is under review let users benefit (reboot survival etc.) from the
new lxc.ephemeral option already in lxc-start-ephemeral. This way we can remove
the lxc.hook.post-stop script-

Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-10-05 11:49:03 +01:00
Christian Brauner
4f64d0db3f Cleanup parts of lxc-destroy
A bit of pedantry usually doesn't hurt. The code should be easier to follow now
and avoids some repetitions.

Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-10-05 11:47:19 +01:00
Christian Brauner
4e6eb26bf0 Add lxc.ephemeral to lxc.container.conf manpage
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-10-05 11:47:16 +01:00
Bogdan Purcareata
9d291dd226 seccomp: add aarch64 support
Enable aarch64 seccomp support for LXC containers running on ARM64
architectures. Tested with libseccomp 2.2.0 and the default seccomp
policy example files delivered with the LXC package.

Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-10-05 11:47:12 +01:00
Stéphane Graber
570bea4eed Merge pull request #666 from Ponce/slackware-template
Add a slackware template.
2015-09-30 13:58:01 -04:00
Stéphane Graber
e209cddb66 Merge pull request #667 from cjwatson/ephemeral-parse-passwd
lxc-start-ephemeral: Parse passwd directly
2015-09-30 13:56:50 -04:00
Colin Watson
c6be89f857 lxc-start-ephemeral: Parse passwd directly
On Ubuntu 15.04, lxc-start-ephemeral's call to pwd.getpwnam always
fails.  While I haven't been able to prove it or track down an exact
cause, I strongly suspect that glibc does not guarantee that you can
call NSS functions after a context switch without re-execing.  (Running
"id root" in a subprocess from the same point works fine.)

It's safer to use getent to extract the relevant line from the passwd
file and parse it directly.

Signed-off-by: Colin Watson <cjwatson@ubuntu.com>
2015-09-30 13:52:32 +01:00
Stéphane Graber
4928c7186c
Define O_PATH and O_NOFOLLOW for Android
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-29 14:59:28 -04:00
Matteo Bernardini
3a05a669c1 Add a slackware template.
Requires pkgtools and slackpkg (from the slackware-current tree).

Signed-off-by: Matteo Bernardini <ponce@slackbuilds.org>
2015-09-29 17:35:25 +02:00
Serge Hallyn
592fd47a62 CVE-2015-1335: Protect container mounts against symlinks
When a container starts up, lxc sets up the container's inital fstree
by doing a bunch of mounting, guided by the container configuration
file.  The container config is owned by the admin or user on the host,
so we do not try to guard against bad entries.  However, since the
mount target is in the container, it's possible that the container admin
could divert the mount with symbolic links.  This could bypass proper
container startup (i.e. confinement of a root-owned container by the
restrictive apparmor policy, by diverting the required write to
/proc/self/attr/current), or bypass the (path-based) apparmor policy
by diverting, say, /proc to /mnt in the container.

To prevent this,

1. do not allow mounts to paths containing symbolic links

2. do not allow bind mounts from relative paths containing symbolic
links.

Details:

Define safe_mount which ensures that the container has not inserted any
symbolic links into any mount targets for mounts to be done during
container setup.

The host's mount path may contain symbolic links.  As it is under the
control of the administrator, that's ok.  So safe_mount begins the check
for symbolic links after the rootfs->mount, by opening that directory.

It opens each directory along the path using openat() relative to the
parent directory using O_NOFOLLOW.  When the target is reached, it
mounts onto /proc/self/fd/<targetfd>.

Use safe_mount() in mount_entry(), when mounting container proc,
and when needed.  In particular, safe_mount() need not be used in
any case where:

1. the mount is done in the container's namespace
2. the mount is for the container's rootfs
3. the mount is relative to a tmpfs or proc/sysfs which we have
   just safe_mount()ed ourselves

Since we were using proc/net as a temporary placeholder for /proc/sys/net
during container startup, and proc/net is a symbolic link, use proc/tty
instead.

Update the lxc.container.conf manpage with details about the new
restrictions.

Finally, add a testcase to test some symbolic link possibilities.

Reported-by: Roman Fiedler
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-29 09:52:55 -04:00
Christian Brauner
f2e4dddd71 Remove unnecessary call to free()
Freeing memory when calloc() fails doesn't make sense

Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-28 17:05:00 -04:00
Kaarle Ritvanen
5afb809607 lxc-alpine: use getopt to parse options
Signed-off-by: Kaarle Ritvanen <kaarle.ritvanen@datakunkku.fi>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-28 15:50:01 -04:00
Kaarle Ritvanen
0b8cdc1034 lxc-alpine: avoid GNU BRE extensions for better portability
Signed-off-by: Kaarle Ritvanen <kaarle.ritvanen@datakunkku.fi>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-28 15:49:59 -04:00
Christian Brauner
196a808645 Free allocated memory on failure (v2)
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-28 15:48:36 -04:00
Christian Brauner
2b54359b24 Add CAP_BLOCK_SUSPEND
CAP_BLOCK_SUSPEND (since Linux 3.5)
    Employ features that can block system suspend (epoll(7) EPOLLWAKEUP, /proc/sys/wake_lock).

Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-28 15:47:25 -04:00
Christian Brauner
57b837e247 Add CAP_AUDIT_READ
CAP_AUDIT_READ (since Linux 3.16)
    Allow reading the audit log via a multicast netlink socket.

Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-28 15:47:22 -04:00
Christian Brauner
d539a2b2a6 Check return value of snprintf in mount_proc_if_needed()
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-28 15:47:20 -04:00
Christian Brauner
793e387a9f Check return value of snprintf
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-28 15:47:18 -04:00
Stéphane Graber
4963978bb6
lxc-debian: We should only check the kernel architecture.
The dpkg architecture isn't relevant to LXC, only the kernel arch is.

Signed-off-by: Gergely Szasz <szaszg@hu.inter.net>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-28 15:43:10 -04:00
Serge Hallyn
df31363a0f coverity: remove useless check
handler->conf can't be null bc we checked handler->conf->epheemral
before calling lxc_destroy_container_on_signal()

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2015-09-26 16:03:53 -05:00
Serge Hallyn
d7c8805c10 coverity: drop second (redundant) block
Don't proceed to try the mount if we failed to create the
target if it didn't exist.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2015-09-26 14:44:40 -05:00
Tycho Andersen
6f2944c172 cmds: fix abstract socket length problem
Since we want to use null-terminated abstract sockets, let's compute the length
of them correctly.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-25 18:52:32 -04:00
Serge Hallyn
8fafe2de03 ubuntu.common.conf: mount /dev/mqueue
systemd wants it.  It doesn't seem to be a big deal, but it's
one fewer error msg.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-25 18:52:29 -04:00
Stéphane Graber
d028235de9
Fix indentation
I've noticed that a bunch of the code we've included over the past few
weeks has been using 8-spaces rather than tabs, making it all very hard
to read depending on your tabstop setting.

This commit attempts to revert all of that back to proper tabs and fix a
few more cases I've noticed here and there.

No functional changes are included in this commit.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 16:25:47 -04:00
Serge Hallyn
270261b90e ovl_rsync: make sure to umount
Otherwise the kernel will umount when it gets around to it, but
that on lxc_destroy we may race with it and fail the rmdir of
the overmounted (BUSY) rootfs.

This makes lxc-test-snapshot pass for me again.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 15:43:38 -04:00
Serge Hallyn
a93488dfcb overlayfs_mount: create delta dir if it doesn't exist
(This *should* fix the lxc-test-snapshot testcase, but doesn't seem
to by itself.)

If it doesn't exist, we may as well start with an empty one.  This
is needed when creating an overlayfs snapshot.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 13:25:28 -04:00
Serge Hallyn
067650d0b6 lxc_rmdir_onedev: don't fail if path doesn't exist
We're asked to delete it, don't fail if it doesn't exist.

This stops lxc-destroy from failing when the container isn't fully
built.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 13:25:25 -04:00
Christian Brauner
776d170c5e Make ephemeral containers survive reboots
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 13:25:23 -04:00
Christian Brauner
f01f7975c1 Remove ephemeral containers from lxc_snapshots
On shutdown ephemeral containers will be destroyed. We use mod_all_rdeps() from
lxccontainer.c to update the lxc_snapshots file of the original container. We
also include lxclock.h to lock the container when mod_all_rdeps() is called to
avoid races.

Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:47:41 -04:00
Christian Brauner
d825fff3ca Make mod_all_rdeps() public It will now also be called from start.c
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:47:37 -04:00
Tycho Andersen
aee755ee52 lxc-checkconfig: add some more config options
Here's some more config options that we do actually require to be able to
boot containers.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:45:01 -04:00
Tycho Andersen
80a706b361 gitignore: add Korean man page output
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:43:16 -04:00
Tycho Andersen
3b63f6e487 gitignore: add strange lxc@.service file
I have no idea what this file is, but the build system seems to be
generating it, so let's ignore it.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:43:09 -04:00
Christian Brauner
42342bed25 Ensure that mmap()ed memory is \0-terminated (v3)
Use pwrite() to write terminating \0-byte

This allows us to use standard string handling functions and we can avoid using
the GNU-extension memmem(). This simplifies removing the container from the
lxc_snapshots file. Wrap strstr() in a while loop to remove duplicate entries.

Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:41:43 -04:00
Stephane Nguyen
af651aa9e1 Fixing MTU calculation in instantiate_veth()]
Signed-off-by: Stephane Nguyen <stephminh@yahoo.es>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:40:57 -04:00
Christian Brauner
28272964d4 Enable lxc_fini() to destroy container on shutdown
When lxc.ephemeral is set to 1 in the containers config it will be destroyed on
shutdown.

Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:40:15 -04:00
Christian Brauner
297c2d5893 Destroy bdevs using bdev_destroy() from bdev.h
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:40:11 -04:00
Christian Brauner
339c6f1fb4 Add bdev_destroy() and bdev_destroy_wrapper()
static do_bdev_destroy() and bdev_destroy_wrapper() from lxccontainer.c become
public bdev_destroy() and bdev_destroy_wrapper() in bdev.c and bdev.h

Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:40:06 -04:00
Christian Brauner
8796becf36 Add lxc.ephemeral lxc.ephemeral indicates whether a container will be destroyed on shutdown Can be 0 for non-ephemeral and 1 for ephemeral.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:40:02 -04:00
Serge Hallyn
186bef0024 overlayfs_clone: rsync the mounted rootfs
Closes #655

We can't rsync the delta as unpriv user because we can't create
the chardevs representing a whiteout.  We can however rsync the
rootfs and have the kernel create the whiteouts for us.

do_rsync: pass --delete

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 11:37:31 -04:00
Christian Brauner
ffe9a25a03 Fix reallocation calculation
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-09-21 11:36:49 -04:00
Serge Hallyn
c4532a2036 Add tests for snapshot clone dependencies
Test edge cases (removing first and last entries in lxc_snapshots and the very
last snapshot) and make sure original container isn't destroyed while there are
snapshots, and is when there are none.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 11:35:32 -04:00
Serge Hallyn
108b88ce31 Add a nesting.conf which can be included to support nesting containers (v2)
Newer kernels have added a new restriction:  if /proc or /sys on the
host has files or non-empty directories which are over-mounted, and
there is no /proc which fully visible, then it assumes there is a
"security" reason for this.  It prevents anyone in a non-initial user
namespace from creating a new proc or sysfs mount.

To work around this, this patch adds a new 'nesting.conf' which can be
lxc.include'd from a container configuration file.  It adds a
non-overmounted mount of /proc and /sys under /dev/.lxc, so that the
kernel can see that we're not trying to *hide* things like /proc/uptime.
and /sys/devices/virtual/net.  If the host adds this to the config file
for container w1, then container w1 will support unprivileged child
containers.

The nesting.conf file also sets the apparmor profile to the with-nesting
variant, since that is required anyway.  This actually means that
supporting nesting isn't really more work than it used to be, just
different.  Instead of adding

lxc.aa_profile = lxc-container-default-with-nesting

you now just need to

lxc.include = /usr/share/lxc/config/nesting.conf

(Look, fewer characters :)

Finally, in order to maintain the current apparmor protections on
proc and sys, we make /dev/.lxc/{proc,sys} non-read/writeable.
We don't need to be able to use them, we're just showing the
kernel what's what.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 11:35:29 -04:00
Sungbae Yoo
76072aec5c doc: Update Korean lxc-snapshot(1) for newname option
Update for commit dedd4f6

Signed-off-by: Sungbae Yoo <sungbae.yoo@samsung.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 11:32:07 -04:00
Sungbae Yoo
5033e12328 doc: Add lxc.init_(uid|gid) in Korean lxc.container.conf(5)
update for commit dbca923

Signed-off-by: Sungbae Yoo <sungbae.yoo@samsung.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 11:31:43 -04:00
Sungbae Yoo
3703aa9e73 doc: Update Korean lxc.cgroup.use in lxc.system.conf(5)
Update for commit 2d8632d

Signed-off-by: Sungbae Yoo <sungbae.yoo@samsung.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 11:31:22 -04:00
Sungbae Yoo
0a05624e82 doc: Add the rename option to lxc-clone(1) in Korean manual
Update for commit 585f3c6

Signed-off-by: Sungbae Yoo <sungbae.yoo@samsung.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 11:30:45 -04:00
Sungbae Yoo
b7349f15f7 doc: Add LXC-specific mount option in Korean lxc.container.conf(5)
Update for commit f5b67b3

Signed-off-by: Sungbae Yoo <sungbae.yoo@samsung.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 11:30:14 -04:00
KATOH Yasufumi
7c3d395052 doc: Update Japanese lxc-snapshot(1) for newname option
Update for commit dedd4f6

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 11:29:49 -04:00
KATOH Yasufumi
7ee64c0f21 doc: Add lxc.init_(uid|gid) in Japanese lxc.container.conf(5)
update for commit dbca923

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-09-21 11:29:25 -04:00