Commit Graph

2272 Commits

Author SHA1 Message Date
Serge Hallyn
13353dc420 daemonized start: exit children on failure, don't return
When starting a daemonized container, only the original parent
thread should return to the caller.  The first forked child
immediately exits after forking, but the grandparent child
was in some places returning on error - causing a second instance
of the calling function.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
2015-06-12 16:11:53 -05:00
Tycho Andersen
69aeabac1a uniformly nullify std fds
In various places throughout the code, we want to "nullify" the std fds,
opening them to /dev/null or zero or so. Instead, let's unify this code and do
it in such a way that Coverity (probably) won't complain.

v2: use /dev/null for stdin as well
v3: add a comment about use of C's short circuiting
v4: axe comment, check errors on dup2, s/quiet/need_null_stdfds

Reported-by: Coverity
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2015-06-10 23:04:51 -05:00
Tycho Andersen
5b72de5fd3 move utils.h #endif to end of file
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2015-06-10 23:04:47 -05:00
Tycho Andersen
bd9e78f570 c/r: remove unused variable mnts
Reported-by: Coverity
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2015-06-10 23:04:45 -05:00
Tycho Andersen
3158ab5b9e c/r: use fclose instead of close
We're leaking the FILE* here while closing the underlying fd; let's just
close the file and thus close both.

Reported-by: Coverity
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2015-06-10 23:04:43 -05:00
Daniel Golle
f58ad87a3f fix build on mpc85xx
Initialize ret to 0 so compiler no longer complains about
monitor.c: In function 'lxc_monitor_open':
monitor.c:212:5: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]

https://github.com/openwrt/packages/issues/1356

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
2015-06-09 12:58:12 +02:00
Serge Hallyn
d9b32b0900 coverity: don't risk exec()ing NULL
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2015-06-08 10:37:55 -05:00
Serge Hallyn
17d252a822 coverity: fix use-after-free in cgmanager.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2015-06-08 10:33:22 -05:00
Stéphane Graber
212bc24189
Fix bdev.h
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2015-06-03 21:45:23 -04:00
Stéphane Graber
c2af52cf52
Revert bdev.h to the way it was
Instead of re-defining MS_ options all over the place, just revert the
last change to bdev.h so we have all the defines in there again.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2015-06-03 19:37:59 -04:00
Stéphane Graber
54c0610037
Define MS_RELATIME for Android
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2015-06-03 17:08:11 -04:00
Stéphane Graber
c37ebdc49a
Define MS_REC and MS_SLAVE for Android in bdev.c
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2015-06-03 15:07:08 -04:00
Serge Hallyn
a70a69e8a0 don't dereference a NULL c->lxc_conf
Commit 37cf711b added a destroy hook, but when it checks
at destroy time whether that hook exists, it assumes that
c->lxc_conf is good.  In fact lxc_conf can be NULL, so check
for that.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-06-03 14:09:51 -04:00
Tycho Andersen
755fa45300 don't hardcode the path to criu when checking versions
We use the right path when actually execing criu to checkpoint and restore, but
when checking versions we didn't. Let's use the right path.

Reported-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-06-03 10:37:30 -04:00
Serge Hallyn
a041127564 detect whether cgmanager_list_controllers is available
and don't use it if not. This fixes failure to build with older
cgmanager.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-06-03 10:37:27 -04:00
Serge Hallyn
454ec0abc7 api_start: always close fds 0-2 when daemonized
commit 507cee3618 moved the close and re-open of fds 0-2 into
do_start.  But this means that the lxc monitor itself keeps the
caller's fds 0-2 open, which is wrong for daemonized containers.

Closes #548

Reported-by: Mathieu Le Marec - Pasquet <kiorky@cryptelium.net>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-06-02 19:05:37 -04:00
Serge Hallyn
27be573155 cgmanager: attach: never use 'all' controller
We were using 'all' controller if current was in all the
same cgroup.  That doesn't suffice.  We'd have to check
the target.  At that point we may as well just attach
controller by controller.

An optimization to consider is to check the /proc/initpid/cgroup
for all identical controllers.  Let's start by just getting it
right.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-06-02 19:05:22 -04:00
Tycho Andersen
59c2d40689 c/r: remember to clean up pidfile
When restoring, we didn't clean up the pidfile that criu uses to pass us the
init pid on error or success; let's do that.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-06-02 19:04:23 -04:00
Stéphane Graber
d24095e46a Fix ABI compatibility
Until we bump the SONAME to liblxc2, only symbol additions and struct
member additions are allowed.

Adding struct members in the middle of the struct breaks backward
compatibility.

This commit makes it clear when struct members were added and moves a
few members that were added in the middle of the 1.0 struct to the end
of it.

Note that unfortunately that means we're breaking backward compatibility
between LXC 1.1.0 and the state after this commit, given 1.1 is
reasonably new, this is the least damaging way of fixing the problem.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-06-02 19:04:20 -04:00
KATOH Yasufumi
31a882ef3a aufs: Support unprivileged clone, mount
Current aufs supports FS_USERNS_MOUNT by using allow_userns module
parameter. It allows root in userns to mount aufs.

This patch allows an unprivileged container to use aufs. The value of
xino option is changed to /dev/shm/aufs.xino that an unpriv user can
write.

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-05-26 16:01:28 -04:00
Serge Hallyn
fe44788608 proc update - don't assume we are pid 1
(I erred in the first patch, causing every lxc-attach to unmount the
container-'s /proc)

Since we now use mount_proc_if_needed() from attach, as opposed to only
from start, we cannot assume we are pid 1.  So fix the check for whether
to mount a new proc.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-05-26 13:07:01 -04:00
Serge Hallyn
ced03a017b attach: mount a sane prox for LSM setup
To set lsm labels, a namespace-local proc mount is needed.

If a container does not have a lxc.mount.auto = proc set, then
tasks in the container do not have a correct /proc mount until
init feels like doing the mount.  At startup we handlie this
by mounting a temporary /proc if needed.  We weren't doing this
at attach, though, so that

lxc-start -n $container
lxc-wait -t 5 -s RUNNING -n $container
lxc-attach -n $container -- uname -a

could in a racy way fail with something like

lxc-attach: lsm/apparmor.c: apparmor_process_label_set: 183 No such file or directory - failed to change apparmor profile to lxc-container-default

Thanks to Chris Townsend for finding this bug at
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1452451

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-05-26 13:06:59 -04:00
Tycho Andersen
4eae405138 c/r: complain when criu isn't exec()'d correctly
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-05-26 11:16:29 -04:00
Serge Hallyn
7f7948206b Use 'cgm listcontrollers' list rather than /proc/self/cgroups
to populate the list of subsystems to use.

Cgmanager can be started with some subsystems disabled (i.e.
cgmanager -M cpuset).  If lxc using cgmanager then uses the
/proc/self/cgroup output to determine which controllers to use,
it will fail when trying to do things to cpuset.  Instead, ask
cgmanager which controllers to use.

This still defers (per patch 1/1) to the lxc.cgroup.use values.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-05-26 11:14:27 -04:00
Serge Hallyn
cb6d63a7aa make cgmanager follow lxc.cgroup.use
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-05-26 11:14:25 -04:00
Serge Hallyn
4295c5de9e lxc-destroy: remove btrfs subvolumes
Doing this requires some btrfs functions from bdev to be used in
utils.c  Because utils.h is imported by lxc_init.c, I had to create
a new initutils.[ch] which are used by both lxc_init.c and utils.c
We could instead put the btrfs functions into utils.c, which would
be a shorter patch, but it really doesn't belong there.  So I went
the other way figuring there may be more such cases coming up of
fns in utils.c needing code from bdev.c which can't go into lxc_init.

Currently, if we detect a btrfs subvolume we just remove it.  The
st_dev on that dir is different, so we cannot detect if this is
bound in from another fs easily.  If we care, we should check
whether this is a mountpoint, this patch doesn't do that.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-05-26 11:14:22 -04:00
Stéphane Graber
1e2eb3f4e6 Merge pull request #536 from regit/passthru-v1.2
Passthru v1.2
2015-05-25 11:51:07 -04:00
Stéphane Graber
fc2d798a90 Merge pull request #522 from ysbnim/master
config : add lxc.hook.destroy option
2015-05-25 11:07:10 -04:00
Stéphane Graber
378da5aa9f Merge pull request #526 from Azendale/master
Change lxc-clone to use 'rsync -aH' instead of just 'rsync -a'
2015-05-25 11:06:07 -04:00
Stéphane Graber
02d25a9ea5
Easy to read tiemstamp in log
Signed-off-by: Gyeongmin Kim <gyeongmintwo@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-05-25 11:04:33 -04:00
Eric Leblond
9985416197 macvlan: add 'passthru' mode
In setup where we want to sniff with an IDS from inside a container
we can use the 'passthru' mode of macvlan. This was not accessible
from the config and this patch fixes the issue.

Signed-off-by: Eric Leblond <eric@regit.org>
2015-05-23 17:53:20 +02:00
Serge Hallyn
a73077478d coverity: free 'result' in error case.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2015-05-17 07:30:57 -05:00
Erik B. Andersen
6a69816295 Change lxc-clone to use 'rsync -aH' instead of just 'rsync -a' for cloning to fix Launchpad Bug #1441307.
Signed-off-by: Erik B. Andersen <erik.b.andersen@gmail.com>
2015-05-14 21:39:57 -07:00
Sungbae Yoo
37cf711b28 config : add lxc.hook.destroy option
Signed-off-by: Sungbae Yoo <sungbae.yoo@samsung.com>
2015-05-14 09:00:35 +09:00
Stéphane Graber
6ad27c4282 Merge pull request #504 from thmo/lua53
Fix Lua 5.3 compatibility code.
2015-05-11 18:09:54 +00:00
Stéphane Graber
ae829be398 Merge pull request #498 from brauner/master
Make lxc-checkconfig work with kernel versions > 3
2015-05-11 18:03:09 +00:00
Kien Truong
365d180a39 Properly free memory of sorted cgroup settings
We need to use lxc_list_for_each_safe, otherwise de-allocation
will fail with a list size bigger than 2. The pointer to the head
of the list also need freeing after we've freed all other elements
of the list.

Signed-off-by: Kien Truong <duckientruong@gmail.com>
2015-05-05 00:22:00 +01:00
Kien Truong
fac7c66386 Check malloc failure when sorting cgroup settings.
Signed-off-by: Kien Truong <duckientruong@gmail.com>
2015-05-05 00:21:59 +01:00
Kien Truong
aaf2683052 Sort the cgroup memory settings before applying.
Add a function to sort the cgroup settings before applying.
Currently, the function will put memory.memsw.limit_in_bytes after
memory.limit_in_bytes setting so the container will start
regardless of the order specified in the input. Fix #453

Signed-off-by: Kien Truong <duckientruong@gmail.com>
2015-05-05 00:21:59 +01:00
Serge Hallyn
44481bff6b overlay: create workdir if it doesn't exist
Otherwise a container created before we needed workdir will fail
to start after a kernel+lxc update.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2015-05-04 08:12:18 -05:00
Tycho Andersen
85c50991da c/r: check for criu images in the checkpoint directory
CRIU can get confused if there are two dumps that are written to the same
directory, so we make some minimal effort to prevent people from doing this.
This is a better alternative than forcing liblxc to create the directory, since
it is mostly race free (and neither solution is bullet proof anyway if someone
rsyncs some bad images over the top of the good ones).

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-04-28 08:24:39 +02:00
Thomas Moschny
98088cfbee Fix Lua 5.3 compatibility code.
If Lua 5.3 is compiled with LUA_COMPAT_5_2 defined, the
luaL_checkunsigned compatibility macro is already defined
in lauxlib.h.

Signed-off-by: Thomas Moschny <thomas.moschny@gmx.de>
2015-04-26 23:26:27 +02:00
Christian Brauner
56983b40c7 Make lxc-checkconfig work with kernel versions > 3
(1) Add test for kernel version greater 3.
(2) Use && and || instead of -a and -o as suggested in
    http://www.unix.com/man-page/posix/1p/test/.

lxc-checkconfig will currently report "missing" on "Cgroup memory controller"
for kernel versions greater 3. This happens because the script, before checking
for the corresponding memory variable in the kernel config, currently will test
whether we have a major kernel version greater- or equal to 3 and a minor kernel
version greater- or equal to 6. This adds an additional test whether we have a
major kernel version greater than 3.

Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
2015-04-25 10:05:07 +02:00
Serge Hallyn
2366b8a769 use poll instead of select
Particularly when using the go-lxc api with lots of threads, it
happens that if the open files limit is > 1024, we will try to
select on fd > 1024 which breaks on glibc.

So use poll instead of select.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-04-22 11:55:33 -05:00
Serge Hallyn
858377e4d9 logs: introduce a thread-local 'current' lxc_config (v2)
The logging code uses a global log_fd and log_level to direct
logging (ERROR(), etc).  While the container configuration file allows
for lxc.loglevel and lxc.logfile, those are only used at configuration
file read time to set the global variables.  This works ok in the
lxc front-end programs, but becomes a problem with threaded API users.

The simplest solution would be to not allow per-container configuration
files, but it'd be nice to avoid that.

Passing a logfd or lxc_conf into every ERROR/INFO/etc call is "possible",
but would be a huge complication as there are many functions, including
struct member functions and callbacks, which don't have that info and
would need to get it from somewhere.

So the approach I'm taking here is to say that all real container work
is done inside api calls, and therefore the API calls themselves can
set a thread-local variable indicating which log info to use.  If
unset, then use the global values.  The lxc-* programs, when called
with a '-o logfile' argument, set a global variable to indicate that
the user-specified value should be used.

In this patch:

If the lxc container configuration specifies a loglevel/logfile, only
set the lxc_config's logfd and loglevel according to those, not the
global values.

Each API call is wrapped to set/unset the current_config.  (The few
exceptions are calls which do not result in any log actions)

Update logfile appender to use the logfile specified in lxc_conf if (a)
current_config is set and (b) the lxc-* command did not override it.

Changelog (2015-04-21):
	. always re-set current_config to NULL at end of an API
	  call, rather than storing the previous value.  We don't
	  nest API calls.
	. remove the log_lock stuff which wasn't used
	. lxc_conf_free: if the config is current_config, set
	  current_config to NULL.  (It can't be another thread's
	  current_config, or we wouldn't be freeing it)
	. lxc_check_inherited: don't close fd if it is the
	  current_config->logfd.  Note this is only called when
	  starting a container, so we have no other threads at
	  this point.

Changelog (2015-04-22)
	. Unset the per-container logfd on destroy
	.
	. Do so before we rm the containerdir.  Otherwise if the logfile is set
	. to $lxcpath/$name/log, the containerdir won't be fully deleted.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2015-04-22 11:54:46 -05:00
Tycho Andersen
507cee3618 c/r: re-open fds after clone()
If we don't re-open these after clone, the init process has a pointer to the
parent's /dev/{zero,null}. CRIU seese these and wants to dump the parent's
mount namespace, which is unnecessary. Instead, we should just re-open
stdin/out/err after we do the clone and pivot root, to ensure that we have
pointers to the devcies in init's rootfs instead of the host's.

v2: Only close fds if the container was daemonized. This didn't turn out as
    nicely as described on the list because lxc_start() doesn't actually have
    the struct lxc_container, so it cant see the flag. Instead, we just pass it
    down everywhere.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-04-22 12:30:32 -04:00
Tycho Andersen
dd62857af3 c/r: enable hugetlbfs in criu
In vivid containers hugetlbfs is mounted, but it is not one of the hardcoded
fses in criu, so we need to tell criu that it is okay to automount it.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-04-22 12:30:29 -04:00
Tycho Andersen
8ba5ced736 c/r: check version of criu
Note that we allow both a tagged version or a git build that has sufficient
patches for the features we require.

v2: close criu's stderr too

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-04-22 12:29:51 -04:00
Tycho Andersen
e29fe1dd21 c/r: move criu code to its own file
Trying to cage the beast that is lxccontainer.c.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-04-22 12:29:48 -04:00
Tycho Andersen
cba98d127b c/r: use criu option instead of lxc-restore-net
As of criu 1.5, the --veth-pair argument supports an additional parameter that
is the bridge name to attach to. This enables us to get rid of the goofy
action-script hack that passed bridge names as environment variables.

This patch is on top of the systemd/lxcfs mount rework patch, as we probably
want to wait to use 1.5 options until it has been out for a while and is in
distros.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2015-04-22 12:29:46 -04:00