gettimeofday() is not async signal safe. So let's switch to clock_gettime() to
be on the safe side.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
- single digit months, days, hours, minutes, and seconds should always be
preceded by a 0.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This allows us to generate nice timestamps in a thread-safe manner without
relying on locale touching functions from any libc.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Converts a unix time Epoch given by a struct timespec to a UTC string useable
in our logging functions. Maybe expanded to allow for more generic formatting.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Our log functions need to make extra sure that they are thread-safe. We had
some problems with that before. This especially involves time-conversion
functions. I don't want to find any localtime() or gmtime() functions or
relatives in here. Not even localtime_r() or gmtime_r() or relatives. They all
fiddle with global variables and locking in various libcs. They cause deadlocks
when liblxc is used multi-threaded and no matter how smart you think you are,
you __will__ cause trouble using them.
(As a short example how this can cause trouble: LXD uses forkstart to fork off
a new process that runs the container. At the same time the go runtime LXD
relies on does its own multi-threading thing which we can't control. The
fork()ing + threading then seems to mess with the locking states in these time
functions causing deadlocks.)
The current solution is to be good old unix people and use the Epoch as our
reference point and simply use the seconds and nanoseconds that have past since
then. This relies on clock_gettime() which is explicitly marked MT-Safe with no
restrictions! This way, anyone who is really strongly invested in getting the
actual time the log entry was created, can just convert it for themselves. Our
logging is mostly done for debugging purposes so don't try to make it pretty.
Pretty might cost you thread-safety.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This macro can be used to set or allocate a string buffer that can hold any
64bit representable number.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The thread-unsafe functions strsignal() is called in run_buffer() which in turn
is called in run_buffer_argv() which is responsible for running __all__ lxc
hooks. This is pretty dangerous for multi-threaded users like LXD.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Before lxc_monitord called lxc_monitord_cleanup() from a signal handler. This
function calls a bunch of async signal unsafe functions and basically begs for
deadlocks. This commit switches lxc-monitord to using sigsetjmp() and
siglongjmp() in the signal handler to jump to a cleanup label that call
lxc_monitord_cleanup(). In this way, we avoid using async signal unsafe
functions.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Setting loglevel to DEBUG will allow us to retrieve more useful information in
case something goes wrong. The total size of the log will not increase
significantly.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Execing lxc-monitord is quite a crucial step so let's be very obsessive about
logging possible errors to guide us in debugging.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Before we used tmpfile() to write out mount entries for the container. This
requires a writeable /tmp file system which can be a problem for systems where
this filesystem is not present. This commit switches from tmpfile() to using
the memfd_create() syscall. It allows us to create an anonymous tmpfs file (And
is somewhat similar to mmap().) which is automatically deleted as soon as any
references to it are dropped. In case we detect that syscall is not
implemented, we fallback to using tmpfile().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
localtime_r() can lead to deadlocks because it calls __tzset() and
__tzconvert() internally. The deadlock stems from an interaction between these
functions and the functions in monitor.c and commands.{c,h}. The latter
functions will write to the log independent of the container thread that is
currently running. Since the monitor fork()ed it seems to duplicate the mutex
states of the time functions mentioned above causing the deadlock.
As a short termm fix, I suggest to simply disable receiving the time when
monitor.c or command.{c,h} functions are called. This should be ok, since the
[lxc monitor] will only emit a few messages and thread-safety is currently more
important than beautiful logs. The rest of the log stays the same as it was
before.
Here is an example output from logs where I printed the pid and tid of the
process that is currently writing to the log:
lxc 20161125170200.619 INFO lxc_start: 18695-18695: - start.c:lxc_check_inherited:243 - Closed inherited fd: 23.
lxc 20161125170200.640 DEBUG lxc_start: 18677-18677: - start.c:__lxc_start:1334 - Not dropping CAP_SYS_BOOT or watching utmp.
lxc 20161125170200.640 INFO lxc_cgroup: 18677-18677: - cgroups/cgroup.c:cgroup_init:68 - cgroup driver cgroupfs-ng initing for lxc-test-concurrent-0
----------> lxc 20150427012246.000 INFO lxc_monitor: 13017-18622: - monitor.c:lxc_monitor_sock_name:178 - using monitor sock name lxc/ad055575fe28ddd5//var/lib/lxc
lxc 20161125170200.662 DEBUG lxc_cgfsng: 18677-18677: - cgroups/cgfsng.c:filter_and_set_cpus:478 - No isolated cpus detected.
lxc 20161125170200.662 DEBUG lxc_cgfsng: 18677-18677: - cgroups/cgfsng.c:handle_cpuset_hierarchy:648 - "cgroup.clone_children" was already set to "1".
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This fixes a race in liblxc logging which can lead to deadlocks. The reproducer
for this issue before this is to simply compile with --enable-tests and then
run:
lxc-test-concurrent -j 20 -m create,start,stop,destroy -D
which should deadlock.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
So far, we opened a file descriptor refering to proc on the host inside the
host namespace and handed that fd to the attached process in
attach_child_main(). This was done to ensure that LSM labels were correctly
setup. However, by exploiting a potential kernel bug, ptrace could be used to
prevent the file descriptor from being closed which in turn could be used by an
unprivileged container to gain access to the host namespace. Aside from this
needing an upstream kernel fix, we should make sure that we don't pass the fd
for proc itself to the attached process. However, we cannot completely prevent
this, as the attached process needs to be able to change its apparmor profile
by writing to /proc/self/attr/exec or /proc/self/attr/current. To minimize the
attack surface, we only send the fd for /proc/self/attr/exec or
/proc/self/attr/current to the attached process. To do this we introduce a
little more IPC between the child and parent:
* IPC mechanism: (X is receiver)
* initial process intermediate attached
* X <--- send pid of
* attached proc,
* then exit
* send 0 ------------------------------------> X
* [do initialization]
* X <------------------------------------ send 1
* [add to cgroup, ...]
* send 2 ------------------------------------> X
* [set LXC_ATTACH_NO_NEW_PRIVS]
* X <------------------------------------ send 3
* [open LSM label fd]
* send 4 ------------------------------------> X
* [set LSM label]
* close socket close socket
* run program
The attached child tells the parent when it is ready to have its LSM labels set
up. The parent then opens an approriate fd for the child PID to
/proc/<pid>/attr/exec or /proc/<pid>/attr/current and sends it via SCM_RIGHTS
to the child. The child can then set its LSM laben. Both sides then close the
socket fds and the child execs the requested process.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>