When the running kernel supports cgroup namespaces and users want to manually
set up cgroups via lxc.hook.mount before the init binary starts the cgroup
namespace needs to be already unshared. Otherwise the view on the cgroup mounts
is wrong. This commit places the call to lxc_setup() after the
LXC_SYNC_POST_CGROUP barrier.
Before this commit, the tty fds we allocate from a fresh devpts instance in the
container's namespaces before the init binary starts were referring to the
host's cgroup namespace since lxc_setup() was called before
unshare(CLONE_NEWCGROUP). Although not a security risk at this point since
setns() restricts its calls to /proc/<self>/ns files it's still better to do it
*after* the cgroup namespace has been unshared.
Adding a Suggested-by line for the lxc.mount.hook fix for Quentin.
Closes#1597.
Suggested-by: Quentin Dufour <quentin@dufour.tk>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
In the case the container has a console with a valid slave pty file descriptor
we duplicate std{in,out,err} to the slave file descriptor so console logging
works correctly. When the container does not have a valid slave pty file
descriptor for its console and is started daemonized we should dup to
/dev/null.
Closes#1646.
Signed-off-by: Li Feng <lifeng68@huawei.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This adds a little more flexibility to the state server. The idea is to have a
command socket function "lxc_cmd_add_state_client()" whose only task is to add
a new state client to the container's in-memory handler. This function returns
either the state of the container if it is already in the requested state or it
will return the newly registered client's fd in one of its arguments to the
caller. We then provide a separate helper function "lxc_cmd_sock_rcv_state()"
which can be passed the returned client fd and listens on the fd for the
requested state.
This is useful when we want to first register a client, then send a signal to
the container and wait for a state. This ensure that the client fd is
registered before the signal can have any effect and can e.g. be used to catch
something like the "STOPPING" state that is very ephemeral.
Additionally we provide a convenience function "lxc_cmd_sock_get_state()" which
combines both tasks and is used in e.g. "lxc_wait()".
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Add a test to see if we can start daemonized containers that have a very
short-lived init process. The point of this is to see whether we can correctly
retrieve the state.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Since we killed lxc-monitord we rely on the container's command socket to wait
for the container. This doesn't work nicely on daemonized startup since a
container's init process might be something that is so short-lived that we
won't even be able to add a state client before the mainloop closes. But the
container might still have been RUNNING and executed the init binary correctly.
In this case we would erroneously report that the container failed to start
when it actually started just fine.
This commit ensures that we really all cases where the container successfully
ran by switching to a short-lived per-container anonymous unix socket pair that
uses credentials to pass container states around. It is immediately closed once
the container has started successfully.
This should also make daemonized container start way more robust since we don't
rely on the command socket handler to be running.
For the experienced developer: Yes, I did think about utilizing the command
socket directly for this. The problem is that when the mainloop starts it may
end up end accept()ing the connection that we want
do_wait_on_daemonized_start() to accept() so this won't work and might cause us
to hang indefinitely. The same problem arises when the container fails to start
before the mainloop is created. In this case we would hang indefinitely as
well.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>