Commit Graph

6489 Commits

Author SHA1 Message Date
Serge Hallyn
2b33c8bf12
Merge pull request #2062 from brauner/2017-12-25/capture_output_of_short_lived_init_process
mainloop: capture output of short-lived init procs
2017-12-30 17:27:48 -06:00
Christian Brauner
12c2798ed1
mainloop: use epoll_create1(EPOLL_CLOEXEC)
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-30 23:33:55 +01:00
Christian Brauner
a63fade55b
console: do not allow non-pty devices on open()
We don't allow non-pty devices anyway so don't let open() create unneeded
files.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-30 23:33:54 +01:00
Christian Brauner
1cc8bd4b61
start: properly cleanup mainloop
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-30 23:33:49 +01:00
Christian Brauner
22840b791d
Merge pull request #2063 from marcosps/lxcconfig_help
lxc_config: Add -h and --help flags handler
2017-12-30 21:05:41 +01:00
Marcos Paulo de Souza
f63ac53e31 lxc_config: Add -h and --help flags handler
As the other tools already handle, show usage message when -h or --help
are used.

Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
2017-12-30 16:35:52 -02:00
Christian Brauner
3c319edbb0
mainloop: capture output of short-lived init procs
The handler for the signal fd will detect when the init process of a container
has exited and cause the mainloop to close. However, this can happen before the
console handlers - or any other events for that matter - are handled. So in the
case of init exiting we still need to allow for all buffered input to the
console to be handled before exiting. This allows us to capture output from
short-lived init processes.

This is conceptually equivalent to my implementation of ExecReaderToChannel()
https://github.com/lxc/lxd/blob/master/shared/util_linux.go#L527

Closes #1694.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-26 11:47:15 +01:00
Christian Brauner
a529bc25cd
mainloop: add mainloop macros
This makes it clearer why handlers return what value.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-26 11:31:39 +01:00
Serge Hallyn
c326c1967f
Merge pull request #2058 from brauner/2017-12-22/bugfixes
start: fix death signal
2017-12-22 16:10:14 -06:00
Christian Brauner
18225d1985
start: handle setting death signal smarter
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-22 22:52:42 +01:00
Christian Brauner
912314fc9b
start: fix death signal
On set{g,u}id() the kernel does:

 	/* dumpability changes */
	if (!uid_eq(old->euid, new->euid) ||
	    !gid_eq(old->egid, new->egid) ||
	    !uid_eq(old->fsuid, new->fsuid) ||
	    !gid_eq(old->fsgid, new->fsgid) ||
	    !cred_cap_issubset(old, new)) {
		if (task->mm)
			set_dumpable(task->mm, suid_dumpable);
		task->pdeath_signal = 0;
		smp_wmb();
	}

which means we need to re-enable the deat signal after the set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-22 22:17:44 +01:00
Serge Hallyn
715584350e
Merge pull request #2057 from brauner/2017-12-22/bugfixes
start: simplify cgroup namespace preservation
2017-12-22 13:50:59 -06:00
Christian Brauner
8bf3abfbd0
start: simplify cgroup namespace preservation
Since we are now dumpable we can open /proc/<child-pid>/ns/cgroup so let's
avoid the overhead of sending around fds.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-22 17:23:46 +01:00
Christian Brauner
4b826b1fdc
start: make us dumpable
When set set{u,g}id() the kernel will make us undumpable. This is unnecessary
since we can guarantee that whatever is running inside the child process at
this point this is fully trusted by the parent. Making us dumpable let's users
use debuggers on the child process before the exec as well and also allows us
to open /proc/<child-pid> files in lieu of the child.
Note, that we only need to perform the prctl(PR_SET_DUMPABLE, ...) if our
effective uid on the host is not 0. If our effective uid on the host is 0 then
we will keep all capabilities in the child user namespace across set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-22 17:23:45 +01:00
Serge Hallyn
150901398d
Merge pull request #2042 from brauner/2017-12-15/bugfixes
start: tweaks + bugfixes
2017-12-21 16:30:11 -06:00
Serge Hallyn
da5f5e3fbb
Merge pull request #2052 from brauner/2017-12-19/unprivileged_btrfs_regression
btrfs: fix unprivileged snapshot creation
2017-12-21 16:08:18 -06:00
Christian Brauner
c3184275ec
start: log closing cmd socket and STOPPED state
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:01:04 +01:00
Christian Brauner
ac2ba69621
start: use lxc_raw_clone_cb() where possible
This way we can rely on the kernel's copy-on-write support similar to fork().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:01:04 +01:00
Christian Brauner
0c2a98bdc8
namespace: add lxc_raw_clone_cb()
This is a copy-on-write (no stack passed) variant of lxc_clone().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:01:04 +01:00
Christian Brauner
718dbb8c2a
namespace: comment lxc_{raw_}clone()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:01:04 +01:00
Christian Brauner
0059379ff4
tree-wide: s/getpid()/lxc_raw_getpid()/g
This is to avoid bad surprises caused by older glibc's pid cache (up to 2.25)
when using clone().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:01:03 +01:00
Christian Brauner
bb196a1aa0
namespace: add lxc_raw_getpid()
Because of older glibc's pid cache (up to 2.25) whenever clone() is called the
child must must retrieve it's own pid via lxc_raw_getpid().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:00:22 +01:00
Christian Brauner
b01b36e9ad
tests: expand lxc_raw_clone() tests
- test CLONE_VFORK
- test CLONE_FILES

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:00:20 +01:00
Serge Hallyn
b5b200c627
Merge pull request #2047 from brauner/2017-12-18/attach_lsm_confinement
attach: simplify significantly
2017-12-21 15:56:51 -06:00
Christian Brauner
57de839fd5
attach: handle /proc with hidepid={1,2} property
Receive fd for LSM security module before we set{g,u}id(). The reason is that
on set{g,u}id() the kernel will a) make us undumpable and b) we will change our
effective uid. This means our effective uid will be different from the
effective uid of the process that created us which means that this processs no
longer has capabilities in our namespace including CAP_SYS_PTRACE. This means
we will not be able to read and /proc/<pid> files for the process anymore when
/proc is mounted with hidepid={1,2}. So let's get the lsm label fd before the
set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 08:00:35 +01:00
Christian Brauner
a998454a2a
attach: use lxc_raw_clone()
This let's us simplify the whole file a lot and makes things way clearer. It
also let's us avoid the infamous pid cache.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 00:42:26 +01:00
Christian Brauner
94ac256fbb
attach: simplify significantly
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-20 22:17:56 +01:00
Christian Brauner
6c049d3a26
Merge pull request #2055 from marcosps/cgfsng_debug
cgfsng: Add new macro to print errors
2017-12-20 14:19:57 +01:00
Christian Brauner
d1de8ddad1
Merge pull request #2013 from 3XX0/oci-dhcp-improvements
Improve the dhclient hook for OCI compat
2017-12-20 02:48:04 +01:00
Marcos Paulo de Souza
65d78313f2 cgfsng: Add new macro to print errors
At this point, macros such DEBUG or ERROR does not take effect because
this code is called from cgroup_ops_init(cgroup.c), which runs with
__attribute__((constructor)), before any log level is set form any tool
like lxc-start, so these messages are lost.

For now on, use the same LXC_DEBUG_CGFSNG environment variable to
control these messages.

Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
2017-12-19 23:43:47 -02:00
Jonathan Calmels
9a962dc622 lxc-oci: add DHCP option leveraging dhclient hooks
Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
2017-12-19 15:18:28 -08:00
Jonathan Calmels
1689c7cf90 lxc-oci: read configuration from oci.common.conf if available
Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
2017-12-19 15:18:28 -08:00
Jonathan Calmels
bbb8e190f1 lxc-net: add LXC_DHCP_PING boolean option
Excerpt from dnsmasq(8):
By default, the DHCP server will attempt to ensure that an address in not
in use before allocating it to a host. It does this by sending an ICMP echo
request (aka "ping") to the address in question. If it gets a reply, then the
address must already be in use, and another is tried. This flag disables this check.

This is useful if one expects all the containers to get an IP address
from the LXC authoritative DHCP server and wants to speed up the process
of getting a lease.

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
2017-12-19 15:18:28 -08:00
Jonathan Calmels
84bf5645ed hooks: dhclient hook improvements
- Merge dhclient-start and dhclient-stop into a single hook.
- Wait for a lease before returning from the hook.
- Generate a logfile when LXC log level is either DEBUG or TRACE.
- Rely on namespace file descriptors for the stop hook.
- Use settings from /<sysconf>/lxc/dhclient.conf if available.
- Attempt to cleanup if dhclient fails to shutdown properly.

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
2017-12-19 15:18:28 -08:00
Christian Brauner
90f20db15f
Merge pull request #2048 from duguhaotian/master
[monitor] wrong statement of break
2017-12-19 15:09:41 +01:00
Christian Brauner
0720664d93
Merge pull request #2015 from flx42/nvidia-mount-hook
hooks: add mount hook to configure access to NVIDIA GPUs
2017-12-19 15:06:20 +01:00
Christian Brauner
92b17705d0
Merge pull request #2050 from tanyifeng/small_fix
conf.c: small fix for args of mount_entry
2017-12-19 14:24:40 +01:00
Christian Brauner
5305675314
Merge pull request #2053 from tenforward/japanese
Update Japanese lxc.container.conf(5)
2017-12-19 12:07:09 +01:00
KATOH Yasufumi
a0a4f759b2 doc: Add relative option for lxc.mount.entry to Japanese lxc.container.conf(5)
and:
* remove empty paragraph in English man
* untabify in Japanese man

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2017-12-19 20:02:46 +09:00
KATOH Yasufumi
b6feb9db85 doc: Translate the hook of network into Japanese in lxc.container.conf(5)
Update for commit 14a7b0f

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2017-12-19 20:02:37 +09:00
KATOH Yasufumi
efcbd1a05a doc: Add the description of new style hook to Japanese lxc.containers.conf(5)
Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2017-12-19 20:02:14 +09:00
KATOH Yasufumi
4eeecbdb08 doc: Add proc section to Japanese lxc.container.conf(5)
Update for commit 61d7a73

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2017-12-19 20:02:02 +09:00
KATOH Yasufumi
b45e48f097 doc: Add sysctl section to Japanese lxc.container.conf(5)
Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2017-12-19 20:01:41 +09:00
Christian Brauner
4aaf9b81e9
btrfs: fix unprivileged snapshot creation
We already fixed privileged btrfs snapshot creation in:

commit 1c7222c084
Author: Christian Brauner <christian.brauner@ubuntu.com>
Date:   Tue Nov 28 13:51:03 2017 +0100

    btrfs: fix btrfs_snapshot()

    Closes #1956.

    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Adrian Reber <areber@redhat.com>

but missed unprivileged btrfs snapshot creation. Fix it too.

Follow-up to #1956.
Closes #2051.

Reported-by: Oleg Freedhom overlayfs@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-19 11:59:52 +01:00
Yifeng Tan
d6bec4ab7b conf.c: small fix for args of mount_entry
Signed-off-by: Yifeng Tan <tanyifeng1@huawei.com>
2017-12-19 17:35:01 +08:00
独孤昊天
94bc08e9ed [monitor] wrong statement of break
if lxc_abstract_unix_connect fail and return -1,  this code never goto retry.

Signed-off-by: liuhao <liuhao27@huawei.com>
2017-12-19 16:51:35 +08:00
Felix Abecassis
58e29e9bf1 hooks: add mount hook to configure access to NVIDIA GPUs
This hook requires the nvidia-container-cli tool provided by libnvidia-container:
https://github.com/nvidia/libnvidia-container

For containers that do not have CUDA_VERSION or NVIDIA_VISIBLE_DEVICES
set in the environment, the hook will be a no-op.

To enable in the configuration file:
lxc.hook.mount = /usr/local/share/lxc/hooks/nvidia

Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
2017-12-18 16:17:23 -08:00
Serge Hallyn
9668d2cd15
Merge pull request #2049 from brauner/2017-12-18/start_reap_attacher_process
start: reap intermediate process
2017-12-18 10:49:50 -06:00
Christian Brauner
4e23246652
start: reap intermediate process
When we inherit namespaces we need to reap the attaching process.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-18 14:08:54 +01:00
Christian Brauner
9aff2c83e4
Merge pull request #2031 from tanyifeng/mask_and_readonly_path
conf.c: add relative option for lxc.mount.entry
2017-12-18 12:12:59 +01:00