Commit Graph

6330 Commits

Author SHA1 Message Date
Christian Brauner
18225d1985
start: handle setting death signal smarter
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-22 22:52:42 +01:00
Christian Brauner
912314fc9b
start: fix death signal
On set{g,u}id() the kernel does:

 	/* dumpability changes */
	if (!uid_eq(old->euid, new->euid) ||
	    !gid_eq(old->egid, new->egid) ||
	    !uid_eq(old->fsuid, new->fsuid) ||
	    !gid_eq(old->fsgid, new->fsgid) ||
	    !cred_cap_issubset(old, new)) {
		if (task->mm)
			set_dumpable(task->mm, suid_dumpable);
		task->pdeath_signal = 0;
		smp_wmb();
	}

which means we need to re-enable the deat signal after the set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-22 22:17:44 +01:00
Serge Hallyn
715584350e
Merge pull request #2057 from brauner/2017-12-22/bugfixes
start: simplify cgroup namespace preservation
2017-12-22 13:50:59 -06:00
Christian Brauner
8bf3abfbd0
start: simplify cgroup namespace preservation
Since we are now dumpable we can open /proc/<child-pid>/ns/cgroup so let's
avoid the overhead of sending around fds.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-22 17:23:46 +01:00
Christian Brauner
4b826b1fdc
start: make us dumpable
When set set{u,g}id() the kernel will make us undumpable. This is unnecessary
since we can guarantee that whatever is running inside the child process at
this point this is fully trusted by the parent. Making us dumpable let's users
use debuggers on the child process before the exec as well and also allows us
to open /proc/<child-pid> files in lieu of the child.
Note, that we only need to perform the prctl(PR_SET_DUMPABLE, ...) if our
effective uid on the host is not 0. If our effective uid on the host is 0 then
we will keep all capabilities in the child user namespace across set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-22 17:23:45 +01:00
Serge Hallyn
150901398d
Merge pull request #2042 from brauner/2017-12-15/bugfixes
start: tweaks + bugfixes
2017-12-21 16:30:11 -06:00
Serge Hallyn
da5f5e3fbb
Merge pull request #2052 from brauner/2017-12-19/unprivileged_btrfs_regression
btrfs: fix unprivileged snapshot creation
2017-12-21 16:08:18 -06:00
Christian Brauner
c3184275ec
start: log closing cmd socket and STOPPED state
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:01:04 +01:00
Christian Brauner
ac2ba69621
start: use lxc_raw_clone_cb() where possible
This way we can rely on the kernel's copy-on-write support similar to fork().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:01:04 +01:00
Christian Brauner
0c2a98bdc8
namespace: add lxc_raw_clone_cb()
This is a copy-on-write (no stack passed) variant of lxc_clone().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:01:04 +01:00
Christian Brauner
718dbb8c2a
namespace: comment lxc_{raw_}clone()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:01:04 +01:00
Christian Brauner
0059379ff4
tree-wide: s/getpid()/lxc_raw_getpid()/g
This is to avoid bad surprises caused by older glibc's pid cache (up to 2.25)
when using clone().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:01:03 +01:00
Christian Brauner
bb196a1aa0
namespace: add lxc_raw_getpid()
Because of older glibc's pid cache (up to 2.25) whenever clone() is called the
child must must retrieve it's own pid via lxc_raw_getpid().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:00:22 +01:00
Christian Brauner
b01b36e9ad
tests: expand lxc_raw_clone() tests
- test CLONE_VFORK
- test CLONE_FILES

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 23:00:20 +01:00
Serge Hallyn
b5b200c627
Merge pull request #2047 from brauner/2017-12-18/attach_lsm_confinement
attach: simplify significantly
2017-12-21 15:56:51 -06:00
Christian Brauner
57de839fd5
attach: handle /proc with hidepid={1,2} property
Receive fd for LSM security module before we set{g,u}id(). The reason is that
on set{g,u}id() the kernel will a) make us undumpable and b) we will change our
effective uid. This means our effective uid will be different from the
effective uid of the process that created us which means that this processs no
longer has capabilities in our namespace including CAP_SYS_PTRACE. This means
we will not be able to read and /proc/<pid> files for the process anymore when
/proc is mounted with hidepid={1,2}. So let's get the lsm label fd before the
set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 08:00:35 +01:00
Christian Brauner
a998454a2a
attach: use lxc_raw_clone()
This let's us simplify the whole file a lot and makes things way clearer. It
also let's us avoid the infamous pid cache.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-21 00:42:26 +01:00
Christian Brauner
94ac256fbb
attach: simplify significantly
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-20 22:17:56 +01:00
Christian Brauner
6c049d3a26
Merge pull request #2055 from marcosps/cgfsng_debug
cgfsng: Add new macro to print errors
2017-12-20 14:19:57 +01:00
Christian Brauner
d1de8ddad1
Merge pull request #2013 from 3XX0/oci-dhcp-improvements
Improve the dhclient hook for OCI compat
2017-12-20 02:48:04 +01:00
Marcos Paulo de Souza
65d78313f2 cgfsng: Add new macro to print errors
At this point, macros such DEBUG or ERROR does not take effect because
this code is called from cgroup_ops_init(cgroup.c), which runs with
__attribute__((constructor)), before any log level is set form any tool
like lxc-start, so these messages are lost.

For now on, use the same LXC_DEBUG_CGFSNG environment variable to
control these messages.

Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
2017-12-19 23:43:47 -02:00
Jonathan Calmels
9a962dc622 lxc-oci: add DHCP option leveraging dhclient hooks
Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
2017-12-19 15:18:28 -08:00
Jonathan Calmels
1689c7cf90 lxc-oci: read configuration from oci.common.conf if available
Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
2017-12-19 15:18:28 -08:00
Jonathan Calmels
bbb8e190f1 lxc-net: add LXC_DHCP_PING boolean option
Excerpt from dnsmasq(8):
By default, the DHCP server will attempt to ensure that an address in not
in use before allocating it to a host. It does this by sending an ICMP echo
request (aka "ping") to the address in question. If it gets a reply, then the
address must already be in use, and another is tried. This flag disables this check.

This is useful if one expects all the containers to get an IP address
from the LXC authoritative DHCP server and wants to speed up the process
of getting a lease.

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
2017-12-19 15:18:28 -08:00
Jonathan Calmels
84bf5645ed hooks: dhclient hook improvements
- Merge dhclient-start and dhclient-stop into a single hook.
- Wait for a lease before returning from the hook.
- Generate a logfile when LXC log level is either DEBUG or TRACE.
- Rely on namespace file descriptors for the stop hook.
- Use settings from /<sysconf>/lxc/dhclient.conf if available.
- Attempt to cleanup if dhclient fails to shutdown properly.

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
2017-12-19 15:18:28 -08:00
Christian Brauner
90f20db15f
Merge pull request #2048 from duguhaotian/master
[monitor] wrong statement of break
2017-12-19 15:09:41 +01:00
Christian Brauner
0720664d93
Merge pull request #2015 from flx42/nvidia-mount-hook
hooks: add mount hook to configure access to NVIDIA GPUs
2017-12-19 15:06:20 +01:00
Christian Brauner
92b17705d0
Merge pull request #2050 from tanyifeng/small_fix
conf.c: small fix for args of mount_entry
2017-12-19 14:24:40 +01:00
Christian Brauner
5305675314
Merge pull request #2053 from tenforward/japanese
Update Japanese lxc.container.conf(5)
2017-12-19 12:07:09 +01:00
KATOH Yasufumi
a0a4f759b2 doc: Add relative option for lxc.mount.entry to Japanese lxc.container.conf(5)
and:
* remove empty paragraph in English man
* untabify in Japanese man

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2017-12-19 20:02:46 +09:00
KATOH Yasufumi
b6feb9db85 doc: Translate the hook of network into Japanese in lxc.container.conf(5)
Update for commit 14a7b0f

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2017-12-19 20:02:37 +09:00
KATOH Yasufumi
efcbd1a05a doc: Add the description of new style hook to Japanese lxc.containers.conf(5)
Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2017-12-19 20:02:14 +09:00
KATOH Yasufumi
4eeecbdb08 doc: Add proc section to Japanese lxc.container.conf(5)
Update for commit 61d7a73

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2017-12-19 20:02:02 +09:00
KATOH Yasufumi
b45e48f097 doc: Add sysctl section to Japanese lxc.container.conf(5)
Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2017-12-19 20:01:41 +09:00
Christian Brauner
4aaf9b81e9
btrfs: fix unprivileged snapshot creation
We already fixed privileged btrfs snapshot creation in:

commit 1c7222c084
Author: Christian Brauner <christian.brauner@ubuntu.com>
Date:   Tue Nov 28 13:51:03 2017 +0100

    btrfs: fix btrfs_snapshot()

    Closes #1956.

    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Adrian Reber <areber@redhat.com>

but missed unprivileged btrfs snapshot creation. Fix it too.

Follow-up to #1956.
Closes #2051.

Reported-by: Oleg Freedhom overlayfs@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-19 11:59:52 +01:00
Yifeng Tan
d6bec4ab7b conf.c: small fix for args of mount_entry
Signed-off-by: Yifeng Tan <tanyifeng1@huawei.com>
2017-12-19 17:35:01 +08:00
独孤昊天
94bc08e9ed [monitor] wrong statement of break
if lxc_abstract_unix_connect fail and return -1,  this code never goto retry.

Signed-off-by: liuhao <liuhao27@huawei.com>
2017-12-19 16:51:35 +08:00
Felix Abecassis
58e29e9bf1 hooks: add mount hook to configure access to NVIDIA GPUs
This hook requires the nvidia-container-cli tool provided by libnvidia-container:
https://github.com/nvidia/libnvidia-container

For containers that do not have CUDA_VERSION or NVIDIA_VISIBLE_DEVICES
set in the environment, the hook will be a no-op.

To enable in the configuration file:
lxc.hook.mount = /usr/local/share/lxc/hooks/nvidia

Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
2017-12-18 16:17:23 -08:00
Serge Hallyn
9668d2cd15
Merge pull request #2049 from brauner/2017-12-18/start_reap_attacher_process
start: reap intermediate process
2017-12-18 10:49:50 -06:00
Christian Brauner
4e23246652
start: reap intermediate process
When we inherit namespaces we need to reap the attaching process.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-18 14:08:54 +01:00
Christian Brauner
9aff2c83e4
Merge pull request #2031 from tanyifeng/mask_and_readonly_path
conf.c: add relative option for lxc.mount.entry
2017-12-18 12:12:59 +01:00
Yifeng Tan
181437fd53 conf.c: add relative option for lxc.mount.entry
Signed-off-by: Yifeng Tan <tanyifeng1@huawei.com>
2017-12-19 01:07:46 +08:00
Christian Brauner
72c94ff968
tools: add UNPRIVILEGED field in fancy output mode
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-16 13:10:47 +01:00
Serge Hallyn
e44465303c
Merge pull request #2040 from brauner/2017-12-14/bugfixes
lxc_init: fix cgroup parsing
2017-12-14 20:10:39 -06:00
Serge Hallyn
f76d0ecb47
Merge pull request #2034 from brauner/2017-12-14/use_clone_in_run_command
utils: use lxc_raw_clone() in run_command()
2017-12-14 16:29:04 -06:00
Christian Brauner
1933b53f59
lxc_init: fix cgroup parsing
coverity: #1426132
coverity: #1426133

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-14 23:28:53 +01:00
Christian Brauner
f4bdebfd8e
tools: add missing break to lxc-execute
coverity: #1426131

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-14 23:28:44 +01:00
Serge Hallyn
389c46753b
Merge pull request #2039 from brauner/2017-12-14/fix_command_socket_race
commands: fix race when open()/close() cmd socket
2017-12-14 15:56:24 -06:00
Christian Brauner
2d728b2fd6
utils: use lxc_raw_clone() in run_command()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-14 22:18:28 +01:00
Christian Brauner
8ab93249a0
namespace: add lxc_raw_clone()
This is based on raw_clone in systemd but adapted to our needs. The main reason
is that we need an implementation of fork()/clone() that does guarantee us that
no pthread_atfork() handlers are run. While clone() in glibc currently doesn't
run pthread_atfork() handlers we should be fine but there's no guarantee that
this won't be the case in the future. So let's do the syscall directly - or as
direct as we can. An additional nice feature is that we get fork() behavior,
i.e. lxc_raw_clone() returns 0 in the child and the child pid in the parent.

Our implementation tries to make sure that we cover all cases according to
kernel sources. Note that we are not interested in any arguments that could be
passed after the stack.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-12-14 22:18:28 +01:00