Commit Graph

10680 Commits

Author SHA1 Message Date
Christian Brauner
e3e69becce
af_unix: report error when no fd is to be sent
Fixes: #3624
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-15 21:55:08 +02:00
Christian Brauner
0b9a29b541
terminal: log TIOCGPTPEER failure less alarmingly
This is not a fatal error and the fallback codepath is equally safe.
When we use TIOCGPTPEER we're using a stashed fd to the container's
devpts mount's ptmx device and allocating a new fd non-path based
through this ioctl. If this ioctl can't be used we're falling back to
allocating a pts device from the host's devpts mount's ptmx device which
is path-based but is not under control of the container and so that's
safe. The difference is just that the first method gets you a nice
native terminal with all the pleasantries of having tty and friends
working whereas the latter method does not.

Fixes: #3625
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-15 21:55:06 +02:00
Christian Brauner
401e36705c
sync: fix log message
Fixes: #3875
Suggested-by: Hank.shi <shk242673@163.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-15 21:55:04 +02:00
Christian Brauner
c18430b001
start: fix logging message
Fixes: #3875
Suggested-by: Hank.shi <shk242673@163.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-15 21:55:00 +02:00
Christian Brauner
115c823151
initutils: include pthread.h
Otherwise we might end up with implicit function declaration warnings.

Link: https://jenkins.linuxcontainers.org/job/lxc-build-android/8915/console
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-15 18:13:37 +02:00
Serge Hallyn
7b784065a9
doc/common_options: add trace and alert loglevels
Signed-off-by: Serge Hallyn <serge@hallyn.com>
2021-07-15 18:13:32 +02:00
Christian Brauner
946e8385da
file_utils: surface ENOENT when falling back to openat()
Link: https://discuss.linuxcontainers.org/t/error-failed-to-retrieve-pid-of-executing-child-process
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-12 17:17:53 +02:00
Christian Brauner
2d7e6a7f0b
lxc_unshare: fix network device handling
We were passing the wrong PID. Fix this!

Link: https://discuss.linuxcontainers.org/t/problem-with-moving-interface-new-network-namespace-in-lxc-unshare
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-12 17:17:52 +02:00
Christian Brauner
37324c231c
lxc_unshare: make mount table private
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-12 17:17:51 +02:00
Wolfgang Bumiller
37f188d9ac
confile: allow including nonexisting directories
If an include directive ends with a trailing slash, we now
always assume it is a directory and do not treat the
non-existence as an error.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2021-07-12 17:17:50 +02:00
Wolfgang Bumiller
76bdf15acd
conf: userns.conf: include userns.conf.d
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2021-07-12 17:17:48 +02:00
KATOH Yasufumi
c0152679f1
doc: Fix typo in English lxc.container.conf(5)
Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2021-07-12 17:17:45 +02:00
KATOH Yasufumi
0d2a619d1c
doc: Add new idmap= option to Japanese lxc.container.conf(5)
Update for commit 1852be9048

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2021-07-12 17:17:43 +02:00
KATOH Yasufumi
d7d93fb104
doc: Append description of net type field
Update for commit 320061b34f

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2021-07-12 17:17:41 +02:00
KATOH Yasufumi
a14a6e9092
doc: Add eBPF-based device controller semantics to Japanese man page
Update for commit 5025f3a690

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
2021-07-12 17:17:37 +02:00
Christian Brauner
01dd32bf95
cmd/lxc-checkconfig: list cgroup namespaces and rename confusing ns_cgroup entry
Link: https://discuss.linuxcontainers.org/t/cgroup-namespace-required-in-lxc-checkconfig-and-config-cgroup-ns
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-01 17:13:56 +02:00
Christian Brauner
49f1fbec16
terminal: ensure newlines are turned into newlines+carriage return for terminal output
Fixes: #3879
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-01 17:13:55 +02:00
Christian Brauner
ff4b545f5e
cgroups: handle funky cgroup layouts
Old versions of Docker emulate a cgroup namespace by bind-mounting the
container's cgroup over the corresponding controller:

/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,xattr,name=systemd
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,net_cls,net_prio
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,cpu,cpuacct
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,memory
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,devices
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,hugetlb
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime master:20 - cgroup cgroup rw,perf_event
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime master:21 - cgroup cgroup rw,cpuset
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime master:22 - cgroup cgroup rw,blkio
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime master:23 - cgroup cgroup rw,pids
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime master:24 - cgroup cgroup rw,freezer

New versions of LXC always stash a file descriptor for the root of the
cgroup mount at /sys/fs/cgroup and then resolve the current cgroup
parsed from /proc/{1,self}/cgroup relative to that file descriptor. This
doesn't work when the caller's cgroup is mouned over the controllers.
Older versions of LXC simply counted such layouts as having no cgroups
available for delegation at all and moved on provided no cgroup limits
were requested. But mainline LXC would fail such layouts. While I would
argue that failing such layouts is the semantically clean approach we
shouldn't regress users so make mainline LXC treat such cgroup layouts
as having no cgroups available for delegation.

Fixes: #3890
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-01 17:13:53 +02:00
Christian Brauner
d50378b422
tests: add tests for read-only /sys with read-write /sys/devices/virtual/net
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-01 17:13:51 +02:00
Christian Brauner
e250f278bb
conf: improve read-only /sys with read-write /sys/devices/virtual/net
Some tools require /sys/devices/virtual/net to be read-write. At the
same time we want all other parts of /sys to be read-only. To do this we
created a layout where we hade a read-only instance of sysfs mounted on
top of a read-write instance of sysfs:

`-/sys                                  sysfs                                                        sysfs      rw,nosuid,nodev,noexec,relatime
  `-/sys                                sysfs                                                        sysfs      ro,nosuid,nodev,noexec,relatime
    |-/sys/devices/virtual/net          sysfs                                                        sysfs      rw,relatime
    | `-/sys/devices/virtual/net        sysfs[/devices/virtual/net]                                  sysfs      rw,nosuid,nodev,noexec,relatime

This causes issues for systemd services that create a separate mount
namespace as they get confused to what mount options need to be
respected.

Simplify our mounting logic so we end up with a single read-only mount
of sysfs on /sys and a read-write bind-mount of /sys/devices/virtual/net:

├─/sys                                sysfs                                                                                  sysfs         ro,nosuid,nodev,noexec,relatime
│ ├─/sys/devices/virtual/net          sysfs[/devices/virtual/net]                                                            sysfs         rw,nosuid,nodev,noexec,relatime

Link: systemd/systemd#20032
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-01 17:13:49 +02:00
Simon Deziel
c73a232555
initutils: close dirfd in error path
Signed-off-by: Simon Deziel <simon.deziel@canonical.com>
2021-07-01 17:13:48 +02:00
Christian Brauner
0a9531960a
execute: ensure parent is notified about child exec and close all unneeded fds
lxc_container_init() creates the container payload process as it's child
so lxc_container_init() itself never really exits and thus the parent
isn't notified about the child exec'ing since the sync file descriptor
is never closed. Make sure it's closed to notify the parent about the
child's exec.

In addition we're currently leaking all file descriptors associated with
the handler into the stub init. Make sure that all file descriptors
other than stderr are closed.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-01 17:13:46 +02:00
Christian Brauner
0089d71762
network: log network devices while sending
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-01 17:13:45 +02:00
Christian Brauner
91ee6c8bfe
initutils: use vfork() in lxc_container_init()
We can let the child finish calling exec before continuing in the
parent.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-07-01 17:13:43 +02:00
Tycho Andersen
a429519676
execute: don't exec init, call it
Instead of having a statically linked init that we put on the host fs
somewhere via packaging, have to either bind mount in or detect fexecve()
functionality, let's just call it as a library function. This way we don't
have to do any of that.

This also fixes up a bunch of conditions from:

if (quiet)
    fprintf(stderr, "log message");

to

if (!quiet)
    fprintf(stderr, "log message");

:)

and it drops all the code for fexecve() detection and bind mounting our
init in, since we no longer need any of that.

A couple other thoughts:

* I left the lxc-init binary in since we ship it, so someone could be using
  it outside of the internal uses.
* There are lots of unused arguments to lxc-init (including presumably
  --quiet, since nobody noticed the above); those may be part of the API
  though and so we don't want to drop them.

Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
2021-07-01 17:13:40 +02:00
Tomasz Blaszczak
7cb9565c7f
When an item is added to an array, then the array is realloc()ed (to size+1),
and the item is copied (strdup()) to the array.
Thus, when an item is removed from an array, memory allocated for that item
should be freed, successive items should be left-shifted and the array
realloc()ed again (size-1).

Additional changes:
- If strdup() fails in add_to_array(), then an array should be
  realloc()ed again to original size.
- Initialize an array in list_all_containers().

Signed-off-by: Tomasz Blaszczak <tomasz.blaszczak@consult.red>
2021-06-29 09:39:01 +02:00
Christian Brauner
13facae3d7
cgroups: verify that hierarchies are non-empty
Fixes: #3881
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-29 09:38:59 +02:00
Stéphane Graber
3efa0cf345
lxc-download: Switch GPG server
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2021-06-29 09:38:58 +02:00
Tomasz Blaszczak
7c2e8e16be
Resize array in remove_from_array() and fix a crash
When an item is added to an array, then the array is realloc()ed (to size+1),
and the item is copied (strdup()) to the array.
Thus, when an item is removed from an array, allocated memory pointed by
the item (not the item itself) should be freed, successive items should
be left-shifted and the array realloc()ed again (size-1).

Additional changes:
- Initialize an array in list_all_containers().

Signed-off-by: Tomasz Blaszczak <tomasz.blaszczak@consult.red>
2021-06-29 09:38:57 +02:00
Tomasz Blaszczak
dfdf49268e
When an item is added to an array, then the array is realloc()ed (to size+1),
and the item is copied (strdup()) to the array.
Thus, when an item is removed from an array, memory allocated for that item
should be freed, successive items should be left-shifted and the array
realloc()ed again (size-1).

Additional changes:
- If strdup() fails in add_to_array(), then an array should be
  realloc()ed again to original size.
- Initialize an array in list_all_containers().

Signed-off-by: Tomasz Blaszczak <tomasz.blaszczak@consult.red>
2021-06-29 09:38:56 +02:00
Christian Brauner
b88eabf0bc
cgroups: use stable ordering for co-mounted v1 controllers
Fixes: #3703
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-29 09:38:52 +02:00
Christian Brauner
b9bf2e27fe
tree-wide: replace problematic terminology
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-29 09:38:51 +02:00
Christian Brauner
111277a543
tree-wide: replace problematic terminology
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-29 09:38:50 +02:00
Christian Brauner
53dfebff46
tree-wide: replace problematic terminology
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-29 09:38:49 +02:00
Christian Brauner
b519ca6e78
tree-wide: remove problematic terminology
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-29 09:38:48 +02:00
Christian Brauner
4d376816e3
seccomp: replace problematic terminology
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-29 09:38:48 +02:00
Christian Brauner
fd8222a011
common.conf: replace problematic terminology
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-29 09:38:45 +02:00
Ruben Jenster
ae3fb5405b
Add support for LISTEN_FDS environment variable.
The LISTEN_FDS environment variable defines the number of
file descriptors that should be inherited by the container,
in addition to stdio.
The LISTEN_FDS environment variable is defined in the OCI spec
and used to support socket activation.

Refs #3845

Signed-off-by: Ruben Jenster <r.jenster@drachenfels.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-14 14:23:16 +02:00
LiFeng
176a249d21
string utils: Make sure don't return uninitialized memory.
The function lxc_string_split_quoted and lxc_string_split_and_trim use
realloc to reduce the memory. But the result may be NULL, the the
returned memory will be uninitialized

Signed-off-by: LiFeng <lifeng68@huawei.com>
2021-06-14 14:23:12 +02:00
Christian Brauner
19b18b6970
confile: backport lxc.init.groups config key
This is needed for lxcri.

Fixes: #3862
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-14 12:10:46 +02:00
Christian Brauner
b781fb3a31
api_extensions: introduce idmapped_mounts_v2 api extension
This indicates that LXC supports idmapping the rootfs and
idmapped lxc.mount.entry entries.

Link: https://github.com/lxc/lxd/issues/8870
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-14 12:02:28 +02:00
Christian Brauner
8c89dd0cfd
tools/lxc_autostart: fix failed count
Don't include skipped containers in the failed count.

Fixes: #3857
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-14 12:02:22 +02:00
Christian Brauner
ef68581ce4
lsm/apparmor: actually report an error when we fail to wire AppArmor profile
Link: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1931064
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-08 14:49:08 +02:00
Christian Brauner
88a5ffc936
lxc: add lpthread to lxc.pc
Fixes: #3853
Suggested-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-06-08 14:49:05 +02:00
Pablo Correa Gómez
71def1ad00
Update lxc-net to support nftables
Closes #3093
Closes #3602

Add support for nftables firewall rules if `nft` command line
interface is available in the system

Signed-off-by: Pablo Correa Gómez <ablocorrea@hotmail.com>
2021-06-08 14:49:01 +02:00
Christian Brauner
d41b0293f5
network: please broken compilers
Some users report that compilation fails because of reports that this
variable can be used uninitialized. Initialize it to silence the
compiler.

Fixes: https://github.com/lxc/lxc/issues/3850
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-05-28 12:48:39 +02:00
Stéphane Graber
7cf81ec6f1
README: Update IRC
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2021-05-28 12:48:38 +02:00
Christian Brauner
93be53b39d
start: simplify startup synchronization
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-05-28 12:48:37 +02:00
Christian Brauner
5b5c4e0c9c
start: reorder START_SYNC_POST_CONFIGURE
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-05-28 12:48:36 +02:00
Christian Brauner
837f9fe51e
start: use barrier instead of wake/wait pair
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-05-28 12:48:35 +02:00