Commit Graph

11513 Commits

Author SHA1 Message Date
Stéphane Graber
cb8e38aca2
Release LXC 5.0.3
Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
2023-07-25 18:00:11 -04:00
Stéphane Graber
85d3f4b1df
github: Update for main branch
Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
2023-07-25 12:29:23 -04:00
Serge Hallyn
d195603e3f
CONTRIBUTING: add a note on AI generated code
Signed-off-by: Serge Hallyn <shallyn@cisco.com>
2023-07-25 12:29:19 -04:00
Serge Hallyn
54227bdb15
get_hierarchy: dont WARN about no usable controller
If I start a container with loglevel WARN, and (on a pretty
stock ubuntu) do lxc-info -n $c, I get

lxc-start media 20230706233337.765 WARN     cgfsng - cgroups/cgfsng.c:get_hierarchy:142 - There is no useable cpuacct controller
lxc-start media 20230706233337.765 WARN     cgfsng - cgroups/cgfsng.c:get_hierarchy:142 - There is no useable blkio controller

I don't think that's worth WARNing about, so change it to
INFO.

Signed-off-by: Serge Hallyn <shallyn@cisco.com>
2023-07-25 12:29:15 -04:00
Stéphane Graber
be7efff356
github: Add DCO/target tests
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2023-07-25 12:28:49 -04:00
Anatolii Gryzlov
8751cd2085
explicitly convert *mainloop_handler to __u64
GCC treats such conversion as warning, while Clang-15 aborts compilation

Signed-off-by: Anatolii Gryzlov <agryzlov.mosbrew@gmail.com>
2023-07-25 12:28:45 -04:00
Magali Lemes
c16bb5b71e
tests: fix parse_config_file seccomp test
Link: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1980218
Check if seccomp is enabled before throwing error.

Signed-off-by: Magali Lemes <magali.lemes@canonical.com>
2023-07-25 12:28:43 -04:00
Stéphane Graber
95ef57c73b
src/tests: Fix container creation errors
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2023-07-25 12:28:41 -04:00
Serge Hallyn
30c79f8a7d
rename functions which clash with libsystemd's
If statically linking against both liblxc and libsystemd, some
function names conflict:

mkdir_p fd_cloexec path_simplify is_dir is_fs_type

Rename those to lxc_\0, as:

for sym in mkdir_p fd_cloexec path_simplify is_dir is_fs_type; do
	git grep "$sym" | awk -F: '{ print $1 }' | sort | uniq | xargs sed -i "s/$sym/lxc_$sym/g"
done

(the above loop wrongly replaces is_dir in meson.build, but
c'est la vie)

Signed-off-by: Serge Hallyn <shallyn@cisco.com>
2023-07-25 12:28:34 -04:00
Alexander Mikhalitsyn
3801a6a3dd
mainloop: io_uring: disable IORING_POLL_ADD_MULTI
Let's disable IORING_POLL_ADD_MULTI to workaround an issue
with false-positive POLLIN events in CQ.

In my local setup I managed to fix an issue without this
by making terminal FDs non-blocking, but during full
testsuite execution in Jenkins it was found that issue
still persists. So, let's add this ugly workaround too.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2023-07-25 12:28:32 -04:00
Alexander Mikhalitsyn
fba0ae0717
terminal: make a terminal FDs non-blocking
Let's prevent freezes on read(2) by making a terminal FDs non-blocking.

It was discovered that there is an issue with io_uring mainloop when
multishot poll (IORING_POLL_ADD_MULTI) mode is enabled. Sometimes
false-positive poll events are put into a CQ. It makes further read(2)
stuck forever and blocks all mainloop processing for an infinite time.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2023-07-25 12:28:29 -04:00
Alexander Mikhalitsyn
1af412d2f9
file_utils: add fd_make_nonblocking helper
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2023-07-25 12:28:27 -04:00
Alexander Mikhalitsyn
eaaf041f68
file_utils: rename fd_make_nonblocking to fd_make_blocking
Currently, fd_make_nonblocking does exactly the opposite thing,
it clears O_NONBLOCK flag and makes fd blocking.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2023-07-25 12:28:02 -04:00
Solar Designer
d05fb8a453
setproctitle(): Handle potential NULL return from strrchr()
Signed-off-by: Solar Designer <solar@openwall.com>
2023-07-25 12:28:00 -04:00
Tycho Andersen
709d42691d
make setproctitle()'s /proc/pid/stat parsing safe
it turns out that our parsing of /proc/pid/stat was not safe in general
(though probably safe for lxc, since our executable names do not contain
spaces).

Let's fix this by looking backwards through the file for ), and then
continuing on from there.

This was reported to me by Solar Designer, who pointed me to this thread:
https://twitter.com/solardiz/status/1634204168545001473

Indeed, this is a lot of tap dancing to work around the kernel's 16
character executable limit. Perhaps I'll send a kernel patch to raise that
limit next.

Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
2023-07-25 12:27:58 -04:00
Serge Hallyn
b183d437b4
console-log test: make sure container is stopped before restarting
Closes #4237

Signed-off-by: Serge Hallyn <serge@hallyn.com>
2023-07-25 12:27:55 -04:00
Alexander Mikhalitsyn
d638d5951b
tree-wide: convert fcntl(FD_CLOEXEC) to SOCK_CLOEXEC
- replace accept() + fcntl(FD_CLOEXEC) with accept4(..., SOCK_CLOEXEC)
- remove fcntl(FD_CLOEXEC) in lxc_server_init() as we already set
SOCK_CLOEXEC in lxc_abstract_unix_open().

See also: ad9429e52 ("tree-wide: make socket SOCK_CLOEXEC")
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2023-07-25 12:27:39 -04:00
Scott Moser
c12c0acb04
Allow fuse mounts in apparmor start-container.
Unprivledged user should be able to do fuse mounts during start-container.
Specifically this solves the problem for un-priv fuse mounting via
pre-hook.

Signed-off-by: Scott Moser <smoser@brickies.net>
2023-07-25 12:27:37 -04:00
Scott Moser
c93418d985
Add support for squashfs images in oci via atomfs
This adds support to the oci template for squashfs images.
It uses 'atomfs' from [1] to accomplish this.

Squashfs images (media type
application/vnd.stacker.image.layer.squashfs+zstd+verity) have several
benefits compared to tar+gz:

 * immediately mountable
 * read-only filesystem
 * verity data present in oci manifest.

I presented this at Fosdem 2023 at [2].

The 'atomfs' program can be replaced by passing '--mount-helper'
argument to the oci template.

    mount-helper mount oci:<oci_dir>:<oci_name> <mountpoint>
    mount-helper umount <mountpoint>

[1] https://github.com/project-machine/atomfs
[2] https://fosdem.org/2023/schedule/event/container_secure_storage/

Signed-off-by: Scott Moser <smoser@brickies.net>
2023-07-25 12:27:34 -04:00
Wolfgang Bumiller
3754e803fd
apparmor: don't try to mmap empty files
In case empty profile files linger somehow (eg. powerloss or
oom killer etc. between creating and writing the file) we
tried to use mmap() with a length of 0 which is invalid.
Let's treat this as if it did not exist.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2023-07-25 12:27:25 -04:00
Alexander Mikhalitsyn
706ee25cda
initutils: use PRIu64 for uint64_t in setproctitle
Kernel UAPI provides as with the following declaration:
/*
 * This structure provides new memory descriptor
 * map which mostly modifies /proc/pid/stat[m]
 * output for a task. This mostly done in a
 * sake of checkpoint/restore functionality.
 */
struct prctl_mm_map {
	__u64	start_code;		/* code section bounds */
	__u64	end_code;
	__u64	start_data;		/* data section bounds */
	__u64	end_data;
	__u64	start_brk;		/* heap for brk() syscall */
	__u64	brk;
	__u64	start_stack;		/* stack starts at */
	__u64	arg_start;		/* command line arguments bounds */
	__u64	arg_end;
	__u64	env_start;		/* environment variables bounds */
	__u64	env_end;
	__u64	*auxv;			/* auxiliary vector */
	__u32	auxv_size;		/* vector size */
	__u32	exe_fd;			/* /proc/$pid/exe link file */
};

Let's use appropriate types/format specifiers everywhere.

Issue #4268

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2023-07-25 12:27:23 -04:00
Quentin Lyons
3cdd5078c3
lxc-net.in: fix nftables syntax for IPv6 NAT
The nftables masquarade rule for IPv6 was using the IPv4 syntax. This
resulted in the following error when starting the lxc-net.service with
LXC_IPV6_NAT="true" and nftables:

    Feb 11 18:54:54 pc lxc-net[4936]: Error: conflicting protocols specified: ip6 vs. ip
    Feb 11 18:54:54 pc lxc-net[4936]:                              ^^^^^^^^
    Feb 11 18:54:54 pc lxc-net[4917]: Failed to setup lxc-net.
    Feb 11 18:54:54 pc systemd[1]: lxc-net.service: Main process exited, code=exited, status=1/FAILURE
    Feb 11 18:54:54 pc systemd[1]: lxc-net.service: Failed with result 'exit-code'.
    Feb 11 18:54:54 pc systemd[1]: Failed to start LXC network bridge setup.

Signed-off-by: Quentin Lyons <36303164+n0p90@users.noreply.github.com>
2023-07-25 12:27:20 -04:00
Ariel Miculas
97bf622471
Fix strlcat's return value checks
Alternatively we could have used safe_strlcat, but it's not used
anywhere and there's also no safe_strlcpy

Signed-off-by: Ariel Miculas <amiculas@cisco.com>
2023-07-25 12:27:18 -04:00
Ariel Miculas
7c81572af7
Fix typo: bev_type -> bdev_type
Signed-off-by: Ariel Miculas <amiculas@cisco.com>
2023-07-25 12:27:11 -04:00
Serge Hallyn
727adc0522
drop broken lxc-test-fuzzers
Closes #4261

Signed-off-by: Serge Hallyn <serge@hallyn.com>
2023-01-25 16:45:35 -05:00
Stéphane Graber
d571736812
Release LXC 5.0.2
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2023-01-16 16:08:50 -05:00
Mathias Gibbens
17c85aac63
Fix build error on sparc64 caused by using the gold linker
Signed-off-by: Mathias Gibbens <gibmat@debian.org>
2023-01-16 16:06:26 -05:00
Serge Hallyn
b7dfb1312a
lxc-default-cgns apparmor profile: allow overlay mounts
Signed-off-by: Serge Hallyn <serge@hallyn.com>
2023-01-16 16:06:24 -05:00
Alexander Mikhalitsyn
5cde898f45
lxc_user_nic: fix get_mtu() error handling
get_mtu() returns int, but "mtu" variable has unsigned int type.
It leads to logical error in error handling, which can end up
with strange -EINVAL error in lxc_veth_create(), cause (mtu > 0)
condition is met, but negative "mtu" value is too large when set
as mtu for network device.

Issue #4232

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2023-01-06 17:15:44 -05:00
Maher Azzouzi
80553b5b41
Patching an incoming CVE (CVE-2022-47952)
lxc-user-nic in lxc through 5.0.1 is installed setuid root, and may
allow local users to infer whether any file exists, even within a
protected directory tree, because "Failed to open" often indicates
that a file does not exist, whereas "does not refer to a network
namespace path" often indicates that a file exists. NOTE: this is
different from CVE-2018-6556 because the CVE-2018-6556 fix design was
based on the premise that "we will report back to the user that the
open() failed but the user has no way of knowing why it failed";
however, in many realistic cases, there are no plausible reasons for
failing except that the file does not exist.

PoC:
> % ls /l
> ls: cannot open directory '/l': Permission denied
> % /usr/lib/x86_64-linux-gnu/lxc/lxc-user-nic delete lol lol /l/h/tt h h
> cmd/lxc_user_nic.c: 1096: main: Failed to open "/l/h/tt" <----- file does not exist.
> % /usr/lib/x86_64-linux-gnu/lxc/lxc-user-nic delete lol lol /l/h/t h h
> cmd/lxc_user_nic.c: 1101: main: Path "/l/h/t" does not refer to a network namespace path <---- file exist!

Signed-off-by: MaherAzzouzi <maherazz04@gmail.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
2023-01-06 17:15:32 -05:00
Christian Brauner
1089f49c58
build: force linking against liblxc
We really need to split up our code into better chunks so we avoid all of this
duplicated compilation.

Fixes: https://github.com/lxc/lxc/issues/4249
Signed-off-by: Christian Brauner (Microsoft) <christian.brauner@ubuntu.com>
2023-01-06 17:15:12 -05:00
Stéphane Graber
0d2a031185
checkconfig: Fix filesystem capability check
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2023-01-06 17:15:10 -05:00
Stéphane Graber
e174295805
checkconfig: Tweak cgroup handling
Only run the Cgroup V1 checks if we're not on a fully functional CGroup
V2 system.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2023-01-06 17:15:08 -05:00
Stéphane Graber
4ab76611df
checkconfig: Tweak layout
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2023-01-06 17:15:06 -05:00
Stéphane Graber
0bca9bb18a
checkconfig: Hide version if no lxc-start
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2023-01-06 17:15:05 -05:00
Stéphane Graber
957e0a5d9b
checkconfig: Fix mixed tabs/spaces
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2023-01-06 17:15:04 -05:00
Fabrice Fontaine
4916a16bd1
src/lxc/meson.build: fix build without apparmor
Don't build lsm/apparmor.c if apparmor is explicitly disabled by the
user to avoid the following build failure with gcc 4.8:

/home/buildroot/autobuild/run/instance-3/output-1/host/arm-buildroot-linux-gnueabi/sysroot/usr/include/bits/fcntl2.h: In function '__apparmor_process_label_open.isra.0':
/home/buildroot/autobuild/run/instance-3/output-1/host/arm-buildroot-linux-gnueabi/sysroot/usr/include/bits/fcntl2.h:50:24: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT in second argument needs 3 arguments
    __open_missing_mode ();
                        ^

Fixes:
 - http://autobuild.buildroot.org/results/c9f05ad264543adf429badb99310905427092772

Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
2023-01-06 17:15:01 -05:00
Aleksa Sarai
a330126b45
build: use cc.get_define to detect FS_CONFIG_* symbols
For some reason, openSUSE has a very strange layout in sys/mount.h where
the definition of all of the FS_CONFIG_* idents are present but are
ifdef'd out in such a way that they will never be defined in an actual
build:

  #define FSOPEN_CLOEXEC          0x00000001
  /* ... */
  #ifndef FSOPEN_CLOEXEC
  enum fsconfig_command
  {
    FSCONFIG_SET_FLAG       = 0,    /* Set parameter, supplying no value */
  # define FSCONFIG_SET_FLAG FSCONFIG_SET_FLAG
  /* ... */
  };
  #endif

Unfortunately, while cc.has_header_symbol is faster, it cannot handle
this which results in compilation errors on openSUSE because the
FS_CONFIG_* symbols are actually not defined when compiling even though
the ident is present in the header. Switching to cc.get_define fixes
this issue.

Fixes: cbabe8abf1 ("build: check for FS_CONFIG_* header symbol in sys/mount.h")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2022-12-16 11:42:22 -05:00
Alexander Mikhalitsyn
c89be8325d
cgroups: fix cgroup layout detection in __initialize_cgroups
It looks like we made a mistake while detecting cgroup layout,
we are always set CGFSNG_LAYOUT_UNIFIED bit.

Reported-by: coverity (CID #1497115)
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2022-12-16 11:36:51 -05:00
Alexander Mikhalitsyn
7802f3647e
state: additional check in lxc_wait to prevent OOB
I can't see a real problem here, but let's just add a check
just in case.

Reported-by: coverity (CID #1517314)
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2022-12-16 11:36:49 -05:00
Alexander Mikhalitsyn
4b434bf52f
cgroups: check snprintf retval in unpriv_systemd_create_scope
Reported-by: coverity (CID #1517315)
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2022-12-16 11:36:47 -05:00
Alexander Mikhalitsyn
0eca8d2ea7
cgroups: fix buffer out-of-bounds access in enable_controllers_delegation
Reported-by: coverity (CID #1517317)
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2022-12-16 11:36:45 -05:00
Alexander Mikhalitsyn
4ce8345d68
network: always initialize struct nl_handler
Despite the fact that struct nl_handler is filled zeros
in netlink_open() there are two cases where we have possible
exit paths from the function before netlink_open() is called.

At the same time we have cleaner registered:
call_cleaner(netlink_close)

Two cases:
- netdev_get_flag
- lxc_ipvlan_create

If we are exiting from these functions before netlink_open()
is called we will close random file descriptor by reading
it from (struct nl_handler)->fd.

Let's just properly initialize this structure in all cases
to prevent this bug in the future.

Reported-by: coverity (CID #1517319 and #1517316)
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2022-12-16 11:36:43 -05:00
Alexander Mikhalitsyn
28a1591cd5
apparmor: properly check lxc_strmmap ret value
Reported-by: coverity (CID #1517320)
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2022-12-16 11:36:41 -05:00
Alexander Mikhalitsyn
bd56c89ea3
github: fix coverity (add libpam-dev)
Should fix
meson.build:494:0: ERROR: C header 'security/pam_modules.h' not found

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2022-12-16 11:36:37 -05:00
Alexander Mikhalitsyn
a1ead0dccd
github: fix coverity build
1. install meson (ninja is dependency)
2. run meson setup before ninja build

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2022-12-16 11:36:34 -05:00
Christian Brauner
9e35b3ecd3
conf: ensure mount tunnel is a dependent mount
Signed-off-by: Christian Brauner (Microsoft) <christian.brauner@ubuntu.com>
2022-12-16 11:36:31 -05:00
Christian Brauner
2ff447445b
apparmor: allow shared mounts in start-container.in
Signed-off-by: Christian Brauner (Microsoft) <christian.brauner@ubuntu.com>
2022-12-16 11:36:29 -05:00
Christian Brauner
58e878209c
conf: create separate peer group for container's root
Finally, we turn the rootfs into a shared mount. Note, that this
doesn't reestablish mount propagation with the hosts mount
namespace. Instead we'll create a new peer group.

We're doing this because most workloads do rely on the rootfs being
a shared mount. For example, systemd daemon like sytemd-udevd run in
their own mount namespace. Their mount namespace has been made a
dependent mount (MS_SLAVE) with the host rootfs as it's dominating
mount. This means new mounts on the host propagate into the
respective services.

This is broken if we leave the container's rootfs a dependent mount.
In which case both the container's rootfs and the service's rootfs
will be dependent mounts with the host's rootfs as their dominating
mount. So if you were to mount over the rootfs from the host it
would not just propagate into the container's mount namespace it
would also propagate into the service. That's nonsense semantics for
nearly all relevant use-cases. Instead, establish the container's
rootfs as a separate peer group mirroring the behavior on the host.

Signed-off-by: Christian Brauner (Microsoft) <christian.brauner@ubuntu.com>
2022-12-16 11:36:24 -05:00
Christian Brauner
06b4612eec
cgroups: only allocate user namespace if we have to
If the monitor runs as root we can assume it's able to remove the cgroups it
created when the container started.

Fixes: https://github.com/lxc/lxd/issues/11108
Signed-off-by: Christian Brauner (Microsoft) <christian.brauner@ubuntu.com>
2022-12-16 11:36:23 -05:00