Compare commits

...

150 Commits

Author SHA1 Message Date
Stéphane Graber
6dc1208ded
Release LXC 4.0.3
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2020-06-28 11:20:38 -04:00
Christian Brauner
53838b018d
commands: don't flood logs
We're ignoring commands that we don't know about. They used to be fatal. Not
anymore.

Closes: #3459.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-25 13:07:50 -04:00
Stéphane Graber
e72336a52c
lxc-net: Set broadcast
Closes #3457

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2020-06-25 13:07:48 -04:00
Robert Vogelgesang
a94c4a6dfb
lxccontainer: fix non-blocking container stop
Stopping a lxc container with without waiting on it was broken in master. This
patch fixes it.

Signed-off-by: Robert Vogelgesang <vogel@folz.de>
2020-06-25 13:07:47 -04:00
Christian Brauner
04e0ad4e95
test: update terminology
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-19 20:43:36 -04:00
Christian Brauner
0332ef2c17
doc: update terminology
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-19 20:43:34 -04:00
Christian Brauner
b15eb500ce
CODING_STYLE: adapt code example
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-19 20:43:32 -04:00
Christian Brauner
1478a2fcbc
openpty: adapt variable naming
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-19 20:43:30 -04:00
Stéphane Graber
51eccacbcf
network: Rename primary to master
The previous change made things confusing by impliying there may be a
secondary when VLAN/IPVLAN/bridge members can only have a single parent
device.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2020-06-19 20:43:21 -04:00
Christian Brauner
8254704dab
tree-wide: use "primary" in networking code
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-18 10:25:31 -04:00
Christian Brauner
2e5e77c522
tree-wide: wipe references to questionable apis from our public logs
We can't do anything about the established kernel API but we can at least not
propagate the terminology.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-18 10:23:24 -04:00
Christian Brauner
148e709eda
tree-wide: use "ptmx" and "pts" as terminal terms
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-18 10:23:23 -04:00
Gaurav Singh
e84f3ab7f7
containertests: fix null pointer defereference
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
2020-06-15 12:52:01 -04:00
Christian Brauner
2989eb15e8
lxccontainer: remove pointless string duplication
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-15 12:51:59 -04:00
Christian Brauner
a993d4f1ad
conf: kill old chown_mapped_root()
It's now a wrapper around userns_exec_mapped_root() which allows us to avoid
fork() + exec() lxc-usernsexec makes things way nicer to test with ASAN etc.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-15 12:51:54 -04:00
Christian Brauner
dd6ed3b07b
conf: add some more logging to userns_exec_mapped_root()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-15 12:51:52 -04:00
Christian Brauner
d3d1c6e112
conf: always use target_fd in userns_exec_mapped_root()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-15 12:51:48 -04:00
Christian Brauner
18adfa20e0
conf: remove faulty flags
If we set O_RDWR we won't be able to open directories and if we set O_PATH we
won't be able to chown.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-15 12:51:44 -04:00
Christian Brauner
1bb0804961
cgroups: initialize lxc.pivot cpuset
Closes: #3443.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-15 12:51:38 -04:00
Thomas Parrott
5d2ce0b6db
network: Removes unused ip_proxy_args
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
2020-06-15 12:51:20 -04:00
Thomas Parrott
ebd26a1972
network: Updates netlink_open handling in lxc_ipvlan_create
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
2020-06-15 12:50:45 -04:00
Thomas Parrott
f36a519af9
network: Adds check for bridge link interface existence in instantiate_veth
To avoid misleading errors about openvswitch when non-existent bridge link interface specified.

Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
2020-06-15 12:49:50 -04:00
Thomas Parrott
b4c9fc149e
macro: Adds UINT_TO_PTR and PTR_TO_USHORT helpers
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
2020-06-15 12:49:17 -04:00
Thomas Parrott
9d81b99a14
.gitignore: Ignores COPYING file created by make
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
2020-06-04 10:51:32 -04:00
Scott Moser
e0344cfa43
lxc-test-usernsexec: If user is root, then create and use non-root user.
Previously if the user was root, then the test would just skip
running (and exit 0).  The lxc test environment is run as root.
So, instead of never doing anything there, we create a user,
make sure it is in /etc/sub{ug}id and then execute the test as that
user.

If user is already non-root, then just execute the tests as before.

Signed-off-by: Scott Moser <smoser@brickies.net>
2020-06-04 10:51:30 -04:00
Scott Moser
bfbd606e6f
Add test of lxc-usernsexec
The test executes lxc-usernsexec to create some files and chmod them.
Then makes assertions on the uid and gid of those files from outside.

Signed-off-by: Scott Moser <smoser@brickies.net>
2020-06-01 21:06:50 -04:00
Christian Brauner
c5acbe98bc
api_extensions: add "pidfd"
Somehow it's documented but wasn't ever added.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-28 11:23:04 -04:00
Christian Brauner
53fbc128f3
commands: make limiting cgroup callbacks unreachable
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-25 09:17:56 +02:00
Christian Brauner
202e017e59
cgroups: be less alarming when creating cgroups
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-25 09:15:40 +02:00
Wolfgang Bumiller
8bbfacd2b4
improve LXC_CMD_GET_CGROUP compatibility
When a newer lxc library communicates with an older one
(such as running an lxc 4.0 lxc-freeze on a longer running
container which was started while lxc was still at version
3), the LXC_CMD_GET_LIMITING_CGROUP command is not
available, causing the remote to just close the socket.
Catch this and try the previous command instead.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-05-25 09:14:40 +02:00
Wolfgang Bumiller
3537c31640
cgroup isolation: handle devices cgroup early
Otherwise we cannot use an 'a' entry in devices.deny/allow
as these are not permitted once a subdirectory was created.

Without isolation we initialize the devices cgroup
particularly late, so there are probably cases which cannot
work with isolation.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-05-25 09:14:06 +02:00
Christian Brauner
bba910b2ff
cgroups: remove unused variable
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-25 09:12:46 +02:00
Wolfgang Bumiller
dc89b0d795
introduce lxc.cgroup.dir.{monitor,container,container.inner}
This is a new approach to #1302 with a container-side
configuration instead of a global boolean flag.

Contrary to the previous PR using an optional additional
parameter for the get-cgroup command, this introduces two
new additional commands to get the limiting cgroup path and
cgroup2 file descriptor. If the limiting option is not in
use, these behave identical to their full-path counterparts.

If these variables are used the payload will end up in the
concatenation of lxc.cgroup.dir.container and
lxc.cgroup.dir.container.inner (which may be empty), and the
monitor will end up in lxc.cgruop.dir.monitor. The
directories are fixed, no retry count logic is applied,
failing to create these directories will simply be a hard
error.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-25 09:11:12 +02:00
Stéphane Graber
0c9e185c96
travis: Restrict coverity to gcc on bionic on amd64
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2020-05-24 21:33:47 -04:00
Christian Brauner
b8025217f7
lxc-usernsexec: don't fail on setgroups()
We can fail to setgroups() when "deny" has been set which we need to set when
we are a fully unprivileged user.

Closes: 3420.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:33:45 -04:00
Christian Brauner
323a156937
lxc-usernsexec: dumb down from error to warning message
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:33:44 -04:00
Christian Brauner
92a8d6e061
network: use __instantiate_ns_common() in instantiate_ns_phys() too
Fixes: https://lists.linuxcontainers.org/pipermail/lxc-users/2020-May/015245.html
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:33:42 -04:00
Christian Brauner
d7df095654
bionic: s/lxc_raw_execveat()/execveat()/g
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:33:41 -04:00
Christian Brauner
fc0a8697b4
network: fix {mac,ip,v}lan device creation
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:33:40 -04:00
Christian Brauner
df7d58b75a
network: restore old behavior
I introduced a regression: when users didn't specify a specific name via
lxc.net.<idx>.name then the device would retain the random name it received
when we created it. Before we would use the "eth%d" syntax to get the kernel to
assign a fixed name. Restore that behavior.

Closes: #3407.
Fixes: 8bf64b77ac ("network: rework network device creation")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:33:38 -04:00
Christian Brauner
7ebcd704be
process_utils: make lxc use clone3() whenever possible
No more weird api quirks between architectures and cool new features.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:33:33 -04:00
Wolfgang Bumiller
0842a4652e
cgfsng: use EPOLLPRI when polling cgroup.events
EPOLLIN will always be true and therefore end up
busy-looping

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-05-24 21:33:30 -04:00
Wolfgang Bumiller
64df0b2f36
cgfsng: deduplicate freeze code
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-05-24 21:02:30 -04:00
Wolfgang Bumiller
c3d189153f
mainloop: add lxc_mainloop_add_handler_events
in order to be able to listen for EPOLLPRI

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-05-24 21:02:08 -04:00
Christian Brauner
a62eb3aa12
process_utils: add clone3() support
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:02:05 -04:00
Christian Brauner
6aefab38c1
process_utils: introduce new process_utils.{c,h}
This will be the central place for all process management helpers. This also
removes raw_syscalls.{c,h}.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:02:01 -04:00
Christian Brauner
ef301301c6
syscall_numbers: add clone3()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:01:58 -04:00
Christian Brauner
5524656d86
syscall_numbers: handle ia64 syscall numbers correctly
They are offset by 1024.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:01:56 -04:00
Christian Brauner
de4d585ee4
console: only create detached mount when a console is requested
otherwise weird things might happen.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-24 21:01:53 -04:00
Christian Brauner
3f924551c9
log: cleanup syslog handling
Disable and enable syslog around lxc_check_inherited().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:11:09 -04:00
Christian Brauner
e758df8570
start: cleanup file descriptor inheritance
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:11:07 -04:00
Christian Brauner
755d1e1fec
start: fix container reboot
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:11:05 -04:00
Christian Brauner
3f96727b45
lxccontainer: use close_prot_errno_disarm() on state_socket_pair
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:11:04 -04:00
Christian Brauner
dd2f1aad65
start: remove unused lxc_zero_handler()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:11:01 -04:00
Christian Brauner
ea2a67a6b0
lxccontainer: small cleanup to lxc_check_inherited() calls
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:10:59 -04:00
Christian Brauner
cf52a093d1
confile: fix order independence of network keys
We need to make sure we don't overwrite values when they have already been set.

Closes: #3405.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:10:58 -04:00
Christian Brauner
7821133aab
tools/lxc-ls: shut up lgtm more
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:10:56 -04:00
Christian Brauner
ab398a1bb9
tools/lxc-ls: shutup lgtm
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:10:55 -04:00
Christian Brauner
1cbdec6a1b
yum: remove unused module
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:10:53 -04:00
Christian Brauner
b467fc3591
tree-wide: this is all rather TODO than FIXME
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-13 21:10:51 -04:00
Christian Brauner
52d2862cf6
compiler: support new access attributes
which will allow us to catch more oob accesses.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-05 10:16:32 -04:00
Christian Brauner
c91e492a17
gcc: add -Warray-bounds, -Wrestrict, -Wreturn-local-addr, -Wstringop-overflow
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-05 10:16:29 -04:00
Christian Brauner
63910a2228
terminal: remove unneeded if condition
Fixes: Coverity 1461742.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-04 22:28:49 -04:00
Christian Brauner
0baff7b7f5
conf: support console setup on containers without rootfs
This depends on the new mount api.

Closes #3164.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-04 22:28:47 -04:00
Christian Brauner
0343423e57
conf: introduce userns_exec_mapped_root()
to avoid the overhead of calling to lxc-usernsexec whenever we can.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-04 22:28:41 -04:00
Christian Brauner
8cce8b5930
cgroups: premount cgroups on cgroup2-only systems
Fixes: #3183
Cc: Thomas Moschny <thomas.moschny@gmx.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-04 22:28:37 -04:00
Christian Brauner
6001872d08
common.conf: add cgroup2 default device limits
Fixes: #3183
Cc: Thomas Moschny <thomas.moschny@gmx.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-04 22:28:27 -04:00
Christian Brauner
ba9eab74b8
cgroups: ignore cgroup2 limits on non-cgroup2 layouts
Mixing cgroup2 and legacy cgroup systems such that some controllers are enabled
in legacy cgroup hierarchies and other controllers in the unified hierarchies
is simply not something we're supporting. Even systemd's hybrid layout (crazy)
doesn't bind controllers to the unified cgroup hierarchy.

Fixes: #3183
Cc: Thomas Moschny <thomas.moschny@gmx.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-04 22:28:23 -04:00
Thomas Parrott
3a4031f036
src/lxc/network: Fixes netlink attribute type 1 has an invalid length message
Fixes #3386

Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
2020-04-23 22:08:31 -04:00
Stéphane Graber
d51d0df41e
apparmor: Allow boot_id
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2020-04-23 22:08:28 -04:00
Stéphane Graber
eaf3c66b93
Release LXC 4.0.2
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2020-04-16 13:32:29 -04:00
Christian Brauner
378b64054c
configure: fix coverity builds
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-16 13:31:37 -04:00
Christian Brauner
f2f25719b7
cgroups: fix cgroup limit braino
Fixes: https://discuss.linuxcontainers.org/t/memory-limits-no-longer-being-applied/7429/7
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:39:37 -04:00
Christian Brauner
04a7c46e1f
travis: coverity gets confused about the %m printf extension in glibc
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:13:16 -04:00
Christian Brauner
da23a3c5eb
log: set GNU_SOURCE as it might help coverity along
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:13:14 -04:00
Christian Brauner
f576850def
conf: correctly cleanup memory in get_minimal_idmap()
Fixes: Coverity 1461760.
Fixes: Coverity 1461762.
Fixes: Coverity 1461763.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:57 -04:00
Christian Brauner
82057b132c
rexec: free argv array on failure
Fixes: Coverity 1461736.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:55 -04:00
Christian Brauner
264d40e507
attach: move check for valid config earlier
Fixes: Coverity 1461735.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:54 -04:00
Christian Brauner
10b15ed006
log: restore non-local value
Fixes: Coverity 1461734.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:52 -04:00
Christian Brauner
dfa49f0d04
network: log warning on network deconfiguration failures
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:49 -04:00
Christian Brauner
00d87eb12f
commands: add additional check to lxc_cmd_sock_get_state()
to please Coverity.

Fixes: Coverity 1461732.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:48 -04:00
Christian Brauner
a1232a5727
zfs: fix resource leak
Fixes: Coverity 1461730.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:46 -04:00
Christian Brauner
c89533f402
criu: make explicit that we're ignoring rmdir() return value
Fixes: Coverity 1461726.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:44 -04:00
Christian Brauner
e052e6d006
conf: don't double free in get_minimal_idmap()
Fixes: Coverity 1461725.
Fixes: Coverity 1461727.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:42 -04:00
Christian Brauner
6a40ccf591
cgroups: use correct NULL pointer check
Fixes: Coverity 1461722.
Fixes: Coverity 1461737.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:40 -04:00
Christian Brauner
9c0e255177
rexec: avoid double-close
Fixes: Coverity 1461721.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:39 -04:00
Christian Brauner
bc15baacf5
cgroups: fix cgroup2 devices
Fixes: Coverity 1461748.
Fixes: Coverity 1461746.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:37 -04:00
Christian Brauner
a4ccd3a752
uuid: close fd
Fixes: Coverity 1461751.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:34 -04:00
Christian Brauner
86652cfb15
cgroups: do not pass NULL pointer
Fixes: Coverity 1461752.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:31 -04:00
Christian Brauner
6810da4484
conf: fix tty cleanup
Fixes: Coverity 1461755.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:29 -04:00
Christian Brauner
c7c55c1a21
memory_utils: directly NULL ptr in free_disarm()
This should keep coverity happy.

Fixes: Coverity 1461757.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:26 -04:00
Christian Brauner
3769a87ba2
travis: add back coverity
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 17:12:24 -04:00
LiFeng
4a2484801d
cgroup: fix wrong use of cgfd_con in cgroup_exit
Signed-off-by: LiFeng <lifeng68@huawei.com>
2020-04-13 22:53:06 -04:00
Toni Ylenius
a772323447
Fix lxc-oci template with loop backingstore
Move the content of rootfs inside OCI package to rootfs instead of
replacing it, as the directory is used as the mountpoint.

Tested with directory and loop backingstore.

Signed-off-by: Toni Ylenius <toni.ylenius@iki.fi>
2020-04-13 22:52:55 -04:00
Christian Brauner
16ccd6eb26
cgroups: ignore legacy limits on pure cgroup2 systems
Link: https://github.com/lxc/lxc/issues/3183#issuecomment-612462322
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-13 22:52:52 -04:00
Stéphane Graber
ec84b86e7d
tests/no-new-privs: Don't mess with /etc/lxc
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2020-04-13 22:52:48 -04:00
Stéphane Graber
43ff9c6862
lxc-update-config: Fix bad handling of lxc.logfile
Closes #3369

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2020-04-10 16:38:38 -04:00
Christian Brauner
2a4fed96b0
conf: move_ptr() in all cases in mapped_hostid_add()
Closes #3366.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-09 18:28:51 -04:00
Christian Brauner
37fcb9bc3e
conf: use macros all around in lxc_map_ids()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-09 18:28:48 -04:00
Christian Brauner
3b7f02fa67
conf: tweak get_minimal_idmap()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-09 18:28:45 -04:00
Thomas Parrott
53cffd7537
src/lxc/network: ipvlan comment and code style tweak
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
2020-04-09 18:28:39 -04:00
KUWAZAWA Takuya
b2722ecbc4
network: Make it possible to set the mode of IPVLAN to L2
Signed-off-by: KUWAZAWA Takuya <albatross0@gmail.com>
2020-04-09 18:28:34 -04:00
Christian Brauner
c2e3e9a4b4
seccomp: newer kernels require the buffer to be zeroed
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-08 23:31:37 -04:00
Christian Brauner
eb8d7c09f7
cgroups: whitespace fixes
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-08 23:31:33 -04:00
Christian Brauner
571694003e
lxc_user_nic: continue when we failed to find a group
Closes #3361.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-08 23:31:25 -04:00
Christian Brauner
e8bb9e4f94
lxc_user_nic: simplify group retrieval
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-08 23:31:22 -04:00
Christian Brauner
468797a31f
syscall_numbers: handle riscv
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-08 23:31:18 -04:00
Christian Brauner
850c0659ce
start: ensure all file descriptors are closed during exec
Closes https://github.com/checkpoint-restore/criu/issues/1011.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-08 23:31:13 -04:00
Stéphane Graber
98613f618b
Release LXC 4.0.1
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2020-04-06 15:14:40 -04:00
Wolfgang Bumiller
d33bb0fe90
Revert "start: remove unnecessary check for valid cgroup_ops"
This reverts commit 52520e4f79.

This can be NULL when there's a pre-start hook which fails.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-04-06 11:43:27 -04:00
Christian Brauner
7e67b81d36
lxccontainer: poll takes millisecond not seconds
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-02 12:22:37 -04:00
Aleksa Sarai
b5d3501f3c
cgroups: fix build warning on GCC 7
GCC 7 appears to be clever enough to detect that transient_len is
uninitialised but not that it won't be used despite [1]. Just initialise
it to zero to stop the complaining, and allow LXC to build on openSUSE
Leap.

[1]: 346830421a ("cgroups: fix "uninitialized transient_len" warning")

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2020-04-02 12:22:34 -04:00
Christian Brauner
05bec1919f
utils: use setres{u,g}id() in lxc_switch_uid_gid()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-02 12:22:27 -04:00
Christian Brauner
9ae5594834
utils: rework fix_stdio_permissions()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-02 12:22:23 -04:00
Aleksa Sarai
256d4d0144
cgroups: fix "uninitialized transient_len" warning
Without this change, a build error is triggered if you compile with
-Werror=maybe-uninitialized.

 cgroups/cgfsng.c: In function 'cgfsng_monitor_enter':
 groups/cgfsng.c:1387:9: error: 'transient_len' may be used uninitialized in this function
    ret = lxc_writeat(h->cgfd_mon, "cgroup.procs", transient, transient_len);
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The issue is that if handler->transient_pid is 0, then transient_len is
uninitialised but lxc_writeat(..., transient_len) still gets called.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2020-04-02 12:22:19 -04:00
gaohuatao
b9d0812941
fix non-root user cannot write /dev/stdout
Signed-off-by: gaohuatao <gaohuatao@huawei.com>
2020-04-02 12:22:17 -04:00
Stéphane Graber
fa7132aef6
systemd: Add Documentation key
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2020-04-01 17:08:42 -04:00
Christian Brauner
e6c5d2e494
autotools: don't install run-coccinelle.sh
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-01 17:08:14 -04:00
Wolfgang Bumiller
4e43c4fb10
apparmor: generate ro,bind,remount rule list
and update to changes based on lxd

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-04-01 17:08:10 -04:00
Wolfgang Bumiller
5697d2c6d5
init: add ExecReload to lxc.service to only reload profiles
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-04-01 17:06:56 -04:00
Christian Brauner
46340ce2f1
start: remove unnecessary check for valid cgroup_ops
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-01 17:05:30 -04:00
Christian Brauner
179e2bf8e0
cgroups: send two fds to attach to unified cgroup
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-30 14:18:58 -04:00
Christian Brauner
7e6deea341
cgroups: send two attach fds
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-30 14:18:56 -04:00
Christian Brauner
73e7bdfcdc
start: log error when failing to create cgroup
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-30 14:18:55 -04:00
Christian Brauner
2f232c5311
cgroups: handle older kernels (e.g. v4.9)
On olders kernels the restrictions to move processes between cgroups are
different than they are on newer kernels. Specifically, we're running into the
following check:

if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
    !uid_eq(cred->euid, tcred->uid) &&
    !uid_eq(cred->euid, tcred->suid))
        ret = -EACCES;

which dictates that in order to move a process into a cgroup one either needs
to be global root (no restrictions apply) or the effective uid of the process
trying to move the process and the {saved}uid of the process that is supposed
to be moved need to be identical. The new attaching logic we did didn't
fulfill this criterion for because it's not present on new kernels.

Closes https://github.com/lxc/lxd/issues/7104.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-30 14:18:53 -04:00
Wolfgang Bumiller
a1a847dbc3
verify cgroup controller name
validate that a cgroup controller name is a valid
zero-terminated string before passing it to
`cgroup_ops->get_cgroup()`.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-03-30 14:18:52 -04:00
Christian Brauner
d45c0d9658
tree-wide: s/recursive_destroy/lxc_rm_rf/g
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-28 12:54:20 -04:00
Christian Brauner
9b15778188
cgroups: better helper naming
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-28 12:54:18 -04:00
Christian Brauner
16a3be601f
cgroups: move check for valid monitor process up
Cc: cenxianlong <cenxianlong@huawei.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-28 12:54:17 -04:00
cenxianlong
e5da28dd00
monitor process exited by signal SIGKILL, clean cgroup resource by third party
Writing the value 0 to a cgroup.procs file causes the
writing process to be moved to the corresponding cgroup

Signed-off-by: cenxianlong <cenxianlong@huawei.com>
2020-03-28 12:54:16 -04:00
Christian Brauner
7457a8b871
cgroups: please compilers
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-28 12:54:14 -04:00
Christian Brauner
cafffc3d2b
cgroups: use hidden directory for attaching cgroup
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-28 12:54:13 -04:00
Christian Brauner
2f1a5e772a
conf: simplify userns_exec_minimal()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 17:11:39 -04:00
Christian Brauner
f95c658c1c
conf: introduce and use userns_exec_minimal()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 17:11:36 -04:00
Christian Brauner
38d12ae68e
Revert "cgroups: fix unified cgroup attach"
This reverts commit ba7ca43b0b.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 17:11:34 -04:00
Wolfgang Bumiller
3e9a732621
fixup i/o handler return values
Particularly important for lxc_cmd_handler() handles client
input and should not be capable of canceling the main loop,
some syscall return values leaked through overlapping with
LXC_MAINLOOP_ERROR, causing unauthorized clients connecting
to the command socket to shutdown the main loop.

In turn, signal_handler() receiving unexpected
`signalfd_siginfo` struct sizes seems like a reason to bail
(since it's a kernel interface).

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 11:04:15 -04:00
Christian Brauner
5c70927b93
cgroups: fix unified cgroup attach
There's a fundamental problem with futexes and setid calls and the go runtime.
POSIX requires that when one thread setids all threas must setids and it uses
futexes and signals to synchronize the state across threads. This causes
deadlocks which means we can't use the pretty solution I first implemented.
Instead we need to chown after we create the directory. I might come up with
something smarter later but for now this will do.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 11:04:13 -04:00
Christian Brauner
04435b805c
cgroups: remove unused variable
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 11:04:12 -04:00
Christian Brauner
54b4c13726
attach: use close_prot_errno_disarm()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 11:04:10 -04:00
Christian Brauner
2bc38e68ee
cgroups: rework __cg_unified_attach()
We didn't account for cgroup_attach() succeeding and just tried to attach to
the same cgroup again which doesn't make sense.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 11:04:08 -04:00
Christian Brauner
17b12f319b
cgroups: move pointer dereference after check
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 11:04:07 -04:00
Christian Brauner
c82fb6b3c7
commands: log actual errno when lxc_cmd_get_cgroup2_fd() fails
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 11:04:05 -04:00
Christian Brauner
8dca61dec4
conf: rework and fix leak in userns_exec_1()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-27 11:04:03 -04:00
Christian Brauner
d8d38da1cc
cgroups: fix attaching to the unified cgroup
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-26 15:34:46 -04:00
Christian Brauner
d06e1513bd
dir: improve dir backend
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-26 15:34:44 -04:00
Christian Brauner
53209ca485
dir: use cleanup macro in dir_mount()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-26 15:34:43 -04:00
Christian Brauner
039f2a9111
tree-wide: harden mount option parsing
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-26 15:34:41 -04:00
Pierre-Elliott Bécue
f3151f06ae
[lxc.service] Starts after remote-fs.target to allow containers relying on remote FS to work
Signed-off-by: Pierre-Elliott Bécue <becue@crans.org>
2020-03-26 15:34:37 -04:00
Christian Brauner
ca65b13068
lxc_init: add missing O_CLOEXEC
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-26 15:34:29 -04:00
Christian Brauner
1ef2b5f476
lxc_init: move main() down
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-26 15:34:28 -04:00
90 changed files with 3160 additions and 1545 deletions

1
.gitignore vendored
View File

@ -16,6 +16,7 @@
Makefile.in Makefile.in
Makefile Makefile
COPYING
aclocal.m4 aclocal.m4
autom4te.cache autom4te.cache

View File

@ -22,3 +22,21 @@ notifications:
recipients: recipients:
- lxc-devel@lists.linuxcontainers.org - lxc-devel@lists.linuxcontainers.org
webhooks: https://linuxcontainers.org/webhook-lxcbot/ webhooks: https://linuxcontainers.org/webhook-lxcbot/
env:
global:
- secure: "HlNoguS2Sjyj7Mbb644wrHZqdp/p7I7gX00XoUzLRcFosmVdYpHo6Ix8pt9ddC5tDfX05pl5x8OBwrccY+picb9NDNCt7C5TlNcuyyDROnMJW5q33j4EZRI91sBQdmn2uorMzi/CnHEtvUw20+sjBOqIqvpnUV2SMaZiWGC1Eec="
addons:
coverity_scan:
build_script_url: https://dl.stgraber.org/coverity_travis.sh
project:
name: lxc/lxc
description: "LXC - Linux Containers https://linuxcontainers.org/lxc"
# Where email notification of build analysis results will be sent
notification_email: christian.brauner@ubuntu.com
build_command_prepend: "./autogen.sh && mkdir build && cd build && ../configure --enable-coverity-build --enable-tests --with-distro=unknown --disable-rpath --enable-tests --enable-memfd-rexec --enable-seccomp --enable-static --enable-werror"
build_command: "make -j4"
branch_pattern: master

View File

@ -733,11 +733,11 @@ __do_closedir __attribute__((__cleanup__(__auto_closedir__)))
``` ```
For example: For example:
```c ```c
void remount_all_slave(void) void turn_into_dependent_mounts(void)
{ {
__do_free char *line = NULL; __do_free char *line = NULL;
__do_fclose FILE *f = NULL; __do_fclose FILE *f = NULL;
__do_close_prot_errno int memfd = -EBADF, mntinfo_fd = -EBADF; __do_close int memfd = -EBADF, mntinfo_fd = -EBADF;
int ret; int ret;
ssize_t copied; ssize_t copied;
size_t len = 0; size_t len = 0;
@ -780,7 +780,7 @@ again:
return; return;
} }
f = fdopen(memfd, "r"); f = fdopen(memfd, "re");
if (!f) { if (!f) {
SYSERROR("Failed to open copy of \"/proc/self/mountinfo\" to mark all shared. Continuing"); SYSERROR("Failed to open copy of \"/proc/self/mountinfo\" to mark all shared. Continuing");
return; return;
@ -810,12 +810,11 @@ again:
null_endofword(target); null_endofword(target);
ret = mount(NULL, target, NULL, MS_SLAVE, NULL); ret = mount(NULL, target, NULL, MS_SLAVE, NULL);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to make \"%s\" MS_SLAVE", target); SYSERROR("Failed to recursively turn old root mount tree into dependent mount. Continuing...");
ERROR("Continuing...");
continue; continue;
} }
TRACE("Remounted \"%s\" as MS_SLAVE", target); TRACE("Recursively turned old root mount tree into dependent mount");
} }
TRACE("Remounted all mount table entries as MS_SLAVE"); TRACE("Turned all mount table entries into dependent mount");
} }
``` ```

View File

@ -3,5 +3,3 @@
EXTRA_DIST = exit.cocci \ EXTRA_DIST = exit.cocci \
run-coccinelle.sh \ run-coccinelle.sh \
while-true.cocci while-true.cocci
bin_SCRIPTS = run-coccinelle.sh

View File

@ -21,6 +21,8 @@
# allow pre-mount hooks to stage mounts under /var/lib/lxc/<container>/ # allow pre-mount hooks to stage mounts under /var/lib/lxc/<container>/
mount -> /var/lib/lxc/{**,}, mount -> /var/lib/lxc/{**,},
mount /dev/.lxc-boot-id -> /proc/sys/kernel/random/boot_id,
# required for some pre-mount hooks # required for some pre-mount hooks
mount fstype=overlayfs, mount fstype=overlayfs,
mount fstype=aufs, mount fstype=aufs,

View File

@ -46,7 +46,7 @@ _ifdown() {
_ifup() { _ifup() {
MASK=`_netmask2cidr ${LXC_NETMASK}` MASK=`_netmask2cidr ${LXC_NETMASK}`
CIDR_ADDR="${LXC_ADDR}/${MASK}" CIDR_ADDR="${LXC_ADDR}/${MASK}"
ip addr add ${CIDR_ADDR} dev ${LXC_BRIDGE} ip addr add ${CIDR_ADDR} broadcast + dev ${LXC_BRIDGE}
ip link set dev ${LXC_BRIDGE} address $LXC_BRIDGE_MAC ip link set dev ${LXC_BRIDGE} address $LXC_BRIDGE_MAC
ip link set dev ${LXC_BRIDGE} up ip link set dev ${LXC_BRIDGE} up
} }

View File

@ -2,6 +2,7 @@
Description=LXC network bridge setup Description=LXC network bridge setup
After=network-online.target After=network-online.target
Before=lxc.service Before=lxc.service
Documentation=man:lxc
[Service] [Service]
Type=oneshot Type=oneshot

View File

@ -1,6 +1,6 @@
[Unit] [Unit]
Description=LXC Container Initialization and Autoboot Code Description=LXC Container Initialization and Autoboot Code
After=network.target lxc-net.service After=network.target lxc-net.service remote-fs.target
Wants=lxc-net.service Wants=lxc-net.service
Documentation=man:lxc-autostart man:lxc Documentation=man:lxc-autostart man:lxc
@ -10,6 +10,7 @@ RemainAfterExit=yes
ExecStartPre=@LIBEXECDIR@/lxc/lxc-apparmor-load ExecStartPre=@LIBEXECDIR@/lxc/lxc-apparmor-load
ExecStart=@LIBEXECDIR@/lxc/lxc-containers start ExecStart=@LIBEXECDIR@/lxc/lxc-containers start
ExecStop=@LIBEXECDIR@/lxc/lxc-containers stop ExecStop=@LIBEXECDIR@/lxc/lxc-containers stop
ExecReload=@LIBEXECDIR@/lxc/lxc-apparmor-load
# Environment=BOOTUP=serial # Environment=BOOTUP=serial
# Environment=CONSOLETYPE=serial # Environment=CONSOLETYPE=serial
Delegate=yes Delegate=yes

View File

@ -15,6 +15,8 @@ lxc.cap.drop = mac_admin mac_override sys_time sys_module sys_rawio
# Ensure hostname is changed on clone # Ensure hostname is changed on clone
lxc.hook.clone = @LXCHOOKDIR@/clonehostname lxc.hook.clone = @LXCHOOKDIR@/clonehostname
# Default legacy cgroup configuration
#
# CGroup whitelist # CGroup whitelist
lxc.cgroup.devices.deny = a lxc.cgroup.devices.deny = a
## Allow any mknod (but not reading/writing the node) ## Allow any mknod (but not reading/writing the node)
@ -42,6 +44,35 @@ lxc.cgroup.devices.allow = c 136:* rwm
### fuse ### fuse
lxc.cgroup.devices.allow = c 10:229 rwm lxc.cgroup.devices.allow = c 10:229 rwm
# Default unified cgroup configuration
#
# CGroup whitelist
lxc.cgroup2.devices.deny = a
## Allow any mknod (but not reading/writing the node)
lxc.cgroup2.devices.allow = c *:* m
lxc.cgroup2.devices.allow = b *:* m
## Allow specific devices
### /dev/null
lxc.cgroup2.devices.allow = c 1:3 rwm
### /dev/zero
lxc.cgroup2.devices.allow = c 1:5 rwm
### /dev/full
lxc.cgroup2.devices.allow = c 1:7 rwm
### /dev/tty
lxc.cgroup2.devices.allow = c 5:0 rwm
### /dev/console
lxc.cgroup2.devices.allow = c 5:1 rwm
### /dev/ptmx
lxc.cgroup2.devices.allow = c 5:2 rwm
### /dev/random
lxc.cgroup2.devices.allow = c 1:8 rwm
### /dev/urandom
lxc.cgroup2.devices.allow = c 1:9 rwm
### /dev/pts/*
lxc.cgroup2.devices.allow = c 136:* rwm
### fuse
lxc.cgroup2.devices.allow = c 10:229 rwm
# Setup the default mounts # Setup the default mounts
lxc.mount.auto = cgroup:mixed proc:mixed sys:mixed lxc.mount.auto = cgroup:mixed proc:mixed sys:mixed
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none bind,optional 0 0 lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none bind,optional 0 0

View File

@ -1,7 +1,15 @@
# CAP_SYS_ADMIN in init-user-ns is required for cgroup.devices # CAP_SYS_ADMIN in init-user-ns is required for cgroup.devices
#
# Default legacy cgroup configuration
#
lxc.cgroup.devices.deny = lxc.cgroup.devices.deny =
lxc.cgroup.devices.allow = lxc.cgroup.devices.allow =
# Default unified cgroup configuration
#
lxc.cgroup2.devices.deny =
lxc.cgroup2.devices.allow =
# Start with a full set of capabilities in user namespaces. # Start with a full set of capabilities in user namespaces.
lxc.cap.drop = lxc.cap.drop =
lxc.cap.keep = lxc.cap.keep =

View File

@ -24,7 +24,6 @@
import os import os
from fnmatch import fnmatch from fnmatch import fnmatch
from yum.plugins import TYPE_INTERACTIVE from yum.plugins import TYPE_INTERACTIVE
from yum.plugins import PluginYumExit
requires_api_version = '2.0' requires_api_version = '2.0'
plugin_type = (TYPE_INTERACTIVE,) plugin_type = (TYPE_INTERACTIVE,)

View File

@ -3,7 +3,7 @@ AC_PREREQ([2.69])
m4_define([lxc_devel], 0) m4_define([lxc_devel], 0)
m4_define([lxc_version_major], 4) m4_define([lxc_version_major], 4)
m4_define([lxc_version_minor], 0) m4_define([lxc_version_minor], 0)
m4_define([lxc_version_micro], 0) m4_define([lxc_version_micro], 3)
m4_define([lxc_version_beta], []) m4_define([lxc_version_beta], [])
m4_define([lxc_abi_major], 1) m4_define([lxc_abi_major], 1)
@ -622,7 +622,10 @@ AC_CHECK_HEADER([ifaddrs.h],
AC_HEADER_MAJOR AC_HEADER_MAJOR
# Check for some syscalls functions # Check for some syscalls functions
AC_CHECK_FUNCS([setns pivot_root sethostname unshare rand_r confstr faccessat gettid memfd_create]) AC_CHECK_FUNCS([setns pivot_root sethostname unshare rand_r confstr faccessat gettid memfd_create move_mount open_tree execveat clone3])
AC_CHECK_TYPES([struct clone_args], [], [], [[#include <linux/sched.h>]])
AC_CHECK_MEMBERS([struct clone_args.set_tid],[],[],[[#include <linux/sched.h>]])
AC_CHECK_MEMBERS([struct clone_args.cgroup],[],[],[[#include <linux/sched.h>]])
# Check for strerror_r() support. Defines: # Check for strerror_r() support. Defines:
# - HAVE_STRERROR_R if available # - HAVE_STRERROR_R if available
@ -753,11 +756,15 @@ AX_CHECK_COMPILE_FLAG([-Wnested-externs], [CFLAGS="$CFLAGS -Wnested-externs"],,[
AX_CHECK_COMPILE_FLAG([-fasynchronous-unwind-tables], [CFLAGS="$CFLAGS -fasynchronous-unwind-tables"],,[-Werror]) AX_CHECK_COMPILE_FLAG([-fasynchronous-unwind-tables], [CFLAGS="$CFLAGS -fasynchronous-unwind-tables"],,[-Werror])
AX_CHECK_COMPILE_FLAG([-pipe], [CFLAGS="$CFLAGS -pipe"],,[-Werror]) AX_CHECK_COMPILE_FLAG([-pipe], [CFLAGS="$CFLAGS -pipe"],,[-Werror])
AX_CHECK_COMPILE_FLAG([-fexceptions], [CFLAGS="$CFLAGS -fexceptions"],,[-Werror]) AX_CHECK_COMPILE_FLAG([-fexceptions], [CFLAGS="$CFLAGS -fexceptions"],,[-Werror])
AX_CHECK_COMPILE_FLAG([-Warray-bounds], [CFLAGS="$CFLAGS -Warray-bounds"],,[-Werror])
AX_CHECK_COMPILE_FLAG([-Wrestrict], [CFLAGS="$CFLAGS -Wrestrict"],,[-Werror])
AX_CHECK_COMPILE_FLAG([-Wreturn-local-addr], [CFLAGS="$CFLAGS -Wreturn-local-addr"],,[-Werror])
AX_CHECK_COMPILE_FLAG([-Wstringop-overflow], [CFLAGS="$CFLAGS -Wstringop-overflow"],,[-Werror])
AX_CHECK_LINK_FLAG([-z relro], [LDFLAGS="$LDFLAGS -z relro"],,[]) AX_CHECK_LINK_FLAG([-z relro], [LDFLAGS="$LDFLAGS -z relro"],,[])
AX_CHECK_LINK_FLAG([-z now], [LDFLAGS="$LDFLAGS -z now"],,[]) AX_CHECK_LINK_FLAG([-z now], [LDFLAGS="$LDFLAGS -z now"],,[])
CFLAGS="$CFLAGS -Wvla -std=gnu11" CFLAGS="$CFLAGS -Wvla -std=gnu11 -fms-extensions"
if test "x$enable_werror" = "xyes"; then if test "x$enable_werror" = "xyes"; then
CFLAGS="$CFLAGS -Werror" CFLAGS="$CFLAGS -Werror"
fi fi
@ -766,6 +773,23 @@ AC_ARG_ENABLE([thread-safety],
[AS_HELP_STRING([--enable-thread-safety], [enforce thread-safety otherwise fail the build [default=yes]])], [AS_HELP_STRING([--enable-thread-safety], [enforce thread-safety otherwise fail the build [default=yes]])],
[enable_thread_safety=$enableval], [enable_thread_safety=yes]) [enable_thread_safety=$enableval], [enable_thread_safety=yes])
AM_CONDITIONAL([ENFORCE_THREAD_SAFETY], [test "x$enable_thread_safety" = "xyes"]) AM_CONDITIONAL([ENFORCE_THREAD_SAFETY], [test "x$enable_thread_safety" = "xyes"])
if test "x$enable_thread_safety" = "xyes"; then
AC_DEFINE([ENFORCE_THREAD_SAFETY], 1, [enforce thread-safety otherwise fail the build])
AC_MSG_RESULT([yes])
else
AC_MSG_RESULT([no])
fi
AC_ARG_ENABLE([coverity-build],
[AS_HELP_STRING([--enable-coverity-build], [build for use with Coverity [default=no]])],
[enable_coverity_build=$enableval], [enable_coverity_build=no])
AM_CONDITIONAL([ENABLE_COVERITY_BUILD], [test "x$enable_coverity_build" = "xyes"])
if test "x$enable_coverity_build" = "xyes"; then
AC_DEFINE([ENABLE_COVERITY_BUILD], 1, [build for use with Coverity])
AC_MSG_RESULT([yes])
else
AC_MSG_RESULT([no])
fi
AC_ARG_ENABLE([dlog], AC_ARG_ENABLE([dlog],
[AS_HELP_STRING([--enable-dlog], [enable dlog support [default=no]])], [AS_HELP_STRING([--enable-dlog], [enable dlog support [default=no]])],
@ -1037,9 +1061,10 @@ Documentation:
- user documentation: $enable_doc - user documentation: $enable_doc
Debugging: Debugging:
- tests: $enable_tests
- ASAN: $enable_asan - ASAN: $enable_asan
- Coverity: $enable_coverity_build
- mutex debugging: $enable_mutex_debugging - mutex debugging: $enable_mutex_debugging
- tests: $enable_tests
Paths: Paths:
- Logs in configpath: $enable_configpath_log - Logs in configpath: $enable_configpath_log

View File

@ -713,25 +713,25 @@ by KATOH Yasufumi <karma at jazz.email.ne.jp>
modes are <option>l3</option>, <option>l3s</option> and modes are <option>l3</option>, <option>l3s</option> and
<option>l2</option>. It defaults to <option>l3</option> mode. <option>l2</option>. It defaults to <option>l3</option> mode.
In <option>l3</option> mode TX processing up to L3 happens on the stack instance In <option>l3</option> mode TX processing up to L3 happens on the stack instance
attached to the slave device and packets are switched to the stack instance of the attached to the dependent device and packets are switched to the stack instance of the
master device for the L2 processing and routing from that instance will be parent device for the L2 processing and routing from that instance will be
used before packets are queued on the outbound device. In this mode the slaves used before packets are queued on the outbound device. In this mode the dependent devices
will not receive nor can send multicast / broadcast traffic. will not receive nor can send multicast / broadcast traffic.
In <option>l3s</option> mode TX processing is very similar to the L3 mode except that In <option>l3s</option> mode TX processing is very similar to the L3 mode except that
iptables (conn-tracking) works in this mode and hence it is L3-symmetric (L3s). iptables (conn-tracking) works in this mode and hence it is L3-symmetric (L3s).
This will have slightly less performance but that shouldn't matter since you are This will have slightly less performance but that shouldn't matter since you are
choosing this mode over plain-L3 mode to make conn-tracking work. choosing this mode over plain-L3 mode to make conn-tracking work.
In <option>l2</option> mode TX processing happens on the stack instance attached to In <option>l2</option> mode TX processing happens on the stack instance attached to
the slave device and packets are switched and queued to the master device to send the dependent device and packets are switched and queued to the parent device to send
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable) as well. out. In this mode the dependent devices will RX/TX multicast and broadcast (if applicable) as well.
<option>lxc.net.[i].ipvlan.isolation</option> specifies the isolation mode. <option>lxc.net.[i].ipvlan.isolation</option> specifies the isolation mode.
The accepted isolation values are <option>bridge</option>, The accepted isolation values are <option>bridge</option>,
<option>private</option> and <option>vepa</option>. <option>private</option> and <option>vepa</option>.
It defaults to <option>bridge</option>. It defaults to <option>bridge</option>.
In <option>bridge</option> isolation mode slaves can cross-talk among themselves In <option>bridge</option> isolation mode dependent devices can cross-talk among themselves
apart from talking through the master device. apart from talking through the parent device.
In <option>private</option> isolation mode the port is set in private mode. In <option>private</option> isolation mode the port is set in private mode.
i.e. port won't allow cross communication between slaves. i.e. port won't allow cross communication between dependent devices.
In <option>vepa</option> isolation mode the port is set in VEPA mode. In <option>vepa</option> isolation mode the port is set in VEPA mode.
i.e. port will offload switching functionality to the external entity as i.e. port will offload switching functionality to the external entity as
described in 802.1Qbg. described in 802.1Qbg.
@ -1548,7 +1548,7 @@ by KATOH Yasufumi <karma at jazz.email.ne.jp>
fstab フォーマットの一行と同じフォーマットのマウントポイントの指定をします。 fstab フォーマットの一行と同じフォーマットのマウントポイントの指定をします。
<!-- <!--
Moreover lxc supports mount propagation, such as rslave or Moreover lxc supports mount propagation, such as rshared or
rprivate, and adds three additional mount options. rprivate, and adds three additional mount options.
<option>optional</option> don't fail if mount does not work. <option>optional</option> don't fail if mount does not work.
<option>create=dir</option> or <option>create=file</option> <option>create=dir</option> or <option>create=file</option>
@ -1556,7 +1556,7 @@ by KATOH Yasufumi <karma at jazz.email.ne.jp>
<option>relative</option> source path is taken to be relative to <option>relative</option> source path is taken to be relative to
the mounted container root. For instance, the mounted container root. For instance,
--> -->
加えて、LXC では rslave や rprivate といったマウント・プロパゲーションオプションと、独自の 3 つのマウントオプションが使えます。 加えて、LXC では rshared や rprivate といったマウント・プロパゲーションオプションと、独自の 3 つのマウントオプションが使えます。
<option>optional</option> は、マウントが失敗しても失敗を返さずに無視します。 <option>optional</option> は、マウントが失敗しても失敗を返さずに無視します。
<option>create=dir</option> と <option>create=file</option> は、マウントポイントをマウントする際にディレクトリもしくはファイルを作成します。 <option>create=dir</option> と <option>create=file</option> は、マウントポイントをマウントする際にディレクトリもしくはファイルを作成します。
<option>relative</option> を指定すると、マウントされたコンテナルートからの相対パスとして取得されます。 <option>relative</option> を指定すると、マウントされたコンテナルートからの相対パスとして取得されます。

View File

@ -530,25 +530,25 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
modes are <option>l3</option>, <option>l3s</option> and modes are <option>l3</option>, <option>l3s</option> and
<option>l2</option>. It defaults to <option>l3</option> mode. <option>l2</option>. It defaults to <option>l3</option> mode.
In <option>l3</option> mode TX processing up to L3 happens on the stack instance In <option>l3</option> mode TX processing up to L3 happens on the stack instance
attached to the slave device and packets are switched to the stack instance of the attached to the dependent device and packets are switched to the stack instance of the
master device for the L2 processing and routing from that instance will be parent device for the L2 processing and routing from that instance will be
used before packets are queued on the outbound device. In this mode the slaves used before packets are queued on the outbound device. In this mode the dependent devices
will not receive nor can send multicast / broadcast traffic. will not receive nor can send multicast / broadcast traffic.
In <option>l3s</option> mode TX processing is very similar to the L3 mode except that In <option>l3s</option> mode TX processing is very similar to the L3 mode except that
iptables (conn-tracking) works in this mode and hence it is L3-symmetric (L3s). iptables (conn-tracking) works in this mode and hence it is L3-symmetric (L3s).
This will have slightly less performance but that shouldn't matter since you are This will have slightly less performance but that shouldn't matter since you are
choosing this mode over plain-L3 mode to make conn-tracking work. choosing this mode over plain-L3 mode to make conn-tracking work.
In <option>l2</option> mode TX processing happens on the stack instance attached to In <option>l2</option> mode TX processing happens on the stack instance attached to
the slave device and packets are switched and queued to the master device to send the dependent device and packets are switched and queued to the parent device to send devices
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable) as well. out. In this mode the dependent devices will RX/TX multicast and broadcast (if applicable) as well.
<option>lxc.net.[i].ipvlan.isolation</option> specifies the isolation mode. <option>lxc.net.[i].ipvlan.isolation</option> specifies the isolation mode.
The accepted isolation values are <option>bridge</option>, The accepted isolation values are <option>bridge</option>,
<option>private</option> and <option>vepa</option>. <option>private</option> and <option>vepa</option>.
It defaults to <option>bridge</option>. It defaults to <option>bridge</option>.
In <option>bridge</option> isolation mode slaves can cross-talk among themselves In <option>bridge</option> isolation mode dependent devices can cross-talk among themselves
apart from talking through the master device. apart from talking through the parent device.
In <option>private</option> isolation mode the port is set in private mode. In <option>private</option> isolation mode the port is set in private mode.
i.e. port won't allow cross communication between slaves. i.e. port won't allow cross communication between dependent devices.
In <option>vepa</option> isolation mode the port is set in VEPA mode. In <option>vepa</option> isolation mode the port is set in VEPA mode.
i.e. port will offload switching functionality to the external entity as i.e. port will offload switching functionality to the external entity as
described in 802.1Qbg. described in 802.1Qbg.
@ -1164,7 +1164,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Specify a mount point corresponding to a line in the Specify a mount point corresponding to a line in the
fstab format. fstab format.
Moreover lxc supports mount propagation, such as rslave or Moreover lxc supports mount propagation, such as rshared or
rprivate, and adds three additional mount options. rprivate, and adds three additional mount options.
<option>optional</option> don't fail if mount does not work. <option>optional</option> don't fail if mount does not work.
<option>create=dir</option> or <option>create=file</option> <option>create=dir</option> or <option>create=file</option>

View File

@ -29,7 +29,7 @@
#include <fcntl.h> #include <fcntl.h>
#include "config.h" #include "config.h"
#include "macro.h" #include "macro.h"
#include "raw_syscalls.h" #include "process_utils.h"
int fexecve(int fd, char *const argv[], char *const envp[]) int fexecve(int fd, char *const argv[], char *const envp[])
{ {
@ -41,11 +41,9 @@ int fexecve(int fd, char *const argv[], char *const envp[])
return -1; return -1;
} }
#ifdef __NR_execveat execveat(fd, "", argv, envp, AT_EMPTY_PATH);
lxc_raw_execveat(fd, "", argv, envp, AT_EMPTY_PATH);
if (errno != ENOSYS) if (errno != ENOSYS)
return -1; return -1;
#endif
ret = snprintf(procfd, sizeof(procfd), "/proc/self/fd/%d", fd); ret = snprintf(procfd, sizeof(procfd), "/proc/self/fd/%d", fd);
if (ret < 0 || (size_t)ret >= sizeof(procfd)) { if (ret < 0 || (size_t)ret >= sizeof(procfd)) {

View File

@ -34,43 +34,43 @@
#define _PATH_DEVPTMX "/dev/ptmx" #define _PATH_DEVPTMX "/dev/ptmx"
int openpty (int *amaster, int *aslave, char *name, struct termios *termp, int openpty (int *aptmx, int *apts, char *name, struct termios *termp,
struct winsize *winp) struct winsize *winp)
{ {
char buf[PATH_MAX]; char buf[PATH_MAX];
int master, slave; int ptmx, pts;
master = open(_PATH_DEVPTMX, O_RDWR); ptmx = open(_PATH_DEVPTMX, O_RDWR);
if (master == -1) if (ptmx == -1)
return -1; return -1;
if (grantpt(master)) if (grantpt(ptmx))
goto fail; goto fail;
if (unlockpt(master)) if (unlockpt(ptmx))
goto fail; goto fail;
if (ptsname_r(master, buf, sizeof buf)) if (ptsname_r(ptmx, buf, sizeof buf))
goto fail; goto fail;
slave = open(buf, O_RDWR | O_NOCTTY); pts = open(buf, O_RDWR | O_NOCTTY);
if (slave == -1) if (pts == -1)
goto fail; goto fail;
/* XXX Should we ignore errors here? */ /* XXX Should we ignore errors here? */
if (termp) if (termp)
tcsetattr(slave, TCSAFLUSH, termp); tcsetattr(pts, TCSAFLUSH, termp);
if (winp) if (winp)
ioctl(slave, TIOCSWINSZ, winp); ioctl(pts, TIOCSWINSZ, winp);
*amaster = master; *aptmx = ptmx;
*aslave = slave; *apts = pts;
if (name != NULL) if (name != NULL)
strcpy(name, buf); strcpy(name, buf);
return 0; return 0;
fail: fail:
close(master); close(ptmx);
return -1; return -1;
} }

View File

@ -27,10 +27,12 @@
#include <termios.h> #include <termios.h>
#include <sys/ioctl.h> #include <sys/ioctl.h>
/* Create pseudo tty master slave pair with NAME and set terminal /*
attributes according to TERMP and WINP and return handles for both * Create pseudo tty ptmx pts pair with @__name and set terminal
ends in AMASTER and ASLAVE. */ * attributes according to @__termp and @__winp and return handles for both
extern int openpty (int *__amaster, int *__aslave, char *__name, * ends in @__aptmx and @__apts.
*/
extern int openpty (int *__aptmx, int *__apts, char *__name,
const struct termios *__termp, const struct termios *__termp,
const struct winsize *__winp); const struct winsize *__winp);

View File

@ -27,11 +27,13 @@
#include "strlcpy.h" #include "strlcpy.h"
#endif #endif
size_t strlcat(char *d, const char *s, size_t n) size_t strlcat(char *src, const char *append, size_t len)
{ {
size_t l = strnlen(d, n); size_t src_len;
if (l == n)
return l + strlen(s);
return l + strlcpy(d + l, s, n - l); src_len = strnlen(src, len);
if (src_len == len)
return src_len + strlen(append);
return src_len + strlcpy(src + src_len, append, len - src_len);
} }

View File

@ -24,6 +24,6 @@
#include <stdio.h> #include <stdio.h>
extern size_t strlcat(char *d, const char *s, size_t n); extern size_t strlcat(char *src, const char *append, size_t len);
#endif #endif

View File

@ -27,7 +27,7 @@ noinst_HEADERS = api_extensions.h \
memory_utils.h \ memory_utils.h \
monitor.h \ monitor.h \
namespace.h \ namespace.h \
raw_syscalls.h \ process_utils.h \
rexec.h \ rexec.h \
start.h \ start.h \
state.h \ state.h \
@ -128,7 +128,7 @@ liblxc_la_SOURCES = af_unix.c af_unix.h \
network.c network.h \ network.c network.h \
monitor.c monitor.h \ monitor.c monitor.h \
parse.c parse.h \ parse.c parse.h \
raw_syscalls.c raw_syscalls.h \ process_utils.c process_utils.h \
ringbuf.c ringbuf.h \ ringbuf.c ringbuf.h \
rtnl.c rtnl.h \ rtnl.c rtnl.h \
state.c state.h \ state.c state.h \
@ -384,7 +384,7 @@ init_lxc_SOURCES = cmd/lxc_init.c \
initutils.c initutils.h \ initutils.c initutils.h \
memory_utils.h \ memory_utils.h \
parse.c parse.h \ parse.c parse.h \
raw_syscalls.c raw_syscalls.h \ process_utils.c process_utils.h \
syscall_numbers.h \ syscall_numbers.h \
string_utils.c string_utils.h string_utils.c string_utils.h
@ -395,7 +395,7 @@ lxc_monitord_SOURCES = cmd/lxc_monitord.c \
log.c log.h \ log.c log.h \
mainloop.c mainloop.h \ mainloop.c mainloop.h \
monitor.c monitor.h \ monitor.c monitor.h \
raw_syscalls.c raw_syscalls.h \ process_utils.c process_utils.h \
syscall_numbers.h \ syscall_numbers.h \
utils.c utils.h utils.c utils.h
lxc_user_nic_SOURCES = cmd/lxc_user_nic.c \ lxc_user_nic_SOURCES = cmd/lxc_user_nic.c \
@ -404,7 +404,7 @@ lxc_user_nic_SOURCES = cmd/lxc_user_nic.c \
memory_utils.h \ memory_utils.h \
network.c network.h \ network.c network.h \
parse.c parse.h \ parse.c parse.h \
raw_syscalls.c raw_syscalls.h \ process_utils.c process_utils.h \
syscall_numbers.h \ syscall_numbers.h \
file_utils.c file_utils.h \ file_utils.c file_utils.h \
string_utils.c string_utils.h \ string_utils.c string_utils.h \

View File

@ -18,7 +18,7 @@
#include "log.h" #include "log.h"
#include "macro.h" #include "macro.h"
#include "memory_utils.h" #include "memory_utils.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "utils.h" #include "utils.h"
#ifndef HAVE_STRLCPY #ifndef HAVE_STRLCPY
@ -189,7 +189,7 @@ static int lxc_abstract_unix_recv_fds_iov(int fd, int *recvfds, int num_recvfds,
msg.msg_iovlen = iovlen; msg.msg_iovlen = iovlen;
do { do {
ret = recvmsg(fd, &msg, 0); ret = recvmsg(fd, &msg, MSG_CMSG_CLOEXEC);
} while (ret < 0 && errno == EINTR); } while (ret < 0 && errno == EINTR);
if (ret < 0 || ret == 0) if (ret < 0 || ret == 0)
return ret; return ret;

View File

@ -7,22 +7,35 @@
#include <sys/socket.h> #include <sys/socket.h>
#include <sys/un.h> #include <sys/un.h>
#include "compiler.h"
/* does not enforce \0-termination */ /* does not enforce \0-termination */
extern int lxc_abstract_unix_open(const char *path, int type, int flags); extern int lxc_abstract_unix_open(const char *path, int type, int flags);
extern void lxc_abstract_unix_close(int fd); extern void lxc_abstract_unix_close(int fd);
/* does not enforce \0-termination */ /* does not enforce \0-termination */
extern int lxc_abstract_unix_connect(const char *path); extern int lxc_abstract_unix_connect(const char *path);
extern int lxc_abstract_unix_send_fds(int fd, int *sendfds, int num_sendfds, extern int lxc_abstract_unix_send_fds(int fd, int *sendfds, int num_sendfds,
void *data, size_t size); void *data, size_t size)
extern int lxc_abstract_unix_send_fds_iov(int fd, int *sendfds, __access_r(2, 3) __access_r(4, 5);
int num_sendfds, struct iovec *iov,
size_t iovlen); extern int lxc_abstract_unix_send_fds_iov(int fd, int *sendfds, int num_sendfds,
struct iovec *iov, size_t iovlen)
__access_r(2, 3);
extern int lxc_abstract_unix_recv_fds(int fd, int *recvfds, int num_recvfds,
void *data, size_t size)
__access_r(2, 3) __access_r(4, 5);
extern int lxc_unix_send_fds(int fd, int *sendfds, int num_sendfds, void *data, extern int lxc_unix_send_fds(int fd, int *sendfds, int num_sendfds, void *data,
size_t size); size_t size);
extern int lxc_abstract_unix_recv_fds(int fd, int *recvfds, int num_recvfds,
void *data, size_t size); extern int lxc_abstract_unix_send_credential(int fd, void *data, size_t size)
extern int lxc_abstract_unix_send_credential(int fd, void *data, size_t size); __access_r(2, 3);
extern int lxc_abstract_unix_rcv_credential(int fd, void *data, size_t size);
extern int lxc_abstract_unix_rcv_credential(int fd, void *data, size_t size)
__access_w(2, 3);
extern int lxc_unix_sockaddr(struct sockaddr_un *ret, const char *path); extern int lxc_unix_sockaddr(struct sockaddr_un *ret, const char *path);
extern int lxc_unix_connect(struct sockaddr_un *addr); extern int lxc_unix_connect(struct sockaddr_un *addr);
extern int lxc_unix_connect_type(struct sockaddr_un *addr, int type); extern int lxc_unix_connect_type(struct sockaddr_un *addr, int type);

View File

@ -38,6 +38,7 @@ static char *api_extensions[] = {
"cgroup2_devices", "cgroup2_devices",
#endif #endif
"cgroup2", "cgroup2",
"pidfd",
}; };
static size_t nr_api_extensions = sizeof(api_extensions) / sizeof(*api_extensions); static size_t nr_api_extensions = sizeof(api_extensions) / sizeof(*api_extensions);

View File

@ -40,7 +40,7 @@
#include "mainloop.h" #include "mainloop.h"
#include "memory_utils.h" #include "memory_utils.h"
#include "namespace.h" #include "namespace.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "syscall_wrappers.h" #include "syscall_wrappers.h"
#include "terminal.h" #include "terminal.h"
#include "utils.h" #include "utils.h"
@ -194,12 +194,8 @@ int lxc_attach_remount_sys_proc(void)
if (ret < 0) if (ret < 0)
return log_error_errno(-1, errno, "Failed to unshare mount namespace"); return log_error_errno(-1, errno, "Failed to unshare mount namespace");
if (detect_shared_rootfs()) { if (detect_shared_rootfs() && mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL))
if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL)) { SYSERROR("Failed to recursively turn root mount tree into dependent mount. Continuing...");
SYSERROR("Failed to make / rslave");
ERROR("Continuing...");
}
}
/* Assume /proc is always mounted, so remount it. */ /* Assume /proc is always mounted, so remount it. */
ret = umount2("/proc", MNT_DETACH); ret = umount2("/proc", MNT_DETACH);
@ -629,7 +625,7 @@ static signed long get_personality(const char *name, const char *lxcpath)
struct attach_clone_payload { struct attach_clone_payload {
int ipc_socket; int ipc_socket;
int terminal_slave_fd; int terminal_pts_fd;
lxc_attach_options_t *options; lxc_attach_options_t *options;
struct lxc_proc_context_info *init_ctx; struct lxc_proc_context_info *init_ctx;
lxc_attach_exec_t exec_function; lxc_attach_exec_t exec_function;
@ -639,7 +635,7 @@ struct attach_clone_payload {
static void lxc_put_attach_clone_payload(struct attach_clone_payload *p) static void lxc_put_attach_clone_payload(struct attach_clone_payload *p)
{ {
close_prot_errno_disarm(p->ipc_socket); close_prot_errno_disarm(p->ipc_socket);
close_prot_errno_disarm(p->terminal_slave_fd); close_prot_errno_disarm(p->terminal_pts_fd);
if (p->init_ctx) { if (p->init_ctx) {
lxc_proc_put_context_info(p->init_ctx); lxc_proc_put_context_info(p->init_ctx);
p->init_ctx = NULL; p->init_ctx = NULL;
@ -860,13 +856,13 @@ static int attach_child_main(struct attach_clone_payload *payload)
} }
if (options->attach_flags & LXC_ATTACH_TERMINAL) { if (options->attach_flags & LXC_ATTACH_TERMINAL) {
ret = lxc_terminal_prepare_login(payload->terminal_slave_fd); ret = lxc_terminal_prepare_login(payload->terminal_pts_fd);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to prepare terminal file descriptor %d", payload->terminal_slave_fd); SYSERROR("Failed to prepare terminal file descriptor %d", payload->terminal_pts_fd);
goto on_error; goto on_error;
} }
TRACE("Prepared terminal file descriptor %d", payload->terminal_slave_fd); TRACE("Prepared terminal file descriptor %d", payload->terminal_pts_fd);
} }
/* Avoid unnecessary syscalls. */ /* Avoid unnecessary syscalls. */
@ -876,6 +872,11 @@ static int attach_child_main(struct attach_clone_payload *payload)
if (new_gid == ns_root_gid) if (new_gid == ns_root_gid)
new_gid = LXC_INVALID_GID; new_gid = LXC_INVALID_GID;
/* Make sure that the processes STDIO is correctly owned by the user that we are switching to */
ret = fix_stdio_permissions(new_uid);
if (ret)
WARN("Failed to ajust stdio permissions");
if (!lxc_switch_uid_gid(new_uid, new_gid)) if (!lxc_switch_uid_gid(new_uid, new_gid))
goto on_error; goto on_error;
@ -931,14 +932,14 @@ static int lxc_attach_terminal_mainloop_init(struct lxc_terminal *terminal,
return 0; return 0;
} }
static inline void lxc_attach_terminal_close_master(struct lxc_terminal *terminal) static inline void lxc_attach_terminal_close_ptmx(struct lxc_terminal *terminal)
{ {
close_prot_errno_disarm(terminal->master); close_prot_errno_disarm(terminal->ptmx);
} }
static inline void lxc_attach_terminal_close_slave(struct lxc_terminal *terminal) static inline void lxc_attach_terminal_close_pts(struct lxc_terminal *terminal)
{ {
close_prot_errno_disarm(terminal->slave); close_prot_errno_disarm(terminal->pts);
} }
static inline void lxc_attach_terminal_close_peer(struct lxc_terminal *terminal) static inline void lxc_attach_terminal_close_peer(struct lxc_terminal *terminal)
@ -1013,6 +1014,8 @@ int lxc_attach(struct lxc_container *container, lxc_attach_exec_t exec_function,
} }
} }
conf = init_ctx->container->lxc_conf; conf = init_ctx->container->lxc_conf;
if (!conf)
return log_error_errno(-EINVAL, EINVAL, "Missing container confifg");
if (!fetch_seccomp(init_ctx->container, options)) if (!fetch_seccomp(init_ctx->container, options))
WARN("Failed to get seccomp policy"); WARN("Failed to get seccomp policy");
@ -1166,7 +1169,7 @@ int lxc_attach(struct lxc_container *container, lxc_attach_exec_t exec_function,
free(cwd); free(cwd);
lxc_proc_close_ns_fd(init_ctx); lxc_proc_close_ns_fd(init_ctx);
if (options->attach_flags & LXC_ATTACH_TERMINAL) if (options->attach_flags & LXC_ATTACH_TERMINAL)
lxc_attach_terminal_close_slave(&terminal); lxc_attach_terminal_close_pts(&terminal);
/* Attach to cgroup, if requested. */ /* Attach to cgroup, if requested. */
if (options->attach_flags & LXC_ATTACH_MOVE_TO_CGROUP) { if (options->attach_flags & LXC_ATTACH_MOVE_TO_CGROUP) {
@ -1174,7 +1177,7 @@ int lxc_attach(struct lxc_container *container, lxc_attach_exec_t exec_function,
* If this is the unified hierarchy cgroup_attach() is * If this is the unified hierarchy cgroup_attach() is
* enough. * enough.
*/ */
ret = cgroup_attach(name, lxcpath, pid); ret = cgroup_attach(conf, name, lxcpath, pid);
if (ret) { if (ret) {
call_cleaner(cgroup_exit) struct cgroup_ops *cgroup_ops = NULL; call_cleaner(cgroup_exit) struct cgroup_ops *cgroup_ops = NULL;
@ -1182,7 +1185,7 @@ int lxc_attach(struct lxc_container *container, lxc_attach_exec_t exec_function,
if (!cgroup_ops) if (!cgroup_ops)
goto on_error; goto on_error;
if (!cgroup_ops->attach(cgroup_ops, name, lxcpath, pid)) if (!cgroup_ops->attach(cgroup_ops, conf, name, lxcpath, pid))
goto on_error; goto on_error;
} }
TRACE("Moved intermediate process %d into container's cgroups", pid); TRACE("Moved intermediate process %d into container's cgroups", pid);
@ -1270,7 +1273,7 @@ int lxc_attach(struct lxc_container *container, lxc_attach_exec_t exec_function,
TRACE("Sent LSM label file descriptor %d to child", labelfd); TRACE("Sent LSM label file descriptor %d to child", labelfd);
} }
if (conf && conf->seccomp.seccomp) { if (conf->seccomp.seccomp) {
ret = lxc_seccomp_recv_notifier_fd(&conf->seccomp, ipc_sockets[0]); ret = lxc_seccomp_recv_notifier_fd(&conf->seccomp, ipc_sockets[0]);
if (ret < 0) if (ret < 0)
goto close_mainloop; goto close_mainloop;
@ -1326,11 +1329,10 @@ int lxc_attach(struct lxc_container *container, lxc_attach_exec_t exec_function,
} }
/* close unneeded file descriptors */ /* close unneeded file descriptors */
close(ipc_sockets[0]); close_prot_errno_disarm(ipc_sockets[0]);
ipc_sockets[0] = -EBADF;
if (options->attach_flags & LXC_ATTACH_TERMINAL) { if (options->attach_flags & LXC_ATTACH_TERMINAL) {
lxc_attach_terminal_close_master(&terminal); lxc_attach_terminal_close_ptmx(&terminal);
lxc_attach_terminal_close_peer(&terminal); lxc_attach_terminal_close_peer(&terminal);
lxc_attach_terminal_close_log(&terminal); lxc_attach_terminal_close_log(&terminal);
} }
@ -1375,7 +1377,7 @@ int lxc_attach(struct lxc_container *container, lxc_attach_exec_t exec_function,
payload.ipc_socket = ipc_sockets[1]; payload.ipc_socket = ipc_sockets[1];
payload.options = options; payload.options = options;
payload.init_ctx = init_ctx; payload.init_ctx = init_ctx;
payload.terminal_slave_fd = terminal.slave; payload.terminal_pts_fd = terminal.pts;
payload.exec_function = exec_function; payload.exec_function = exec_function;
payload.exec_payload = exec_payload; payload.exec_payload = exec_payload;
@ -1405,7 +1407,7 @@ int lxc_attach(struct lxc_container *container, lxc_attach_exec_t exec_function,
} }
if (options->attach_flags & LXC_ATTACH_TERMINAL) if (options->attach_flags & LXC_ATTACH_TERMINAL)
lxc_attach_terminal_close_slave(&terminal); lxc_attach_terminal_close_pts(&terminal);
/* Tell grandparent the pid of the pid of the newly created child. */ /* Tell grandparent the pid of the pid of the newly created child. */
ret = lxc_write_nointr(ipc_sockets[1], &pid, sizeof(pid)); ret = lxc_write_nointr(ipc_sockets[1], &pid, sizeof(pid));

View File

@ -26,7 +26,7 @@ enum {
/* The following are off by default: */ /* The following are off by default: */
LXC_ATTACH_REMOUNT_PROC_SYS = 0x00010000, /*!< Remount /proc filesystem */ LXC_ATTACH_REMOUNT_PROC_SYS = 0x00010000, /*!< Remount /proc filesystem */
LXC_ATTACH_LSM_NOW = 0x00020000, /*!< FIXME: unknown */ LXC_ATTACH_LSM_NOW = 0x00020000, /*!< TODO: currently unused */
/* Set PR_SET_NO_NEW_PRIVS to block execve() gainable privileges. */ /* Set PR_SET_NO_NEW_PRIVS to block execve() gainable privileges. */
LXC_ATTACH_NO_NEW_PRIVS = 0x00040000, /*!< PR_SET_NO_NEW_PRIVS */ LXC_ATTACH_NO_NEW_PRIVS = 0x00040000, /*!< PR_SET_NO_NEW_PRIVS */
LXC_ATTACH_TERMINAL = 0x00080000, /*!< Allocate new terminal for attached process. */ LXC_ATTACH_TERMINAL = 0x00080000, /*!< Allocate new terminal for attached process. */

View File

@ -27,9 +27,11 @@
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
#include <sys/epoll.h>
#include <sys/types.h> #include <sys/types.h>
#include <unistd.h> #include <unistd.h>
#include "af_unix.h"
#include "caps.h" #include "caps.h"
#include "cgroup.h" #include "cgroup.h"
#include "cgroup2_devices.h" #include "cgroup2_devices.h"
@ -724,6 +726,7 @@ static struct hierarchy *add_hierarchy(struct hierarchy ***h, char **clist, char
new->container_base_path = container_base_path; new->container_base_path = container_base_path;
new->version = type; new->version = type;
new->cgfd_con = -EBADF; new->cgfd_con = -EBADF;
new->cgfd_limit = -EBADF;
new->cgfd_mon = -EBADF; new->cgfd_mon = -EBADF;
newentry = append_null_to_list((void ***)h); newentry = append_null_to_list((void ***)h);
@ -945,7 +948,7 @@ static void lxc_cgfsng_print_basecg_debuginfo(char *basecginfo, char **klist,
TRACE("named subsystem %d: %s", k, *it); TRACE("named subsystem %d: %s", k, *it);
} }
static int cgroup_rmdir(struct hierarchy **hierarchies, static int cgroup_tree_remove(struct hierarchy **hierarchies,
const char *container_cgroup) const char *container_cgroup)
{ {
if (!container_cgroup || !hierarchies) if (!container_cgroup || !hierarchies)
@ -955,13 +958,15 @@ static int cgroup_rmdir(struct hierarchy **hierarchies,
struct hierarchy *h = hierarchies[i]; struct hierarchy *h = hierarchies[i];
int ret; int ret;
if (!h->container_full_path) if (!h->container_limit_path)
continue; continue;
ret = recursive_destroy(h->container_full_path); ret = lxc_rm_rf(h->container_limit_path);
if (ret < 0) if (ret < 0)
WARN("Failed to destroy \"%s\"", h->container_full_path); WARN("Failed to destroy \"%s\"", h->container_limit_path);
if (h->container_limit_path != h->container_full_path)
free_disarm(h->container_limit_path);
free_disarm(h->container_full_path); free_disarm(h->container_full_path);
} }
@ -976,7 +981,7 @@ struct generic_userns_exec_data {
char *path; char *path;
}; };
static int cgroup_rmdir_wrapper(void *data) static int cgroup_tree_remove_wrapper(void *data)
{ {
struct generic_userns_exec_data *arg = data; struct generic_userns_exec_data *arg = data;
uid_t nsuid = (arg->conf->root_nsuid_map != NULL) ? 0 : arg->conf->init_uid; uid_t nsuid = (arg->conf->root_nsuid_map != NULL) ? 0 : arg->conf->init_uid;
@ -996,7 +1001,7 @@ static int cgroup_rmdir_wrapper(void *data)
return log_error_errno(-1, errno, "Failed to setresuid(%d, %d, %d)", return log_error_errno(-1, errno, "Failed to setresuid(%d, %d, %d)",
(int)nsuid, (int)nsuid, (int)nsuid); (int)nsuid, (int)nsuid, (int)nsuid);
return cgroup_rmdir(arg->hierarchies, arg->container_cgroup); return cgroup_tree_remove(arg->hierarchies, arg->container_cgroup);
} }
__cgfsng_ops static void cgfsng_payload_destroy(struct cgroup_ops *ops, __cgfsng_ops static void cgfsng_payload_destroy(struct cgroup_ops *ops,
@ -1035,10 +1040,10 @@ __cgfsng_ops static void cgfsng_payload_destroy(struct cgroup_ops *ops,
.hierarchies = ops->hierarchies, .hierarchies = ops->hierarchies,
.origuid = 0, .origuid = 0,
}; };
ret = userns_exec_1(handler->conf, cgroup_rmdir_wrapper, &wrap, ret = userns_exec_1(handler->conf, cgroup_tree_remove_wrapper,
"cgroup_rmdir_wrapper"); &wrap, "cgroup_tree_remove_wrapper");
} else { } else {
ret = cgroup_rmdir(ops->hierarchies, ops->container_cgroup); ret = cgroup_tree_remove(ops->hierarchies, ops->container_cgroup);
} }
if (ret < 0) if (ret < 0)
SYSWARN("Failed to destroy cgroups"); SYSWARN("Failed to destroy cgroups");
@ -1077,25 +1082,37 @@ __cgfsng_ops static void cgfsng_monitor_destroy(struct cgroup_ops *ops,
for (int i = 0; ops->hierarchies[i]; i++) { for (int i = 0; ops->hierarchies[i]; i++) {
__do_free char *pivot_path = NULL; __do_free char *pivot_path = NULL;
struct hierarchy *h = ops->hierarchies[i]; struct hierarchy *h = ops->hierarchies[i];
size_t offset;
int ret; int ret;
if (!h->monitor_full_path) if (!h->monitor_full_path)
continue; continue;
if (conf && conf->cgroup_meta.dir) /* Monitor might have died before we entered the cgroup. */
pivot_path = must_make_path(h->mountpoint, if (handler->monitor_pid <= 0) {
h->container_base_path, WARN("No valid monitor process found while destroying cgroups");
conf->cgroup_meta.dir, goto try_lxc_rm_rf;
CGROUP_PIVOT, NULL); }
if (conf && conf->cgroup_meta.monitor_dir)
pivot_path = must_make_path(h->mountpoint, h->container_base_path,
conf->cgroup_meta.monitor_dir, CGROUP_PIVOT, NULL);
else if (conf && conf->cgroup_meta.dir)
pivot_path = must_make_path(h->mountpoint, h->container_base_path,
conf->cgroup_meta.dir, CGROUP_PIVOT, NULL);
else else
pivot_path = must_make_path(h->mountpoint, pivot_path = must_make_path(h->mountpoint, h->container_base_path,
h->container_base_path,
CGROUP_PIVOT, NULL); CGROUP_PIVOT, NULL);
offset = strlen(h->mountpoint) + strlen(h->container_base_path);
if (cg_legacy_handle_cpuset_hierarchy(h, pivot_path + offset))
SYSWARN("Failed to initialize cpuset %s/" CGROUP_PIVOT, pivot_path);
ret = mkdir_p(pivot_path, 0755); ret = mkdir_p(pivot_path, 0755);
if (ret < 0 && errno != EEXIST) { if (ret < 0 && errno != EEXIST) {
ERROR("Failed to create %s", pivot_path); ERROR("Failed to create %s", pivot_path);
goto try_recursive_destroy; goto try_lxc_rm_rf;
} }
ret = lxc_write_openat(pivot_path, "cgroup.procs", pidstr, len); ret = lxc_write_openat(pivot_path, "cgroup.procs", pidstr, len);
@ -1104,8 +1121,8 @@ __cgfsng_ops static void cgfsng_monitor_destroy(struct cgroup_ops *ops,
continue; continue;
} }
try_recursive_destroy: try_lxc_rm_rf:
ret = recursive_destroy(h->monitor_full_path); ret = lxc_rm_rf(h->monitor_full_path);
if (ret < 0) if (ret < 0)
WARN("Failed to destroy \"%s\"", h->monitor_full_path); WARN("Failed to destroy \"%s\"", h->monitor_full_path);
} }
@ -1133,16 +1150,18 @@ static int mkdir_eexist_on_last(const char *dir, mode_t mode)
ret = mkdir(makeme, mode); ret = mkdir(makeme, mode);
if (ret < 0 && ((errno != EEXIST) || (orig_len == cur_len))) if (ret < 0 && ((errno != EEXIST) || (orig_len == cur_len)))
return log_error_errno(-1, errno, "Failed to create directory \"%s\"", makeme); return log_warn_errno(-1, errno, "Failed to create directory \"%s\"", makeme);
} while (tmp != dir); } while (tmp != dir);
return 0; return 0;
} }
static bool create_cgroup_tree(struct hierarchy *h, const char *cgroup_tree, static bool cgroup_tree_create(struct cgroup_ops *ops, struct lxc_conf *conf,
const char *cgroup_leaf, bool payload) struct hierarchy *h, const char *cgroup_tree,
const char *cgroup_leaf, bool payload,
const char *cgroup_limit_dir)
{ {
__do_free char *path = NULL; __do_free char *path = NULL, *limit_path = NULL;
int ret, ret_cpuset; int ret, ret_cpuset;
path = must_make_path(h->mountpoint, h->container_base_path, cgroup_leaf, NULL); path = must_make_path(h->mountpoint, h->container_base_path, cgroup_leaf, NULL);
@ -1153,6 +1172,37 @@ static bool create_cgroup_tree(struct hierarchy *h, const char *cgroup_tree,
if (ret_cpuset < 0) if (ret_cpuset < 0)
return log_error_errno(false, errno, "Failed to handle legacy cpuset controller"); return log_error_errno(false, errno, "Failed to handle legacy cpuset controller");
if (payload && cgroup_limit_dir) {
/* with isolation both parts need to not already exist */
limit_path = must_make_path(h->mountpoint,
h->container_base_path,
cgroup_limit_dir, NULL);
ret = mkdir_eexist_on_last(limit_path, 0755);
if (ret < 0)
return log_debug_errno(false,
errno, "Failed to create %s limiting cgroup",
limit_path);
h->cgfd_limit = lxc_open_dirfd(limit_path);
if (h->cgfd_limit < 0)
return log_error_errno(false, errno,
"Failed to open %s", path);
h->container_limit_path = move_ptr(limit_path);
/*
* With isolation the devices legacy cgroup needs to be
* iinitialized early, as it typically contains an 'a' (all)
* line, which is not possible once a subdirectory has been
* created.
*/
if (string_in_list(h->controllers, "devices")) {
ret = ops->setup_limits_legacy(ops, conf, true);
if (ret < 0)
return ret;
}
}
ret = mkdir_eexist_on_last(path, 0755); ret = mkdir_eexist_on_last(path, 0755);
if (ret < 0) { if (ret < 0) {
/* /*
@ -1161,7 +1211,7 @@ static bool create_cgroup_tree(struct hierarchy *h, const char *cgroup_tree,
* directory for us to ensure correct initialization. * directory for us to ensure correct initialization.
*/ */
if (ret_cpuset != 1 || cgroup_tree) if (ret_cpuset != 1 || cgroup_tree)
return log_error_errno(false, errno, "Failed to create %s cgroup", path); return log_debug_errno(false, errno, "Failed to create %s cgroup", path);
} }
if (payload) { if (payload) {
@ -1169,6 +1219,10 @@ static bool create_cgroup_tree(struct hierarchy *h, const char *cgroup_tree,
if (h->cgfd_con < 0) if (h->cgfd_con < 0)
return log_error_errno(false, errno, "Failed to open %s", path); return log_error_errno(false, errno, "Failed to open %s", path);
h->container_full_path = move_ptr(path); h->container_full_path = move_ptr(path);
if (h->cgfd_limit < 0)
h->cgfd_limit = h->cgfd_con;
if (!h->container_limit_path)
h->container_limit_path = h->container_full_path;
} else { } else {
h->cgfd_mon = lxc_open_dirfd(path); h->cgfd_mon = lxc_open_dirfd(path);
if (h->cgfd_mon < 0) if (h->cgfd_mon < 0)
@ -1179,13 +1233,17 @@ static bool create_cgroup_tree(struct hierarchy *h, const char *cgroup_tree,
return true; return true;
} }
static void cgroup_remove_leaf(struct hierarchy *h, bool payload) static void cgroup_tree_leaf_remove(struct hierarchy *h, bool payload)
{ {
__do_free char *full_path = NULL; __do_free char *full_path = NULL, *__limit_path = NULL;
char *limit_path = NULL;
if (payload) { if (payload) {
__lxc_unused __do_close int fd = move_fd(h->cgfd_con); __lxc_unused __do_close int fd = move_fd(h->cgfd_con);
full_path = move_ptr(h->container_full_path); full_path = move_ptr(h->container_full_path);
limit_path = move_ptr(h->container_limit_path);
if (limit_path != full_path)
__limit_path = limit_path;
} else { } else {
__lxc_unused __do_close int fd = move_fd(h->cgfd_mon); __lxc_unused __do_close int fd = move_fd(h->cgfd_mon);
full_path = move_ptr(h->monitor_full_path); full_path = move_ptr(h->monitor_full_path);
@ -1193,6 +1251,38 @@ static void cgroup_remove_leaf(struct hierarchy *h, bool payload)
if (full_path && rmdir(full_path)) if (full_path && rmdir(full_path))
SYSWARN("Failed to rmdir(\"%s\") cgroup", full_path); SYSWARN("Failed to rmdir(\"%s\") cgroup", full_path);
if (limit_path && rmdir(limit_path))
SYSWARN("Failed to rmdir(\"%s\") cgroup", limit_path);
}
/*
* Check we have no lxc.cgroup.dir, and that lxc.cgroup.dir.limit_prefix is a
* proper prefix directory of lxc.cgroup.dir.payload.
*
* Returns the prefix length if it is set, otherwise zero on success.
*/
static bool check_cgroup_dir_config(struct lxc_conf *conf)
{
const char *monitor_dir = conf->cgroup_meta.monitor_dir,
*container_dir = conf->cgroup_meta.container_dir,
*namespace_dir = conf->cgroup_meta.namespace_dir;
/* none of the new options are set, all is fine */
if (!monitor_dir && !container_dir && !namespace_dir)
return true;
/* some are set, make sure lxc.cgroup.dir is not also set*/
if (conf->cgroup_meta.dir)
return log_error_errno(false, EINVAL,
"lxc.cgroup.dir conflicts with lxc.cgroup.dir.payload/monitor");
/* make sure both monitor and payload are set */
if (!monitor_dir || !container_dir)
return log_error_errno(false, EINVAL,
"lxc.cgroup.dir.payload and lxc.cgroup.dir.monitor must both be set");
/* namespace_dir may be empty */
return true;
} }
__cgfsng_ops static inline bool cgfsng_monitor_create(struct cgroup_ops *ops, __cgfsng_ops static inline bool cgfsng_monitor_create(struct cgroup_ops *ops,
@ -1203,7 +1293,7 @@ __cgfsng_ops static inline bool cgfsng_monitor_create(struct cgroup_ops *ops,
int idx = 0; int idx = 0;
int i; int i;
size_t len; size_t len;
char *suffix; char *suffix = NULL;
struct lxc_conf *conf; struct lxc_conf *conf;
if (!ops) if (!ops)
@ -1220,7 +1310,13 @@ __cgfsng_ops static inline bool cgfsng_monitor_create(struct cgroup_ops *ops,
conf = handler->conf; conf = handler->conf;
if (conf->cgroup_meta.dir) { if (!check_cgroup_dir_config(conf))
return false;
if (conf->cgroup_meta.monitor_dir) {
cgroup_tree = NULL;
monitor_cgroup = strdup(conf->cgroup_meta.monitor_dir);
} else if (conf->cgroup_meta.dir) {
cgroup_tree = conf->cgroup_meta.dir; cgroup_tree = conf->cgroup_meta.dir;
monitor_cgroup = must_concat(&len, conf->cgroup_meta.dir, "/", monitor_cgroup = must_concat(&len, conf->cgroup_meta.dir, "/",
DEFAULT_MONITOR_CGROUP_PREFIX, DEFAULT_MONITOR_CGROUP_PREFIX,
@ -1244,27 +1340,31 @@ __cgfsng_ops static inline bool cgfsng_monitor_create(struct cgroup_ops *ops,
if (!monitor_cgroup) if (!monitor_cgroup)
return ret_set_errno(false, ENOMEM); return ret_set_errno(false, ENOMEM);
if (!conf->cgroup_meta.monitor_dir) {
suffix = monitor_cgroup + len - CGROUP_CREATE_RETRY_LEN; suffix = monitor_cgroup + len - CGROUP_CREATE_RETRY_LEN;
*suffix = '\0'; *suffix = '\0';
}
do { do {
if (idx) if (idx && suffix)
sprintf(suffix, "-%d", idx); sprintf(suffix, "-%d", idx);
for (i = 0; ops->hierarchies[i]; i++) { for (i = 0; ops->hierarchies[i]; i++) {
if (create_cgroup_tree(ops->hierarchies[i], cgroup_tree, monitor_cgroup, false)) if (cgroup_tree_create(ops, handler->conf,
ops->hierarchies[i], cgroup_tree,
monitor_cgroup, false, NULL))
continue; continue;
ERROR("Failed to create cgroup \"%s\"", ops->hierarchies[i]->monitor_full_path ?: "(null)"); DEBUG("Failed to create cgroup \"%s\"", ops->hierarchies[i]->monitor_full_path ?: "(null)");
for (int j = 0; j < i; j++) for (int j = 0; j < i; j++)
cgroup_remove_leaf(ops->hierarchies[j], false); cgroup_tree_leaf_remove(ops->hierarchies[j], false);
idx++; idx++;
break; break;
} }
} while (ops->hierarchies[i] && idx > 0 && idx < 1000); } while (ops->hierarchies[i] && idx > 0 && idx < 1000 && suffix);
if (idx == 1000) if (idx == 1000 || (!suffix && idx != 0))
return ret_set_errno(false, ERANGE); return log_error_errno(false, ERANGE, "Failed to create monitor cgroup");
ops->monitor_cgroup = move_ptr(monitor_cgroup); ops->monitor_cgroup = move_ptr(monitor_cgroup);
return log_info(true, "The monitor process uses \"%s\" as cgroup", ops->monitor_cgroup); return log_info(true, "The monitor process uses \"%s\" as cgroup", ops->monitor_cgroup);
@ -1277,12 +1377,14 @@ __cgfsng_ops static inline bool cgfsng_monitor_create(struct cgroup_ops *ops,
__cgfsng_ops static inline bool cgfsng_payload_create(struct cgroup_ops *ops, __cgfsng_ops static inline bool cgfsng_payload_create(struct cgroup_ops *ops,
struct lxc_handler *handler) struct lxc_handler *handler)
{ {
__do_free char *container_cgroup = NULL, *__cgroup_tree = NULL; __do_free char *container_cgroup = NULL,
*__cgroup_tree = NULL,
*limiting_cgroup = NULL;
const char *cgroup_tree; const char *cgroup_tree;
int idx = 0; int idx = 0;
int i; int i;
size_t len; size_t len;
char *suffix; char *suffix = NULL;
struct lxc_conf *conf; struct lxc_conf *conf;
if (!ops) if (!ops)
@ -1299,7 +1401,25 @@ __cgfsng_ops static inline bool cgfsng_payload_create(struct cgroup_ops *ops,
conf = handler->conf; conf = handler->conf;
if (conf->cgroup_meta.dir) { if (!check_cgroup_dir_config(conf))
return false;
if (conf->cgroup_meta.container_dir) {
cgroup_tree = NULL;
limiting_cgroup = strdup(conf->cgroup_meta.container_dir);
if (!limiting_cgroup)
return ret_set_errno(false, ENOMEM);
if (conf->cgroup_meta.namespace_dir) {
container_cgroup = must_make_path(limiting_cgroup,
conf->cgroup_meta.namespace_dir,
NULL);
} else {
/* explicit paths but without isolation */
container_cgroup = move_ptr(limiting_cgroup);
}
} else if (conf->cgroup_meta.dir) {
cgroup_tree = conf->cgroup_meta.dir; cgroup_tree = conf->cgroup_meta.dir;
container_cgroup = must_concat(&len, cgroup_tree, "/", container_cgroup = must_concat(&len, cgroup_tree, "/",
DEFAULT_PAYLOAD_CGROUP_PREFIX, DEFAULT_PAYLOAD_CGROUP_PREFIX,
@ -1323,27 +1443,32 @@ __cgfsng_ops static inline bool cgfsng_payload_create(struct cgroup_ops *ops,
if (!container_cgroup) if (!container_cgroup)
return ret_set_errno(false, ENOMEM); return ret_set_errno(false, ENOMEM);
if (!conf->cgroup_meta.container_dir) {
suffix = container_cgroup + len - CGROUP_CREATE_RETRY_LEN; suffix = container_cgroup + len - CGROUP_CREATE_RETRY_LEN;
*suffix = '\0'; *suffix = '\0';
}
do { do {
if (idx) if (idx && suffix)
sprintf(suffix, "-%d", idx); sprintf(suffix, "-%d", idx);
for (i = 0; ops->hierarchies[i]; i++) { for (i = 0; ops->hierarchies[i]; i++) {
if (create_cgroup_tree(ops->hierarchies[i], cgroup_tree, container_cgroup, true)) if (cgroup_tree_create(ops, handler->conf,
ops->hierarchies[i], cgroup_tree,
container_cgroup, true,
limiting_cgroup))
continue; continue;
ERROR("Failed to create cgroup \"%s\"", ops->hierarchies[i]->container_full_path ?: "(null)"); DEBUG("Failed to create cgroup \"%s\"", ops->hierarchies[i]->container_full_path ?: "(null)");
for (int j = 0; j < i; j++) for (int j = 0; j < i; j++)
cgroup_remove_leaf(ops->hierarchies[j], true); cgroup_tree_leaf_remove(ops->hierarchies[j], true);
idx++; idx++;
break; break;
} }
} while (ops->hierarchies[i] && idx > 0 && idx < 1000); } while (ops->hierarchies[i] && idx > 0 && idx < 1000 && suffix);
if (idx == 1000) if (idx == 1000 || (!suffix && idx != 0))
return ret_set_errno(false, ERANGE); return log_error_errno(false, ERANGE, "Failed to create container cgroup");
ops->container_cgroup = move_ptr(container_cgroup); ops->container_cgroup = move_ptr(container_cgroup);
INFO("The container process uses \"%s\" as cgroup", ops->container_cgroup); INFO("The container process uses \"%s\" as cgroup", ops->container_cgroup);
@ -1353,7 +1478,7 @@ __cgfsng_ops static inline bool cgfsng_payload_create(struct cgroup_ops *ops,
__cgfsng_ops static bool cgfsng_monitor_enter(struct cgroup_ops *ops, __cgfsng_ops static bool cgfsng_monitor_enter(struct cgroup_ops *ops,
struct lxc_handler *handler) struct lxc_handler *handler)
{ {
int monitor_len, transient_len; int monitor_len, transient_len = 0;
char monitor[INTTYPE_TO_STRLEN(pid_t)], char monitor[INTTYPE_TO_STRLEN(pid_t)],
transient[INTTYPE_TO_STRLEN(pid_t)]; transient[INTTYPE_TO_STRLEN(pid_t)];
@ -1381,7 +1506,7 @@ __cgfsng_ops static bool cgfsng_monitor_enter(struct cgroup_ops *ops,
if (ret) if (ret)
return log_error_errno(false, errno, "Failed to enter cgroup \"%s\"", h->monitor_full_path); return log_error_errno(false, errno, "Failed to enter cgroup \"%s\"", h->monitor_full_path);
if (handler->transient_pid < 0) if (handler->transient_pid <= 0)
return true; return true;
ret = lxc_writeat(h->cgfd_mon, "cgroup.procs", transient, transient_len); ret = lxc_writeat(h->cgfd_mon, "cgroup.procs", transient, transient_len);
@ -1710,6 +1835,19 @@ __cgfsng_ops static bool cgfsng_mount(struct cgroup_ops *ops,
wants_force_mount = !in_caplist(CAP_SYS_ADMIN, &handler->conf->keepcaps); wants_force_mount = !in_caplist(CAP_SYS_ADMIN, &handler->conf->keepcaps);
else else
wants_force_mount = in_caplist(CAP_SYS_ADMIN, &handler->conf->caps); wants_force_mount = in_caplist(CAP_SYS_ADMIN, &handler->conf->caps);
/*
* Most recent distro versions currently have init system that
* do support cgroup2 but do not mount it by default unless
* explicitly told so even if the host is cgroup2 only. That
* means they often will fail to boot. Fix this by pre-mounting
* cgroup2 by default. We will likely need to be doing this a
* few years until all distros have switched over to cgroup2 at
* which point we can safely assume that their init systems
* will mount it themselves.
*/
if (pure_unified_layout(ops))
wants_force_mount = true;
} }
has_cgns = cgns_supported(); has_cgns = cgns_supported();
@ -1908,7 +2046,11 @@ static int freezer_cgroup_events_cb(int fd, uint32_t events, void *cbdata,
return LXC_MAINLOOP_CONTINUE; return LXC_MAINLOOP_CONTINUE;
} }
static int cg_unified_freeze(struct cgroup_ops *ops, int timeout) static int cg_unified_freeze_do(struct cgroup_ops *ops, int timeout,
const char *state_string,
int state_num,
const char *epoll_error,
const char *wait_error)
{ {
__do_close int fd = -EBADF; __do_close int fd = -EBADF;
call_cleaner(lxc_mainloop_close) struct lxc_epoll_descr *descr_ptr = NULL; call_cleaner(lxc_mainloop_close) struct lxc_epoll_descr *descr_ptr = NULL;
@ -1933,26 +2075,33 @@ static int cg_unified_freeze(struct cgroup_ops *ops, int timeout)
ret = lxc_mainloop_open(&descr); ret = lxc_mainloop_open(&descr);
if (ret) if (ret)
return log_error_errno(-1, errno, "Failed to create epoll instance to wait for container freeze"); return log_error_errno(-1, errno, "%s", epoll_error);
/* automatically cleaned up now */ /* automatically cleaned up now */
descr_ptr = &descr; descr_ptr = &descr;
ret = lxc_mainloop_add_handler(&descr, fd, freezer_cgroup_events_cb, INT_TO_PTR((int){1})); ret = lxc_mainloop_add_handler_events(&descr, fd, EPOLLPRI, freezer_cgroup_events_cb, INT_TO_PTR(state_num));
if (ret < 0) if (ret < 0)
return log_error_errno(-1, errno, "Failed to add cgroup.events fd handler to mainloop"); return log_error_errno(-1, errno, "Failed to add cgroup.events fd handler to mainloop");
} }
ret = lxc_write_openat(h->container_full_path, "cgroup.freeze", "1", 1); ret = lxc_write_openat(h->container_full_path, "cgroup.freeze", state_string, 1);
if (ret < 0) if (ret < 0)
return log_error_errno(-1, errno, "Failed to open cgroup.freeze file"); return log_error_errno(-1, errno, "Failed to open cgroup.freeze file");
if (timeout != 0 && lxc_mainloop(&descr, timeout)) if (timeout != 0 && lxc_mainloop(&descr, timeout))
return log_error_errno(-1, errno, "Failed to wait for container to be frozen"); return log_error_errno(-1, errno, "%s", wait_error);
return 0; return 0;
} }
static int cg_unified_freeze(struct cgroup_ops *ops, int timeout)
{
return cg_unified_freeze_do(ops, timeout, "1", 1,
"Failed to create epoll instance to wait for container freeze",
"Failed to wait for container to be frozen");
}
__cgfsng_ops static int cgfsng_freeze(struct cgroup_ops *ops, int timeout) __cgfsng_ops static int cgfsng_freeze(struct cgroup_ops *ops, int timeout)
{ {
if (!ops->hierarchies) if (!ops->hierarchies)
@ -1978,47 +2127,9 @@ static int cg_legacy_unfreeze(struct cgroup_ops *ops)
static int cg_unified_unfreeze(struct cgroup_ops *ops, int timeout) static int cg_unified_unfreeze(struct cgroup_ops *ops, int timeout)
{ {
__do_close int fd = -EBADF; return cg_unified_freeze_do(ops, timeout, "0", 0,
call_cleaner(lxc_mainloop_close)struct lxc_epoll_descr *descr_ptr = NULL; "Failed to create epoll instance to wait for container unfreeze",
int ret; "Failed to wait for container to be unfrozen");
struct lxc_epoll_descr descr;
struct hierarchy *h;
h = ops->unified;
if (!h)
return ret_set_errno(-1, ENOENT);
if (!h->container_full_path)
return ret_set_errno(-1, EEXIST);
if (timeout != 0) {
__do_free char *events_file = NULL;
events_file = must_make_path(h->container_full_path, "cgroup.events", NULL);
fd = open(events_file, O_RDONLY | O_CLOEXEC);
if (fd < 0)
return log_error_errno(-1, errno, "Failed to open cgroup.events file");
ret = lxc_mainloop_open(&descr);
if (ret)
return log_error_errno(-1, errno, "Failed to create epoll instance to wait for container unfreeze");
/* automatically cleaned up now */
descr_ptr = &descr;
ret = lxc_mainloop_add_handler(&descr, fd, freezer_cgroup_events_cb, INT_TO_PTR((int){0}));
if (ret < 0)
return log_error_errno(-1, errno, "Failed to add cgroup.events fd handler to mainloop");
}
ret = lxc_write_openat(h->container_full_path, "cgroup.freeze", "0", 1);
if (ret < 0)
return log_error_errno(-1, errno, "Failed to open cgroup.freeze file");
if (timeout != 0 && lxc_mainloop(&descr, timeout))
return log_error_errno(-1, errno, "Failed to wait for container to be unfrozen");
return 0;
} }
__cgfsng_ops static int cgfsng_unfreeze(struct cgroup_ops *ops, int timeout) __cgfsng_ops static int cgfsng_unfreeze(struct cgroup_ops *ops, int timeout)
@ -2032,8 +2143,8 @@ __cgfsng_ops static int cgfsng_unfreeze(struct cgroup_ops *ops, int timeout)
return cg_unified_unfreeze(ops, timeout); return cg_unified_unfreeze(ops, timeout);
} }
__cgfsng_ops static const char *cgfsng_get_cgroup(struct cgroup_ops *ops, static const char *cgfsng_get_cgroup_do(struct cgroup_ops *ops,
const char *controller) const char *controller, bool limiting)
{ {
struct hierarchy *h; struct hierarchy *h;
@ -2042,11 +2153,28 @@ __cgfsng_ops static const char *cgfsng_get_cgroup(struct cgroup_ops *ops,
return log_warn_errno(NULL, ENOENT, "Failed to find hierarchy for controller \"%s\"", return log_warn_errno(NULL, ENOENT, "Failed to find hierarchy for controller \"%s\"",
controller ? controller : "(null)"); controller ? controller : "(null)");
if (limiting)
return h->container_limit_path
? h->container_limit_path + strlen(h->mountpoint)
: NULL;
return h->container_full_path return h->container_full_path
? h->container_full_path + strlen(h->mountpoint) ? h->container_full_path + strlen(h->mountpoint)
: NULL; : NULL;
} }
__cgfsng_ops static const char *cgfsng_get_cgroup(struct cgroup_ops *ops,
const char *controller)
{
return cgfsng_get_cgroup_do(ops, controller, false);
}
__cgfsng_ops static const char *cgfsng_get_limiting_cgroup(struct cgroup_ops *ops,
const char *controller)
{
return cgfsng_get_cgroup_do(ops, controller, true);
}
/* Given a cgroup path returned from lxc_cmd_get_cgroup_path, build a full path, /* Given a cgroup path returned from lxc_cmd_get_cgroup_path, build a full path,
* which must be freed by the caller. * which must be freed by the caller.
*/ */
@ -2057,21 +2185,20 @@ static inline char *build_full_cgpath_from_monitorpath(struct hierarchy *h,
return must_make_path(h->mountpoint, inpath, filename, NULL); return must_make_path(h->mountpoint, inpath, filename, NULL);
} }
static int cgroup_attach_leaf(int unified_fd, int64_t pid) static int cgroup_attach_leaf(const struct lxc_conf *conf, int unified_fd, pid_t pid)
{ {
int idx = 1; int idx = 1;
int ret; int ret;
char pidstr[INTTYPE_TO_STRLEN(int64_t) + 1]; char pidstr[INTTYPE_TO_STRLEN(int64_t) + 1];
char attach_cgroup[STRLITERALLEN("lxc-1000/cgroup.procs") + 1];
size_t pidstr_len; size_t pidstr_len;
/* Create leaf cgroup. */ /* Create leaf cgroup. */
ret = mkdirat(unified_fd, "lxc", 0755); ret = mkdirat(unified_fd, ".lxc", 0755);
if (ret < 0 && errno != EEXIST) if (ret < 0 && errno != EEXIST)
return log_error_errno(-1, errno, "Failed to create leaf cgroup \"lxc\""); return log_error_errno(-1, errno, "Failed to create leaf cgroup \".lxc\"");
pidstr_len = sprintf(pidstr, INT64_FMT, pid); pidstr_len = sprintf(pidstr, INT64_FMT, (int64_t)pid);
ret = lxc_writeat(unified_fd, "lxc/cgroup.procs", pidstr, pidstr_len); ret = lxc_writeat(unified_fd, ".lxc/cgroup.procs", pidstr, pidstr_len);
if (ret < 0) if (ret < 0)
ret = lxc_writeat(unified_fd, "cgroup.procs", pidstr, pidstr_len); ret = lxc_writeat(unified_fd, "cgroup.procs", pidstr, pidstr_len);
if (ret == 0) if (ret == 0)
@ -2082,15 +2209,22 @@ static int cgroup_attach_leaf(int unified_fd, int64_t pid)
return log_error_errno(-1, errno, "Failed to attach to unified cgroup"); return log_error_errno(-1, errno, "Failed to attach to unified cgroup");
do { do {
bool rm = false;
char attach_cgroup[STRLITERALLEN(".lxc-1000/cgroup.procs") + 1];
char *slash; char *slash;
sprintf(attach_cgroup, "lxc-%d/cgroup.procs", idx); ret = snprintf(attach_cgroup, sizeof(attach_cgroup), ".lxc-%d/cgroup.procs", idx);
if (ret < 0 || (size_t)ret >= sizeof(attach_cgroup))
return ret_errno(EIO);
slash = &attach_cgroup[ret] - STRLITERALLEN("/cgroup.procs"); slash = &attach_cgroup[ret] - STRLITERALLEN("/cgroup.procs");
*slash = '\0'; *slash = '\0';
ret = mkdirat(unified_fd, attach_cgroup, 0755); ret = mkdirat(unified_fd, attach_cgroup, 0755);
if (ret < 0 && errno != EEXIST) if (ret < 0 && errno != EEXIST)
return log_error_errno(-1, errno, "Failed to create cgroup %s", attach_cgroup); return log_error_errno(-1, errno, "Failed to create cgroup %s", attach_cgroup);
if (ret == 0)
rm = true;
*slash = '/'; *slash = '/';
@ -2098,6 +2232,9 @@ static int cgroup_attach_leaf(int unified_fd, int64_t pid)
if (ret == 0) if (ret == 0)
return 0; return 0;
if (rm && unlinkat(unified_fd, attach_cgroup, AT_REMOVEDIR))
SYSERROR("Failed to remove cgroup \"%d(%s)\"", unified_fd, attach_cgroup);
/* this is a non-leaf node */ /* this is a non-leaf node */
if (errno != EBUSY) if (errno != EBUSY)
return log_error_errno(-1, errno, "Failed to attach to unified cgroup"); return log_error_errno(-1, errno, "Failed to attach to unified cgroup");
@ -2108,15 +2245,132 @@ static int cgroup_attach_leaf(int unified_fd, int64_t pid)
return log_error_errno(-1, errno, "Failed to attach to unified cgroup"); return log_error_errno(-1, errno, "Failed to attach to unified cgroup");
} }
int cgroup_attach(const char *name, const char *lxcpath, int64_t pid) static int cgroup_attach_create_leaf(const struct lxc_conf *conf,
int unified_fd, int *sk_fd)
{
__do_close int sk = *sk_fd, target_fd0 = -EBADF, target_fd1 = -EBADF;
int target_fds[2];
ssize_t ret;
/* Create leaf cgroup. */
ret = mkdirat(unified_fd, ".lxc", 0755);
if (ret < 0 && errno != EEXIST)
return log_error_errno(-1, errno, "Failed to create leaf cgroup \".lxc\"");
target_fd0 = openat(unified_fd, ".lxc/cgroup.procs", O_WRONLY | O_CLOEXEC | O_NOFOLLOW);
if (target_fd0 < 0)
return log_error_errno(-errno, errno, "Failed to open \".lxc/cgroup.procs\"");
target_fds[0] = target_fd0;
target_fd1 = openat(unified_fd, "cgroup.procs", O_WRONLY | O_CLOEXEC | O_NOFOLLOW);
if (target_fd1 < 0)
return log_error_errno(-errno, errno, "Failed to open \".lxc/cgroup.procs\"");
target_fds[1] = target_fd1;
ret = lxc_abstract_unix_send_fds(sk, target_fds, 2, NULL, 0);
if (ret <= 0)
return log_error_errno(-errno, errno, "Failed to send \".lxc/cgroup.procs\" fds %d and %d",
target_fd0, target_fd1);
return log_debug(0, "Sent target cgroup fds %d and %d", target_fd0, target_fd1);
}
static int cgroup_attach_move_into_leaf(const struct lxc_conf *conf,
int *sk_fd, pid_t pid)
{
__do_close int sk = *sk_fd, target_fd0 = -EBADF, target_fd1 = -EBADF;
int target_fds[2];
char pidstr[INTTYPE_TO_STRLEN(int64_t) + 1];
size_t pidstr_len;
ssize_t ret;
ret = lxc_abstract_unix_recv_fds(sk, target_fds, 2, NULL, 0);
if (ret <= 0)
return log_error_errno(-1, errno, "Failed to receive target cgroup fd");
target_fd0 = target_fds[0];
target_fd1 = target_fds[1];
pidstr_len = sprintf(pidstr, INT64_FMT, (int64_t)pid);
ret = lxc_write_nointr(target_fd0, pidstr, pidstr_len);
if (ret > 0 && ret == pidstr_len)
return log_debug(0, "Moved process into target cgroup via fd %d", target_fd0);
ret = lxc_write_nointr(target_fd1, pidstr, pidstr_len);
if (ret > 0 && ret == pidstr_len)
return log_debug(0, "Moved process into target cgroup via fd %d", target_fd1);
return log_debug_errno(-1, errno, "Failed to move process into target cgroup via fd %d and %d",
target_fd0, target_fd1);
}
struct userns_exec_unified_attach_data {
const struct lxc_conf *conf;
int unified_fd;
int sk_pair[2];
pid_t pid;
};
static int cgroup_unified_attach_child_wrapper(void *data)
{
struct userns_exec_unified_attach_data *args = data;
if (!args->conf || args->unified_fd < 0 || args->pid <= 0 ||
args->sk_pair[0] < 0 || args->sk_pair[1] < 0)
return ret_errno(EINVAL);
close_prot_errno_disarm(args->sk_pair[0]);
return cgroup_attach_create_leaf(args->conf, args->unified_fd,
&args->sk_pair[1]);
}
static int cgroup_unified_attach_parent_wrapper(void *data)
{
struct userns_exec_unified_attach_data *args = data;
if (!args->conf || args->unified_fd < 0 || args->pid <= 0 ||
args->sk_pair[0] < 0 || args->sk_pair[1] < 0)
return ret_errno(EINVAL);
close_prot_errno_disarm(args->sk_pair[1]);
return cgroup_attach_move_into_leaf(args->conf, &args->sk_pair[0],
args->pid);
}
int cgroup_attach(const struct lxc_conf *conf, const char *name,
const char *lxcpath, pid_t pid)
{ {
__do_close int unified_fd = -EBADF; __do_close int unified_fd = -EBADF;
int ret;
if (!conf || !name || !lxcpath || pid <= 0)
return ret_errno(EINVAL);
unified_fd = lxc_cmd_get_cgroup2_fd(name, lxcpath); unified_fd = lxc_cmd_get_cgroup2_fd(name, lxcpath);
if (unified_fd < 0) if (unified_fd < 0)
return -1; return ret_errno(EBADF);
return cgroup_attach_leaf(unified_fd, pid); if (!lxc_list_empty(&conf->id_map)) {
struct userns_exec_unified_attach_data args = {
.conf = conf,
.unified_fd = unified_fd,
.pid = pid,
};
ret = socketpair(PF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, args.sk_pair);
if (ret < 0)
return -errno;
ret = userns_exec_minimal(conf,
cgroup_unified_attach_parent_wrapper,
&args,
cgroup_unified_attach_child_wrapper,
&args);
} else {
ret = cgroup_attach_leaf(conf, unified_fd, pid);
}
return ret;
} }
/* Technically, we're always at a delegation boundary here (This is especially /* Technically, we're always at a delegation boundary here (This is especially
@ -2128,33 +2382,63 @@ int cgroup_attach(const char *name, const char *lxcpath, int64_t pid)
* created when we started the container in the latter case we create our own * created when we started the container in the latter case we create our own
* cgroup for the attaching process. * cgroup for the attaching process.
*/ */
static int __cg_unified_attach(const struct hierarchy *h, const char *name, static int __cg_unified_attach(const struct hierarchy *h,
const struct lxc_conf *conf, const char *name,
const char *lxcpath, pid_t pid, const char *lxcpath, pid_t pid,
const char *controller) const char *controller)
{ {
__do_close int unified_fd = -EBADF; __do_close int unified_fd = -EBADF;
__do_free char *path = NULL, *cgroup = NULL;
int ret; int ret;
ret = cgroup_attach(name, lxcpath, pid); if (!conf || !name || !lxcpath || pid <= 0)
if (ret < 0) { return ret_errno(EINVAL);
__do_free char *path = NULL, *cgroup = NULL;
ret = cgroup_attach(conf, name, lxcpath, pid);
if (ret == 0)
return log_trace(0, "Attached to unified cgroup via command handler");
if (ret != -EBADF)
return log_error_errno(ret, errno, "Failed to attach to unified cgroup");
/* Fall back to retrieving the path for the unified cgroup. */
cgroup = lxc_cmd_get_cgroup_path(name, lxcpath, controller); cgroup = lxc_cmd_get_cgroup_path(name, lxcpath, controller);
/* not running */ /* not running */
if (!cgroup) if (!cgroup)
return 0; return 0;
path = must_make_path(h->mountpoint, cgroup, NULL); path = must_make_path(h->mountpoint, cgroup, NULL);
unified_fd = open(path, O_DIRECTORY | O_RDONLY | O_CLOEXEC);
} unified_fd = open(path, O_PATH | O_DIRECTORY | O_CLOEXEC);
if (unified_fd < 0) if (unified_fd < 0)
return -1; return ret_errno(EBADF);
return cgroup_attach_leaf(unified_fd, pid); if (!lxc_list_empty(&conf->id_map)) {
struct userns_exec_unified_attach_data args = {
.conf = conf,
.unified_fd = unified_fd,
.pid = pid,
};
ret = socketpair(PF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, args.sk_pair);
if (ret < 0)
return -errno;
ret = userns_exec_minimal(conf,
cgroup_unified_attach_parent_wrapper,
&args,
cgroup_unified_attach_child_wrapper,
&args);
} else {
ret = cgroup_attach_leaf(conf, unified_fd, pid);
} }
__cgfsng_ops static bool cgfsng_attach(struct cgroup_ops *ops, const char *name, return ret;
const char *lxcpath, pid_t pid) }
__cgfsng_ops static bool cgfsng_attach(struct cgroup_ops *ops,
const struct lxc_conf *conf,
const char *name, const char *lxcpath,
pid_t pid)
{ {
int len, ret; int len, ret;
char pidstr[INTTYPE_TO_STRLEN(pid_t)]; char pidstr[INTTYPE_TO_STRLEN(pid_t)];
@ -2174,7 +2458,7 @@ __cgfsng_ops static bool cgfsng_attach(struct cgroup_ops *ops, const char *name,
struct hierarchy *h = ops->hierarchies[i]; struct hierarchy *h = ops->hierarchies[i];
if (h->version == CGROUP2_SUPER_MAGIC) { if (h->version == CGROUP2_SUPER_MAGIC) {
ret = __cg_unified_attach(h, name, lxcpath, pid, ret = __cg_unified_attach(h, conf, name, lxcpath, pid,
h->controllers[0]); h->controllers[0]);
if (ret < 0) if (ret < 0)
return false; return false;
@ -2219,7 +2503,7 @@ __cgfsng_ops static int cgfsng_get(struct cgroup_ops *ops, const char *filename,
if (p) if (p)
*p = '\0'; *p = '\0';
path = lxc_cmd_get_cgroup_path(name, lxcpath, controller); path = lxc_cmd_get_limiting_cgroup_path(name, lxcpath, controller);
/* not running */ /* not running */
if (!path) if (!path)
return -1; return -1;
@ -2384,7 +2668,7 @@ __cgfsng_ops static int cgfsng_set(struct cgroup_ops *ops,
return 0; return 0;
} }
path = lxc_cmd_get_cgroup_path(name, lxcpath, controller); path = lxc_cmd_get_limiting_cgroup_path(name, lxcpath, controller);
/* not running */ /* not running */
if (!path) if (!path)
return -1; return -1;
@ -2442,6 +2726,9 @@ static int device_cgroup_rule_parse_devpath(struct device_item *device,
return ret_set_errno(-1, EINVAL); return ret_set_errno(-1, EINVAL);
} }
if (!mode)
return ret_errno(EINVAL);
if (device_cgroup_parse_access(device, mode) < 0) if (device_cgroup_parse_access(device, mode) < 0)
return -1; return -1;
@ -2494,7 +2781,7 @@ static int convert_devpath(const char *invalue, char *dest)
* we created the cgroups. * we created the cgroups.
*/ */
static int cg_legacy_set_data(struct cgroup_ops *ops, const char *filename, static int cg_legacy_set_data(struct cgroup_ops *ops, const char *filename,
const char *value) const char *value, bool is_cpuset)
{ {
__do_free char *controller = NULL; __do_free char *controller = NULL;
char *p; char *p;
@ -2520,7 +2807,12 @@ static int cg_legacy_set_data(struct cgroup_ops *ops, const char *filename,
if (!h) if (!h)
return log_error_errno(-ENOENT, ENOENT, "Failed to setup limits for the \"%s\" controller. The controller seems to be unused by \"cgfsng\" cgroup driver or not enabled on the cgroup hierarchy", controller); return log_error_errno(-ENOENT, ENOENT, "Failed to setup limits for the \"%s\" controller. The controller seems to be unused by \"cgfsng\" cgroup driver or not enabled on the cgroup hierarchy", controller);
return lxc_write_openat(h->container_full_path, filename, value, strlen(value)); if (is_cpuset) {
int ret = lxc_write_openat(h->container_full_path, filename, value, strlen(value));
if (ret)
return ret;
}
return lxc_write_openat(h->container_limit_path, filename, value, strlen(value));
} }
__cgfsng_ops static bool cgfsng_setup_limits_legacy(struct cgroup_ops *ops, __cgfsng_ops static bool cgfsng_setup_limits_legacy(struct cgroup_ops *ops,
@ -2546,6 +2838,9 @@ __cgfsng_ops static bool cgfsng_setup_limits_legacy(struct cgroup_ops *ops,
if (!ops->hierarchies) if (!ops->hierarchies)
return ret_set_errno(false, EINVAL); return ret_set_errno(false, EINVAL);
if (pure_unified_layout(ops))
return log_warn_errno(true, EINVAL, "Ignoring legacy cgroup limits on pure cgroup2 system");
sorted_cgroup_settings = sort_cgroup_settings(cgroup_settings); sorted_cgroup_settings = sort_cgroup_settings(cgroup_settings);
if (!sorted_cgroup_settings) if (!sorted_cgroup_settings)
return false; return false;
@ -2554,7 +2849,7 @@ __cgfsng_ops static bool cgfsng_setup_limits_legacy(struct cgroup_ops *ops,
cg = iterator->elem; cg = iterator->elem;
if (do_devices == !strncmp("devices", cg->subsystem, 7)) { if (do_devices == !strncmp("devices", cg->subsystem, 7)) {
if (cg_legacy_set_data(ops, cg->subsystem, cg->value)) { if (cg_legacy_set_data(ops, cg->subsystem, cg->value, strncmp("cpuset", cg->subsystem, 6) == 0)) {
if (do_devices && (errno == EACCES || errno == EPERM)) { if (do_devices && (errno == EACCES || errno == EPERM)) {
SYSWARN("Failed to set \"%s\" to \"%s\"", cg->subsystem, cg->value); SYSWARN("Failed to set \"%s\" to \"%s\"", cg->subsystem, cg->value);
continue; continue;
@ -2623,9 +2918,12 @@ __cgfsng_ops static bool cgfsng_setup_limits(struct cgroup_ops *ops,
return ret_set_errno(false, EINVAL); return ret_set_errno(false, EINVAL);
conf = handler->conf; conf = handler->conf;
if (lxc_list_empty(&conf->cgroup2))
return true;
cgroup_settings = &conf->cgroup2; cgroup_settings = &conf->cgroup2;
if (lxc_list_empty(cgroup_settings))
return true;
if (!pure_unified_layout(ops))
return log_warn_errno(true, EINVAL, "Ignoring cgroup2 limits on legacy cgroup system");
if (!ops->unified) if (!ops->unified)
return false; return false;
@ -2639,7 +2937,7 @@ __cgfsng_ops static bool cgfsng_setup_limits(struct cgroup_ops *ops,
ret = bpf_device_cgroup_prepare(ops, conf, cg->subsystem, ret = bpf_device_cgroup_prepare(ops, conf, cg->subsystem,
cg->value); cg->value);
} else { } else {
ret = lxc_write_openat(h->container_full_path, ret = lxc_write_openat(h->container_limit_path,
cg->subsystem, cg->value, cg->subsystem, cg->value,
strlen(cg->value)); strlen(cg->value));
if (ret < 0) if (ret < 0)
@ -2715,7 +3013,7 @@ __cgfsng_ops bool cgfsng_devices_activate(struct cgroup_ops *ops,
return log_error_errno(false, ENOMEM, "Failed to finalize bpf program"); return log_error_errno(false, ENOMEM, "Failed to finalize bpf program");
ret = bpf_program_cgroup_attach(devices, BPF_CGROUP_DEVICE, ret = bpf_program_cgroup_attach(devices, BPF_CGROUP_DEVICE,
unified->container_full_path, unified->container_limit_path,
BPF_F_ALLOW_MULTI); BPF_F_ALLOW_MULTI);
if (ret) if (ret)
return log_error_errno(false, ENOMEM, "Failed to attach bpf program"); return log_error_errno(false, ENOMEM, "Failed to attach bpf program");
@ -3160,6 +3458,7 @@ struct cgroup_ops *cgfsng_ops_init(struct lxc_conf *conf)
cgfsng_ops->chown = cgfsng_chown; cgfsng_ops->chown = cgfsng_chown;
cgfsng_ops->mount = cgfsng_mount; cgfsng_ops->mount = cgfsng_mount;
cgfsng_ops->devices_activate = cgfsng_devices_activate; cgfsng_ops->devices_activate = cgfsng_devices_activate;
cgfsng_ops->get_limiting_cgroup = cgfsng_get_limiting_cgroup;
return move_ptr(cgfsng_ops); return move_ptr(cgfsng_ops);
} }

View File

@ -79,7 +79,7 @@ void cgroup_exit(struct cgroup_ops *ops)
free((*it)->container_base_path); free((*it)->container_base_path);
free((*it)->container_full_path); free((*it)->container_full_path);
free((*it)->monitor_full_path); free((*it)->monitor_full_path);
if ((*it)->cgfd_mon >= 0) if ((*it)->cgfd_con >= 0)
close((*it)->cgfd_con); close((*it)->cgfd_con);
if ((*it)->cgfd_mon >= 0) if ((*it)->cgfd_mon >= 0)
close((*it)->cgfd_mon); close((*it)->cgfd_mon);

View File

@ -54,7 +54,11 @@ typedef enum {
* init's cgroup (if root). * init's cgroup (if root).
* *
* @container_full_path * @container_full_path
* - The full path to the containers cgroup. * - The full path to the container's cgroup.
*
* @container_limit_path
* - The full path to the container's limiting cgroup. May simply point to
* container_full_path.
* *
* @monitor_full_path * @monitor_full_path
* - The full path to the monitor's cgroup. * - The full path to the monitor's cgroup.
@ -77,15 +81,18 @@ struct hierarchy {
char *mountpoint; char *mountpoint;
char *container_base_path; char *container_base_path;
char *container_full_path; char *container_full_path;
char *container_limit_path;
char *monitor_full_path; char *monitor_full_path;
int version; int version;
/* cgroup2 only */ /* cgroup2 only */
unsigned int bpf_device_controller:1; unsigned int bpf_device_controller:1;
/* monitor cgroup fd */
int cgfd_con;
/* container cgroup fd */ /* container cgroup fd */
int cgfd_con;
/* limiting cgroup fd (may be equal to cgfd_con if not separated) */
int cgfd_limit;
/* monitor cgroup fd */
int cgfd_mon; int cgfd_mon;
}; };
@ -160,8 +167,8 @@ struct cgroup_ops {
struct lxc_conf *conf, bool with_devices); struct lxc_conf *conf, bool with_devices);
bool (*setup_limits)(struct cgroup_ops *ops, struct lxc_handler *handler); bool (*setup_limits)(struct cgroup_ops *ops, struct lxc_handler *handler);
bool (*chown)(struct cgroup_ops *ops, struct lxc_conf *conf); bool (*chown)(struct cgroup_ops *ops, struct lxc_conf *conf);
bool (*attach)(struct cgroup_ops *ops, const char *name, bool (*attach)(struct cgroup_ops *ops, const struct lxc_conf *conf,
const char *lxcpath, pid_t pid); const char *name, const char *lxcpath, pid_t pid);
bool (*mount)(struct cgroup_ops *ops, struct lxc_handler *handler, bool (*mount)(struct cgroup_ops *ops, struct lxc_handler *handler,
const char *root, int type); const char *root, int type);
bool (*devices_activate)(struct cgroup_ops *ops, bool (*devices_activate)(struct cgroup_ops *ops,
@ -169,6 +176,7 @@ struct cgroup_ops {
bool (*monitor_delegate_controllers)(struct cgroup_ops *ops); bool (*monitor_delegate_controllers)(struct cgroup_ops *ops);
bool (*payload_delegate_controllers)(struct cgroup_ops *ops); bool (*payload_delegate_controllers)(struct cgroup_ops *ops);
void (*payload_finalize)(struct cgroup_ops *ops); void (*payload_finalize)(struct cgroup_ops *ops);
const char *(*get_limiting_cgroup)(struct cgroup_ops *ops, const char *controller);
}; };
extern struct cgroup_ops *cgroup_init(struct lxc_conf *conf); extern struct cgroup_ops *cgroup_init(struct lxc_conf *conf);
@ -178,7 +186,8 @@ define_cleanup_function(struct cgroup_ops *, cgroup_exit);
extern void prune_init_scope(char *cg); extern void prune_init_scope(char *cg);
extern int cgroup_attach(const char *name, const char *lxcpath, int64_t pid); extern int cgroup_attach(const struct lxc_conf *conf, const char *name,
const char *lxcpath, pid_t pid);
static inline bool pure_unified_layout(const struct cgroup_ops *ops) static inline bool pure_unified_layout(const struct cgroup_ops *ops)
{ {

View File

@ -167,7 +167,7 @@ struct bpf_program *bpf_program_new(uint32_t prog_type)
{ {
__do_free struct bpf_program *prog = NULL; __do_free struct bpf_program *prog = NULL;
prog = calloc(1, sizeof(struct bpf_program)); prog = zalloc(sizeof(struct bpf_program));
if (!prog) if (!prog)
return NULL; return NULL;
@ -183,9 +183,6 @@ struct bpf_program *bpf_program_new(uint32_t prog_type)
int bpf_program_init(struct bpf_program *prog) int bpf_program_init(struct bpf_program *prog)
{ {
if (!prog)
return ret_set_errno(-1, EINVAL);
const struct bpf_insn pre_insn[] = { const struct bpf_insn pre_insn[] = {
/* load device type to r2 */ /* load device type to r2 */
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct bpf_cgroup_dev_ctx, access_type)), BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct bpf_cgroup_dev_ctx, access_type)),
@ -202,19 +199,17 @@ int bpf_program_init(struct bpf_program *prog)
BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1, offsetof(struct bpf_cgroup_dev_ctx, minor)), BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1, offsetof(struct bpf_cgroup_dev_ctx, minor)),
}; };
if (!prog)
return ret_set_errno(-1, EINVAL);
return bpf_program_add_instructions(prog, pre_insn, ARRAY_SIZE(pre_insn)); return bpf_program_add_instructions(prog, pre_insn, ARRAY_SIZE(pre_insn));
} }
int bpf_program_append_device(struct bpf_program *prog, struct device_item *device) int bpf_program_append_device(struct bpf_program *prog, struct device_item *device)
{ {
int ret;
int jump_nr = 1; int jump_nr = 1;
struct bpf_insn bpf_access_decision[] = { int access_mask, device_type, ret;
BPF_MOV64_IMM(BPF_REG_0, device->allow), struct bpf_insn bpf_access_decision[2];
BPF_EXIT_INSN(),
};
int access_mask;
int device_type;
if (!prog || !device) if (!prog || !device)
return ret_set_errno(-1, EINVAL); return ret_set_errno(-1, EINVAL);
@ -285,6 +280,8 @@ int bpf_program_append_device(struct bpf_program *prog, struct device_item *devi
return log_error_errno(-1, errno, "Failed to add instructions to bpf cgroup program"); return log_error_errno(-1, errno, "Failed to add instructions to bpf cgroup program");
} }
bpf_access_decision[0] = BPF_MOV64_IMM(BPF_REG_0, device->allow);
bpf_access_decision[1] = BPF_EXIT_INSN();
ret = bpf_program_add_instructions(prog, bpf_access_decision, ret = bpf_program_add_instructions(prog, bpf_access_decision,
ARRAY_SIZE(bpf_access_decision)); ARRAY_SIZE(bpf_access_decision));
if (ret) if (ret)
@ -295,10 +292,7 @@ int bpf_program_append_device(struct bpf_program *prog, struct device_item *devi
int bpf_program_finalize(struct bpf_program *prog) int bpf_program_finalize(struct bpf_program *prog)
{ {
struct bpf_insn ins[] = { struct bpf_insn ins[2];
BPF_MOV64_IMM(BPF_REG_0, prog->device_list_type),
BPF_EXIT_INSN(),
};
if (!prog) if (!prog)
return ret_set_errno(-1, EINVAL); return ret_set_errno(-1, EINVAL);
@ -307,6 +301,9 @@ int bpf_program_finalize(struct bpf_program *prog)
prog->device_list_type == LXC_BPF_DEVICE_CGROUP_BLACKLIST prog->device_list_type == LXC_BPF_DEVICE_CGROUP_BLACKLIST
? "blacklist" ? "blacklist"
: "whitelist"); : "whitelist");
ins[0] = BPF_MOV64_IMM(BPF_REG_0, prog->device_list_type);
ins[1] = BPF_EXIT_INSN();
return bpf_program_add_instructions(prog, ins, ARRAY_SIZE(ins)); return bpf_program_add_instructions(prog, ins, ARRAY_SIZE(ins));
} }
@ -340,12 +337,12 @@ static int bpf_program_load_kernel(struct bpf_program *prog, char *log_buf,
int bpf_program_cgroup_attach(struct bpf_program *prog, int type, int bpf_program_cgroup_attach(struct bpf_program *prog, int type,
const char *path, uint32_t flags) const char *path, uint32_t flags)
{ {
__do_free char *copy = NULL;
__do_close int fd = -EBADF; __do_close int fd = -EBADF;
__do_free char *copy = NULL;
union bpf_attr attr; union bpf_attr attr;
int ret; int ret;
if (!prog) if (!path || !prog)
return ret_set_errno(-1, EINVAL); return ret_set_errno(-1, EINVAL);
if (flags & ~(BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI)) if (flags & ~(BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI))
@ -395,8 +392,8 @@ int bpf_program_cgroup_attach(struct bpf_program *prog, int type,
int bpf_program_cgroup_detach(struct bpf_program *prog) int bpf_program_cgroup_detach(struct bpf_program *prog)
{ {
int ret;
__do_close int fd = -EBADF; __do_close int fd = -EBADF;
int ret;
if (!prog) if (!prog)
return 0; return 0;
@ -444,6 +441,9 @@ int bpf_list_add_device(struct lxc_conf *conf, struct device_item *device)
__do_free struct device_item *new_device = NULL; __do_free struct device_item *new_device = NULL;
struct lxc_list *it; struct lxc_list *it;
if (!conf || !device)
return ret_errno(EINVAL);
lxc_list_for_each(it, &conf->devices) { lxc_list_for_each(it, &conf->devices) {
struct device_item *cur = it->elem; struct device_item *cur = it->elem;
@ -502,12 +502,11 @@ int bpf_list_add_device(struct lxc_conf *conf, struct device_item *device)
bool bpf_devices_cgroup_supported(void) bool bpf_devices_cgroup_supported(void)
{ {
__do_bpf_program_free struct bpf_program *prog = NULL;
const struct bpf_insn dummy[] = { const struct bpf_insn dummy[] = {
BPF_MOV64_IMM(BPF_REG_0, 1), BPF_MOV64_IMM(BPF_REG_0, 1),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
}; };
__do_bpf_program_free struct bpf_program *prog = NULL;
int ret; int ret;
if (geteuid() != 0) if (geteuid() != 0)
@ -515,7 +514,7 @@ bool bpf_devices_cgroup_supported(void)
"The bpf device cgroup requires real root"); "The bpf device cgroup requires real root");
prog = bpf_program_new(BPF_PROG_TYPE_CGROUP_DEVICE); prog = bpf_program_new(BPF_PROG_TYPE_CGROUP_DEVICE);
if (prog < 0) if (!prog)
return log_trace(false, "Failed to allocate new bpf device cgroup program"); return log_trace(false, "Failed to allocate new bpf device cgroup program");
ret = bpf_program_add_instructions(prog, dummy, ARRAY_SIZE(dummy)); ret = bpf_program_add_instructions(prog, dummy, ARRAY_SIZE(dummy));

View File

@ -74,7 +74,7 @@ sed -i \
-e 's/\([[:blank:]*]\|#*\)\(lxc\.stopsignal\)\([[:blank:]*]\|=\)/\1lxc\.signal\.stop\3/g' \ -e 's/\([[:blank:]*]\|#*\)\(lxc\.stopsignal\)\([[:blank:]*]\|=\)/\1lxc\.signal\.stop\3/g' \
-e 's/\([[:blank:]*]\|#*\)\(lxc\.syslog\)\([[:blank:]*]\|=\)/\1lxc\.log\.syslog\3/g' \ -e 's/\([[:blank:]*]\|#*\)\(lxc\.syslog\)\([[:blank:]*]\|=\)/\1lxc\.log\.syslog\3/g' \
-e 's/\([[:blank:]*]\|#*\)\(lxc\.loglevel\)\([[:blank:]*]\|=\)/\1lxc\.log\.level\3/g' \ -e 's/\([[:blank:]*]\|#*\)\(lxc\.loglevel\)\([[:blank:]*]\|=\)/\1lxc\.log\.level\3/g' \
-e 's/\([[:blank:]*]\|#*\)\(lxc\.logfile\)\([[:blank:]*]\|=\)/1lxc\.log\.file\3/g' \ -e 's/\([[:blank:]*]\|#*\)\(lxc\.logfile\)\([[:blank:]*]\|=\)/\1lxc\.log\.file\3/g' \
-e 's/\([[:blank:]*]\|#*\)\(lxc\.init_cmd\)\([[:blank:]*]\|=\)/\1lxc\.init\.cmd\3/g' \ -e 's/\([[:blank:]*]\|#*\)\(lxc\.init_cmd\)\([[:blank:]*]\|=\)/\1lxc\.init\.cmd\3/g' \
-e 's/\([[:blank:]*]\|#*\)\(lxc\.init_uid\)\([[:blank:]*]\|=\)/\1lxc\.init\.uid\3/g' \ -e 's/\([[:blank:]*]\|#*\)\(lxc\.init_uid\)\([[:blank:]*]\|=\)/\1lxc\.init\.uid\3/g' \
-e 's/\([[:blank:]*]\|#*\)\(lxc\.init_gid\)\([[:blank:]*]\|=\)/\1lxc\.init\.gid\3/g' \ -e 's/\([[:blank:]*]\|#*\)\(lxc\.init_gid\)\([[:blank:]*]\|=\)/\1lxc\.init\.gid\3/g' \

View File

@ -28,7 +28,7 @@
#include "initutils.h" #include "initutils.h"
#include "memory_utils.h" #include "memory_utils.h"
#include "parse.h" #include "parse.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "string_utils.h" #include "string_utils.h"
/* option keys for long only options */ /* option keys for long only options */
@ -70,9 +70,6 @@ struct arguments {
int argc; int argc;
}; };
static int arguments_parse(struct arguments *my_args, int argc,
char *const argv[]);
static struct arguments my_args = { static struct arguments my_args = {
.options = long_options, .options = long_options,
.shortopts = short_options .shortopts = short_options
@ -90,7 +87,8 @@ static void prevent_forking(void)
return; return;
while (getline(&line, &len, f) != -1) { while (getline(&line, &len, f) != -1) {
int fd, ret; __do_close int fd = -EBADF;
int ret;
char *p, *p2; char *p, *p2;
p = strchr(line, ':'); p = strchr(line, ':');
@ -121,7 +119,7 @@ static void prevent_forking(void)
return; return;
} }
fd = open(path, O_WRONLY); fd = open(path, O_WRONLY | O_CLOEXEC);
if (fd < 0) { if (fd < 0) {
if (my_args.quiet) if (my_args.quiet)
fprintf(stderr, "Failed to open \"%s\"\n", path); fprintf(stderr, "Failed to open \"%s\"\n", path);
@ -132,7 +130,6 @@ static void prevent_forking(void)
if (ret != 1 && !my_args.quiet) if (ret != 1 && !my_args.quiet)
fprintf(stderr, "Failed to write to \"%s\"\n", path); fprintf(stderr, "Failed to write to \"%s\"\n", path);
close(fd);
return; return;
} }
} }
@ -191,6 +188,99 @@ static void remove_self(void)
return; return;
} }
__noreturn static void print_usage_exit(const struct option longopts[])
{
fprintf(stderr, "Usage: lxc-init [-n|--name=NAME] [-h|--help] [--usage] [--version]\n\
[-q|--quiet] [-P|--lxcpath=LXCPATH]\n");
exit(EXIT_SUCCESS);
}
__noreturn static void print_version_exit(void)
{
printf("%s\n", LXC_VERSION);
exit(EXIT_SUCCESS);
}
static void print_help(void)
{
fprintf(stderr, "\
Usage: lxc-init --name=NAME -- COMMAND\n\
\n\
lxc-init start a COMMAND as PID 2 inside a container\n\
\n\
Options :\n\
-n, --name=NAME NAME of the container\n\
-q, --quiet Don't produce any output\n\
-P, --lxcpath=PATH Use specified container path\n\
-?, --help Give this help list\n\
--usage Give a short usage message\n\
--version Print the version number\n\
\n\
Mandatory or optional arguments to long options are also mandatory or optional\n\
for any corresponding short options.\n\
\n\
See the lxc-init man page for further information.\n\n");
}
static int arguments_parse(struct arguments *args, int argc,
char *const argv[])
{
for (;;) {
int c;
int index = 0;
c = getopt_long(argc, argv, args->shortopts, args->options, &index);
if (c == -1)
break;
switch (c) {
case 'n':
args->name = optarg;
break;
case 'o':
break;
case 'l':
break;
case 'q':
args->quiet = true;
break;
case 'P':
remove_trailing_slashes(optarg);
args->lxcpath = optarg;
break;
case OPT_USAGE:
print_usage_exit(args->options);
case OPT_VERSION:
print_version_exit();
case '?':
print_help();
exit(EXIT_FAILURE);
case 'h':
print_help();
exit(EXIT_SUCCESS);
}
}
/*
* Reclaim the remaining command arguments
*/
args->argv = &argv[optind];
args->argc = argc - optind;
/* If no lxcpath was given, use default */
if (!args->lxcpath)
args->lxcpath = lxc_global_config_value("lxc.lxcpath");
/* Check the command options */
if (!args->name) {
if (!args->quiet)
fprintf(stderr, "lxc-init: missing container name, use --name option\n");
return -1;
}
return 0;
}
int main(int argc, char *argv[]) int main(int argc, char *argv[])
{ {
int i, logfd, ret; int i, logfd, ret;
@ -426,96 +516,3 @@ out:
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
exit(exit_with); exit(exit_with);
} }
__noreturn static void print_usage_exit(const struct option longopts[])
{
fprintf(stderr, "Usage: lxc-init [-n|--name=NAME] [-h|--help] [--usage] [--version]\n\
[-q|--quiet] [-P|--lxcpath=LXCPATH]\n");
exit(EXIT_SUCCESS);
}
__noreturn static void print_version_exit(void)
{
printf("%s\n", LXC_VERSION);
exit(EXIT_SUCCESS);
}
static void print_help(void)
{
fprintf(stderr, "\
Usage: lxc-init --name=NAME -- COMMAND\n\
\n\
lxc-init start a COMMAND as PID 2 inside a container\n\
\n\
Options :\n\
-n, --name=NAME NAME of the container\n\
-q, --quiet Don't produce any output\n\
-P, --lxcpath=PATH Use specified container path\n\
-?, --help Give this help list\n\
--usage Give a short usage message\n\
--version Print the version number\n\
\n\
Mandatory or optional arguments to long options are also mandatory or optional\n\
for any corresponding short options.\n\
\n\
See the lxc-init man page for further information.\n\n");
}
static int arguments_parse(struct arguments *args, int argc,
char *const argv[])
{
for (;;) {
int c;
int index = 0;
c = getopt_long(argc, argv, args->shortopts, args->options, &index);
if (c == -1)
break;
switch (c) {
case 'n':
args->name = optarg;
break;
case 'o':
break;
case 'l':
break;
case 'q':
args->quiet = true;
break;
case 'P':
remove_trailing_slashes(optarg);
args->lxcpath = optarg;
break;
case OPT_USAGE:
print_usage_exit(args->options);
case OPT_VERSION:
print_version_exit();
case '?':
print_help();
exit(EXIT_FAILURE);
case 'h':
print_help();
exit(EXIT_SUCCESS);
}
}
/*
* Reclaim the remaining command arguments
*/
args->argv = &argv[optind];
args->argc = argc - optind;
/* If no lxcpath was given, use default */
if (!args->lxcpath)
args->lxcpath = lxc_global_config_value("lxc.lxcpath");
/* Check the command options */
if (!args->name) {
if (!args->quiet)
fprintf(stderr, "lxc-init: missing container name, use --name option\n");
return -1;
}
return 0;
}

View File

@ -28,7 +28,7 @@
#include "log.h" #include "log.h"
#include "mainloop.h" #include "mainloop.h"
#include "monitor.h" #include "monitor.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "utils.h" #include "utils.h"
#define CLIENTFDS_CHUNK 64 #define CLIENTFDS_CHUNK 64

View File

@ -36,7 +36,7 @@
#include "memory_utils.h" #include "memory_utils.h"
#include "network.h" #include "network.h"
#include "parse.h" #include "parse.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "string_utils.h" #include "string_utils.h"
#include "syscall_wrappers.h" #include "syscall_wrappers.h"
#include "utils.h" #include "utils.h"
@ -133,26 +133,14 @@ static char *get_username(void)
return strdup(pwent.pw_name); return strdup(pwent.pw_name);
} }
static void free_groupnames(char **groupnames)
{
int i;
if (!groupnames)
return;
for (i = 0; groupnames[i]; i++)
free(groupnames[i]);
free(groupnames);
}
static char **get_groupnames(void) static char **get_groupnames(void)
{ {
__do_free char *buf = NULL; __do_free char *buf = NULL;
__do_free gid_t *group_ids = NULL; __do_free gid_t *group_ids = NULL;
__do_free_string_list char **groupnames = NULL;
int ngroups; int ngroups;
int ret, i; int ret, i;
char **groupnames;
struct group grent; struct group grent;
struct group *grentp = NULL; struct group *grentp = NULL;
size_t bufsize; size_t bufsize;
@ -161,10 +149,11 @@ static char **get_groupnames(void)
if (ngroups < 0) { if (ngroups < 0) {
CMD_SYSERROR("Failed to get number of groups the user belongs to\n"); CMD_SYSERROR("Failed to get number of groups the user belongs to\n");
return NULL; return NULL;
} else if (ngroups == 0) {
return NULL;
} }
if (ngroups == 0)
return NULL;
group_ids = malloc(sizeof(gid_t) * ngroups); group_ids = malloc(sizeof(gid_t) * ngroups);
if (!group_ids) { if (!group_ids) {
CMD_SYSERROR("Failed to allocate memory while getting groups the user belongs to\n"); CMD_SYSERROR("Failed to allocate memory while getting groups the user belongs to\n");
@ -177,66 +166,53 @@ static char **get_groupnames(void)
return NULL; return NULL;
} }
groupnames = malloc(sizeof(char *) * (ngroups + 1)); groupnames = zalloc(sizeof(char *) * (ngroups + 1));
if (!groupnames) { if (!groupnames) {
CMD_SYSERROR("Failed to allocate memory while getting group names\n"); CMD_SYSERROR("Failed to allocate memory while getting group names\n");
return NULL; return NULL;
} }
memset(groupnames, 0, sizeof(char *) * (ngroups + 1));
bufsize = sysconf(_SC_GETGR_R_SIZE_MAX); bufsize = sysconf(_SC_GETGR_R_SIZE_MAX);
if (bufsize == -1) if (bufsize == -1)
bufsize = 1024; bufsize = 1024;
buf = malloc(bufsize); buf = malloc(bufsize);
if (!buf) { if (!buf) {
free_groupnames(groupnames);
CMD_SYSERROR("Failed to allocate memory while getting group names\n"); CMD_SYSERROR("Failed to allocate memory while getting group names\n");
return NULL; return NULL;
} }
for (i = 0; i < ngroups; i++) { for (i = 0; i < ngroups; i++) {
while ((ret = getgrgid_r(group_ids[i], &grent, buf, bufsize, &grentp)) == ERANGE) { while ((ret = getgrgid_r(group_ids[i], &grent, buf, bufsize, &grentp)) == ERANGE) {
char *new_buf;
bufsize <<= 1; bufsize <<= 1;
if (bufsize > MAX_GRBUF_SIZE) { if (bufsize > MAX_GRBUF_SIZE) {
usernic_error("Failed to get group members: %u\n", usernic_error("Failed to get group members: %u\n", group_ids[i]);
group_ids[i]);
free(buf);
free(group_ids);
free_groupnames(groupnames);
return NULL; return NULL;
} }
char *new_buf = realloc(buf, bufsize);
new_buf = realloc(buf, bufsize);
if (!new_buf) { if (!new_buf) {
usernic_error("Failed to allocate memory while getting group " usernic_error("Failed to allocate memory while getting group names: %s\n",
"names: %s\n",
strerror(errno)); strerror(errno));
free(buf);
free(group_ids);
free_groupnames(groupnames);
return NULL; return NULL;
} }
buf = new_buf; buf = new_buf;
} }
if (!grentp) {
if (ret == 0)
usernic_error("%s", "Could not find matched group record\n");
CMD_SYSERROR("Failed to get group name: %u\n", group_ids[i]); /* If a group is not found, just ignore it. */
free_groupnames(groupnames); if (!grentp)
return NULL; continue;
}
groupnames[i] = strdup(grent.gr_name); groupnames[i] = strdup(grent.gr_name);
if (!groupnames[i]) { if (!groupnames[i]) {
usernic_error("Failed to copy group name \"%s\"", grent.gr_name); usernic_error("Failed to copy group name \"%s\"", grent.gr_name);
free_groupnames(groupnames);
return NULL; return NULL;
} }
} }
return groupnames; return move_ptr(groupnames);
} }
static bool name_is_in_groupnames(char *name, char **groupnames) static bool name_is_in_groupnames(char *name, char **groupnames)
@ -325,9 +301,9 @@ static int get_alloted(char *me, char *intype, char *link,
{ {
__do_free char *line = NULL; __do_free char *line = NULL;
__do_fclose FILE *fin = NULL; __do_fclose FILE *fin = NULL;
__do_free_string_list char **groups = NULL;
int n, ret; int n, ret;
char name[100], type[100], br[100]; char name[100], type[100], br[100];
char **groups;
int count = 0; int count = 0;
size_t len = 0; size_t len = 0;
@ -379,8 +355,6 @@ static int get_alloted(char *me, char *intype, char *link,
count += n; count += n;
} }
free_groupnames(groups);
/* Now return the total number of nics that this user can create. */ /* Now return the total number of nics that this user can create. */
return count; return count;
} }

View File

@ -61,7 +61,7 @@ static void opentty(const char *tty, int which)
fd = open(tty, O_RDWR | O_NONBLOCK); fd = open(tty, O_RDWR | O_NONBLOCK);
if (fd < 0) { if (fd < 0) {
CMD_SYSERROR("Failed to open tty"); CMD_SYSINFO("Failed to open tty");
return; return;
} }
@ -87,11 +87,11 @@ static int do_child(void *vargv)
int ret; int ret;
char **argv = (char **)vargv; char **argv = (char **)vargv;
/* Assume we want to become root */ if (!lxc_setgroups(0, NULL))
if (!lxc_switch_uid_gid(0, 0))
return -1; return -1;
if (!lxc_setgroups(0, NULL)) /* Assume we want to become root */
if (!lxc_switch_uid_gid(0, 0))
return -1; return -1;
ret = unshare(CLONE_NEWNS); ret = unshare(CLONE_NEWNS);
@ -103,7 +103,7 @@ static int do_child(void *vargv)
if (detect_shared_rootfs()) { if (detect_shared_rootfs()) {
ret = mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL); ret = mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL);
if (ret < 0) { if (ret < 0) {
CMD_SYSINFO("Failed to make \"/\" rslave"); CMD_SYSINFO("Failed to recursively turn root mount tree into dependent mount");
return -1; return -1;
} }
} }

View File

@ -84,6 +84,8 @@ static const char *lxc_cmd_str(lxc_cmd_t cmd)
[LXC_CMD_UNFREEZE] = "unfreeze", [LXC_CMD_UNFREEZE] = "unfreeze",
[LXC_CMD_GET_CGROUP2_FD] = "get_cgroup2_fd", [LXC_CMD_GET_CGROUP2_FD] = "get_cgroup2_fd",
[LXC_CMD_GET_INIT_PIDFD] = "get_init_pidfd", [LXC_CMD_GET_INIT_PIDFD] = "get_init_pidfd",
[LXC_CMD_GET_LIMITING_CGROUP] = "get_limiting_cgroup",
[LXC_CMD_GET_LIMITING_CGROUP2_FD] = "get_limiting_cgroup2_fd",
}; };
if (cmd >= LXC_CMD_MAX) if (cmd >= LXC_CMD_MAX)
@ -106,7 +108,7 @@ static const char *lxc_cmd_str(lxc_cmd_t cmd)
* stored directly in data and datalen will be 0. * stored directly in data and datalen will be 0.
* *
* As a special case, the response for LXC_CMD_CONSOLE is created * As a special case, the response for LXC_CMD_CONSOLE is created
* here as it contains an fd for the master pty passed through the * here as it contains an fd for the ptmx pty passed through the
* unix socket. * unix socket.
*/ */
static int lxc_cmd_rsp_recv(int sock, struct lxc_cmd_rr *cmd) static int lxc_cmd_rsp_recv(int sock, struct lxc_cmd_rr *cmd)
@ -137,12 +139,14 @@ static int lxc_cmd_rsp_recv(int sock, struct lxc_cmd_rr *cmd)
ENOMEM, "Failed to receive response for command \"%s\"", ENOMEM, "Failed to receive response for command \"%s\"",
lxc_cmd_str(cmd->req.cmd)); lxc_cmd_str(cmd->req.cmd));
rspdata->masterfd = move_fd(fd_rsp); rspdata->ptmxfd = move_fd(fd_rsp);
rspdata->ttynum = PTR_TO_INT(rsp->data); rspdata->ttynum = PTR_TO_INT(rsp->data);
rsp->data = rspdata; rsp->data = rspdata;
} }
if (cmd->req.cmd == LXC_CMD_GET_CGROUP2_FD) { if (cmd->req.cmd == LXC_CMD_GET_CGROUP2_FD ||
cmd->req.cmd == LXC_CMD_GET_LIMITING_CGROUP2_FD)
{
int cgroup2_fd = move_fd(fd_rsp); int cgroup2_fd = move_fd(fd_rsp);
rsp->data = INT_TO_PTR(cgroup2_fd); rsp->data = INT_TO_PTR(cgroup2_fd);
} }
@ -325,6 +329,34 @@ int lxc_try_cmd(const char *name, const char *lxcpath)
return 0; return 0;
} }
/*
* Validate that the input is a proper string parameter. If not,
* send an EINVAL response and return -1.
*
* Precondition: there is non-zero-length data available.
*/
static int validate_string_request(int fd, const struct lxc_cmd_req *req)
{
int ret;
size_t maxlen = req->datalen - 1;
const char *data = req->data;
if (data[maxlen] == 0 && strnlen(data, maxlen) == maxlen)
return 0;
struct lxc_cmd_rsp rsp = {
.ret = -EINVAL,
.datalen = 0,
.data = NULL,
};
ret = lxc_cmd_rsp_send(fd, &rsp);
if (ret < 0)
return LXC_CMD_REAP_CLIENT_FD;
return -1;
}
/* Implementations of the commands and their callbacks */ /* Implementations of the commands and their callbacks */
/* /*
@ -455,6 +487,50 @@ static int lxc_cmd_get_clone_flags_callback(int fd, struct lxc_cmd_req *req,
return 0; return 0;
} }
static char *lxc_cmd_get_cgroup_path_do(const char *name, const char *lxcpath,
const char *subsystem,
lxc_cmd_t command)
{
int ret, stopped;
struct lxc_cmd_rr cmd = {
.req = {
.cmd = command,
.data = subsystem,
.datalen = 0,
},
};
cmd.req.data = subsystem;
cmd.req.datalen = 0;
if (subsystem)
cmd.req.datalen = strlen(subsystem) + 1;
ret = lxc_cmd(name, &cmd, &stopped, lxcpath, NULL);
if (ret < 0)
return NULL;
if (ret == 0) {
if (command == LXC_CMD_GET_LIMITING_CGROUP) {
/*
* This may indicate that the container was started
* under an ealier version before
* `cgroup_advanced_isolation` as implemented, there
* it sees an unknown command and just closes the
* socket, sending us an EOF.
*/
return lxc_cmd_get_cgroup_path_do(name, lxcpath,
subsystem,
LXC_CMD_GET_CGROUP);
}
return NULL;
}
if (cmd.rsp.ret < 0 || cmd.rsp.datalen < 0)
return NULL;
return cmd.rsp.data;
}
/* /*
* lxc_cmd_get_cgroup_path: Calculate a container's cgroup path for a * lxc_cmd_get_cgroup_path: Calculate a container's cgroup path for a
* particular subsystem. This is the cgroup path relative to the root * particular subsystem. This is the cgroup path relative to the root
@ -470,46 +546,57 @@ static int lxc_cmd_get_clone_flags_callback(int fd, struct lxc_cmd_req *req,
char *lxc_cmd_get_cgroup_path(const char *name, const char *lxcpath, char *lxc_cmd_get_cgroup_path(const char *name, const char *lxcpath,
const char *subsystem) const char *subsystem)
{ {
int ret, stopped; return lxc_cmd_get_cgroup_path_do(name, lxcpath, subsystem,
struct lxc_cmd_rr cmd = { LXC_CMD_GET_CGROUP);
.req = {
.cmd = LXC_CMD_GET_CGROUP,
.data = subsystem,
.datalen = 0,
},
};
cmd.req.data = subsystem;
cmd.req.datalen = 0;
if (subsystem)
cmd.req.datalen = strlen(subsystem) + 1;
ret = lxc_cmd(name, &cmd, &stopped, lxcpath, NULL);
if (ret < 0)
return NULL;
if (ret == 0)
return NULL;
if (cmd.rsp.ret < 0 || cmd.rsp.datalen < 0)
return NULL;
return cmd.rsp.data;
} }
static int lxc_cmd_get_cgroup_callback(int fd, struct lxc_cmd_req *req, /*
* lxc_cmd_get_limiting_cgroup_path: Calculate a container's limiting cgroup
* path for a particular subsystem. This is the cgroup path relative to the
* root of the cgroup filesystem. This may be the same as the path returned by
* lxc_cmd_get_cgroup_path if the container doesn't have a limiting path prefix
* set.
*
* @name : name of container to connect to
* @lxcpath : the lxcpath in which the container is running
* @subsystem : the subsystem being asked about
*
* Returns the path on success, NULL on failure. The caller must free() the
* returned path.
*/
char *lxc_cmd_get_limiting_cgroup_path(const char *name, const char *lxcpath,
const char *subsystem)
{
return lxc_cmd_get_cgroup_path_do(name, lxcpath, subsystem,
LXC_CMD_GET_LIMITING_CGROUP);
}
static int lxc_cmd_get_cgroup_callback_do(int fd, struct lxc_cmd_req *req,
struct lxc_handler *handler, struct lxc_handler *handler,
struct lxc_epoll_descr *descr) struct lxc_epoll_descr *descr,
bool limiting_cgroup)
{ {
int ret; int ret;
const char *path; const char *path;
const void *reqdata;
struct lxc_cmd_rsp rsp; struct lxc_cmd_rsp rsp;
struct cgroup_ops *cgroup_ops = handler->cgroup_ops; struct cgroup_ops *cgroup_ops = handler->cgroup_ops;
const char *(*get_fn)(struct cgroup_ops *ops, const char *controller);
if (req->datalen > 0) {
ret = validate_string_request(fd, req);
if (ret != 0)
return ret;
reqdata = req->data;
} else {
reqdata = NULL;
}
get_fn = (limiting_cgroup ? cgroup_ops->get_cgroup
: cgroup_ops->get_limiting_cgroup);
path = get_fn(cgroup_ops, reqdata);
if (req->datalen > 0)
path = cgroup_ops->get_cgroup(cgroup_ops, req->data);
else
path = cgroup_ops->get_cgroup(cgroup_ops, NULL);
if (!path) if (!path)
return -1; return -1;
@ -524,6 +611,20 @@ static int lxc_cmd_get_cgroup_callback(int fd, struct lxc_cmd_req *req,
return 0; return 0;
} }
static int lxc_cmd_get_cgroup_callback(int fd, struct lxc_cmd_req *req,
struct lxc_handler *handler,
struct lxc_epoll_descr *descr)
{
return lxc_cmd_get_cgroup_callback_do(fd, req, handler, descr, false);
}
static int lxc_cmd_get_limiting_cgroup_callback(int fd, struct lxc_cmd_req *req,
struct lxc_handler *handler,
struct lxc_epoll_descr *descr)
{
return ret_errno(ENOSYS);
}
/* /*
* lxc_cmd_get_config_item: Get config item the running container * lxc_cmd_get_config_item: Get config item the running container
* *
@ -743,7 +844,7 @@ static int lxc_cmd_terminal_winch_callback(int fd, struct lxc_cmd_req *req,
* @name : name of container to connect to * @name : name of container to connect to
* @ttynum : in: the tty to open or -1 for next available * @ttynum : in: the tty to open or -1 for next available
* : out: the tty allocated * : out: the tty allocated
* @fd : out: file descriptor for master side of pty * @fd : out: file descriptor for ptmx side of pty
* @lxcpath : the lxcpath in which the container is running * @lxcpath : the lxcpath in which the container is running
* *
* Returns fd holding tty allocated on success, < 0 on failure * Returns fd holding tty allocated on success, < 0 on failure
@ -770,11 +871,11 @@ int lxc_cmd_console(const char *name, int *ttynum, int *fd, const char *lxcpath)
if (ret == 0) if (ret == 0)
return log_error(-1, "tty number %d invalid, busy or all ttys busy", *ttynum); return log_error(-1, "tty number %d invalid, busy or all ttys busy", *ttynum);
if (rspdata->masterfd < 0) if (rspdata->ptmxfd < 0)
return log_error(-1, "Unable to allocate fd for tty %d", rspdata->ttynum); return log_error(-1, "Unable to allocate fd for tty %d", rspdata->ttynum);
ret = cmd.rsp.ret; /* socket fd */ ret = cmd.rsp.ret; /* socket fd */
*fd = rspdata->masterfd; *fd = rspdata->ptmxfd;
*ttynum = rspdata->ttynum; *ttynum = rspdata->ttynum;
return log_info(ret, "Alloced fd %d for tty %d via socket %d", *fd, rspdata->ttynum, ret); return log_info(ret, "Alloced fd %d for tty %d via socket %d", *fd, rspdata->ttynum, ret);
@ -784,17 +885,17 @@ static int lxc_cmd_console_callback(int fd, struct lxc_cmd_req *req,
struct lxc_handler *handler, struct lxc_handler *handler,
struct lxc_epoll_descr *descr) struct lxc_epoll_descr *descr)
{ {
int masterfd, ret; int ptmxfd, ret;
struct lxc_cmd_rsp rsp; struct lxc_cmd_rsp rsp;
int ttynum = PTR_TO_INT(req->data); int ttynum = PTR_TO_INT(req->data);
masterfd = lxc_terminal_allocate(handler->conf, fd, &ttynum); ptmxfd = lxc_terminal_allocate(handler->conf, fd, &ttynum);
if (masterfd < 0) if (ptmxfd < 0)
return LXC_CMD_REAP_CLIENT_FD; return LXC_CMD_REAP_CLIENT_FD;
memset(&rsp, 0, sizeof(rsp)); memset(&rsp, 0, sizeof(rsp));
rsp.data = INT_TO_PTR(ttynum); rsp.data = INT_TO_PTR(ttynum);
ret = lxc_abstract_unix_send_fds(fd, &masterfd, 1, &rsp, sizeof(rsp)); ret = lxc_abstract_unix_send_fds(fd, &ptmxfd, 1, &rsp, sizeof(rsp));
if (ret < 0) { if (ret < 0) {
lxc_terminal_free(handler->conf, fd); lxc_terminal_free(handler->conf, fd);
return log_error_errno(LXC_CMD_REAP_CLIENT_FD, errno, return log_error_errno(LXC_CMD_REAP_CLIENT_FD, errno,
@ -1328,31 +1429,50 @@ int lxc_cmd_get_cgroup2_fd(const char *name, const char *lxcpath)
return -1; return -1;
if (cmd.rsp.ret < 0) if (cmd.rsp.ret < 0)
return log_debug_errno(-1, errno, "Failed to receive cgroup2 fd"); return log_debug_errno(cmd.rsp.ret, -cmd.rsp.ret, "Failed to receive cgroup2 fd");
return PTR_TO_INT(cmd.rsp.data); return PTR_TO_INT(cmd.rsp.data);
} }
static int lxc_cmd_get_cgroup2_fd_callback_do(int fd, struct lxc_cmd_req *req,
struct lxc_handler *handler,
struct lxc_epoll_descr *descr,
bool limiting_cgroup)
{
struct lxc_cmd_rsp rsp = {
.ret = -EINVAL,
};
struct cgroup_ops *ops = handler->cgroup_ops;
int ret, send_fd;
if (!pure_unified_layout(ops) || !ops->unified)
return lxc_cmd_rsp_send(fd, &rsp);
send_fd = limiting_cgroup ? ops->unified->cgfd_limit
: ops->unified->cgfd_con;
rsp.ret = 0;
ret = lxc_abstract_unix_send_fds(fd, &send_fd, 1, &rsp, sizeof(rsp));
if (ret < 0)
return log_error(LXC_CMD_REAP_CLIENT_FD, "Failed to send cgroup2 fd");
return 0;
}
static int lxc_cmd_get_cgroup2_fd_callback(int fd, struct lxc_cmd_req *req, static int lxc_cmd_get_cgroup2_fd_callback(int fd, struct lxc_cmd_req *req,
struct lxc_handler *handler, struct lxc_handler *handler,
struct lxc_epoll_descr *descr) struct lxc_epoll_descr *descr)
{ {
struct lxc_cmd_rsp rsp = { return lxc_cmd_get_cgroup2_fd_callback_do(fd, req, handler, descr,
.ret = -EINVAL, false);
}; }
struct cgroup_ops *ops = handler->cgroup_ops;
int ret;
if (!pure_unified_layout(ops) || !ops->unified) static int lxc_cmd_get_limiting_cgroup2_fd_callback(int fd,
return lxc_cmd_rsp_send(fd, &rsp); struct lxc_cmd_req *req,
struct lxc_handler *handler,
rsp.ret = 0; struct lxc_epoll_descr *descr)
ret = lxc_abstract_unix_send_fds(fd, &ops->unified->cgfd_con, 1, &rsp, {
sizeof(rsp)); return ret_errno(ENOSYS);
if (ret < 0)
return log_error(LXC_CMD_REAP_CLIENT_FD, "Failed to send cgroup2 fd");
return 0;
} }
static int lxc_cmd_process(int fd, struct lxc_cmd_req *req, static int lxc_cmd_process(int fd, struct lxc_cmd_req *req,
@ -1382,10 +1502,12 @@ static int lxc_cmd_process(int fd, struct lxc_cmd_req *req,
[LXC_CMD_UNFREEZE] = lxc_cmd_unfreeze_callback, [LXC_CMD_UNFREEZE] = lxc_cmd_unfreeze_callback,
[LXC_CMD_GET_CGROUP2_FD] = lxc_cmd_get_cgroup2_fd_callback, [LXC_CMD_GET_CGROUP2_FD] = lxc_cmd_get_cgroup2_fd_callback,
[LXC_CMD_GET_INIT_PIDFD] = lxc_cmd_get_init_pidfd_callback, [LXC_CMD_GET_INIT_PIDFD] = lxc_cmd_get_init_pidfd_callback,
[LXC_CMD_GET_LIMITING_CGROUP] = lxc_cmd_get_limiting_cgroup_callback,
[LXC_CMD_GET_LIMITING_CGROUP2_FD] = lxc_cmd_get_limiting_cgroup2_fd_callback,
}; };
if (req->cmd >= LXC_CMD_MAX) if (req->cmd >= LXC_CMD_MAX)
return log_error_errno(-1, ENOENT, "Undefined command id %d", req->cmd); return log_trace_errno(-1, EINVAL, "Invalid command id %d", req->cmd);
return cb[req->cmd](fd, req, handler, descr); return cb[req->cmd](fd, req, handler, descr);
} }
@ -1450,7 +1572,7 @@ static int lxc_cmd_handler(int fd, uint32_t events, void *data,
if (errno == EACCES) { if (errno == EACCES) {
/* We don't care for the peer, just send and close. */ /* We don't care for the peer, just send and close. */
struct lxc_cmd_rsp rsp = { struct lxc_cmd_rsp rsp = {
.ret = ret, .ret = -EPERM,
}; };
lxc_cmd_rsp_send(fd, &rsp); lxc_cmd_rsp_send(fd, &rsp);
@ -1464,14 +1586,11 @@ static int lxc_cmd_handler(int fd, uint32_t events, void *data,
if (ret != sizeof(req)) { if (ret != sizeof(req)) {
WARN("Failed to receive full command request. Ignoring request for \"%s\"", lxc_cmd_str(req.cmd)); WARN("Failed to receive full command request. Ignoring request for \"%s\"", lxc_cmd_str(req.cmd));
ret = -1;
goto out_close; goto out_close;
} }
if ((req.datalen > LXC_CMD_DATA_MAX) && (req.cmd != LXC_CMD_CONSOLE_LOG)) { if ((req.datalen > LXC_CMD_DATA_MAX) && (req.cmd != LXC_CMD_CONSOLE_LOG)) {
ERROR("Received command data length %d is too large for command \"%s\"", req.datalen, lxc_cmd_str(req.cmd)); ERROR("Received command data length %d is too large for command \"%s\"", req.datalen, lxc_cmd_str(req.cmd));
errno = EFBIG;
ret = -EFBIG;
goto out_close; goto out_close;
} }
@ -1480,7 +1599,6 @@ static int lxc_cmd_handler(int fd, uint32_t events, void *data,
ret = lxc_recv_nointr(fd, reqdata, req.datalen, 0); ret = lxc_recv_nointr(fd, reqdata, req.datalen, 0);
if (ret != req.datalen) { if (ret != req.datalen) {
WARN("Failed to receive full command request. Ignoring request for \"%s\"", lxc_cmd_str(req.cmd)); WARN("Failed to receive full command request. Ignoring request for \"%s\"", lxc_cmd_str(req.cmd));
ret = LXC_MAINLOOP_ERROR;
goto out_close; goto out_close;
} }
@ -1490,12 +1608,11 @@ static int lxc_cmd_handler(int fd, uint32_t events, void *data,
ret = lxc_cmd_process(fd, &req, handler, descr); ret = lxc_cmd_process(fd, &req, handler, descr);
if (ret) { if (ret) {
/* This is not an error, but only a request to close fd. */ /* This is not an error, but only a request to close fd. */
ret = LXC_MAINLOOP_CONTINUE;
goto out_close; goto out_close;
} }
out: out:
return ret; return LXC_MAINLOOP_CONTINUE;
out_close: out_close:
lxc_cmd_fd_cleanup(fd, handler, descr, req.cmd); lxc_cmd_fd_cleanup(fd, handler, descr, req.cmd);

View File

@ -38,6 +38,8 @@ typedef enum {
LXC_CMD_UNFREEZE, LXC_CMD_UNFREEZE,
LXC_CMD_GET_CGROUP2_FD, LXC_CMD_GET_CGROUP2_FD,
LXC_CMD_GET_INIT_PIDFD, LXC_CMD_GET_INIT_PIDFD,
LXC_CMD_GET_LIMITING_CGROUP,
LXC_CMD_GET_LIMITING_CGROUP2_FD,
LXC_CMD_MAX, LXC_CMD_MAX,
} lxc_cmd_t; } lxc_cmd_t;
@ -59,7 +61,7 @@ struct lxc_cmd_rr {
}; };
struct lxc_cmd_console_rsp_data { struct lxc_cmd_console_rsp_data {
int masterfd; int ptmxfd;
int ttynum; int ttynum;
}; };
@ -129,5 +131,9 @@ extern int lxc_cmd_add_bpf_device_cgroup(const char *name, const char *lxcpath,
extern int lxc_cmd_freeze(const char *name, const char *lxcpath, int timeout); extern int lxc_cmd_freeze(const char *name, const char *lxcpath, int timeout);
extern int lxc_cmd_unfreeze(const char *name, const char *lxcpath, int timeout); extern int lxc_cmd_unfreeze(const char *name, const char *lxcpath, int timeout);
extern int lxc_cmd_get_cgroup2_fd(const char *name, const char *lxcpath); extern int lxc_cmd_get_cgroup2_fd(const char *name, const char *lxcpath);
extern char *lxc_cmd_get_limiting_cgroup_path(const char *name,
const char *lxcpath,
const char *subsystem);
extern int lxc_cmd_get_limiting_cgroup2_fd(const char *name, const char *lxcpath);
#endif /* __commands_h */ #endif /* __commands_h */

View File

@ -62,11 +62,14 @@ int lxc_cmd_sock_get_state(const char *name, const char *lxcpath,
ret = lxc_cmd_add_state_client(name, lxcpath, states, &state_client_fd); ret = lxc_cmd_add_state_client(name, lxcpath, states, &state_client_fd);
if (ret < 0) if (ret < 0)
return -1; return ret_errno(EINVAL);
if (ret < MAX_STATE) if (ret < MAX_STATE)
return ret; return ret;
if (state_client_fd < 0)
return ret_errno(EBADF);
return lxc_cmd_sock_rcv_state(state_client_fd, timeout); return lxc_cmd_sock_rcv_state(state_client_fd, timeout);
} }

View File

@ -57,4 +57,22 @@
#define __cgfsng_ops #define __cgfsng_ops
/* access attribute */
#define __access_r(x, y)
#define __access_w(x, y)
#define __access_rw(x, y)
#ifdef __has_attribute
#if __has_attribute(access)
#undef __access_r
#define __access_r(x, y) __attribute__((access(read_only, x, y)))
#undef __access_w
#define __access_w(x, y) __attribute__((access(write_only, x, y)))
#undef __access_rw
#define __access_rw(x, y) __attribute__((access(read_write, x, y)))
#endif
#endif
#endif /* __LXC_COMPILER_H */ #endif /* __LXC_COMPILER_H */

File diff suppressed because it is too large Load Diff

View File

@ -60,6 +60,9 @@ struct lxc_cgroup {
struct /* meta */ { struct /* meta */ {
char *controllers; char *controllers;
char *dir; char *dir;
char *monitor_dir;
char *container_dir;
char *namespace_dir;
bool relative; bool relative;
}; };
}; };
@ -401,7 +404,8 @@ struct lxc_conf {
}; };
extern int write_id_mapping(enum idtype idtype, pid_t pid, const char *buf, extern int write_id_mapping(enum idtype idtype, pid_t pid, const char *buf,
size_t buf_size); size_t buf_size)
__access_r(3, 4);
#ifdef HAVE_TLS #ifdef HAVE_TLS
extern thread_local struct lxc_conf *current_config; extern thread_local struct lxc_conf *current_config;
@ -436,19 +440,18 @@ extern int lxc_setup_rootfs_prepare_root(struct lxc_conf *conf,
extern int lxc_setup(struct lxc_handler *handler); extern int lxc_setup(struct lxc_handler *handler);
extern int lxc_setup_parent(struct lxc_handler *handler); extern int lxc_setup_parent(struct lxc_handler *handler);
extern int setup_resource_limits(struct lxc_list *limits, pid_t pid); extern int setup_resource_limits(struct lxc_list *limits, pid_t pid);
extern int find_unmapped_nsid(struct lxc_conf *conf, enum idtype idtype); extern int find_unmapped_nsid(const struct lxc_conf *conf, enum idtype idtype);
extern int mapped_hostid(unsigned id, const struct lxc_conf *conf, extern int mapped_hostid(unsigned id, const struct lxc_conf *conf,
enum idtype idtype); enum idtype idtype);
extern int chown_mapped_root(const char *path, const struct lxc_conf *conf); extern int userns_exec_1(const struct lxc_conf *conf, int (*fn)(void *),
extern int userns_exec_1(struct lxc_conf *conf, int (*fn)(void *), void *data, void *data, const char *fn_name);
const char *fn_name);
extern int userns_exec_full(struct lxc_conf *conf, int (*fn)(void *), extern int userns_exec_full(struct lxc_conf *conf, int (*fn)(void *),
void *data, const char *fn_name); void *data, const char *fn_name);
extern int parse_mntopts(const char *mntopts, unsigned long *mntflags, extern int parse_mntopts(const char *mntopts, unsigned long *mntflags,
char **mntdata); char **mntdata);
extern int parse_propagationopts(const char *mntopts, unsigned long *pflags); extern int parse_propagationopts(const char *mntopts, unsigned long *pflags);
extern void tmp_proc_unmount(struct lxc_conf *lxc_conf); extern void tmp_proc_unmount(struct lxc_conf *lxc_conf);
extern void remount_all_slave(void); extern void turn_into_dependent_mounts(void);
extern void suggest_default_idmap(void); extern void suggest_default_idmap(void);
extern FILE *make_anonymous_mount_file(struct lxc_list *mount, extern FILE *make_anonymous_mount_file(struct lxc_list *mount,
bool include_nesting_helpers); bool include_nesting_helpers);
@ -467,5 +470,14 @@ extern int setup_proc_filesystem(struct lxc_list *procs, pid_t pid);
extern int lxc_clear_procs(struct lxc_conf *c, const char *key); extern int lxc_clear_procs(struct lxc_conf *c, const char *key);
extern int lxc_clear_apparmor_raw(struct lxc_conf *c); extern int lxc_clear_apparmor_raw(struct lxc_conf *c);
extern int lxc_clear_namespace(struct lxc_conf *c); extern int lxc_clear_namespace(struct lxc_conf *c);
extern int userns_exec_minimal(const struct lxc_conf *conf,
int (*fn_parent)(void *), void *fn_parent_data,
int (*fn_child)(void *), void *fn_child_data);
extern int userns_exec_mapped_root(const char *path, int path_fd,
const struct lxc_conf *conf);
static inline int chown_mapped_root(const char *path, const struct lxc_conf *conf)
{
return userns_exec_mapped_root(path, -EBADF, conf);
}
#endif /* __LXC_CONF_H */ #endif /* __LXC_CONF_H */

View File

@ -300,13 +300,17 @@ static int set_config_net_type(const char *key, const char *value,
netdev->type = LXC_NET_VETH; netdev->type = LXC_NET_VETH;
lxc_list_init(&netdev->priv.veth_attr.ipv4_routes); lxc_list_init(&netdev->priv.veth_attr.ipv4_routes);
lxc_list_init(&netdev->priv.veth_attr.ipv6_routes); lxc_list_init(&netdev->priv.veth_attr.ipv6_routes);
if (!lxc_veth_flag_to_mode(netdev->priv.veth_attr.mode))
lxc_veth_mode_to_flag(&netdev->priv.veth_attr.mode, "bridge"); lxc_veth_mode_to_flag(&netdev->priv.veth_attr.mode, "bridge");
} else if (strcmp(value, "macvlan") == 0) { } else if (strcmp(value, "macvlan") == 0) {
netdev->type = LXC_NET_MACVLAN; netdev->type = LXC_NET_MACVLAN;
if (!lxc_macvlan_flag_to_mode(netdev->priv.veth_attr.mode))
lxc_macvlan_mode_to_flag(&netdev->priv.macvlan_attr.mode, "private"); lxc_macvlan_mode_to_flag(&netdev->priv.macvlan_attr.mode, "private");
} else if (strcmp(value, "ipvlan") == 0) { } else if (strcmp(value, "ipvlan") == 0) {
netdev->type = LXC_NET_IPVLAN; netdev->type = LXC_NET_IPVLAN;
if (!lxc_ipvlan_flag_to_mode(netdev->priv.ipvlan_attr.mode))
lxc_ipvlan_mode_to_flag(&netdev->priv.ipvlan_attr.mode, "l3"); lxc_ipvlan_mode_to_flag(&netdev->priv.ipvlan_attr.mode, "l3");
if (!lxc_ipvlan_flag_to_isolation(netdev->priv.ipvlan_attr.isolation))
lxc_ipvlan_isolation_to_flag(&netdev->priv.ipvlan_attr.isolation, "bridge"); lxc_ipvlan_isolation_to_flag(&netdev->priv.ipvlan_attr.isolation, "bridge");
} else if (strcmp(value, "vlan") == 0) { } else if (strcmp(value, "vlan") == 0) {
netdev->type = LXC_NET_VLAN; netdev->type = LXC_NET_VLAN;
@ -2572,9 +2576,9 @@ static int set_config_rootfs_mount(const char *key, const char *value,
static int set_config_rootfs_options(const char *key, const char *value, static int set_config_rootfs_options(const char *key, const char *value,
struct lxc_conf *lxc_conf, void *data) struct lxc_conf *lxc_conf, void *data)
{ {
int ret;
unsigned long mflags = 0, pflags = 0; unsigned long mflags = 0, pflags = 0;
char *mdata = NULL, *opts = NULL; char *mdata = NULL, *opts = NULL;
int ret;
struct lxc_rootfs *rootfs = &lxc_conf->rootfs; struct lxc_rootfs *rootfs = &lxc_conf->rootfs;
ret = parse_mntopts(value, &mflags, &mdata); ret = parse_mntopts(value, &mflags, &mdata);

View File

@ -9,6 +9,8 @@
#include <lxc/attach_options.h> #include <lxc/attach_options.h>
#include <lxc/lxccontainer.h> #include <lxc/lxccontainer.h>
#include "compiler.h"
struct lxc_conf; struct lxc_conf;
struct lxc_list; struct lxc_list;
@ -46,21 +48,24 @@ struct new_config_item {
extern struct lxc_config_t *lxc_get_config(const char *key); extern struct lxc_config_t *lxc_get_config(const char *key);
/* List all available config items. */ /* List all available config items. */
extern int lxc_list_config_items(char *retv, int inlen); extern int lxc_list_config_items(char *retv, int inlen)
__access_rw(1, 2);
/* Given a configuration key namespace (e.g. lxc.apparmor) list all associated /* Given a configuration key namespace (e.g. lxc.apparmor) list all associated
* subkeys for that namespace. * subkeys for that namespace.
* Must be implemented when adding a new configuration key. * Must be implemented when adding a new configuration key.
*/ */
extern int lxc_list_subkeys(struct lxc_conf *conf, const char *key, char *retv, extern int lxc_list_subkeys(struct lxc_conf *conf, const char *key, char *retv,
int inlen); int inlen)
__access_rw(3, 4);
/* List all configuration items associated with a given network. For example /* List all configuration items associated with a given network. For example
* pass "lxc.net.[i]" to retrieve all configuration items associated with * pass "lxc.net.[i]" to retrieve all configuration items associated with
* the network associated with index [i]. * the network associated with index [i].
*/ */
extern int lxc_list_net(struct lxc_conf *c, const char *key, char *retv, extern int lxc_list_net(struct lxc_conf *c, const char *key, char *retv,
int inlen); int inlen)
__access_rw(3, 4);
extern int lxc_config_read(const char *file, struct lxc_conf *conf, extern int lxc_config_read(const char *file, struct lxc_conf *conf,
bool from_include); bool from_include);

View File

@ -506,6 +506,18 @@ int lxc_veth_mode_to_flag(int *mode, const char *value)
return ret_set_errno(-1, EINVAL); return ret_set_errno(-1, EINVAL);
} }
char *lxc_veth_flag_to_mode(int mode)
{
for (size_t i = 0; i < sizeof(veth_mode) / sizeof(veth_mode[0]); i++) {
if (veth_mode[i].mode != mode)
continue;
return veth_mode[i].name;
}
return NULL;
}
static struct lxc_macvlan_mode { static struct lxc_macvlan_mode {
char *name; char *name;
int mode; int mode;

View File

@ -5,6 +5,7 @@
#include <stdbool.h> #include <stdbool.h>
#include "compiler.h"
#include "conf.h" #include "conf.h"
#include "confile_utils.h" #include "confile_utils.h"
@ -40,6 +41,7 @@ extern void lxc_log_configured_netdevs(const struct lxc_conf *conf);
extern bool lxc_remove_nic_by_idx(struct lxc_conf *conf, unsigned int idx); extern bool lxc_remove_nic_by_idx(struct lxc_conf *conf, unsigned int idx);
extern void lxc_free_networks(struct lxc_list *networks); extern void lxc_free_networks(struct lxc_list *networks);
extern int lxc_veth_mode_to_flag(int *mode, const char *value); extern int lxc_veth_mode_to_flag(int *mode, const char *value);
extern char *lxc_veth_flag_to_mode(int mode);
extern int lxc_macvlan_mode_to_flag(int *mode, const char *value); extern int lxc_macvlan_mode_to_flag(int *mode, const char *value);
extern char *lxc_macvlan_flag_to_mode(int mode); extern char *lxc_macvlan_flag_to_mode(int mode);
extern int lxc_ipvlan_mode_to_flag(int *mode, const char *value); extern int lxc_ipvlan_mode_to_flag(int *mode, const char *value);
@ -49,12 +51,16 @@ extern char *lxc_ipvlan_flag_to_isolation(int mode);
extern int set_config_string_item(char **conf_item, const char *value); extern int set_config_string_item(char **conf_item, const char *value);
extern int set_config_string_item_max(char **conf_item, const char *value, extern int set_config_string_item_max(char **conf_item, const char *value,
size_t max); size_t max)
__access_r(2, 3);
extern int set_config_path_item(char **conf_item, const char *value); extern int set_config_path_item(char **conf_item, const char *value);
extern int set_config_bool_item(bool *conf_item, const char *value, extern int set_config_bool_item(bool *conf_item, const char *value,
bool empty_conf_action); bool empty_conf_action);
extern int config_ip_prefix(struct in_addr *addr); extern int config_ip_prefix(struct in_addr *addr);
extern int network_ifname(char *valuep, const char *value, size_t size); extern int network_ifname(char *valuep, const char *value, size_t size)
__access_r(2, 3);
extern void rand_complete_hwaddr(char *hwaddr); extern void rand_complete_hwaddr(char *hwaddr);
extern bool lxc_config_net_is_hwaddr(const char *line); extern bool lxc_config_net_is_hwaddr(const char *line);
extern bool new_hwaddr(char *hwaddr); extern bool new_hwaddr(char *hwaddr);

View File

@ -303,7 +303,7 @@ static void exec_criu(struct cgroup_ops *cgroup_ops, struct lxc_conf *conf,
* the handler the restore task created. * the handler the restore task created.
*/ */
if (!strcmp(opts->action, "dump") || !strcmp(opts->action, "pre-dump")) { if (!strcmp(opts->action, "dump") || !strcmp(opts->action, "pre-dump")) {
path = lxc_cmd_get_cgroup_path(opts->c->name, opts->c->config_path, controllers[0]); path = lxc_cmd_get_limiting_cgroup_path(opts->c->name, opts->c->config_path, controllers[0]);
if (!path) { if (!path) {
ERROR("failed to get cgroup path for %s", controllers[0]); ERROR("failed to get cgroup path for %s", controllers[0]);
goto err; goto err;
@ -311,7 +311,7 @@ static void exec_criu(struct cgroup_ops *cgroup_ops, struct lxc_conf *conf,
} else { } else {
const char *p; const char *p;
p = cgroup_ops->get_cgroup(cgroup_ops, controllers[0]); p = cgroup_ops->get_limiting_cgroup(cgroup_ops, controllers[0]);
if (!p) { if (!p) {
ERROR("failed to get cgroup path for %s", controllers[0]); ERROR("failed to get cgroup path for %s", controllers[0]);
goto err; goto err;
@ -367,9 +367,9 @@ static void exec_criu(struct cgroup_ops *cgroup_ops, struct lxc_conf *conf,
goto err; goto err;
while (getmntent_r(mnts, &mntent, buf, sizeof(buf))) { while (getmntent_r(mnts, &mntent, buf, sizeof(buf))) {
char *mntdata; unsigned long flags = 0;
char *mntdata = NULL;
char arg[2 * PATH_MAX + 2]; char arg[2 * PATH_MAX + 2];
unsigned long flags;
if (parse_mntopts(mntent.mnt_opts, &flags, &mntdata) < 0) if (parse_mntopts(mntent.mnt_opts, &flags, &mntdata) < 0)
goto err; goto err;
@ -406,7 +406,7 @@ static void exec_criu(struct cgroup_ops *cgroup_ops, struct lxc_conf *conf,
DECLARE_ARG("-t"); DECLARE_ARG("-t");
DECLARE_ARG(pid); DECLARE_ARG(pid);
freezer_relative = lxc_cmd_get_cgroup_path(opts->c->name, freezer_relative = lxc_cmd_get_limiting_cgroup_path(opts->c->name,
opts->c->config_path, opts->c->config_path,
"freezer"); "freezer");
if (!freezer_relative) { if (!freezer_relative) {
@ -942,7 +942,7 @@ static void do_restore(struct lxc_container *c, int status_pipe, struct migrate_
close(fd); close(fd);
} }
handler = lxc_init_handler(c->name, c->lxc_conf, c->config_path, false); handler = lxc_init_handler(NULL, c->name, c->lxc_conf, c->config_path, false);
if (!handler) if (!handler)
goto out; goto out;
@ -1011,7 +1011,7 @@ static void do_restore(struct lxc_container *c, int status_pipe, struct migrate_
} }
if (mount(rootfs->path, rootfs->mount, NULL, MS_BIND, NULL) < 0) { if (mount(rootfs->path, rootfs->mount, NULL, MS_BIND, NULL) < 0) {
rmdir(rootfs->mount); (void)rmdir(rootfs->mount);
goto out_fini_handler; goto out_fini_handler;
} }
} }
@ -1020,7 +1020,7 @@ static void do_restore(struct lxc_container *c, int status_pipe, struct migrate_
os.action = "restore"; os.action = "restore";
os.user = opts; os.user = opts;
os.c = c; os.c = c;
os.console_fd = c->lxc_conf->console.slave; os.console_fd = c->lxc_conf->console.pts;
os.criu_version = criu_version; os.criu_version = criu_version;
os.handler = handler; os.handler = handler;
@ -1046,7 +1046,7 @@ static void do_restore(struct lxc_container *c, int status_pipe, struct migrate_
/* exec_criu() returning is an error */ /* exec_criu() returning is an error */
exec_criu(cgroup_ops, c->lxc_conf, &os); exec_criu(cgroup_ops, c->lxc_conf, &os);
umount(rootfs->mount); umount(rootfs->mount);
rmdir(rootfs->mount); (void)rmdir(rootfs->mount);
goto out_fini_handler; goto out_fini_handler;
} else { } else {
char title[2048]; char title[2048];
@ -1323,7 +1323,7 @@ static bool do_dump(struct lxc_container *c, char *mode, struct migrate_opts *op
fail: fail:
close(criuout[0]); close(criuout[0]);
close(criuout[1]); close(criuout[1]);
rmdir(opts->directory); (void)rmdir(opts->directory);
free(criu_version); free(criu_version);
return false; return false;
} }

View File

@ -14,7 +14,7 @@
#include "config.h" #include "config.h"
#include "log.h" #include "log.h"
#include "start.h" #include "start.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "utils.h" #include "utils.h"
lxc_log_define(execute, start); lxc_log_define(execute, start);
@ -66,7 +66,7 @@ static int execute_start(struct lxc_handler *handler, void* data)
NOTICE("Exec'ing \"%s\"", my_args->argv[0]); NOTICE("Exec'ing \"%s\"", my_args->argv[0]);
if (my_args->init_fd >= 0) if (my_args->init_fd >= 0)
lxc_raw_execveat(my_args->init_fd, "", argv, environ, AT_EMPTY_PATH); execveat(my_args->init_fd, "", argv, environ, AT_EMPTY_PATH);
else else
execvp(argv[0], argv); execvp(argv[0], argv);
SYSERROR("Failed to exec %s", argv[0]); SYSERROR("Failed to exec %s", argv[0]);

View File

@ -12,27 +12,52 @@
#include <sys/vfs.h> #include <sys/vfs.h>
#include <unistd.h> #include <unistd.h>
#include "compiler.h"
/* read and write whole files */ /* read and write whole files */
extern int lxc_write_to_file(const char *filename, const void *buf, extern int lxc_write_to_file(const char *filename, const void *buf,
size_t count, bool add_newline, mode_t mode); size_t count, bool add_newline, mode_t mode)
extern int lxc_readat(int dirfd, const char *filename, void *buf, size_t count); __access_r(2, 3);
extern int lxc_readat(int dirfd, const char *filename, void *buf, size_t count)
__access_w(3, 4);
extern int lxc_writeat(int dirfd, const char *filename, const void *buf, extern int lxc_writeat(int dirfd, const char *filename, const void *buf,
size_t count); size_t count)
__access_r(3, 4);
extern int lxc_write_openat(const char *dir, const char *filename, extern int lxc_write_openat(const char *dir, const char *filename,
const void *buf, size_t count); const void *buf, size_t count)
extern int lxc_read_from_file(const char *filename, void *buf, size_t count); __access_r(3, 4);
extern int lxc_read_from_file(const char *filename, void *buf, size_t count)
__access_w(2, 3);
/* send and receive buffers completely */ /* send and receive buffers completely */
extern ssize_t lxc_write_nointr(int fd, const void *buf, size_t count); extern ssize_t lxc_write_nointr(int fd, const void *buf, size_t count)
__access_r(2, 3);
extern ssize_t lxc_pwrite_nointr(int fd, const void *buf, size_t count, extern ssize_t lxc_pwrite_nointr(int fd, const void *buf, size_t count,
off_t offset); off_t offset)
extern ssize_t lxc_send_nointr(int sockfd, void *buf, size_t len, int flags); __access_r(2, 3);
extern ssize_t lxc_read_nointr(int fd, void *buf, size_t count);
extern ssize_t lxc_send_nointr(int sockfd, void *buf, size_t len, int flags)
__access_r(2, 3);
extern ssize_t lxc_read_nointr(int fd, void *buf, size_t count)
__access_w(2, 3);
extern ssize_t lxc_read_nointr_expect(int fd, void *buf, size_t count, extern ssize_t lxc_read_nointr_expect(int fd, void *buf, size_t count,
const void *expected_buf); const void *expected_buf)
__access_w(2, 3);
extern ssize_t lxc_read_file_expect(const char *path, void *buf, size_t count, extern ssize_t lxc_read_file_expect(const char *path, void *buf, size_t count,
const void *expected_buf); const void *expected_buf)
extern ssize_t lxc_recv_nointr(int sockfd, void *buf, size_t len, int flags); __access_w(2, 3);
extern ssize_t lxc_recv_nointr(int sockfd, void *buf, size_t len, int flags)
__access_w(2, 3);
ssize_t lxc_recvmsg_nointr_iov(int sockfd, struct iovec *iov, size_t iovlen, ssize_t lxc_recvmsg_nointr_iov(int sockfd, struct iovec *iov, size_t iovlen,
int flags); int flags);

View File

@ -44,7 +44,7 @@
#define LXC_LOG_TIME_SIZE ((INTTYPE_TO_STRLEN(uint64_t)) * 2) #define LXC_LOG_TIME_SIZE ((INTTYPE_TO_STRLEN(uint64_t)) * 2)
int lxc_log_fd = -EBADF; int lxc_log_fd = -EBADF;
static int syslog_enable = 0; static bool wants_syslog = false;
int lxc_quiet_specified; int lxc_quiet_specified;
int lxc_log_use_global_fd; int lxc_log_use_global_fd;
static int lxc_loglevel_specified; static int lxc_loglevel_specified;
@ -128,7 +128,7 @@ static int log_append_syslog(const struct lxc_log_appender *appender,
__do_free char *msg = NULL; __do_free char *msg = NULL;
const char *log_container_name; const char *log_container_name;
if (!syslog_enable) if (!wants_syslog)
return 0; return 0;
log_container_name = lxc_log_get_container_name(); log_container_name = lxc_log_get_container_name();
@ -485,10 +485,9 @@ static int build_dir(const char *name)
*p = '\0'; *p = '\0';
ret = lxc_unpriv(mkdir(n, 0755)); ret = lxc_unpriv(mkdir(n, 0755));
*p = '/';
if (ret && errno != EEXIST) if (ret && errno != EEXIST)
return log_error_errno(-errno, errno, "Failed to create directory \"%s\"", n); return log_error_errno(-errno, errno, "Failed to create directory \"%s\"", n);
*p = '/';
} }
return 0; return 0;
@ -739,9 +738,14 @@ int lxc_log_syslog(int facility)
return 0; return 0;
} }
inline void lxc_log_enable_syslog(void) void lxc_log_syslog_enable(void)
{ {
syslog_enable = 1; wants_syslog = true;
}
void lxc_log_syslog_disable(void)
{
wants_syslog = false;
} }
/* /*

View File

@ -3,6 +3,9 @@
#ifndef __LXC_LOG_H #ifndef __LXC_LOG_H
#define __LXC_LOG_H #define __LXC_LOG_H
#ifndef _GNU_SOURCE
#define _GNU_SOURCE 1
#endif
#include <errno.h> #include <errno.h>
#include <stdarg.h> #include <stdarg.h>
#include <stdio.h> #include <stdio.h>
@ -14,6 +17,7 @@
#include <time.h> #include <time.h>
#include "conf.h" #include "conf.h"
#include "config.h"
#ifndef O_CLOEXEC #ifndef O_CLOEXEC
#define O_CLOEXEC 02000000 #define O_CLOEXEC 02000000
@ -388,7 +392,7 @@ __lxc_unused static inline void LXC_##LEVEL(struct lxc_log_locinfo* locinfo, \
LXC_FATAL(&locinfo, format, ##__VA_ARGS__); \ LXC_FATAL(&locinfo, format, ##__VA_ARGS__); \
} while (0) } while (0)
#if HAVE_M_FORMAT #if HAVE_M_FORMAT && !ENABLE_COVERITY_BUILD
#define SYSTRACE(format, ...) \ #define SYSTRACE(format, ...) \
TRACE("%m - " format, ##__VA_ARGS__) TRACE("%m - " format, ##__VA_ARGS__)
#else #else
@ -399,7 +403,7 @@ __lxc_unused static inline void LXC_##LEVEL(struct lxc_log_locinfo* locinfo, \
} while (0) } while (0)
#endif #endif
#if HAVE_M_FORMAT #if HAVE_M_FORMAT && !ENABLE_COVERITY_BUILD
#define SYSDEBUG(format, ...) \ #define SYSDEBUG(format, ...) \
DEBUG("%m - " format, ##__VA_ARGS__) DEBUG("%m - " format, ##__VA_ARGS__)
#else #else
@ -411,7 +415,7 @@ __lxc_unused static inline void LXC_##LEVEL(struct lxc_log_locinfo* locinfo, \
#endif #endif
#if HAVE_M_FORMAT #if HAVE_M_FORMAT && !ENABLE_COVERITY_BUILD
#define SYSINFO(format, ...) \ #define SYSINFO(format, ...) \
INFO("%m - " format, ##__VA_ARGS__) INFO("%m - " format, ##__VA_ARGS__)
#else #else
@ -422,7 +426,7 @@ __lxc_unused static inline void LXC_##LEVEL(struct lxc_log_locinfo* locinfo, \
} while (0) } while (0)
#endif #endif
#if HAVE_M_FORMAT #if HAVE_M_FORMAT && !ENABLE_COVERITY_BUILD
#define SYSNOTICE(format, ...) \ #define SYSNOTICE(format, ...) \
NOTICE("%m - " format, ##__VA_ARGS__) NOTICE("%m - " format, ##__VA_ARGS__)
#else #else
@ -433,7 +437,7 @@ __lxc_unused static inline void LXC_##LEVEL(struct lxc_log_locinfo* locinfo, \
} while (0) } while (0)
#endif #endif
#if HAVE_M_FORMAT #if HAVE_M_FORMAT && !ENABLE_COVERITY_BUILD
#define SYSWARN(format, ...) \ #define SYSWARN(format, ...) \
WARN("%m - " format, ##__VA_ARGS__) WARN("%m - " format, ##__VA_ARGS__)
#else #else
@ -444,7 +448,7 @@ __lxc_unused static inline void LXC_##LEVEL(struct lxc_log_locinfo* locinfo, \
} while (0) } while (0)
#endif #endif
#if HAVE_M_FORMAT #if HAVE_M_FORMAT && !ENABLE_COVERITY_BUILD
#define SYSERROR(format, ...) \ #define SYSERROR(format, ...) \
ERROR("%m - " format, ##__VA_ARGS__) ERROR("%m - " format, ##__VA_ARGS__)
#else #else
@ -455,7 +459,7 @@ __lxc_unused static inline void LXC_##LEVEL(struct lxc_log_locinfo* locinfo, \
} while (0) } while (0)
#endif #endif
#if HAVE_M_FORMAT #if HAVE_M_FORMAT && !ENABLE_COVERITY_BUILD
#define CMD_SYSERROR(format, ...) \ #define CMD_SYSERROR(format, ...) \
fprintf(stderr, "%s: %d: %s - %m - " format "\n", __FILE__, __LINE__, \ fprintf(stderr, "%s: %d: %s - %m - " format "\n", __FILE__, __LINE__, \
__func__, ##__VA_ARGS__); __func__, ##__VA_ARGS__);
@ -468,7 +472,7 @@ __lxc_unused static inline void LXC_##LEVEL(struct lxc_log_locinfo* locinfo, \
} while (0) } while (0)
#endif #endif
#if HAVE_M_FORMAT #if HAVE_M_FORMAT && !ENABLE_COVERITY_BUILD
#define CMD_SYSINFO(format, ...) \ #define CMD_SYSINFO(format, ...) \
printf("%s: %d: %s - %m - " format "\n", __FILE__, __LINE__, __func__, \ printf("%s: %d: %s - %m - " format "\n", __FILE__, __LINE__, __func__, \
##__VA_ARGS__); ##__VA_ARGS__);
@ -559,7 +563,8 @@ __lxc_unused static inline void LXC_##LEVEL(struct lxc_log_locinfo* locinfo, \
extern int lxc_log_fd; extern int lxc_log_fd;
extern int lxc_log_syslog(int facility); extern int lxc_log_syslog(int facility);
extern void lxc_log_enable_syslog(void); extern void lxc_log_syslog_enable(void);
extern void lxc_log_syslog_disable(void);
extern int lxc_log_set_level(int *dest, int level); extern int lxc_log_set_level(int *dest, int level);
extern int lxc_log_get_level(void); extern int lxc_log_get_level(void);
extern bool lxc_log_has_valid_level(void); extern bool lxc_log_has_valid_level(void);

View File

@ -19,7 +19,7 @@
#include "log.h" #include "log.h"
#include "lsm.h" #include "lsm.h"
#include "parse.h" #include "parse.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "utils.h" #include "utils.h"
lxc_log_define(apparmor, lsm); lxc_log_define(apparmor, lsm);
@ -121,8 +121,8 @@ static const char AA_PROFILE_BASE[] =
" # deny reads from debugfs\n" " # deny reads from debugfs\n"
" deny /sys/kernel/debug/{,**} rwklx,\n" " deny /sys/kernel/debug/{,**} rwklx,\n"
"\n" "\n"
" # allow paths to be made slave, shared, private or unbindable\n" " # allow paths to be made dependent, shared, private or unbindable\n"
" # FIXME: This currently doesn't work due to the apparmor parser treating those as allowing all mounts.\n" " # TODO: This currently doesn't work due to the apparmor parser treating those as allowing all mounts.\n"
"# mount options=(rw,make-slave) -> **,\n" "# mount options=(rw,make-slave) -> **,\n"
"# mount options=(rw,make-rslave) -> **,\n" "# mount options=(rw,make-rslave) -> **,\n"
"# mount options=(rw,make-shared) -> **,\n" "# mount options=(rw,make-shared) -> **,\n"
@ -132,6 +132,16 @@ static const char AA_PROFILE_BASE[] =
"# mount options=(rw,make-unbindable) -> **,\n" "# mount options=(rw,make-unbindable) -> **,\n"
"# mount options=(rw,make-runbindable) -> **,\n" "# mount options=(rw,make-runbindable) -> **,\n"
"\n" "\n"
"# Allow limited modification of mount propagation\n"
" mount options=(rw,make-slave) -> /,\n"
" mount options=(rw,make-rslave) -> /,\n"
" mount options=(rw,make-shared) -> /,\n"
" mount options=(rw,make-rshared) -> /,\n"
" mount options=(rw,make-private) -> /,\n"
" mount options=(rw,make-rprivate) -> /,\n"
" mount options=(rw,make-unbindable) -> /,\n"
" mount options=(rw,make-runbindable) -> /,\n"
"\n"
" # allow bind-mounts of anything except /proc, /sys and /dev\n" " # allow bind-mounts of anything except /proc, /sys and /dev\n"
" mount options=(rw,bind) /[^spd]*{,/**},\n" " mount options=(rw,bind) /[^spd]*{,/**},\n"
" mount options=(rw,bind) /d[^e]*{,/**},\n" " mount options=(rw,bind) /d[^e]*{,/**},\n"
@ -150,15 +160,18 @@ static const char AA_PROFILE_BASE[] =
" mount options=(rw,bind) /sy[^s]*{,/**},\n" " mount options=(rw,bind) /sy[^s]*{,/**},\n"
" mount options=(rw,bind) /sys?*{,/**},\n" " mount options=(rw,bind) /sys?*{,/**},\n"
"\n" "\n"
" # allow various ro-bind-*re*-mounts\n" " # Allow rbind-mounts of anything except /, /dev, /proc and /sys\n"
" mount options=(ro,remount,bind),\n" " mount options=(rw,rbind) /[^spd]*{,/**},\n"
" mount options=(ro,remount,bind,nosuid),\n" " mount options=(rw,rbind) /d[^e]*{,/**},\n"
" mount options=(ro,remount,bind,noexec),\n" " mount options=(rw,rbind) /de[^v]*{,/**},\n"
" mount options=(ro,remount,bind,nodev),\n" " mount options=(rw,rbind) /dev?*{,/**},\n"
" mount options=(ro,remount,bind,nosuid,noexec),\n" " mount options=(rw,rbind) /p[^r]*{,/**},\n"
" mount options=(ro,remount,bind,noexec,nodev),\n" " mount options=(rw,rbind) /pr[^o]*{,/**},\n"
" mount options=(ro,remount,bind,nodev,nosuid),\n" " mount options=(rw,rbind) /pro[^c]*{,/**},\n"
" mount options=(ro,remount,bind,nosuid,noexec,nodev),\n" " mount options=(rw,rbind) /proc?*{,/**},\n"
" mount options=(rw,rbind) /s[^y]*{,/**},\n"
" mount options=(rw,rbind) /sy[^s]*{,/**},\n"
" mount options=(rw,rbind) /sys?*{,/**},\n"
"\n" "\n"
" # allow moving mounts except for /proc, /sys and /dev\n" " # allow moving mounts except for /proc, /sys and /dev\n"
" mount options=(rw,move) /[^spd]*{,/**},\n" " mount options=(rw,move) /[^spd]*{,/**},\n"
@ -324,12 +337,13 @@ static const char AA_PROFILE_NESTING_BASE[] =
"\n" "\n"
" mount fstype=proc -> /usr/lib/*/lxc/**,\n" " mount fstype=proc -> /usr/lib/*/lxc/**,\n"
" mount fstype=sysfs -> /usr/lib/*/lxc/**,\n" " mount fstype=sysfs -> /usr/lib/*/lxc/**,\n"
" mount options=(rw,bind),\n"
" mount options=(rw,rbind),\n"
" mount options=(rw,make-rshared),\n"
"\n" "\n"
/* FIXME: What's the state here on apparmor's side? */ " # Allow nested LXD\n"
" # there doesn't seem to be a way to ask for:\n" " mount none -> /var/lib/lxd/shmounts/,\n"
" mount /var/lib/lxd/shmounts/ -> /var/lib/lxd/shmounts/,\n"
" mount options=bind /var/lib/lxd/shmounts/** -> /var/lib/lxd/**,\n"
"\n"
" # TODO: There doesn't seem to be a way to ask for:\n"
" # mount options=(ro,nosuid,nodev,noexec,remount,bind),\n" " # mount options=(ro,nosuid,nodev,noexec,remount,bind),\n"
" # as we always get mount to $cdir/proc/sys with those flags denied\n" " # as we always get mount to $cdir/proc/sys with those flags denied\n"
" # So allow all mounts until that is straightened out:\n" " # So allow all mounts until that is straightened out:\n"
@ -524,7 +538,7 @@ static inline char *apparmor_namespace(const char *ctname, const char *lxcpath)
return full; return full;
} }
/* FIXME: This is currently run only in the context of a constructor (via the /* TODO: This is currently run only in the context of a constructor (via the
* initial lsm_init() called due to its __attribute__((constructor)), so we * initial lsm_init() called due to its __attribute__((constructor)), so we
* do not have ERROR/... macros available, so there are some fprintf(stderr)s * do not have ERROR/... macros available, so there are some fprintf(stderr)s
* in there. * in there.
@ -546,7 +560,7 @@ static bool check_apparmor_parser_version()
lxc_pclose(parserpipe); lxc_pclose(parserpipe);
/* We stay silent for now as this most likely means the shell /* We stay silent for now as this most likely means the shell
* lxc_popen executed failed to find the apparmor_parser binary. * lxc_popen executed failed to find the apparmor_parser binary.
* See the FIXME comment above for details. * See the TODO comment above for details.
*/ */
return false; return false;
} }
@ -631,6 +645,86 @@ static bool is_privileged(struct lxc_conf *conf)
return lxc_list_empty(&conf->id_map); return lxc_list_empty(&conf->id_map);
} }
static const char* AA_ALL_DEST_PATH_LIST[] = {
" -> /[^spd]*{,/**},\n",
" -> /d[^e]*{,/**},\n",
" -> /de[^v]*{,/**},\n",
" -> /dev/.[^l]*{,/**},\n",
" -> /dev/.l[^x]*{,/**},\n",
" -> /dev/.lx[^c]*{,/**},\n",
" -> /dev/.lxc?*{,/**},\n",
" -> /dev/[^.]*{,/**},\n",
" -> /dev?*{,/**},\n",
" -> /p[^r]*{,/**},\n",
" -> /pr[^o]*{,/**},\n",
" -> /pro[^c]*{,/**},\n",
" -> /proc?*{,/**},\n",
" -> /s[^y]*{,/**},\n",
" -> /sy[^s]*{,/**},\n",
" -> /sys?*{,/**},\n",
NULL,
};
static const struct mntopt_t {
const char *opt;
size_t len;
} REMOUNT_OPTIONS[] = {
{ ",nodev", sizeof(",nodev")-1 },
{ ",nosuid", sizeof(",nosuid")-1 },
{ ",noexec", sizeof(",noexec")-1 },
};
static void append_remount_rule(char **profile, size_t *size, const char *rule)
{
size_t rule_len = strlen(rule);
for (const char **dest = AA_ALL_DEST_PATH_LIST; *dest; ++dest) {
must_append_sized(profile, size, rule, rule_len);
must_append_sized(profile, size, *dest, strlen(*dest));
}
}
static void append_all_remount_rules(char **profile, size_t *size)
{
/*
* That's 30, and we add at most:
* ",nodev,nosuid,noexec,strictatime -> /dev/.lx[^c]*{,/ **},\ n",
* which is anouther ~58, this s hould be enough:
*/
char buf[128] = " mount options=(ro,remount,bind";
const size_t buf_append_pos = strlen(buf);
const size_t opt_count = ARRAY_SIZE(REMOUNT_OPTIONS);
size_t opt_bits;
must_append_sized(profile, size,
"# allow various ro-bind-*re*mounts\n",
sizeof("# allow various ro-bind-*re*mounts\n")-1);
for (opt_bits = 0; opt_bits != 1 << opt_count; ++opt_bits) {
size_t at = buf_append_pos;
unsigned bit = 1;
size_t o;
for (o = 0; o != opt_count; ++o, bit <<= 1) {
if (opt_bits & bit) {
const struct mntopt_t *opt = &REMOUNT_OPTIONS[o];
memcpy(&buf[at], opt->opt, opt->len);
at += opt->len;
}
}
memcpy(&buf[at], ")", sizeof(")"));
append_remount_rule(profile, size, buf);
/* noatime and strictatime don't go together */
memcpy(&buf[at], ",noatime)", sizeof(",noatime)"));
append_remount_rule(profile, size, buf);
memcpy(&buf[at], ",strictatime)", sizeof(",strictatime)"));
append_remount_rule(profile, size, buf);
}
}
static char *get_apparmor_profile_content(struct lxc_conf *conf, const char *lxcpath) static char *get_apparmor_profile_content(struct lxc_conf *conf, const char *lxcpath)
{ {
char *profile, *profile_name_full; char *profile, *profile_name_full;
@ -648,6 +742,8 @@ static char *get_apparmor_profile_content(struct lxc_conf *conf, const char *lxc
must_append_sized(&profile, &size, AA_PROFILE_BASE, must_append_sized(&profile, &size, AA_PROFILE_BASE,
STRARRAYLEN(AA_PROFILE_BASE)); STRARRAYLEN(AA_PROFILE_BASE));
append_all_remount_rules(&profile, &size);
if (aa_supports_unix) if (aa_supports_unix)
must_append_sized(&profile, &size, AA_PROFILE_UNIX_SOCKETS, must_append_sized(&profile, &size, AA_PROFILE_UNIX_SOCKETS,
STRARRAYLEN(AA_PROFILE_UNIX_SOCKETS)); STRARRAYLEN(AA_PROFILE_UNIX_SOCKETS));

View File

@ -49,7 +49,7 @@
#include "namespace.h" #include "namespace.h"
#include "network.h" #include "network.h"
#include "parse.h" #include "parse.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "start.h" #include "start.h"
#include "state.h" #include "state.h"
#include "storage.h" #include "storage.h"
@ -537,12 +537,12 @@ static bool do_lxcapi_unfreeze(struct lxc_container *c)
WRAP_API(bool, lxcapi_unfreeze) WRAP_API(bool, lxcapi_unfreeze)
static int do_lxcapi_console_getfd(struct lxc_container *c, int *ttynum, int *masterfd) static int do_lxcapi_console_getfd(struct lxc_container *c, int *ttynum, int *ptmxfd)
{ {
if (!c) if (!c)
return -1; return -1;
return lxc_terminal_getfd(c, ttynum, masterfd); return lxc_terminal_getfd(c, ttynum, ptmxfd);
} }
WRAP_API_2(int, lxcapi_console_getfd, int *, int *) WRAP_API_2(int, lxcapi_console_getfd, int *, int *)
@ -830,14 +830,12 @@ static bool wait_on_daemonized_start(struct lxc_handler *handler, int pid)
DEBUG("First child %d exited", pid); DEBUG("First child %d exited", pid);
/* Close write end of the socket pair. */ /* Close write end of the socket pair. */
close(handler->state_socket_pair[1]); close_prot_errno_disarm(handler->state_socket_pair[1]);
handler->state_socket_pair[1] = -1;
state = lxc_rcv_status(handler->state_socket_pair[0]); state = lxc_rcv_status(handler->state_socket_pair[0]);
/* Close read end of the socket pair. */ /* Close read end of the socket pair. */
close(handler->state_socket_pair[0]); close_prot_errno_disarm(handler->state_socket_pair[0]);
handler->state_socket_pair[0] = -1;
if (state < 0) { if (state < 0) {
SYSERROR("Failed to receive the container state"); SYSERROR("Failed to receive the container state");
@ -867,7 +865,6 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
NULL, NULL,
}; };
char **init_cmd = NULL; char **init_cmd = NULL;
int keepfds[3] = {-1, -1, -1};
/* container does exist */ /* container does exist */
if (!c) if (!c)
@ -901,7 +898,7 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
conf = c->lxc_conf; conf = c->lxc_conf;
/* initialize handler */ /* initialize handler */
handler = lxc_init_handler(c->name, conf, c->config_path, c->daemonize); handler = lxc_init_handler(NULL, c->name, conf, c->config_path, c->daemonize);
container_mem_unlock(c); container_mem_unlock(c);
if (!handler) if (!handler)
@ -918,7 +915,7 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
if (!argv) { if (!argv) {
if (useinit) { if (useinit) {
ERROR("No valid init detected"); ERROR("No valid init detected");
lxc_free_handler(handler); lxc_put_handler(handler);
return false; return false;
} }
argv = default_args; argv = default_args;
@ -936,7 +933,7 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
pid_first = fork(); pid_first = fork();
if (pid_first < 0) { if (pid_first < 0) {
free_init_cmd(init_cmd); free_init_cmd(init_cmd);
lxc_free_handler(handler); lxc_put_handler(handler);
return false; return false;
} }
@ -953,7 +950,7 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
started = wait_on_daemonized_start(handler, pid_first); started = wait_on_daemonized_start(handler, pid_first);
free_init_cmd(init_cmd); free_init_cmd(init_cmd);
lxc_free_handler(handler); lxc_put_handler(handler);
return started; return started;
} }
@ -985,7 +982,7 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
/* second parent */ /* second parent */
if (pid_second != 0) { if (pid_second != 0) {
free_init_cmd(init_cmd); free_init_cmd(init_cmd);
lxc_free_handler(handler); lxc_put_handler(handler);
_exit(EXIT_SUCCESS); _exit(EXIT_SUCCESS);
} }
@ -998,11 +995,7 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
_exit(EXIT_FAILURE); _exit(EXIT_FAILURE);
} }
keepfds[0] = handler->conf->maincmd_fd; ret = inherit_fds(handler, true);
keepfds[1] = handler->state_socket_pair[0];
keepfds[2] = handler->state_socket_pair[1];
ret = lxc_check_inherited(conf, true, keepfds,
sizeof(keepfds) / sizeof(keepfds[0]));
if (ret < 0) if (ret < 0)
_exit(EXIT_FAILURE); _exit(EXIT_FAILURE);
@ -1020,7 +1013,7 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
} else if (!am_single_threaded()) { } else if (!am_single_threaded()) {
ERROR("Cannot start non-daemonized container when threaded"); ERROR("Cannot start non-daemonized container when threaded");
free_init_cmd(init_cmd); free_init_cmd(init_cmd);
lxc_free_handler(handler); lxc_put_handler(handler);
return false; return false;
} }
@ -1034,7 +1027,7 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
w = snprintf(pidstr, sizeof(pidstr), "%d", lxc_raw_getpid()); w = snprintf(pidstr, sizeof(pidstr), "%d", lxc_raw_getpid());
if (w < 0 || (size_t)w >= sizeof(pidstr)) { if (w < 0 || (size_t)w >= sizeof(pidstr)) {
free_init_cmd(init_cmd); free_init_cmd(init_cmd);
lxc_free_handler(handler); lxc_put_handler(handler);
SYSERROR("Failed to write monitor pid to \"%s\"", c->pidfile); SYSERROR("Failed to write monitor pid to \"%s\"", c->pidfile);
@ -1047,7 +1040,7 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
ret = lxc_write_to_file(c->pidfile, pidstr, w, false, 0600); ret = lxc_write_to_file(c->pidfile, pidstr, w, false, 0600);
if (ret < 0) { if (ret < 0) {
free_init_cmd(init_cmd); free_init_cmd(init_cmd);
lxc_free_handler(handler); lxc_put_handler(handler);
SYSERROR("Failed to write monitor pid to \"%s\"", c->pidfile); SYSERROR("Failed to write monitor pid to \"%s\"", c->pidfile);
@ -1065,15 +1058,15 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
ret = unshare(CLONE_NEWNS); ret = unshare(CLONE_NEWNS);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to unshare mount namespace"); SYSERROR("Failed to unshare mount namespace");
lxc_free_handler(handler); lxc_put_handler(handler);
ret = 1; ret = 1;
goto on_error; goto on_error;
} }
ret = mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL); ret = mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to make / rslave at startup"); SYSERROR("Failed to recursively turn root mount tree into dependent mount. Continuing...");
lxc_free_handler(handler); lxc_put_handler(handler);
ret = 1; ret = 1;
goto on_error; goto on_error;
} }
@ -1082,20 +1075,16 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
reboot: reboot:
if (conf->reboot == REBOOT_INIT) { if (conf->reboot == REBOOT_INIT) {
/* initialize handler */ /* initialize handler */
handler = lxc_init_handler(c->name, conf, c->config_path, c->daemonize); handler = lxc_init_handler(handler, c->name, conf, c->config_path, c->daemonize);
if (!handler) { if (!handler) {
ret = 1; ret = 1;
goto on_error; goto on_error;
} }
} }
keepfds[0] = handler->conf->maincmd_fd; ret = inherit_fds(handler, c->daemonize);
keepfds[1] = handler->state_socket_pair[0];
keepfds[2] = handler->state_socket_pair[1];
ret = lxc_check_inherited(conf, c->daemonize, keepfds,
sizeof(keepfds) / sizeof(keepfds[0]));
if (ret < 0) { if (ret < 0) {
lxc_free_handler(handler); lxc_put_handler(handler);
ret = 1; ret = 1;
goto on_error; goto on_error;
} }
@ -1196,7 +1185,6 @@ WRAP_API(bool, lxcapi_stop)
static int do_create_container_dir(const char *path, struct lxc_conf *conf) static int do_create_container_dir(const char *path, struct lxc_conf *conf)
{ {
__do_free char *p = NULL;
int lasterr; int lasterr;
int ret = -1; int ret = -1;
@ -1212,10 +1200,8 @@ static int do_create_container_dir(const char *path, struct lxc_conf *conf)
ret = 0; ret = 0;
} }
p = must_copy_string(path);
if (!lxc_list_empty(&conf->id_map)) { if (!lxc_list_empty(&conf->id_map)) {
ret = chown_mapped_root(p, conf); ret = chown_mapped_root(path, conf);
if (ret < 0) if (ret < 0)
ret = -1; ret = -1;
} }
@ -1359,14 +1345,8 @@ static bool create_run_template(struct lxc_container *c, char *tpath,
_exit(EXIT_FAILURE); _exit(EXIT_FAILURE);
} }
ret = detect_shared_rootfs(); if (detect_shared_rootfs() && mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL))
if (ret == 1) { SYSERROR("Failed to recursively turn root mount tree into dependent mount. Continuing...");
ret = mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL);
if (ret < 0) {
SYSERROR("Failed to make \"/\" rslave");
ERROR("Continuing...");
}
}
} }
if (strcmp(bdev->type, "dir") != 0 && strcmp(bdev->type, "btrfs") != 0) { if (strcmp(bdev->type, "dir") != 0 && strcmp(bdev->type, "btrfs") != 0) {
@ -2110,6 +2090,7 @@ static bool do_lxcapi_shutdown(struct lxc_container *c, int timeout)
if (ret < MAX_STATE) if (ret < MAX_STATE)
return false; return false;
}
if (pidfd >= 0) { if (pidfd >= 0) {
struct pollfd pidfd_poll = { struct pollfd pidfd_poll = {
@ -2131,7 +2112,7 @@ static bool do_lxcapi_shutdown(struct lxc_container *c, int timeout)
*/ */
if (timeout != 0) { if (timeout != 0) {
ret = poll(&pidfd_poll, 1, timeout); ret = poll(&pidfd_poll, 1, timeout * 1000);
if (ret < 0 || !(pidfd_poll.revents & POLLIN)) if (ret < 0 || !(pidfd_poll.revents & POLLIN))
return false; return false;
@ -2145,7 +2126,6 @@ static bool do_lxcapi_shutdown(struct lxc_container *c, int timeout)
TRACE("Sent signal %d to pid %d", haltsignal, pid); TRACE("Sent signal %d to pid %d", haltsignal, pid);
} }
}
if (timeout == 0) if (timeout == 0)
return true; return true;
@ -3685,12 +3665,8 @@ static int clone_update_rootfs(struct clone_update_data *data)
return -1; return -1;
} }
if (detect_shared_rootfs()) { if (detect_shared_rootfs() && mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL))
if (mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL)) { SYSERROR("Failed to recursively turn root mount tree into dependent mount. Continuing...");
SYSERROR("Failed to make / rslave");
ERROR("Continuing...");
}
}
if (bdev->ops->mount(bdev) < 0) { if (bdev->ops->mount(bdev) < 0) {
storage_put(bdev); storage_put(bdev);

View File

@ -90,7 +90,7 @@ struct lxc_container {
* \private * \private
* Container configuration. * Container configuration.
* *
* \internal FIXME: do we want the whole lxc_handler? * \internal TODO: do we want the whole lxc_handler?
*/ */
struct lxc_conf *lxc_conf; struct lxc_conf *lxc_conf;
@ -563,7 +563,7 @@ struct lxc_container {
* \param c Container. * \param c Container.
* \param[in,out] ttynum Terminal number to attempt to allocate, * \param[in,out] ttynum Terminal number to attempt to allocate,
* or \c -1 to allocate the first available tty. * or \c -1 to allocate the first available tty.
* \param[out] masterfd File descriptor referring to the master side of the pty. * \param[out] ptmxfd File descriptor referring to the ptmx side of the pty.
* *
* \return tty file descriptor number on success, or \c -1 on * \return tty file descriptor number on success, or \c -1 on
* failure. * failure.
@ -575,7 +575,7 @@ struct lxc_container {
* descriptor when no longer required so that it may be allocated * descriptor when no longer required so that it may be allocated
* by another caller. * by another caller.
*/ */
int (*console_getfd)(struct lxc_container *c, int *ttynum, int *masterfd); int (*console_getfd)(struct lxc_container *c, int *ttynum, int *ptmxfd);
/*! /*!
* \brief Allocate and run a console tty. * \brief Allocate and run a console tty.

View File

@ -57,6 +57,20 @@
#define CAP_SETGID 6 #define CAP_SETGID 6
#endif #endif
/* move_mount */
#ifndef MOVE_MOUNT_F_EMPTY_PATH
#define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */
#endif
/* open_tree */
#ifndef OPEN_TREE_CLONE
#define OPEN_TREE_CLONE 1 /* Clone the target tree and attach the clone */
#endif
#ifndef OPEN_TREE_CLOEXEC
#define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */
#endif
/* prctl */ /* prctl */
#ifndef PR_CAPBSET_READ #ifndef PR_CAPBSET_READ
#define PR_CAPBSET_READ 23 #define PR_CAPBSET_READ 23
@ -419,6 +433,9 @@ enum {
#define PTR_TO_UINT64(p) ((uint64_t)((intptr_t)(p))) #define PTR_TO_UINT64(p) ((uint64_t)((intptr_t)(p)))
#define UINT_TO_PTR(u) ((void *) ((uintptr_t) (u)))
#define PTR_TO_USHORT(p) ((unsigned short)((uintptr_t)(p)))
#define LXC_INVALID_UID ((uid_t)-1) #define LXC_INVALID_UID ((uid_t)-1)
#define LXC_INVALID_GID ((gid_t)-1) #define LXC_INVALID_GID ((gid_t)-1)

View File

@ -59,8 +59,10 @@ int lxc_mainloop(struct lxc_epoll_descr *descr, int timeout_ms)
} }
} }
int lxc_mainloop_add_handler(struct lxc_epoll_descr *descr, int fd, int lxc_mainloop_add_handler_events(struct lxc_epoll_descr *descr, int fd,
lxc_mainloop_callback_t callback, void *data) int events,
lxc_mainloop_callback_t callback,
void *data)
{ {
__do_free struct mainloop_handler *handler = NULL; __do_free struct mainloop_handler *handler = NULL;
__do_free struct lxc_list *item = NULL; __do_free struct lxc_list *item = NULL;
@ -77,7 +79,7 @@ int lxc_mainloop_add_handler(struct lxc_epoll_descr *descr, int fd,
handler->fd = fd; handler->fd = fd;
handler->data = data; handler->data = data;
ev.events = EPOLLIN; ev.events = events;
ev.data.ptr = handler; ev.data.ptr = handler;
if (epoll_ctl(descr->epfd, EPOLL_CTL_ADD, fd, &ev) < 0) if (epoll_ctl(descr->epfd, EPOLL_CTL_ADD, fd, &ev) < 0)
@ -92,6 +94,13 @@ int lxc_mainloop_add_handler(struct lxc_epoll_descr *descr, int fd,
return 0; return 0;
} }
int lxc_mainloop_add_handler(struct lxc_epoll_descr *descr, int fd,
lxc_mainloop_callback_t callback, void *data)
{
return lxc_mainloop_add_handler_events(descr, fd, EPOLLIN, callback,
data);
}
int lxc_mainloop_del_handler(struct lxc_epoll_descr *descr, int fd) int lxc_mainloop_del_handler(struct lxc_epoll_descr *descr, int fd)
{ {
struct mainloop_handler *handler; struct mainloop_handler *handler;

View File

@ -22,6 +22,10 @@ typedef int (*lxc_mainloop_callback_t)(int fd, uint32_t event, void *data,
extern int lxc_mainloop(struct lxc_epoll_descr *descr, int timeout_ms); extern int lxc_mainloop(struct lxc_epoll_descr *descr, int timeout_ms);
extern int lxc_mainloop_add_handler_events(struct lxc_epoll_descr *descr,
int fd, int events,
lxc_mainloop_callback_t callback,
void *data);
extern int lxc_mainloop_add_handler(struct lxc_epoll_descr *descr, int fd, extern int lxc_mainloop_add_handler(struct lxc_epoll_descr *descr, int fd,
lxc_mainloop_callback_t callback, lxc_mainloop_callback_t callback,
void *data); void *data);

View File

@ -44,7 +44,7 @@ define_cleanup_function(DIR *, closedir);
#define free_disarm(ptr) \ #define free_disarm(ptr) \
({ \ ({ \
free(ptr); \ free(ptr); \
move_ptr(ptr); \ ptr = NULL; \
}) })
static inline void free_disarm_function(void *ptr) static inline void free_disarm_function(void *ptr)

View File

@ -21,33 +21,6 @@
lxc_log_define(namespace, lxc); lxc_log_define(namespace, lxc);
/*
* Let's use the "standard stack limit" (i.e. glibc thread size default) for
* stack sizes: 8MB.
*/
#define __LXC_STACK_SIZE (8 * 1024 * 1024)
pid_t lxc_clone(int (*fn)(void *), void *arg, int flags, int *pidfd)
{
pid_t ret;
void *stack;
stack = malloc(__LXC_STACK_SIZE);
if (!stack) {
SYSERROR("Failed to allocate clone stack");
return -ENOMEM;
}
#ifdef __ia64__
ret = __clone2(fn, stack, __LXC_STACK_SIZE, flags | SIGCHLD, arg, pidfd);
#else
ret = clone(fn, stack + __LXC_STACK_SIZE, flags | SIGCHLD, arg, pidfd);
#endif
if (ret < 0)
SYSERROR("Failed to clone (%#x)", flags);
return ret;
}
/* Leave the user namespace at the first position in the array of structs so /* Leave the user namespace at the first position in the array of structs so
* that we always attach to it first when iterating over the struct and using * that we always attach to it first when iterating over the struct and using
* setns() to switch namespaces. This especially affects lxc_attach(): Suppose * setns() to switch namespaces. This especially affects lxc_attach(): Suppose

View File

@ -7,63 +7,6 @@
#include <unistd.h> #include <unistd.h>
#include <sys/syscall.h> #include <sys/syscall.h>
#ifndef CLONE_PARENT_SETTID
#define CLONE_PARENT_SETTID 0x00100000
#endif
#ifndef CLONE_CHILD_CLEARTID
#define CLONE_CHILD_CLEARTID 0x00200000
#endif
#ifndef CLONE_CHILD_SETTID
#define CLONE_CHILD_SETTID 0x01000000
#endif
#ifndef CLONE_VFORK
#define CLONE_VFORK 0x00004000
#endif
#ifndef CLONE_THREAD
#define CLONE_THREAD 0x00010000
#endif
#ifndef CLONE_SETTLS
#define CLONE_SETTLS 0x00080000
#endif
#ifndef CLONE_VM
#define CLONE_VM 0x00000100
#endif
#ifndef CLONE_FILES
#define CLONE_FILES 0x00000400
#endif
#ifndef CLONE_FS
# define CLONE_FS 0x00000200
#endif
#ifndef CLONE_NEWNS
# define CLONE_NEWNS 0x00020000
#endif
#ifndef CLONE_NEWCGROUP
# define CLONE_NEWCGROUP 0x02000000
#endif
#ifndef CLONE_NEWUTS
# define CLONE_NEWUTS 0x04000000
#endif
#ifndef CLONE_NEWIPC
# define CLONE_NEWIPC 0x08000000
#endif
#ifndef CLONE_NEWUSER
# define CLONE_NEWUSER 0x10000000
#endif
#ifndef CLONE_NEWPID
# define CLONE_NEWPID 0x20000000
#endif
#ifndef CLONE_NEWNET
# define CLONE_NEWNET 0x40000000
#endif
enum { enum {
LXC_NS_USER, LXC_NS_USER,
LXC_NS_MNT, LXC_NS_MNT,
@ -82,39 +25,6 @@ extern const struct ns_info {
const char *env_name; const char *env_name;
} ns_info[LXC_NS_MAX]; } ns_info[LXC_NS_MAX];
#if defined(__ia64__)
int __clone2(int (*__fn) (void *__arg), void *__child_stack_base,
size_t __child_stack_size, int __flags, void *__arg, ...);
#else
int clone(int (*fn)(void *), void *child_stack,
int flags, void *arg, ...
/* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );
#endif
/**
* lxc_clone() - create a new process
*
* - allocate stack:
* This function allocates a new stack the size of page and passes it to the
* kernel.
*
* - support all CLONE_*flags:
* This function supports all CLONE_* flags. If in doubt or not sufficiently
* familiar with process creation in the kernel and interactions with libcs
* this function should be used.
*
* - pthread_atfork() handlers depending on libc:
* Whether this function runs pthread_atfork() handlers depends on the
* corresponding libc wrapper. glibc currently does not run pthread_atfork()
* handlers but does not guarantee that they are not. Other libcs might or
* might not run pthread_atfork() handlers. If you require guarantees please
* refer to the lxc_raw_clone*() functions in raw_syscalls.{c,h}.
*
* - should call lxc_raw_getpid():
* The child should use lxc_raw_getpid() to retrieve its pid.
*/
extern pid_t lxc_clone(int (*fn)(void *), void *arg, int flags, int *pidfd);
extern int lxc_namespace_2_cloneflag(const char *namespace); extern int lxc_namespace_2_cloneflag(const char *namespace);
extern int lxc_namespace_2_ns_idx(const char *namespace); extern int lxc_namespace_2_ns_idx(const char *namespace);
extern int lxc_namespace_2_std_identifiers(char *namespaces); extern int lxc_namespace_2_std_identifiers(char *namespaces);

View File

@ -36,7 +36,7 @@
#include "memory_utils.h" #include "memory_utils.h"
#include "network.h" #include "network.h"
#include "nl.h" #include "nl.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "syscall_wrappers.h" #include "syscall_wrappers.h"
#include "utils.h" #include "utils.h"
@ -182,11 +182,6 @@ static int setup_ipv6_addr_routes(struct lxc_list *ip, int ifindex)
return 0; return 0;
} }
struct ip_proxy_args {
const char *ip;
const char *dev;
};
static int lxc_ip_neigh_proxy(__u16 nlmsg_type, int family, int ifindex, void *dest) static int lxc_ip_neigh_proxy(__u16 nlmsg_type, int family, int ifindex, void *dest)
{ {
call_cleaner(nlmsg_free) struct nlmsg *answer = NULL, *nlmsg = NULL; call_cleaner(nlmsg_free) struct nlmsg *answer = NULL, *nlmsg = NULL;
@ -324,11 +319,15 @@ static int instantiate_veth(struct lxc_handler *handler, struct lxc_netdev *netd
} }
if (!is_empty_string(netdev->link) && netdev->priv.veth_attr.mode == VETH_MODE_BRIDGE) { if (!is_empty_string(netdev->link) && netdev->priv.veth_attr.mode == VETH_MODE_BRIDGE) {
if (!lxc_nic_exists(netdev->link)) {
SYSERROR("Failed to attach \"%s\" to bridge \"%s\", bridge interface doesn't exist", veth1, netdev->link);
goto out_delete;
}
err = lxc_bridge_attach(netdev->link, veth1); err = lxc_bridge_attach(netdev->link, veth1);
if (err) { if (err) {
errno = -err; errno = -err;
SYSERROR("Failed to attach \"%s\" to bridge \"%s\"", SYSERROR("Failed to attach \"%s\" to bridge \"%s\"", veth1, netdev->link);
veth1, netdev->link);
goto out_delete; goto out_delete;
} }
INFO("Attached \"%s\" to bridge \"%s\"", veth1, netdev->link); INFO("Attached \"%s\" to bridge \"%s\"", veth1, netdev->link);
@ -483,8 +482,6 @@ static int instantiate_macvlan(struct lxc_handler *handler, struct lxc_netdev *n
} }
strlcpy(netdev->created_name, peer, IFNAMSIZ); strlcpy(netdev->created_name, peer, IFNAMSIZ);
if (is_empty_string(netdev->name))
(void)strlcpy(netdev->name, peer, IFNAMSIZ);
netdev->ifindex = if_nametoindex(peer); netdev->ifindex = if_nametoindex(peer);
if (!netdev->ifindex) { if (!netdev->ifindex) {
@ -534,7 +531,7 @@ on_error:
return -1; return -1;
} }
static int lxc_ipvlan_create(const char *master, const char *name, int mode, int isolation) static int lxc_ipvlan_create(const char *parent, const char *name, int mode, int isolation)
{ {
call_cleaner(nlmsg_free) struct nlmsg *answer = NULL, *nlmsg = NULL; call_cleaner(nlmsg_free) struct nlmsg *answer = NULL, *nlmsg = NULL;
struct nl_handler nlh; struct nl_handler nlh;
@ -543,7 +540,7 @@ static int lxc_ipvlan_create(const char *master, const char *name, int mode, int
struct ifinfomsg *ifi; struct ifinfomsg *ifi;
struct rtattr *nest, *nest2; struct rtattr *nest, *nest2;
len = strlen(master); len = strlen(parent);
if (len == 1 || len >= IFNAMSIZ) if (len == 1 || len >= IFNAMSIZ)
return ret_errno(EINVAL); return ret_errno(EINVAL);
@ -551,13 +548,13 @@ static int lxc_ipvlan_create(const char *master, const char *name, int mode, int
if (len == 1 || len >= IFNAMSIZ) if (len == 1 || len >= IFNAMSIZ)
return ret_errno(EINVAL); return ret_errno(EINVAL);
index = if_nametoindex(master); index = if_nametoindex(parent);
if (!index) if (!index)
return ret_errno(EINVAL); return ret_errno(EINVAL);
err = netlink_open(nlh_ptr, NETLINK_ROUTE); err = netlink_open(nlh_ptr, NETLINK_ROUTE);
if (err) if (err)
return ret_errno(-err); return err;
nlmsg = nlmsg_alloc(NLMSG_GOOD_SIZE); nlmsg = nlmsg_alloc(NLMSG_GOOD_SIZE);
if (!nlmsg) if (!nlmsg)
@ -582,24 +579,21 @@ static int lxc_ipvlan_create(const char *master, const char *name, int mode, int
if (nla_put_string(nlmsg, IFLA_INFO_KIND, "ipvlan")) if (nla_put_string(nlmsg, IFLA_INFO_KIND, "ipvlan"))
return ret_errno(EPROTO); return ret_errno(EPROTO);
if (mode) {
nest2 = nla_begin_nested(nlmsg, IFLA_INFO_DATA); nest2 = nla_begin_nested(nlmsg, IFLA_INFO_DATA);
if (!nest2) if (!nest2)
return ret_errno(EPROTO); return ret_errno(EPROTO);
if (nla_put_u32(nlmsg, IFLA_IPVLAN_MODE, mode)) if (nla_put_u16(nlmsg, IFLA_IPVLAN_MODE, mode))
return ret_errno(EPROTO); return ret_errno(EPROTO);
/* if_link.h does not define the isolation flag value for bridge mode so we define it as 0 /* if_link.h does not define the isolation flag value for bridge mode (unlike IPVLAN_F_PRIVATE and
* and only send mode if mode >0 as default mode is bridge anyway according to ipvlan docs. * IPVLAN_F_VEPA) so we define it as 0 and only send mode if mode >0 as default mode is bridge anyway
* according to ipvlan docs.
*/ */
if (isolation > 0 && if (isolation > 0 && nla_put_u16(nlmsg, IFLA_IPVLAN_ISOLATION, isolation))
nla_put_u16(nlmsg, IFLA_IPVLAN_ISOLATION, isolation))
return ret_errno(EPROTO); return ret_errno(EPROTO);
nla_end_nested(nlmsg, nest2); nla_end_nested(nlmsg, nest2);
}
nla_end_nested(nlmsg, nest); nla_end_nested(nlmsg, nest);
if (nla_put_u32(nlmsg, IFLA_LINK, index)) if (nla_put_u32(nlmsg, IFLA_LINK, index))
@ -637,8 +631,6 @@ static int instantiate_ipvlan(struct lxc_handler *handler, struct lxc_netdev *ne
} }
strlcpy(netdev->created_name, peer, IFNAMSIZ); strlcpy(netdev->created_name, peer, IFNAMSIZ);
if (is_empty_string(netdev->name))
(void)strlcpy(netdev->name, peer, IFNAMSIZ);
netdev->ifindex = if_nametoindex(peer); netdev->ifindex = if_nametoindex(peer);
if (!netdev->ifindex) { if (!netdev->ifindex) {
@ -712,8 +704,6 @@ static int instantiate_vlan(struct lxc_handler *handler, struct lxc_netdev *netd
} }
strlcpy(netdev->created_name, peer, IFNAMSIZ); strlcpy(netdev->created_name, peer, IFNAMSIZ);
if (is_empty_string(netdev->name))
(void)strlcpy(netdev->name, peer, IFNAMSIZ);
netdev->ifindex = if_nametoindex(peer); netdev->ifindex = if_nametoindex(peer);
if (!netdev->ifindex) { if (!netdev->ifindex) {
@ -869,7 +859,7 @@ static instantiate_cb netdev_conf[LXC_NET_MAXCONFTYPE + 1] = {
[LXC_NET_NONE] = instantiate_none, [LXC_NET_NONE] = instantiate_none,
}; };
static int instantiate_ns_veth(struct lxc_netdev *netdev) static int __instantiate_ns_common(struct lxc_netdev *netdev)
{ {
char current_ifname[IFNAMSIZ]; char current_ifname[IFNAMSIZ];
@ -911,33 +901,30 @@ static int instantiate_ns_veth(struct lxc_netdev *netdev)
return 0; return 0;
} }
static int __instantiate_common(struct lxc_netdev *netdev) static int instantiate_ns_veth(struct lxc_netdev *netdev)
{ {
netdev->ifindex = if_nametoindex(netdev->name);
if (!netdev->ifindex)
return log_error_errno(-1, errno, "Failed to retrieve ifindex for network device with name %s", netdev->name);
return 0; return __instantiate_ns_common(netdev);
} }
static int instantiate_ns_macvlan(struct lxc_netdev *netdev) static int instantiate_ns_macvlan(struct lxc_netdev *netdev)
{ {
return __instantiate_common(netdev); return __instantiate_ns_common(netdev);
} }
static int instantiate_ns_ipvlan(struct lxc_netdev *netdev) static int instantiate_ns_ipvlan(struct lxc_netdev *netdev)
{ {
return __instantiate_common(netdev); return __instantiate_ns_common(netdev);
} }
static int instantiate_ns_vlan(struct lxc_netdev *netdev) static int instantiate_ns_vlan(struct lxc_netdev *netdev)
{ {
return __instantiate_common(netdev); return __instantiate_ns_common(netdev);
} }
static int instantiate_ns_phys(struct lxc_netdev *netdev) static int instantiate_ns_phys(struct lxc_netdev *netdev)
{ {
return __instantiate_common(netdev); return __instantiate_ns_common(netdev);
} }
static int instantiate_ns_empty(struct lxc_netdev *netdev) static int instantiate_ns_empty(struct lxc_netdev *netdev)
@ -1749,7 +1736,7 @@ int lxc_veth_create(const char *name1, const char *name2, pid_t pid, unsigned in
} }
/* TODO: merge with lxc_macvlan_create */ /* TODO: merge with lxc_macvlan_create */
int lxc_vlan_create(const char *master, const char *name, unsigned short vlanid) int lxc_vlan_create(const char *parent, const char *name, unsigned short vlanid)
{ {
call_cleaner(nlmsg_free) struct nlmsg *answer = NULL, *nlmsg = NULL; call_cleaner(nlmsg_free) struct nlmsg *answer = NULL, *nlmsg = NULL;
struct nl_handler nlh; struct nl_handler nlh;
@ -1762,7 +1749,7 @@ int lxc_vlan_create(const char *master, const char *name, unsigned short vlanid)
if (err) if (err)
return err; return err;
len = strlen(master); len = strlen(parent);
if (len == 1 || len >= IFNAMSIZ) if (len == 1 || len >= IFNAMSIZ)
return ret_errno(EINVAL); return ret_errno(EINVAL);
@ -1778,7 +1765,7 @@ int lxc_vlan_create(const char *master, const char *name, unsigned short vlanid)
if (!answer) if (!answer)
return ret_errno(ENOMEM); return ret_errno(ENOMEM);
lindex = if_nametoindex(master); lindex = if_nametoindex(parent);
if (!lindex) if (!lindex)
return ret_errno(EINVAL); return ret_errno(EINVAL);
@ -1817,7 +1804,7 @@ int lxc_vlan_create(const char *master, const char *name, unsigned short vlanid)
return netlink_transaction(nlh_ptr, nlmsg, answer); return netlink_transaction(nlh_ptr, nlmsg, answer);
} }
int lxc_macvlan_create(const char *master, const char *name, int mode) int lxc_macvlan_create(const char *parent, const char *name, int mode)
{ {
call_cleaner(nlmsg_free) struct nlmsg *answer = NULL, *nlmsg = NULL; call_cleaner(nlmsg_free) struct nlmsg *answer = NULL, *nlmsg = NULL;
struct nl_handler nlh; struct nl_handler nlh;
@ -1830,7 +1817,7 @@ int lxc_macvlan_create(const char *master, const char *name, int mode)
if (err) if (err)
return err; return err;
len = strlen(master); len = strlen(parent);
if (len == 1 || len >= IFNAMSIZ) if (len == 1 || len >= IFNAMSIZ)
return ret_errno(EINVAL); return ret_errno(EINVAL);
@ -1846,7 +1833,7 @@ int lxc_macvlan_create(const char *master, const char *name, int mode)
if (!answer) if (!answer)
return ret_errno(ENOMEM); return ret_errno(ENOMEM);
index = if_nametoindex(master); index = if_nametoindex(parent);
if (!index) if (!index)
return ret_errno(EINVAL); return ret_errno(EINVAL);
@ -2847,6 +2834,9 @@ bool lxc_delete_network_unpriv(struct lxc_handler *handler)
netdev->ifindex, netdev->link); netdev->ifindex, netdev->link);
ret = netdev_deconf[netdev->type](handler, netdev); ret = netdev_deconf[netdev->type](handler, netdev);
if (ret < 0)
WARN("Failed to deconfigure interface with index %d and initial name \"%s\"",
netdev->ifindex, netdev->link);
goto clear_ifindices; goto clear_ifindices;
} }
@ -3120,9 +3110,9 @@ int lxc_network_move_created_netdev_priv(struct lxc_handler *handler)
physname = is_wlan(netdev->link); physname = is_wlan(netdev->link);
if (physname) if (physname)
ret = lxc_netdev_move_wlan(physname, netdev->link, pid, netdev->name); ret = lxc_netdev_move_wlan(physname, netdev->link, pid, NULL);
else else
ret = lxc_netdev_move_by_index(netdev->ifindex, pid, netdev->name); ret = lxc_netdev_move_by_index(netdev->ifindex, pid, NULL);
if (ret) if (ret)
return log_error_errno(-1, -ret, "Failed to move network device \"%s\" with ifindex %d to network namespace %d", return log_error_errno(-1, -ret, "Failed to move network device \"%s\" with ifindex %d to network namespace %d",
netdev->created_name, netdev->created_name,
@ -3229,6 +3219,9 @@ bool lxc_delete_network_priv(struct lxc_handler *handler)
} }
ret = netdev_deconf[netdev->type](handler, netdev); ret = netdev_deconf[netdev->type](handler, netdev);
if (ret < 0)
WARN("Failed to deconfigure interface with index %d and initial name \"%s\"",
netdev->ifindex, netdev->link);
goto clear_ifindices; goto clear_ifindices;
} }

View File

@ -205,8 +205,8 @@ extern int lxc_netdev_set_mtu(const char *name, int mtu);
/* Create a virtual network devices. */ /* Create a virtual network devices. */
extern int lxc_veth_create(const char *name1, const char *name2, pid_t pid, extern int lxc_veth_create(const char *name1, const char *name2, pid_t pid,
unsigned int mtu); unsigned int mtu);
extern int lxc_macvlan_create(const char *master, const char *name, int mode); extern int lxc_macvlan_create(const char *parent, const char *name, int mode);
extern int lxc_vlan_create(const char *master, const char *name, extern int lxc_vlan_create(const char *parent, const char *name,
unsigned short vid); unsigned short vid);
/* Set ip address. */ /* Set ip address. */

View File

@ -13,15 +13,12 @@
#include "compiler.h" #include "compiler.h"
#include "config.h" #include "config.h"
#include "log.h"
#include "macro.h" #include "macro.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "syscall_numbers.h" #include "syscall_numbers.h"
int lxc_raw_execveat(int dirfd, const char *pathname, char *const argv[], lxc_log_define(process_utils, lxc);
char *const envp[], int flags)
{
return syscall(__NR_execveat, dirfd, pathname, argv, envp, flags);
}
/* /*
* This is based on raw_clone in systemd but adapted to our needs. This uses * This is based on raw_clone in systemd but adapted to our needs. This uses
@ -31,16 +28,8 @@ int lxc_raw_execveat(int dirfd, const char *pathname, char *const argv[],
* The nice thing about this is that we get fork() behavior. That is * The nice thing about this is that we get fork() behavior. That is
* lxc_raw_clone() returns 0 in the child and the child pid in the parent. * lxc_raw_clone() returns 0 in the child and the child pid in the parent.
*/ */
__returns_twice pid_t lxc_raw_clone(unsigned long flags, int *pidfd) __returns_twice static pid_t __lxc_raw_clone(unsigned long flags, int *pidfd)
{ {
/*
* These flags don't interest at all so we don't jump through any hoops
* of retrieving them and passing them to the kernel.
*/
errno = EINVAL;
if ((flags & (CLONE_VM | CLONE_PARENT_SETTID | CLONE_CHILD_SETTID |
CLONE_CHILD_CLEARTID | CLONE_SETTLS)))
return -EINVAL;
#if defined(__s390x__) || defined(__s390__) || defined(__CRIS__) #if defined(__s390x__) || defined(__s390__) || defined(__CRIS__)
/* On s390/s390x and cris the order of the first and second arguments /* On s390/s390x and cris the order of the first and second arguments
@ -100,6 +89,31 @@ __returns_twice pid_t lxc_raw_clone(unsigned long flags, int *pidfd)
#endif #endif
} }
__returns_twice pid_t lxc_raw_clone(unsigned long flags, int *pidfd)
{
pid_t pid;
struct lxc_clone_args args = {
.flags = flags,
.pidfd = ptr_to_u64(pidfd),
};
if (flags & (CLONE_VM | CLONE_PARENT_SETTID | CLONE_CHILD_SETTID |
CLONE_CHILD_CLEARTID | CLONE_SETTLS))
return ret_errno(EINVAL);
/* On CLONE_PARENT we inherit the parent's exit signal. */
if (!(flags & CLONE_PARENT))
args.exit_signal = SIGCHLD;
pid = lxc_clone3(&args, CLONE_ARGS_SIZE_VER0);
if (pid < 0 && errno == ENOSYS) {
SYSTRACE("Falling back to legacy clone");
return __lxc_raw_clone(flags, pidfd);
}
return pid;
}
pid_t lxc_raw_clone_cb(int (*fn)(void *), void *args, unsigned long flags, pid_t lxc_raw_clone_cb(int (*fn)(void *), void *args, unsigned long flags,
int *pidfd) int *pidfd)
{ {
@ -124,3 +138,30 @@ int lxc_raw_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
{ {
return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags); return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
} }
/*
* Let's use the "standard stack limit" (i.e. glibc thread size default) for
* stack sizes: 8MB.
*/
#define __LXC_STACK_SIZE (8 * 1024 * 1024)
pid_t lxc_clone(int (*fn)(void *), void *arg, int flags, int *pidfd)
{
pid_t ret;
void *stack;
stack = malloc(__LXC_STACK_SIZE);
if (!stack) {
SYSERROR("Failed to allocate clone stack");
return -ENOMEM;
}
#ifdef __ia64__
ret = __clone2(fn, stack, __LXC_STACK_SIZE, flags | SIGCHLD, arg, pidfd);
#else
ret = clone(fn, stack + __LXC_STACK_SIZE, flags | SIGCHLD, arg, pidfd);
#endif
if (ret < 0)
SYSERROR("Failed to clone (%#x)", flags);
return ret;
}

290
src/lxc/process_utils.h Normal file
View File

@ -0,0 +1,290 @@
/* SPDX-License-Identifier: LGPL-2.1+ */
#ifndef __LXC_PROCESS_UTILS_H
#define __LXC_PROCESS_UTILS_H
#ifndef _GNU_SOURCE
#define _GNU_SOURCE 1
#endif
#include <linux/sched.h>
#include <sched.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <unistd.h>
#include "compiler.h"
#include "config.h"
#include "syscall_numbers.h"
#ifndef CSIGNAL
#define CSIGNAL 0x000000ff /* signal mask to be sent at exit */
#endif
#ifndef CLONE_VM
#define CLONE_VM 0x00000100 /* set if VM shared between processes */
#endif
#ifndef CLONE_FS
#define CLONE_FS 0x00000200 /* set if fs info shared between processes */
#endif
#ifndef CLONE_FILES
#define CLONE_FILES 0x00000400 /* set if open files shared between processes */
#endif
#ifndef CLONE_SIGHAND
#define CLONE_SIGHAND 0x00000800 /* set if signal handlers and blocked signals shared */
#endif
#ifndef CLONE_PIDFD
#define CLONE_PIDFD 0x00001000 /* set if a pidfd should be placed in parent */
#endif
#ifndef CLONE_PTRACE
#define CLONE_PTRACE 0x00002000 /* set if we want to let tracing continue on the child too */
#endif
#ifndef CLONE_VFORK
#define CLONE_VFORK 0x00004000 /* set if the parent wants the child to wake it up on mm_release */
#endif
#ifndef CLONE_PARENT
#define CLONE_PARENT 0x00008000 /* set if we want to have the same parent as the cloner */
#endif
#ifndef CLONE_THREAD
#define CLONE_THREAD 0x00010000 /* Same thread group? */
#endif
#ifndef CLONE_NEWNS
#define CLONE_NEWNS 0x00020000 /* New mount namespace group */
#endif
#ifndef CLONE_SYSVSEM
#define CLONE_SYSVSEM 0x00040000 /* share system V SEM_UNDO semantics */
#endif
#ifndef CLONE_SETTLS
#define CLONE_SETTLS 0x00080000 /* create a new TLS for the child */
#endif
#ifndef CLONE_PARENT_SETTID
#define CLONE_PARENT_SETTID 0x00100000 /* set the TID in the parent */
#endif
#ifndef CLONE_CHILD_CLEARTID
#define CLONE_CHILD_CLEARTID 0x00200000 /* clear the TID in the child */
#endif
#ifndef CLONE_DETACHED
#define CLONE_DETACHED 0x00400000 /* Unused, ignored */
#endif
#ifndef CLONE_UNTRACED
#define CLONE_UNTRACED 0x00800000 /* set if the tracing process can't force CLONE_PTRACE on this clone */
#endif
#ifndef CLONE_CHILD_SETTID
#define CLONE_CHILD_SETTID 0x01000000 /* set the TID in the child */
#endif
#ifndef CLONE_NEWCGROUP
#define CLONE_NEWCGROUP 0x02000000 /* New cgroup namespace */
#endif
#ifndef CLONE_NEWUTS
#define CLONE_NEWUTS 0x04000000 /* New utsname namespace */
#endif
#ifndef CLONE_NEWIPC
#define CLONE_NEWIPC 0x08000000 /* New ipc namespace */
#endif
#ifndef CLONE_NEWUSER
#define CLONE_NEWUSER 0x10000000 /* New user namespace */
#endif
#ifndef CLONE_NEWPID
#define CLONE_NEWPID 0x20000000 /* New pid namespace */
#endif
#ifndef CLONE_NEWNET
#define CLONE_NEWNET 0x40000000 /* New network namespace */
#endif
#ifndef CLONE_IO
#define CLONE_IO 0x80000000 /* Clone io context */
#endif
/* Flags for the clone3() syscall. */
#ifndef CLONE_CLEAR_SIGHAND
#define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
#endif
#ifndef CLONE_INTO_CGROUP
#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
#endif
/*
* cloning flags intersect with CSIGNAL so can be used with unshare and clone3
* syscalls only:
*/
#ifndef CLONE_NEWTIME
#define CLONE_NEWTIME 0x00000080 /* New time namespace */
#endif
/* waitid */
#ifndef P_PIDFD
#define P_PIDFD 3
#endif
#ifndef CLONE_ARGS_SIZE_VER0
#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
#endif
#ifndef CLONE_ARGS_SIZE_VER1
#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
#endif
#ifndef CLONE_ARGS_SIZE_VER2
#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
#endif
#ifndef ptr_to_u64
#define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
#endif
#ifndef u64_to_ptr
#define u64_to_ptr(x) ((void *)(uintptr_t)x)
#endif
struct lxc_clone_args {
__aligned_u64 flags;
__aligned_u64 pidfd;
__aligned_u64 child_tid;
__aligned_u64 parent_tid;
__aligned_u64 exit_signal;
__aligned_u64 stack;
__aligned_u64 stack_size;
__aligned_u64 tls;
__aligned_u64 set_tid;
__aligned_u64 set_tid_size;
__aligned_u64 cgroup;
};
__returns_twice static inline pid_t lxc_clone3(struct lxc_clone_args *args, size_t size)
{
return syscall(__NR_clone3, args, size);
}
#if defined(__ia64__)
int __clone2(int (*__fn)(void *__arg), void *__child_stack_base,
size_t __child_stack_size, int __flags, void *__arg, ...);
#else
int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, ...
/* pid_t *ptid, struct user_desc *tls, pid_t *ctid */);
#endif
/**
* lxc_clone() - create a new process
*
* - allocate stack:
* This function allocates a new stack the size of page and passes it to the
* kernel.
*
* - support all CLONE_*flags:
* This function supports all CLONE_* flags. If in doubt or not sufficiently
* familiar with process creation in the kernel and interactions with libcs
* this function should be used.
*
* - pthread_atfork() handlers depending on libc:
* Whether this function runs pthread_atfork() handlers depends on the
* corresponding libc wrapper. glibc currently does not run pthread_atfork()
* handlers but does not guarantee that they are not. Other libcs might or
* might not run pthread_atfork() handlers. If you require guarantees please
* refer to the lxc_raw_clone*() functions in process_utils.{c,h}.
*
* - should call lxc_raw_getpid():
* The child should use lxc_raw_getpid() to retrieve its pid.
*/
extern pid_t lxc_clone(int (*fn)(void *), void *arg, int flags, int *pidfd);
/*
* lxc_raw_clone() - create a new process
*
* - fork() behavior:
* This function returns 0 in the child and > 0 in the parent.
*
* - copy-on-write:
* This function does not allocate a new stack and relies on copy-on-write
* semantics.
*
* - supports subset of ClONE_* flags:
* lxc_raw_clone() intentionally only supports a subset of the flags available
* to the actual system call. Please refer to the implementation what flags
* cannot be used. Also, please don't assume that just because a flag isn't
* explicitly checked for as being unsupported that it is supported. If in
* doubt or not sufficiently familiar with process creation in the kernel and
* interactions with libcs this function should be used.
*
* - no pthread_atfork() handlers:
* This function circumvents - as much as this this is possible - any libc
* wrappers and thus does not run any pthread_atfork() handlers. Make sure
* that this is safe to do in the context you are trying to call this
* function.
*
* - must call lxc_raw_getpid():
* The child must use lxc_raw_getpid() to retrieve its pid.
*/
extern pid_t lxc_raw_clone(unsigned long flags, int *pidfd);
/*
* lxc_raw_clone_cb() - create a new process
*
* - non-fork() behavior:
* Function does return pid of the child or -1 on error. Pass in a callback
* function via the "fn" argument that gets executed in the child process.
* The "args" argument is passed to "fn".
*
* All other comments that apply to lxc_raw_clone() apply to lxc_raw_clone_cb()
* as well.
*/
extern pid_t lxc_raw_clone_cb(int (*fn)(void *), void *args,
unsigned long flags, int *pidfd);
#ifndef HAVE_EXECVEAT
static inline int execveat(int dirfd, const char *pathname, char *const argv[],
char *const envp[], int flags)
{
return syscall(__NR_execveat, dirfd, pathname, argv, envp, flags);
}
#else
extern int execveat(int dirfd, const char *pathname, char *const argv[],
char *const envp[], int flags);
#endif
/*
* Because of older glibc's pid cache (up to 2.25) whenever clone() is called
* the child must must retrieve it's own pid via lxc_raw_getpid().
*/
static inline pid_t lxc_raw_getpid(void)
{
return (pid_t)syscall(SYS_getpid);
}
static inline pid_t lxc_raw_gettid(void)
{
#if __NR_gettid > 0
return syscall(__NR_gettid);
#else
return lxc_raw_getpid();
#endif
}
extern int lxc_raw_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
unsigned int flags);
#endif /* __LXC_PROCESS_UTILS_H */

View File

@ -1,94 +0,0 @@
/* SPDX-License-Identifier: LGPL-2.1+ */
#ifndef __LXC_RAW_SYSCALL_H
#define __LXC_RAW_SYSCALL_H
#ifndef _GNU_SOURCE
#define _GNU_SOURCE 1
#endif
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/syscall.h>
#include <unistd.h>
/* clone */
#ifndef CLONE_PIDFD
#define CLONE_PIDFD 0x00001000
#endif
/* waitid */
#ifndef P_PIDFD
#define P_PIDFD 3
#endif
/*
* lxc_raw_clone() - create a new process
*
* - fork() behavior:
* This function returns 0 in the child and > 0 in the parent.
*
* - copy-on-write:
* This function does not allocate a new stack and relies on copy-on-write
* semantics.
*
* - supports subset of ClONE_* flags:
* lxc_raw_clone() intentionally only supports a subset of the flags available
* to the actual system call. Please refer to the implementation what flags
* cannot be used. Also, please don't assume that just because a flag isn't
* explicitly checked for as being unsupported that it is supported. If in
* doubt or not sufficiently familiar with process creation in the kernel and
* interactions with libcs this function should be used.
*
* - no pthread_atfork() handlers:
* This function circumvents - as much as this this is possible - any libc
* wrappers and thus does not run any pthread_atfork() handlers. Make sure
* that this is safe to do in the context you are trying to call this
* function.
*
* - must call lxc_raw_getpid():
* The child must use lxc_raw_getpid() to retrieve its pid.
*/
extern pid_t lxc_raw_clone(unsigned long flags, int *pidfd);
/*
* lxc_raw_clone_cb() - create a new process
*
* - non-fork() behavior:
* Function does return pid of the child or -1 on error. Pass in a callback
* function via the "fn" argument that gets executed in the child process.
* The "args" argument is passed to "fn".
*
* All other comments that apply to lxc_raw_clone() apply to lxc_raw_clone_cb()
* as well.
*/
extern pid_t lxc_raw_clone_cb(int (*fn)(void *), void *args,
unsigned long flags, int *pidfd);
extern int lxc_raw_execveat(int dirfd, const char *pathname, char *const argv[],
char *const envp[], int flags);
/*
* Because of older glibc's pid cache (up to 2.25) whenever clone() is called
* the child must must retrieve it's own pid via lxc_raw_getpid().
*/
static inline pid_t lxc_raw_getpid(void)
{
return (pid_t)syscall(SYS_getpid);
}
static inline pid_t lxc_raw_gettid(void)
{
#if __NR_gettid > 0
return syscall(__NR_gettid);
#else
return lxc_raw_getpid();
#endif
}
extern int lxc_raw_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
unsigned int flags);
#endif /* __LXC_RAW_SYSCALL_H */

View File

@ -13,7 +13,7 @@
#include "file_utils.h" #include "file_utils.h"
#include "macro.h" #include "macro.h"
#include "memory_utils.h" #include "memory_utils.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "string_utils.h" #include "string_utils.h"
#include "syscall_wrappers.h" #include "syscall_wrappers.h"
@ -143,7 +143,7 @@ static void lxc_rexec_as_memfd(char **argv, char **envp, const char *memfd_name)
if (fcntl(memfd, F_ADD_SEALS, LXC_MEMFD_REXEC_SEALS)) if (fcntl(memfd, F_ADD_SEALS, LXC_MEMFD_REXEC_SEALS))
return; return;
execfd = memfd; execfd = move_fd(memfd);
} else { } else {
char procfd[LXC_PROC_PID_FD_LEN]; char procfd[LXC_PROC_PID_FD_LEN];
@ -169,13 +169,12 @@ extern char **environ;
int lxc_rexec(const char *memfd_name) int lxc_rexec(const char *memfd_name)
{ {
__do_free_string_list char **argv = NULL;
int ret; int ret;
char **argv = NULL;
ret = is_memfd(); ret = is_memfd();
if (ret < 0 && ret == -ENOTRECOVERABLE) { if (ret < 0 && ret == -ENOTRECOVERABLE) {
fprintf(stderr, fprintf(stderr, "%s - Failed to determine whether this is a memfd\n",
"%s - Failed to determine whether this is a memfd\n",
strerror(errno)); strerror(errno));
return -1; return -1;
} else if (ret > 0) { } else if (ret > 0) {
@ -184,8 +183,7 @@ int lxc_rexec(const char *memfd_name)
ret = parse_argv(&argv); ret = parse_argv(&argv);
if (ret < 0) { if (ret < 0) {
fprintf(stderr, fprintf(stderr, "%s - Failed to parse command line parameters\n",
"%s - Failed to parse command line parameters\n",
strerror(errno)); strerror(errno));
return -1; return -1;
} }

View File

@ -1354,6 +1354,7 @@ int seccomp_notify_handler(int fd, uint32_t events, void *data,
char *cookie = conf->seccomp.notifier.cookie; char *cookie = conf->seccomp.notifier.cookie;
uint64_t req_id; uint64_t req_id;
memset(req, 0, sizeof(*req));
ret = seccomp_notify_receive(fd, req); ret = seccomp_notify_receive(fd, req);
if (ret) { if (ret) {
SYSERROR("Failed to read seccomp notification"); SYSERROR("Failed to read seccomp notification");
@ -1478,10 +1479,8 @@ retry:
SYSERROR("Failed to send seccomp notification"); SYSERROR("Failed to send seccomp notification");
out: out:
return 0;
#else
return -ENOSYS;
#endif #endif
return LXC_MAINLOOP_CONTINUE;
} }
void seccomp_conf_init(struct lxc_conf *conf) void seccomp_conf_init(struct lxc_conf *conf)

View File

@ -47,7 +47,7 @@
#include "monitor.h" #include "monitor.h"
#include "namespace.h" #include "namespace.h"
#include "network.h" #include "network.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "start.h" #include "start.h"
#include "storage/storage.h" #include "storage/storage.h"
#include "storage/storage_utils.h" #include "storage/storage_utils.h"
@ -212,6 +212,13 @@ int lxc_check_inherited(struct lxc_conf *conf, bool closeall,
if (conf && conf->close_all_fds) if (conf && conf->close_all_fds)
closeall = true; closeall = true;
/*
* Disable syslog at this point to avoid the above logging
* function to open a new fd and make the check_inherited function
* enter an infinite loop.
*/
lxc_log_syslog_disable();
restart: restart:
dir = opendir("/proc/self/fd"); dir = opendir("/proc/self/fd");
if (!dir) if (!dir)
@ -272,21 +279,24 @@ restart:
#endif #endif
if (closeall) { if (closeall) {
close(fd); if (close(fd))
closedir(dir); SYSINFO("Closed inherited fd %d", fd);
else
INFO("Closed inherited fd %d", fd); INFO("Closed inherited fd %d", fd);
closedir(dir);
goto restart; goto restart;
} }
WARN("Inherited fd %d", fd); WARN("Inherited fd %d", fd);
} }
closedir(dir);
/* Only enable syslog at this point to avoid the above logging function /*
* to open a new fd and make the check_inherited function enter an * Only enable syslog at this point to avoid the above logging
* infinite loop. * function to open a new fd and make the check_inherited function
* enter an infinite loop.
*/ */
lxc_log_enable_syslog(); lxc_log_syslog_enable();
closedir(dir); /* cannot fail */
return 0; return 0;
} }
@ -335,7 +345,7 @@ static int signal_handler(int fd, uint32_t events, void *data,
return log_error(LXC_MAINLOOP_ERROR, "Failed to read signal info from signal file descriptor %d", fd); return log_error(LXC_MAINLOOP_ERROR, "Failed to read signal info from signal file descriptor %d", fd);
if (ret != sizeof(siginfo)) if (ret != sizeof(siginfo))
return log_error(-EINVAL, "Unexpected size for struct signalfd_siginfo"); return log_error(LXC_MAINLOOP_ERROR, "Unexpected size for struct signalfd_siginfo");
/* Check whether init is running. */ /* Check whether init is running. */
info.si_pid = 0; info.si_pid = 0;
@ -605,32 +615,7 @@ out_sigfd:
return ret; return ret;
} }
void lxc_zero_handler(struct lxc_handler *handler) void lxc_put_handler(struct lxc_handler *handler)
{
memset(handler, 0, sizeof(struct lxc_handler));
handler->state = STOPPED;
handler->pinfd = -EBADF;
handler->pidfd = -EBADF;
handler->sigfd = -EBADF;
for (int i = 0; i < LXC_NS_MAX; i++)
handler->nsfd[i] = -EBADF;
handler->data_sock[0] = -EBADF;
handler->data_sock[1] = -EBADF;
handler->state_socket_pair[0] = -EBADF;
handler->state_socket_pair[1] = -EBADF;
handler->sync_sock[0] = -EBADF;
handler->sync_sock[1] = -EBADF;
}
void lxc_free_handler(struct lxc_handler *handler)
{ {
close_prot_errno_disarm(handler->pinfd); close_prot_errno_disarm(handler->pinfd);
close_prot_errno_disarm(handler->pidfd); close_prot_errno_disarm(handler->pidfd);
@ -642,22 +627,27 @@ void lxc_free_handler(struct lxc_handler *handler)
close_prot_errno_disarm(handler->state_socket_pair[0]); close_prot_errno_disarm(handler->state_socket_pair[0]);
close_prot_errno_disarm(handler->state_socket_pair[1]); close_prot_errno_disarm(handler->state_socket_pair[1]);
cgroup_exit(handler->cgroup_ops); cgroup_exit(handler->cgroup_ops);
handler->conf = NULL; if (handler->conf && handler->conf->reboot == REBOOT_NONE)
free_disarm(handler); free_disarm(handler);
else
handler->conf = NULL;
} }
struct lxc_handler *lxc_init_handler(const char *name, struct lxc_conf *conf, struct lxc_handler *lxc_init_handler(struct lxc_handler *old,
const char *name, struct lxc_conf *conf,
const char *lxcpath, bool daemonize) const char *lxcpath, bool daemonize)
{ {
int nr_keep_fds = 0;
int ret; int ret;
struct lxc_handler *handler; struct lxc_handler *handler;
handler = malloc(sizeof(*handler)); if (!old)
handler = zalloc(sizeof(*handler));
else
handler = old;
if (!handler) if (!handler)
return NULL; return NULL;
memset(handler, 0, sizeof(*handler));
/* Note that am_guest_unpriv() checks the effective uid. We /* Note that am_guest_unpriv() checks the effective uid. We
* probably don't care if we are real root only if we are running * probably don't care if we are real root only if we are running
* as root so this should be fine. * as root so this should be fine.
@ -701,6 +691,8 @@ struct lxc_handler *lxc_init_handler(const char *name, struct lxc_conf *conf,
TRACE("Created anonymous pair {%d,%d} of unix sockets", TRACE("Created anonymous pair {%d,%d} of unix sockets",
handler->state_socket_pair[0], handler->state_socket_pair[0],
handler->state_socket_pair[1]); handler->state_socket_pair[1]);
handler->keep_fds[nr_keep_fds++] = handler->state_socket_pair[0];
handler->keep_fds[nr_keep_fds++] = handler->state_socket_pair[1];
} }
if (handler->conf->reboot == REBOOT_NONE) { if (handler->conf->reboot == REBOOT_NONE) {
@ -709,6 +701,7 @@ struct lxc_handler *lxc_init_handler(const char *name, struct lxc_conf *conf,
ERROR("Failed to set up command socket"); ERROR("Failed to set up command socket");
goto on_error; goto on_error;
} }
handler->keep_fds[nr_keep_fds++] = handler->conf->maincmd_fd;
} }
TRACE("Unix domain socket %d for command server is ready", TRACE("Unix domain socket %d for command server is ready",
@ -717,7 +710,7 @@ struct lxc_handler *lxc_init_handler(const char *name, struct lxc_conf *conf,
return handler; return handler;
on_error: on_error:
lxc_free_handler(handler); lxc_put_handler(handler);
return NULL; return NULL;
} }
@ -1008,7 +1001,7 @@ void lxc_end(struct lxc_handler *handler)
if (handler->conf->ephemeral == 1 && handler->conf->reboot != REBOOT_REQ) if (handler->conf->ephemeral == 1 && handler->conf->reboot != REBOOT_REQ)
lxc_destroy_container_on_signal(handler, name); lxc_destroy_container_on_signal(handler, name);
lxc_free_handler(handler); lxc_put_handler(handler);
} }
void lxc_abort(struct lxc_handler *handler) void lxc_abort(struct lxc_handler *handler)
@ -1039,14 +1032,13 @@ static int do_start(void *data)
struct lxc_handler *handler = data; struct lxc_handler *handler = data;
__lxc_unused __do_close int data_sock0 = handler->data_sock[0], __lxc_unused __do_close int data_sock0 = handler->data_sock[0],
data_sock1 = handler->data_sock[1]; data_sock1 = handler->data_sock[1];
__do_close int status_fd = -EBADF; __do_close int devnull_fd = -EBADF, status_fd = -EBADF;
int ret; int ret;
uid_t new_uid; uid_t new_uid;
gid_t new_gid; gid_t new_gid;
struct lxc_list *iterator; struct lxc_list *iterator;
uid_t nsuid = 0; uid_t nsuid = 0;
gid_t nsgid = 0; gid_t nsgid = 0;
int devnull_fd = -1;
lxc_sync_fini_parent(handler); lxc_sync_fini_parent(handler);
@ -1257,14 +1249,14 @@ static int do_start(void *data)
* setup on its console ie. the pty allocated in lxc_terminal_setup() so * setup on its console ie. the pty allocated in lxc_terminal_setup() so
* make sure that that pty is stdin,stdout,stderr. * make sure that that pty is stdin,stdout,stderr.
*/ */
if (handler->conf->console.slave >= 0) { if (handler->conf->console.pts >= 0) {
if (handler->daemonize || !handler->conf->is_execute) if (handler->daemonize || !handler->conf->is_execute)
ret = set_stdfds(handler->conf->console.slave); ret = set_stdfds(handler->conf->console.pts);
else else
ret = lxc_terminal_set_stdfds(handler->conf->console.slave); ret = lxc_terminal_set_stdfds(handler->conf->console.pts);
if (ret < 0) { if (ret < 0) {
ERROR("Failed to redirect std{in,out,err} to pty file descriptor %d", ERROR("Failed to redirect std{in,out,err} to pty file descriptor %d",
handler->conf->console.slave); handler->conf->console.pts);
goto out_warn_father; goto out_warn_father;
} }
} }
@ -1291,7 +1283,7 @@ static int do_start(void *data)
close_prot_errno_disarm(handler->sigfd); close_prot_errno_disarm(handler->sigfd);
if (handler->conf->console.slave < 0 && handler->daemonize) { if (handler->conf->console.pts < 0 && handler->daemonize) {
if (devnull_fd < 0) { if (devnull_fd < 0) {
devnull_fd = open_devnull(); devnull_fd = open_devnull();
if (devnull_fd < 0) if (devnull_fd < 0)
@ -1366,6 +1358,11 @@ static int do_start(void *data)
if (new_gid == nsgid) if (new_gid == nsgid)
new_gid = LXC_INVALID_GID; new_gid = LXC_INVALID_GID;
/* Make sure that the processes STDIO is correctly owned by the user that we are switching to */
ret = fix_stdio_permissions(new_uid);
if (ret)
WARN("Failed to ajust stdio permissions");
/* If we are in a new user namespace we already dropped all groups when /* If we are in a new user namespace we already dropped all groups when
* we switched to root in the new user namespace further above. Only * we switched to root in the new user namespace further above. Only
* drop groups if we can, so ensure that we have necessary privilege. * drop groups if we can, so ensure that we have necessary privilege.
@ -1396,20 +1393,20 @@ static int do_start(void *data)
} }
} }
/* After this call, we are in error because this ops should not return /*
* After this call, we are in error because this ops should not return
* as it execs. * as it execs.
*/ */
handler->ops->start(handler, handler->data); handler->ops->start(handler, handler->data);
out_warn_father: out_warn_father:
/* We want the parent to know something went wrong, so we return a /*
* We want the parent to know something went wrong, so we return a
* special error code. * special error code.
*/ */
lxc_sync_wake_parent(handler, LXC_SYNC_ERROR); lxc_sync_wake_parent(handler, LXC_SYNC_ERROR);
out_error: out_error:
close_prot_errno_disarm(devnull_fd);
return -1; return -1;
} }
@ -1438,9 +1435,9 @@ static int lxc_recv_ttys_from_child(struct lxc_handler *handler)
tty = &ttys->tty[i]; tty = &ttys->tty[i];
tty->busy = -1; tty->busy = -1;
tty->master = ttyfds[0]; tty->ptmx = ttyfds[0];
tty->slave = ttyfds[1]; tty->pts = ttyfds[1];
TRACE("Received pty with master fd %d and slave fd %d from child", tty->master, tty->slave); TRACE("Received pty with ptmx fd %d and pts fd %d from child", tty->ptmx, tty->pts);
} }
if (ret < 0) if (ret < 0)
@ -1703,6 +1700,7 @@ static int lxc_spawn(struct lxc_handler *handler)
} }
if (!cgroup_ops->payload_enter(cgroup_ops, handler)) { if (!cgroup_ops->payload_enter(cgroup_ops, handler)) {
ERROR("Failed to enter cgroups");
goto out_delete_net; goto out_delete_net;
} }
@ -1778,7 +1776,12 @@ static int lxc_spawn(struct lxc_handler *handler)
if (ret < 0) if (ret < 0)
goto out_delete_net; goto out_delete_net;
if (!cgroup_ops->setup_limits_legacy(cgroup_ops, handler->conf, true)) { /*
* with isolation the limiting devices cgroup was already setup, so
* only setup devices here if we have no namespace directory
*/
if (!handler->conf->cgroup_meta.namespace_dir &&
!cgroup_ops->setup_limits_legacy(cgroup_ops, handler->conf, true)) {
ERROR("Failed to setup legacy device cgroup controller limits"); ERROR("Failed to setup legacy device cgroup controller limits");
goto out_delete_net; goto out_delete_net;
} }
@ -1932,7 +1935,7 @@ int __lxc_start(struct lxc_handler *handler, struct lxc_operations *ops,
} }
INFO("Unshared CLONE_NEWNS"); INFO("Unshared CLONE_NEWNS");
remount_all_slave(); turn_into_dependent_mounts();
ret = lxc_setup_rootfs_prepare_root(conf, name, lxcpath); ret = lxc_setup_rootfs_prepare_root(conf, name, lxcpath);
if (ret < 0) { if (ret < 0) {
ERROR("Error setting up rootfs mount as root before spawn"); ERROR("Error setting up rootfs mount as root before spawn");

View File

@ -10,6 +10,7 @@
#include <sys/un.h> #include <sys/un.h>
#include "conf.h" #include "conf.h"
#include "macro.h"
#include "namespace.h" #include "namespace.h"
#include "state.h" #include "state.h"
@ -122,6 +123,9 @@ struct lxc_handler {
int exit_status; int exit_status;
struct cgroup_ops *cgroup_ops; struct cgroup_ops *cgroup_ops;
/* Internal fds that always need to stay open. */
int keep_fds[3];
}; };
struct execute_args { struct execute_args {
@ -143,12 +147,11 @@ extern int lxc_serve_state_clients(const char *name,
struct lxc_handler *handler, struct lxc_handler *handler,
lxc_state_t state); lxc_state_t state);
extern void lxc_abort(struct lxc_handler *handler); extern void lxc_abort(struct lxc_handler *handler);
extern struct lxc_handler *lxc_init_handler(const char *name, extern struct lxc_handler *lxc_init_handler(struct lxc_handler *old,
const char *name,
struct lxc_conf *conf, struct lxc_conf *conf,
const char *lxcpath, const char *lxcpath, bool daemonize);
bool daemonize); extern void lxc_put_handler(struct lxc_handler *handler);
extern void lxc_zero_handler(struct lxc_handler *handler);
extern void lxc_free_handler(struct lxc_handler *handler);
extern int lxc_init(const char *name, struct lxc_handler *handler); extern int lxc_init(const char *name, struct lxc_handler *handler);
extern void lxc_end(struct lxc_handler *handler); extern void lxc_end(struct lxc_handler *handler);
@ -161,6 +164,11 @@ extern void lxc_end(struct lxc_handler *handler);
*/ */
extern int lxc_check_inherited(struct lxc_conf *conf, bool closeall, extern int lxc_check_inherited(struct lxc_conf *conf, bool closeall,
int *fds_to_ignore, size_t len_fds); int *fds_to_ignore, size_t len_fds);
static inline int inherit_fds(struct lxc_handler *handler, bool closeall)
{
return lxc_check_inherited(handler->conf, closeall, handler->keep_fds,
ARRAY_SIZE(handler->keep_fds));
}
extern int __lxc_start(struct lxc_handler *, struct lxc_operations *, void *, extern int __lxc_start(struct lxc_handler *, struct lxc_operations *, void *,
const char *, bool, int *); const char *, bool, int *);

View File

@ -192,8 +192,8 @@ bool btrfs_detect(const char *path)
int btrfs_mount(struct lxc_storage *bdev) int btrfs_mount(struct lxc_storage *bdev)
{ {
unsigned long mntflags; unsigned long mntflags = 0;
char *mntdata; char *mntdata = NULL;
const char *src; const char *src;
int ret; int ret;
@ -730,7 +730,7 @@ static bool do_remove_btrfs_children(struct my_btrfs_tree *tree, u64 root_id,
return true; return true;
} }
static int btrfs_recursive_destroy(const char *path) static int btrfs_lxc_rm_rf(const char *path)
{ {
u64 root_id; u64 root_id;
int fd; int fd;
@ -893,7 +893,7 @@ bool btrfs_try_remove_subvol(const char *path)
if (!btrfs_detect(path)) if (!btrfs_detect(path))
return false; return false;
return btrfs_recursive_destroy(path) == 0; return btrfs_lxc_rm_rf(path) == 0;
} }
int btrfs_destroy(struct lxc_storage *orig) int btrfs_destroy(struct lxc_storage *orig)
@ -902,7 +902,7 @@ int btrfs_destroy(struct lxc_storage *orig)
src = lxc_storage_get_path(orig->src, "btrfs"); src = lxc_storage_get_path(orig->src, "btrfs");
return btrfs_recursive_destroy(src); return btrfs_lxc_rm_rf(src);
} }
int btrfs_create(struct lxc_storage *bdev, const char *dest, const char *n, int btrfs_create(struct lxc_storage *bdev, const char *dest, const char *n,

View File

@ -4,16 +4,20 @@
#define _GNU_SOURCE 1 #define _GNU_SOURCE 1
#endif #endif
#include <stdint.h> #include <stdint.h>
#include <stdio.h>
#include <string.h> #include <string.h>
#include "config.h" #include "config.h"
#include "log.h" #include "log.h"
#include "macro.h"
#include "memory_utils.h"
#include "storage.h" #include "storage.h"
#include "utils.h" #include "utils.h"
lxc_log_define(dir, lxc); lxc_log_define(dir, lxc);
/* For a simple directory bind mount, we substitute the old container name and /*
* For a simple directory bind mount, we substitute the old container name and
* paths for the new. * paths for the new.
*/ */
int dir_clonepaths(struct lxc_storage *orig, struct lxc_storage *new, int dir_clonepaths(struct lxc_storage *orig, struct lxc_storage *new,
@ -25,33 +29,26 @@ int dir_clonepaths(struct lxc_storage *orig, struct lxc_storage *new,
int ret; int ret;
size_t len; size_t len;
if (snap) { if (snap)
ERROR("Directories cannot be snapshotted"); return log_error_errno(-EINVAL, EINVAL, "Directories cannot be snapshotted");
return -1;
}
if (!orig->dest || !orig->src) if (!orig->dest || !orig->src)
return -1; return ret_errno(EINVAL);
len = strlen(lxcpath) + strlen(cname) + strlen("rootfs") + 4 + 3; len = STRLITERALLEN("dir:") + strlen(lxcpath) + STRLITERALLEN("/") +
strlen(cname) + STRLITERALLEN("/rootfs") + 1;
new->src = malloc(len); new->src = malloc(len);
if (!new->src) { if (!new->src)
ERROR("Failed to allocate memory"); return ret_errno(ENOMEM);
return -1;
}
ret = snprintf(new->src, len, "dir:%s/%s/rootfs", lxcpath, cname); ret = snprintf(new->src, len, "dir:%s/%s/rootfs", lxcpath, cname);
if (ret < 0 || (size_t)ret >= len) { if (ret < 0 || (size_t)ret >= len)
ERROR("Failed to create string"); return ret_errno(EIO);
return -1;
}
src_no_prefix = lxc_storage_get_path(new->src, new->type); src_no_prefix = lxc_storage_get_path(new->src, new->type);
new->dest = strdup(src_no_prefix); new->dest = strdup(src_no_prefix);
if (!new->dest) { if (!new->dest)
ERROR("Failed to duplicate string \"%s\"", new->src); return log_error_errno(-ENOMEM, ENOMEM, "Failed to duplicate string \"%s\"", new->src);
return -1;
}
TRACE("Created new path \"%s\" for dir storage driver", new->dest); TRACE("Created new path \"%s\" for dir storage driver", new->dest);
return 0; return 0;
@ -60,42 +57,37 @@ int dir_clonepaths(struct lxc_storage *orig, struct lxc_storage *new,
int dir_create(struct lxc_storage *bdev, const char *dest, const char *n, int dir_create(struct lxc_storage *bdev, const char *dest, const char *n,
struct bdev_specs *specs, const struct lxc_conf *conf) struct bdev_specs *specs, const struct lxc_conf *conf)
{ {
__do_free char *bdev_src = NULL, *bdev_dest = NULL;
int ret; int ret;
const char *src; const char *src;
size_t len; size_t len;
/* strlen("dir:") */ len = STRLITERALLEN("dir:");
len = 4;
if (specs && specs->dir) if (specs && specs->dir)
src = specs->dir; src = specs->dir;
else else
src = dest; src = dest;
len += strlen(src) + 1; len += strlen(src) + 1;
bdev->src = malloc(len); bdev_src = malloc(len);
if (!bdev->src) { if (!bdev_src)
ERROR("Failed to allocate memory"); return ret_errno(ENOMEM);
return -1;
}
ret = snprintf(bdev->src, len, "dir:%s", src); ret = snprintf(bdev_src, len, "dir:%s", src);
if (ret < 0 || (size_t)ret >= len) { if (ret < 0 || (size_t)ret >= len)
ERROR("Failed to create string"); return ret_errno(EIO);
return -1;
}
bdev->dest = strdup(dest); bdev_dest = strdup(dest);
if (!bdev->dest) { if (!bdev_dest)
ERROR("Failed to duplicate string \"%s\"", dest); return ret_errno(ENOMEM);
return -1;
}
ret = mkdir_p(dest, 0755); ret = mkdir_p(dest, 0755);
if (ret < 0) { if (ret < 0)
ERROR("Failed to create directory \"%s\"", dest); return log_error_errno(-errno, errno, "Failed to create directory \"%s\"", dest);
return -1;
}
TRACE("Created directory \"%s\"", dest); TRACE("Created directory \"%s\"", dest);
bdev->src = move_ptr(bdev_src);
bdev->dest = move_ptr(bdev_dest);
return 0; return 0;
} }
@ -108,10 +100,8 @@ int dir_destroy(struct lxc_storage *orig)
src = lxc_storage_get_path(orig->src, orig->src); src = lxc_storage_get_path(orig->src, orig->src);
ret = lxc_rmdir_onedev(src, NULL); ret = lxc_rmdir_onedev(src, NULL);
if (ret < 0) { if (ret < 0)
ERROR("Failed to delete \"%s\"", src); return log_error_errno(ret, errno, "Failed to delete \"%s\"", src);
return -1;
}
return 0; return 0;
} }
@ -125,10 +115,8 @@ bool dir_detect(const char *path)
return true; return true;
ret = stat(path, &statbuf); ret = stat(path, &statbuf);
if (ret == -1 && errno == EPERM) { if (ret == -1 && errno == EPERM)
SYSERROR("dir_detect: failed to look at \"%s\"", path); return log_error_errno(false, errno, "dir_detect: failed to look at \"%s\"", path);
return false;
}
if (ret == 0 && S_ISDIR(statbuf.st_mode)) if (ret == 0 && S_ISDIR(statbuf.st_mode))
return true; return true;
@ -138,9 +126,9 @@ bool dir_detect(const char *path)
int dir_mount(struct lxc_storage *bdev) int dir_mount(struct lxc_storage *bdev)
{ {
int ret; __do_free char *mntdata = NULL;
unsigned long mflags = 0, mntflags = 0, pflags = 0; unsigned long mflags = 0, mntflags = 0, pflags = 0;
char *mntdata; int ret;
const char *src; const char *src;
if (strcmp(bdev->type, "dir")) if (strcmp(bdev->type, "dir"))
@ -150,47 +138,42 @@ int dir_mount(struct lxc_storage *bdev)
return -22; return -22;
ret = parse_mntopts(bdev->mntopts, &mntflags, &mntdata); ret = parse_mntopts(bdev->mntopts, &mntflags, &mntdata);
if (ret < 0) { if (ret < 0)
ERROR("Failed to parse mount options \"%s\"", bdev->mntopts); return log_error_errno(ret, errno, "Failed to parse mount options \"%s\"", bdev->mntopts);
free(mntdata);
return -EINVAL;
}
ret = parse_propagationopts(bdev->mntopts, &pflags); ret = parse_propagationopts(bdev->mntopts, &pflags);
if (ret < 0) { if (ret < 0)
ERROR("Failed to parse propagation options \"%s\"", bdev->mntopts); return log_error_errno(-EINVAL, EINVAL, "Failed to parse mount propagation options \"%s\"", bdev->mntopts);
free(mntdata);
return -EINVAL;
}
src = lxc_storage_get_path(bdev->src, bdev->type); src = lxc_storage_get_path(bdev->src, bdev->type);
ret = mount(src, bdev->dest, "bind", MS_BIND | MS_REC | mntflags | pflags, mntdata); ret = mount(src, bdev->dest, "bind", MS_BIND | MS_REC | mntflags | pflags, mntdata);
if ((0 == ret) && (mntflags & MS_RDONLY)) { if (ret < 0)
DEBUG("Remounting \"%s\" on \"%s\" readonly", return log_error_errno(-errno, errno, "Failed to mount \"%s\" on \"%s\"", src, bdev->dest);
src ? src : "(none)", bdev->dest ? bdev->dest : "(none)");
if (ret == 0 && (mntflags & MS_RDONLY)) {
mflags = add_required_remount_flags(src, bdev->dest, MS_BIND | MS_REC | mntflags | pflags | MS_REMOUNT); mflags = add_required_remount_flags(src, bdev->dest, MS_BIND | MS_REC | mntflags | pflags | MS_REMOUNT);
ret = mount(src, bdev->dest, "bind", mflags, mntdata); ret = mount(src, bdev->dest, "bind", mflags, mntdata);
if (ret < 0)
return log_error_errno(-errno, errno, "Failed to remount \"%s\" on \"%s\" read-only with options \"%s\", mount flags \"%lu\", and propagation flags \"%lu\"",
src ? src : "(none)", bdev->dest ? bdev->dest : "(none)", mntdata, mflags, pflags);
else
DEBUG("Remounted \"%s\" on \"%s\" read-only with options \"%s\", mount flags \"%lu\", and propagation flags \"%lu\"",
src ? src : "(none)", bdev->dest ? bdev->dest : "(none)", mntdata, mflags, pflags);
} }
if (ret < 0) { TRACE("Mounted \"%s\" on \"%s\" with options \"%s\", mount flags \"%lu\", and propagation flags \"%lu\"",
SYSERROR("Failed to mount \"%s\" on \"%s\"", src, bdev->dest); src ? src : "(none)", bdev->dest ? bdev->dest : "(none)", mntdata, mflags, pflags);
free(mntdata); return 0;
return -1;
}
TRACE("Mounted \"%s\" on \"%s\"", src, bdev->dest);
free(mntdata);
return ret;
} }
int dir_umount(struct lxc_storage *bdev) int dir_umount(struct lxc_storage *bdev)
{ {
if (strcmp(bdev->type, "dir")) if (strcmp(bdev->type, "dir"))
return -22; return ret_errno(EINVAL);
if (!bdev->src || !bdev->dest) if (!bdev->src || !bdev->dest)
return -22; return ret_errno(EINVAL);
return umount(bdev->dest); return umount2(bdev->dest, MNT_DETACH);
} }

View File

@ -342,13 +342,12 @@ bool ovl_detect(const char *path)
int ovl_mount(struct lxc_storage *bdev) int ovl_mount(struct lxc_storage *bdev)
{ {
__do_free char *options = NULL, __do_free char *options = NULL, *options_work = NULL;
*options_work = NULL; unsigned long mntflags = 0;
char *mntdata = NULL;
char *tmp, *dup, *lower, *upper; char *tmp, *dup, *lower, *upper;
char *work, *lastslash; char *work, *lastslash;
size_t len, len2; size_t len, len2;
unsigned long mntflags;
char *mntdata;
int ret, ret2; int ret, ret2;
if (strcmp(bdev->type, "overlay") && strcmp(bdev->type, "overlayfs")) if (strcmp(bdev->type, "overlay") && strcmp(bdev->type, "overlayfs"))

View File

@ -78,12 +78,8 @@ int lxc_rsync(struct rsync_data *data)
return -1; return -1;
} }
ret = detect_shared_rootfs(); if (detect_shared_rootfs() && mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL))
if (ret) { SYSERROR("Failed to recursively turn root mount tree into dependent mount");
ret = mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL);
if (ret < 0)
SYSERROR("Failed to make \"/\" a slave mount");
}
ret = orig->ops->mount(orig); ret = orig->ops->mount(orig);
if (ret < 0) { if (ret < 0) {

View File

@ -165,11 +165,8 @@ int detect_fs(struct lxc_storage *bdev, char *type, int len)
if (unshare(CLONE_NEWNS) < 0) if (unshare(CLONE_NEWNS) < 0)
_exit(EXIT_FAILURE); _exit(EXIT_FAILURE);
if (detect_shared_rootfs()) if (detect_shared_rootfs() && mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL))
if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL)) { SYSERROR("Failed to recursively turn root mount tree into dependent mount. Continuing...");
SYSERROR("Failed to make / rslave");
ERROR("Continuing...");
}
ret = mount_unknown_fs(srcdev, bdev->dest, bdev->mntopts); ret = mount_unknown_fs(srcdev, bdev->dest, bdev->mntopts);
if (ret < 0) { if (ret < 0) {
@ -315,9 +312,8 @@ int find_fstype_cb(char *buffer, void *data)
const char *target; const char *target;
const char *options; const char *options;
} *cbarg = data; } *cbarg = data;
unsigned long mntflags = 0;
unsigned long mntflags; char *mntdata = NULL;
char *mntdata;
char *fstype; char *fstype;
/* we don't try 'nodev' entries */ /* we don't try 'nodev' entries */

View File

@ -159,11 +159,12 @@ bool zfs_detect(const char *path)
int zfs_mount(struct lxc_storage *bdev) int zfs_mount(struct lxc_storage *bdev)
{ {
__do_free char *mntdata = NULL;
unsigned long mntflags = 0;
int ret; int ret;
size_t oldlen, newlen, totallen; size_t oldlen, newlen, totallen;
char *mntdata, *tmp; char *tmp;
const char *src; const char *src;
unsigned long mntflags;
char cmd_output[PATH_MAX] = {0}; char cmd_output[PATH_MAX] = {0};
if (strcmp(bdev->type, "zfs")) if (strcmp(bdev->type, "zfs"))
@ -175,7 +176,6 @@ int zfs_mount(struct lxc_storage *bdev)
ret = parse_mntopts(bdev->mntopts, &mntflags, &mntdata); ret = parse_mntopts(bdev->mntopts, &mntflags, &mntdata);
if (ret < 0) { if (ret < 0) {
ERROR("Failed to parse mount options"); ERROR("Failed to parse mount options");
free(mntdata);
return -22; return -22;
} }
@ -220,7 +220,6 @@ int zfs_mount(struct lxc_storage *bdev)
tmp = realloc(mntdata, totallen); tmp = realloc(mntdata, totallen);
if (!tmp) { if (!tmp) {
ERROR("Failed to reallocate memory"); ERROR("Failed to reallocate memory");
free(mntdata);
return -1; return -1;
} }
mntdata = tmp; mntdata = tmp;
@ -228,12 +227,10 @@ int zfs_mount(struct lxc_storage *bdev)
ret = snprintf((mntdata + oldlen), newlen, ",zfsutil,mntpoint=%s", src); ret = snprintf((mntdata + oldlen), newlen, ",zfsutil,mntpoint=%s", src);
if (ret < 0 || (size_t)ret >= newlen) { if (ret < 0 || (size_t)ret >= newlen) {
ERROR("Failed to create string"); ERROR("Failed to create string");
free(mntdata);
return -1; return -1;
} }
ret = mount(src, bdev->dest, "zfs", mntflags, mntdata); ret = mount(src, bdev->dest, "zfs", mntflags, mntdata);
free(mntdata);
if (ret < 0 && errno != EBUSY) { if (ret < 0 && errno != EBUSY) {
SYSERROR("Failed to mount \"%s\" on \"%s\"", src, bdev->dest); SYSERROR("Failed to mount \"%s\" on \"%s\"", src, bdev->dest);
return -1; return -1;

View File

@ -10,6 +10,10 @@
#include "initutils.h" #include "initutils.h"
#include "macro.h" #include "macro.h"
#ifndef HAVE_STRLCAT
#include "include/strlcat.h"
#endif
/* convert variadic argument lists to arrays (for execl type argument lists) */ /* convert variadic argument lists to arrays (for execl type argument lists) */
extern char **lxc_va_arg_list_to_argv(va_list ap, size_t skip, int do_strdup); extern char **lxc_va_arg_list_to_argv(va_list ap, size_t skip, int do_strdup);
extern const char **lxc_va_arg_list_to_argv_const(va_list ap, size_t skip); extern const char **lxc_va_arg_list_to_argv_const(va_list ap, size_t skip);
@ -103,4 +107,15 @@ static inline bool is_empty_string(const char *s)
return !s || strcmp(s, "") == 0; return !s || strcmp(s, "") == 0;
} }
static inline ssize_t safe_strlcat(char *src, const char *append, size_t len)
{
size_t new_len;
new_len = strlcat(src, append, len);
if (new_len >= len)
return ret_errno(EINVAL);
return (ssize_t)new_len;
}
#endif /* __LXC_STRING_UTILS_H */ #endif /* __LXC_STRING_UTILS_H */

View File

@ -35,10 +35,12 @@
#define __NR_keyctl 280 #define __NR_keyctl 280
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_keyctl 271 #define __NR_keyctl 271
#elif defined __riscv
#define __NR_keyctl 219
#elif defined __sparc__ #elif defined __sparc__
#define __NR_keyctl 283 #define __NR_keyctl 283
#elif defined __ia64__ #elif defined __ia64__
#define __NR_keyctl 249 #define __NR_keyctl (249 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_keyctl 4282 #define __NR_keyctl 4282
@ -68,6 +70,8 @@
#define __NR_memfd_create 350 #define __NR_memfd_create 350
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_memfd_create 360 #define __NR_memfd_create 360
#elif defined __riscv
#define __NR_memfd_create 279
#elif defined __sparc__ #elif defined __sparc__
#define __NR_memfd_create 348 #define __NR_memfd_create 348
#elif defined __blackfin__ #elif defined __blackfin__
@ -103,10 +107,12 @@
#define __NR_pivot_root 217 #define __NR_pivot_root 217
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_pivot_root 203 #define __NR_pivot_root 203
#elif defined __riscv
#define __NR_pivot_root 41
#elif defined __sparc__ #elif defined __sparc__
#define __NR_pivot_root 146 #define __NR_pivot_root 146
#elif defined __ia64__ #elif defined __ia64__
#define __NR_pivot_root 183 #define __NR_pivot_root (183 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_pivot_root 4216 #define __NR_pivot_root 4216
@ -136,10 +142,12 @@
#define __NR_setns 339 #define __NR_setns 339
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_setns 350 #define __NR_setns 350
#elif defined __riscv
#define __NR_setns 268
#elif defined __sparc__ #elif defined __sparc__
#define __NR_setns 337 #define __NR_setns 337
#elif defined __ia64__ #elif defined __ia64__
#define __NR_setns 306 #define __NR_setns (306 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_setns 4344 #define __NR_setns 4344
@ -169,10 +177,12 @@
#define __NR_sethostname 74 #define __NR_sethostname 74
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_sethostname 74 #define __NR_sethostname 74
#elif defined __riscv
#define __NR_sethostname 161
#elif defined __sparc__ #elif defined __sparc__
#define __NR_sethostname 88 #define __NR_sethostname 88
#elif defined __ia64__ #elif defined __ia64__
#define __NR_sethostname 59 #define __NR_sethostname (59 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_sethostname 474 #define __NR_sethostname 474
@ -202,10 +212,12 @@
#define __NR_signalfd 316 #define __NR_signalfd 316
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_signalfd 305 #define __NR_signalfd 305
#elif defined __riscv
#define __NR_signalfd 74
#elif defined __sparc__ #elif defined __sparc__
#define __NR_signalfd 311 #define __NR_signalfd 311
#elif defined __ia64__ #elif defined __ia64__
#define __NR_signalfd 283 #define __NR_signalfd (283 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_signalfd 4317 #define __NR_signalfd 4317
@ -235,10 +247,12 @@
#define __NR_signalfd4 322 #define __NR_signalfd4 322
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_signalfd4 313 #define __NR_signalfd4 313
#elif defined __riscv
#define __NR_signalfd4 74
#elif defined __sparc__ #elif defined __sparc__
#define __NR_signalfd4 317 #define __NR_signalfd4 317
#elif defined __ia64__ #elif defined __ia64__
#define __NR_signalfd4 289 #define __NR_signalfd4 (289 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_signalfd4 4324 #define __NR_signalfd4 4324
@ -268,10 +282,12 @@
#define __NR_unshare 303 #define __NR_unshare 303
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_unshare 282 #define __NR_unshare 282
#elif defined __riscv
#define __NR_unshare 97
#elif defined __sparc__ #elif defined __sparc__
#define __NR_unshare 299 #define __NR_unshare 299
#elif defined __ia64__ #elif defined __ia64__
#define __NR_unshare 272 #define __NR_unshare (272 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_unshare 4303 #define __NR_unshare 4303
@ -301,10 +317,12 @@
#define __NR_bpf 351 #define __NR_bpf 351
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_bpf 361 #define __NR_bpf 361
#elif defined __riscv
#define __NR_bpf 280
#elif defined __sparc__ #elif defined __sparc__
#define __NR_bpf 349 #define __NR_bpf 349
#elif defined __ia64__ #elif defined __ia64__
#define __NR_bpf 317 #define __NR_bpf (317 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_bpf 4355 #define __NR_bpf 4355
@ -334,10 +352,12 @@
#define __NR_faccessat 300 #define __NR_faccessat 300
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_faccessat 298 #define __NR_faccessat 298
#elif defined __riscv
#define __NR_faccessat 48
#elif defined __sparc__ #elif defined __sparc__
#define __NR_faccessat 296 #define __NR_faccessat 296
#elif defined __ia64__ #elif defined __ia64__
#define __NR_faccessat 269 #define __NR_faccessat (269 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_faccessat 4300 #define __NR_faccessat 4300
@ -367,6 +387,8 @@
#if _MIPS_SIM == _MIPS_SIM_ABI64 /* n64 */ #if _MIPS_SIM == _MIPS_SIM_ABI64 /* n64 */
#define __NR_pidfd_send_signal 5424 #define __NR_pidfd_send_signal 5424
#endif #endif
#elif defined __ia64__
#define __NR_pidfd_send_signal (424 + 1024)
#else #else
#define __NR_pidfd_send_signal 424 #define __NR_pidfd_send_signal 424
#endif #endif
@ -385,10 +407,12 @@
#define __NR_seccomp 348 #define __NR_seccomp 348
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_seccomp 358 #define __NR_seccomp 358
#elif defined __riscv
#define __NR_seccomp 277
#elif defined __sparc__ #elif defined __sparc__
#define __NR_seccomp 346 #define __NR_seccomp 346
#elif defined __ia64__ #elif defined __ia64__
#define __NR_seccomp 329 #define __NR_seccomp (329 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_seccomp 4352 #define __NR_seccomp 4352
@ -418,10 +442,12 @@
#define __NR_gettid 236 #define __NR_gettid 236
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_gettid 207 #define __NR_gettid 207
#elif defined __riscv
#define __NR_gettid 178
#elif defined __sparc__ #elif defined __sparc__
#define __NR_gettid 143 #define __NR_gettid 143
#elif defined __ia64__ #elif defined __ia64__
#define __NR_gettid 81 #define __NR_gettid (81 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_gettid 4222 #define __NR_gettid 4222
@ -455,10 +481,12 @@
#define __NR_execveat 354 #define __NR_execveat 354
#elif defined __powerpc__ #elif defined __powerpc__
#define __NR_execveat 362 #define __NR_execveat 362
#elif defined __riscv
#define __NR_execveat 281
#elif defined __sparc__ #elif defined __sparc__
#define __NR_execveat 350 #define __NR_execveat 350
#elif defined __ia64__ #elif defined __ia64__
#define __NR_execveat 318 #define __NR_execveat (318 + 1024)
#elif defined _MIPS_SIM #elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_execveat 4356 #define __NR_execveat 4356
@ -475,4 +503,64 @@
#endif #endif
#endif #endif
#ifndef __NR_move_mount
#if defined __alpha__
#define __NR_move_mount 539
#elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_move_mount 4429
#endif
#if _MIPS_SIM == _MIPS_SIM_NABI32 /* n32 */
#define __NR_move_mount 6429
#endif
#if _MIPS_SIM == _MIPS_SIM_ABI64 /* n64 */
#define __NR_move_mount 5429
#endif
#elif defined __ia64__
#define __NR_move_mount (428 + 1024)
#else
#define __NR_move_mount 429
#endif
#endif
#ifndef __NR_open_tree
#if defined __alpha__
#define __NR_open_tree 538
#elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_open_tree 4428
#endif
#if _MIPS_SIM == _MIPS_SIM_NABI32 /* n32 */
#define __NR_open_tree 6428
#endif
#if _MIPS_SIM == _MIPS_SIM_ABI64 /* n64 */
#define __NR_open_tree 5428
#endif
#elif defined __ia64__
#define __NR_open_tree (428 + 1024)
#else
#define __NR_open_tree 428
#endif
#endif
#ifndef __NR_clone3
#if defined __alpha__
#define __NR_clone3 545
#elif defined _MIPS_SIM
#if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
#define __NR_clone3 4435
#endif
#if _MIPS_SIM == _MIPS_SIM_NABI32 /* n32 */
#define __NR_clone3 6435
#endif
#if _MIPS_SIM == _MIPS_SIM_ABI64 /* n64 */
#define __NR_clone3 5435
#endif
#elif defined __ia64__
#define __NR_clone3 (435 + 1024)
#else
#define __NR_clone3 435
#endif
#endif
#endif /* __LXC_SYSCALL_NUMBERS_H */ #endif /* __LXC_SYSCALL_NUMBERS_H */

View File

@ -137,4 +137,28 @@ static int faccessat(int __fd, const char *__file, int __type, int __flag)
} }
#endif #endif
#ifndef HAVE_MOVE_MOUNT
static inline int move_mount_lxc(int from_dfd, const char *from_pathname,
int to_dfd, const char *to_pathname,
unsigned int flags)
{
return syscall(__NR_move_mount, from_dfd, from_pathname, to_dfd,
to_pathname, flags);
}
#define move_mount move_mount_lxc
#else
extern int move_mount(int from_dfd, const char *from_pathname, int to_dfd,
const char *to_pathname, unsigned int flags);
#endif
#ifndef HAVE_OPEN_TREE
static inline int open_tree_lxc(int dfd, const char *filename, unsigned int flags)
{
return syscall(__NR_open_tree, dfd, filename, flags);
}
#define open_tree open_tree_lxc
#else
extern int open_tree(int dfd, const char *filename, unsigned int flags);
#endif
#endif /* __LXC_SYSCALL_WRAPPER_H */ #endif /* __LXC_SYSCALL_WRAPPER_H */

View File

@ -65,7 +65,7 @@ void lxc_terminal_winsz(int srcfd, int dstfd)
static void lxc_terminal_winch(struct lxc_terminal_state *ts) static void lxc_terminal_winch(struct lxc_terminal_state *ts)
{ {
lxc_terminal_winsz(ts->stdinfd, ts->masterfd); lxc_terminal_winsz(ts->stdinfd, ts->ptmxfd);
} }
int lxc_terminal_signalfd_cb(int fd, uint32_t events, void *cbdata, int lxc_terminal_signalfd_cb(int fd, uint32_t events, void *cbdata,
@ -105,7 +105,7 @@ struct lxc_terminal_state *lxc_terminal_signal_init(int srcfd, int dstfd)
memset(ts, 0, sizeof(*ts)); memset(ts, 0, sizeof(*ts));
ts->stdinfd = srcfd; ts->stdinfd = srcfd;
ts->masterfd = dstfd; ts->ptmxfd = dstfd;
ts->sigfd = -1; ts->sigfd = -1;
ret = sigemptyset(&mask); ret = sigemptyset(&mask);
@ -330,8 +330,8 @@ int lxc_terminal_io_cb(int fd, uint32_t events, void *data,
INFO("Terminal client on fd %d has exited", fd); INFO("Terminal client on fd %d has exited", fd);
lxc_mainloop_del_handler(descr, fd); lxc_mainloop_del_handler(descr, fd);
if (fd == terminal->master) { if (fd == terminal->ptmx) {
terminal->master = -EBADF; terminal->ptmx = -EBADF;
} else if (fd == terminal->peer) { } else if (fd == terminal->peer) {
lxc_terminal_signal_fini(terminal); lxc_terminal_signal_fini(terminal);
terminal->peer = -EBADF; terminal->peer = -EBADF;
@ -344,10 +344,10 @@ int lxc_terminal_io_cb(int fd, uint32_t events, void *data,
} }
if (fd == terminal->peer) if (fd == terminal->peer)
w = lxc_write_nointr(terminal->master, buf, r); w = lxc_write_nointr(terminal->ptmx, buf, r);
w_rbuf = w_log = 0; w_rbuf = w_log = 0;
if (fd == terminal->master) { if (fd == terminal->ptmx) {
/* write to peer first */ /* write to peer first */
if (terminal->peer >= 0) if (terminal->peer >= 0)
w = lxc_write_nointr(terminal->peer, buf, r); w = lxc_write_nointr(terminal->peer, buf, r);
@ -406,16 +406,16 @@ int lxc_terminal_mainloop_add(struct lxc_epoll_descr *descr,
{ {
int ret; int ret;
if (terminal->master < 0) { if (terminal->ptmx < 0) {
INFO("Terminal is not initialized"); INFO("Terminal is not initialized");
return 0; return 0;
} }
ret = lxc_mainloop_add_handler(descr, terminal->master, ret = lxc_mainloop_add_handler(descr, terminal->ptmx,
lxc_terminal_io_cb, terminal); lxc_terminal_io_cb, terminal);
if (ret < 0) { if (ret < 0) {
ERROR("Failed to add handler for terminal master fd %d to " ERROR("Failed to add handler for terminal ptmx fd %d to "
"mainloop", terminal->master); "mainloop", terminal->ptmx);
return -1; return -1;
} }
@ -483,11 +483,11 @@ static void lxc_terminal_peer_proxy_free(struct lxc_terminal *terminal)
{ {
lxc_terminal_signal_fini(terminal); lxc_terminal_signal_fini(terminal);
close(terminal->proxy.master); close(terminal->proxy.ptmx);
terminal->proxy.master = -1; terminal->proxy.ptmx = -1;
close(terminal->proxy.slave); close(terminal->proxy.pts);
terminal->proxy.slave = -1; terminal->proxy.pts = -1;
terminal->proxy.busy = -1; terminal->proxy.busy = -1;
@ -503,7 +503,7 @@ static int lxc_terminal_peer_proxy_alloc(struct lxc_terminal *terminal,
struct termios oldtermio; struct termios oldtermio;
struct lxc_terminal_state *ts; struct lxc_terminal_state *ts;
if (terminal->master < 0) { if (terminal->ptmx < 0) {
ERROR("Terminal not set up"); ERROR("Terminal not set up");
return -1; return -1;
} }
@ -519,51 +519,51 @@ static int lxc_terminal_peer_proxy_alloc(struct lxc_terminal *terminal,
} }
/* This is the proxy terminal that will be given to the client, and /* This is the proxy terminal that will be given to the client, and
* that the real terminal master will send to / recv from. * that the real terminal ptmx will send to / recv from.
*/ */
ret = openpty(&terminal->proxy.master, &terminal->proxy.slave, NULL, ret = openpty(&terminal->proxy.ptmx, &terminal->proxy.pts, NULL,
NULL, NULL); NULL, NULL);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to open proxy terminal"); SYSERROR("Failed to open proxy terminal");
return -1; return -1;
} }
ret = ttyname_r(terminal->proxy.slave, terminal->proxy.name, ret = ttyname_r(terminal->proxy.pts, terminal->proxy.name,
sizeof(terminal->proxy.name)); sizeof(terminal->proxy.name));
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to retrieve name of proxy terminal slave"); SYSERROR("Failed to retrieve name of proxy terminal pts");
goto on_error; goto on_error;
} }
ret = fd_cloexec(terminal->proxy.master, true); ret = fd_cloexec(terminal->proxy.ptmx, true);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to set FD_CLOEXEC flag on proxy terminal master"); SYSERROR("Failed to set FD_CLOEXEC flag on proxy terminal ptmx");
goto on_error; goto on_error;
} }
ret = fd_cloexec(terminal->proxy.slave, true); ret = fd_cloexec(terminal->proxy.pts, true);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to set FD_CLOEXEC flag on proxy terminal slave"); SYSERROR("Failed to set FD_CLOEXEC flag on proxy terminal pts");
goto on_error; goto on_error;
} }
ret = lxc_setup_tios(terminal->proxy.slave, &oldtermio); ret = lxc_setup_tios(terminal->proxy.pts, &oldtermio);
if (ret < 0) if (ret < 0)
goto on_error; goto on_error;
ts = lxc_terminal_signal_init(terminal->proxy.master, terminal->master); ts = lxc_terminal_signal_init(terminal->proxy.ptmx, terminal->ptmx);
if (!ts) if (!ts)
goto on_error; goto on_error;
terminal->tty_state = ts; terminal->tty_state = ts;
terminal->peer = terminal->proxy.slave; terminal->peer = terminal->proxy.pts;
terminal->proxy.busy = sockfd; terminal->proxy.busy = sockfd;
ret = lxc_terminal_mainloop_add_peer(terminal); ret = lxc_terminal_mainloop_add_peer(terminal);
if (ret < 0) if (ret < 0)
goto on_error; goto on_error;
NOTICE("Opened proxy terminal with master fd %d and slave fd %d", NOTICE("Opened proxy terminal with ptmx fd %d and pts fd %d",
terminal->proxy.master, terminal->proxy.slave); terminal->proxy.ptmx, terminal->proxy.pts);
return 0; return 0;
on_error: on_error:
@ -574,7 +574,7 @@ on_error:
int lxc_terminal_allocate(struct lxc_conf *conf, int sockfd, int *ttyreq) int lxc_terminal_allocate(struct lxc_conf *conf, int sockfd, int *ttyreq)
{ {
int ttynum; int ttynum;
int masterfd = -1; int ptmxfd = -1;
struct lxc_tty_info *ttys = &conf->ttys; struct lxc_tty_info *ttys = &conf->ttys;
struct lxc_terminal *terminal = &conf->console; struct lxc_terminal *terminal = &conf->console;
@ -585,7 +585,7 @@ int lxc_terminal_allocate(struct lxc_conf *conf, int sockfd, int *ttyreq)
if (ret < 0) if (ret < 0)
goto out; goto out;
masterfd = terminal->proxy.master; ptmxfd = terminal->proxy.ptmx;
goto out; goto out;
} }
@ -614,10 +614,10 @@ int lxc_terminal_allocate(struct lxc_conf *conf, int sockfd, int *ttyreq)
out_tty: out_tty:
ttys->tty[ttynum - 1].busy = sockfd; ttys->tty[ttynum - 1].busy = sockfd;
masterfd = ttys->tty[ttynum - 1].master; ptmxfd = ttys->tty[ttynum - 1].ptmx;
out: out:
return masterfd; return ptmxfd;
} }
void lxc_terminal_free(struct lxc_conf *conf, int fd) void lxc_terminal_free(struct lxc_conf *conf, int fd)
@ -633,7 +633,7 @@ void lxc_terminal_free(struct lxc_conf *conf, int fd)
if (terminal->proxy.busy != fd) if (terminal->proxy.busy != fd)
return; return;
lxc_mainloop_del_handler(terminal->descr, terminal->proxy.slave); lxc_mainloop_del_handler(terminal->descr, terminal->proxy.pts);
lxc_terminal_peer_proxy_free(terminal); lxc_terminal_peer_proxy_free(terminal);
} }
@ -666,14 +666,14 @@ static int lxc_terminal_peer_default(struct lxc_terminal *terminal)
goto on_error_free_tios; goto on_error_free_tios;
} }
ts = lxc_terminal_signal_init(terminal->peer, terminal->master); ts = lxc_terminal_signal_init(terminal->peer, terminal->ptmx);
terminal->tty_state = ts; terminal->tty_state = ts;
if (!ts) { if (!ts) {
WARN("Failed to install signal handler"); WARN("Failed to install signal handler");
goto on_error_free_tios; goto on_error_free_tios;
} }
lxc_terminal_winsz(terminal->peer, terminal->master); lxc_terminal_winsz(terminal->peer, terminal->ptmx);
terminal->tios = malloc(sizeof(*terminal->tios)); terminal->tios = malloc(sizeof(*terminal->tios));
if (!terminal->tios) if (!terminal->tios)
@ -749,13 +749,13 @@ void lxc_terminal_delete(struct lxc_terminal *terminal)
close(terminal->peer); close(terminal->peer);
terminal->peer = -1; terminal->peer = -1;
if (terminal->master >= 0) if (terminal->ptmx >= 0)
close(terminal->master); close(terminal->ptmx);
terminal->master = -1; terminal->ptmx = -1;
if (terminal->slave >= 0) if (terminal->pts >= 0)
close(terminal->slave); close(terminal->pts);
terminal->slave = -1; terminal->pts = -1;
if (terminal->log_fd >= 0) if (terminal->log_fd >= 0)
close(terminal->log_fd); close(terminal->log_fd);
@ -764,7 +764,7 @@ void lxc_terminal_delete(struct lxc_terminal *terminal)
/** /**
* Note that this function needs to run before the mainloop starts. Since we * Note that this function needs to run before the mainloop starts. Since we
* register a handler for the terminal's masterfd when we create the mainloop * register a handler for the terminal's ptmxfd when we create the mainloop
* the terminal handler needs to see an allocated ringbuffer. * the terminal handler needs to see an allocated ringbuffer.
*/ */
static int lxc_terminal_create_ringbuf(struct lxc_terminal *terminal) static int lxc_terminal_create_ringbuf(struct lxc_terminal *terminal)
@ -832,27 +832,27 @@ int lxc_terminal_create(struct lxc_terminal *terminal)
{ {
int ret; int ret;
ret = openpty(&terminal->master, &terminal->slave, NULL, NULL, NULL); ret = openpty(&terminal->ptmx, &terminal->pts, NULL, NULL, NULL);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to open terminal"); SYSERROR("Failed to open terminal");
return -1; return -1;
} }
ret = ttyname_r(terminal->slave, terminal->name, sizeof(terminal->name)); ret = ttyname_r(terminal->pts, terminal->name, sizeof(terminal->name));
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to retrieve name of terminal slave"); SYSERROR("Failed to retrieve name of terminal pts");
goto err; goto err;
} }
ret = fd_cloexec(terminal->master, true); ret = fd_cloexec(terminal->ptmx, true);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to set FD_CLOEXEC flag on terminal master"); SYSERROR("Failed to set FD_CLOEXEC flag on terminal ptmx");
goto err; goto err;
} }
ret = fd_cloexec(terminal->slave, true); ret = fd_cloexec(terminal->pts, true);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to set FD_CLOEXEC flag on terminal slave"); SYSERROR("Failed to set FD_CLOEXEC flag on terminal pts");
goto err; goto err;
} }
@ -956,21 +956,21 @@ int lxc_terminal_stdin_cb(int fd, uint32_t events, void *cbdata,
ts->saw_escape = 0; ts->saw_escape = 0;
} }
ret = lxc_write_nointr(ts->masterfd, &c, 1); ret = lxc_write_nointr(ts->ptmxfd, &c, 1);
if (ret <= 0) if (ret <= 0)
return LXC_MAINLOOP_CLOSE; return LXC_MAINLOOP_CLOSE;
return LXC_MAINLOOP_CONTINUE; return LXC_MAINLOOP_CONTINUE;
} }
int lxc_terminal_master_cb(int fd, uint32_t events, void *cbdata, int lxc_terminal_ptmx_cb(int fd, uint32_t events, void *cbdata,
struct lxc_epoll_descr *descr) struct lxc_epoll_descr *descr)
{ {
int r, w; int r, w;
char buf[LXC_TERMINAL_BUFFER_SIZE]; char buf[LXC_TERMINAL_BUFFER_SIZE];
struct lxc_terminal_state *ts = cbdata; struct lxc_terminal_state *ts = cbdata;
if (fd != ts->masterfd) if (fd != ts->ptmxfd)
return LXC_MAINLOOP_CLOSE; return LXC_MAINLOOP_CLOSE;
r = lxc_read_nointr(fd, buf, sizeof(buf)); r = lxc_read_nointr(fd, buf, sizeof(buf));
@ -984,16 +984,16 @@ int lxc_terminal_master_cb(int fd, uint32_t events, void *cbdata,
return LXC_MAINLOOP_CONTINUE; return LXC_MAINLOOP_CONTINUE;
} }
int lxc_terminal_getfd(struct lxc_container *c, int *ttynum, int *masterfd) int lxc_terminal_getfd(struct lxc_container *c, int *ttynum, int *ptmxfd)
{ {
return lxc_cmd_console(c->name, ttynum, masterfd, c->config_path); return lxc_cmd_console(c->name, ttynum, ptmxfd, c->config_path);
} }
int lxc_console(struct lxc_container *c, int ttynum, int lxc_console(struct lxc_container *c, int ttynum,
int stdinfd, int stdoutfd, int stderrfd, int stdinfd, int stdoutfd, int stderrfd,
int escape) int escape)
{ {
int masterfd, ret, ttyfd; int ptmxfd, ret, ttyfd;
struct lxc_epoll_descr descr; struct lxc_epoll_descr descr;
struct termios oldtios; struct termios oldtios;
struct lxc_terminal_state *ts; struct lxc_terminal_state *ts;
@ -1002,7 +1002,7 @@ int lxc_console(struct lxc_container *c, int ttynum,
}; };
int istty = 0; int istty = 0;
ttyfd = lxc_cmd_console(c->name, &ttynum, &masterfd, c->config_path); ttyfd = lxc_cmd_console(c->name, &ttynum, &ptmxfd, c->config_path);
if (ttyfd < 0) if (ttyfd < 0)
return -1; return -1;
@ -1010,7 +1010,7 @@ int lxc_console(struct lxc_container *c, int ttynum,
if (ret < 0) if (ret < 0)
TRACE("Process is already group leader"); TRACE("Process is already group leader");
ts = lxc_terminal_signal_init(stdinfd, masterfd); ts = lxc_terminal_signal_init(stdinfd, ptmxfd);
if (!ts) { if (!ts) {
ret = -1; ret = -1;
goto close_fds; goto close_fds;
@ -1021,8 +1021,8 @@ int lxc_console(struct lxc_container *c, int ttynum,
istty = isatty(stdinfd); istty = isatty(stdinfd);
if (istty) { if (istty) {
lxc_terminal_winsz(stdinfd, masterfd); lxc_terminal_winsz(stdinfd, ptmxfd);
lxc_terminal_winsz(ts->stdinfd, ts->masterfd); lxc_terminal_winsz(ts->stdinfd, ts->ptmxfd);
} else { } else {
INFO("File descriptor %d does not refer to a terminal", stdinfd); INFO("File descriptor %d does not refer to a terminal", stdinfd);
} }
@ -1049,10 +1049,10 @@ int lxc_console(struct lxc_container *c, int ttynum,
goto close_mainloop; goto close_mainloop;
} }
ret = lxc_mainloop_add_handler(&descr, ts->masterfd, ret = lxc_mainloop_add_handler(&descr, ts->ptmxfd,
lxc_terminal_master_cb, ts); lxc_terminal_ptmx_cb, ts);
if (ret < 0) { if (ret < 0) {
ERROR("Failed to add master handler"); ERROR("Failed to add ptmx handler");
goto close_mainloop; goto close_mainloop;
} }
@ -1093,7 +1093,7 @@ sigwinch_fini:
lxc_terminal_signal_fini(&terminal); lxc_terminal_signal_fini(&terminal);
close_fds: close_fds:
close(masterfd); close(ptmxfd);
close(ttyfd); close(ttyfd);
return ret; return ret;
@ -1133,16 +1133,16 @@ int lxc_terminal_prepare_login(int fd)
void lxc_terminal_info_init(struct lxc_terminal_info *terminal) void lxc_terminal_info_init(struct lxc_terminal_info *terminal)
{ {
terminal->name[0] = '\0'; terminal->name[0] = '\0';
terminal->master = -EBADF; terminal->ptmx = -EBADF;
terminal->slave = -EBADF; terminal->pts = -EBADF;
terminal->busy = -1; terminal->busy = -1;
} }
void lxc_terminal_init(struct lxc_terminal *terminal) void lxc_terminal_init(struct lxc_terminal *terminal)
{ {
memset(terminal, 0, sizeof(*terminal)); memset(terminal, 0, sizeof(*terminal));
terminal->slave = -EBADF; terminal->pts = -EBADF;
terminal->master = -EBADF; terminal->ptmx = -EBADF;
terminal->peer = -EBADF; terminal->peer = -EBADF;
terminal->log_fd = -EBADF; terminal->log_fd = -EBADF;
lxc_terminal_info_init(&terminal->proxy); lxc_terminal_info_init(&terminal->proxy);
@ -1167,13 +1167,13 @@ int lxc_terminal_map_ids(struct lxc_conf *c, struct lxc_terminal *terminal)
if (strcmp(terminal->name, "") == 0) if (strcmp(terminal->name, "") == 0)
return 0; return 0;
ret = chown_mapped_root(terminal->name, c); ret = userns_exec_mapped_root(terminal->name, terminal->pts, c);
if (ret < 0) { if (ret < 0) {
ERROR("Failed to chown terminal \"%s\"", terminal->name); return log_error(-1, "Failed to chown terminal %d(%s)",
return -1; terminal->pts, terminal->name);
} }
TRACE("Chowned terminal \"%s\"", terminal->name); TRACE("Chowned terminal %d(%s)", terminal->pts, terminal->name);
return 0; return 0;
} }

View File

@ -15,14 +15,14 @@ struct lxc_conf;
struct lxc_epoll_descr; struct lxc_epoll_descr;
struct lxc_terminal_info { struct lxc_terminal_info {
/* the path name of the slave side */ /* the path name of the pts side */
char name[PATH_MAX]; char name[PATH_MAX];
/* the file descriptor of the master */ /* the file descriptor of the ptmx */
int master; int ptmx;
/* the file descriptor of the slave */ /* the file descriptor of the pts */
int slave; int pts;
/* whether the terminal is currently used */ /* whether the terminal is currently used */
int busy; int busy;
@ -32,7 +32,7 @@ struct lxc_terminal_state {
struct lxc_list node; struct lxc_list node;
int stdinfd; int stdinfd;
int stdoutfd; int stdoutfd;
int masterfd; int ptmxfd;
/* Escape sequence to use for exiting the terminal. A single char can /* Escape sequence to use for exiting the terminal. A single char can
* be specified. The terminal can then exited by doing: Ctrl + * be specified. The terminal can then exited by doing: Ctrl +
@ -57,8 +57,8 @@ struct lxc_terminal_state {
}; };
struct lxc_terminal { struct lxc_terminal {
int slave; int pts;
int master; int ptmx;
int peer; int peer;
struct lxc_terminal_info proxy; struct lxc_terminal_info proxy;
struct lxc_epoll_descr *descr; struct lxc_epoll_descr *descr;
@ -102,10 +102,10 @@ extern int lxc_terminal_allocate(struct lxc_conf *conf, int sockfd, int *ttynum
/** /**
* Create a new terminal: * Create a new terminal:
* - calls openpty() to allocate a master/slave pair * - calls openpty() to allocate a ptmx/pts pair
* - sets the FD_CLOEXEC flag on the master/slave fds * - sets the FD_CLOEXEC flag on the ptmx/pts fds
* - allocates either the current controlling terminal (default) or a user * - allocates either the current controlling terminal (default) or a user
* specified terminal as proxy for the newly created master/slave pair * specified terminal as proxy for the newly created ptmx/pts pair
* - sets up SIGWINCH handler, winsz, and new terminal settings * - sets up SIGWINCH handler, winsz, and new terminal settings
* (Handlers for SIGWINCH and I/O are not registered in a mainloop.) * (Handlers for SIGWINCH and I/O are not registered in a mainloop.)
*/ */
@ -164,7 +164,7 @@ extern int lxc_console(struct lxc_container *c, int ttynum,
* the range specified by lxc.tty.max to allocate a specific tty. * the range specified by lxc.tty.max to allocate a specific tty.
*/ */
extern int lxc_terminal_getfd(struct lxc_container *c, int *ttynum, extern int lxc_terminal_getfd(struct lxc_container *c, int *ttynum,
int *masterfd); int *ptmxfd);
/** /**
* Make fd a duplicate of the standard file descriptors. The fd is made a * Make fd a duplicate of the standard file descriptors. The fd is made a
@ -183,12 +183,12 @@ extern int lxc_terminal_stdin_cb(int fd, uint32_t events, void *cbdata,
struct lxc_epoll_descr *descr); struct lxc_epoll_descr *descr);
/** /**
* Handler for events on the master fd of the terminal. To be registered via * Handler for events on the ptmx fd of the terminal. To be registered via
* the corresponding functions declared and defined in mainloop.{c,h} or * the corresponding functions declared and defined in mainloop.{c,h} or
* lxc_terminal_mainloop_add(). * lxc_terminal_mainloop_add().
* This function exits the loop cleanly when an EPOLLHUP event is received. * This function exits the loop cleanly when an EPOLLHUP event is received.
*/ */
extern int lxc_terminal_master_cb(int fd, uint32_t events, void *cbdata, extern int lxc_terminal_ptmx_cb(int fd, uint32_t events, void *cbdata,
struct lxc_epoll_descr *descr); struct lxc_epoll_descr *descr);
/** /**
@ -202,9 +202,9 @@ extern int lxc_setup_tios(int fd, struct termios *oldtios);
* lxc_terminal_winsz: propagate winsz from one terminal to another * lxc_terminal_winsz: propagate winsz from one terminal to another
* *
* @srcfd * @srcfd
* - terminal to get size from (typically a slave pty) * - terminal to get size from (typically a pts pty)
* @dstfd * @dstfd
* - terminal to set size on (typically a master pty) * - terminal to set size on (typically a ptmx pty)
*/ */
extern void lxc_terminal_winsz(int srcfd, int dstfd); extern void lxc_terminal_winsz(int srcfd, int dstfd);

View File

@ -1071,7 +1071,7 @@ static int ls_remove_lock(const char *path, const char *name,
if (check < 0 || (size_t)check >= *len_lockpath) if (check < 0 || (size_t)check >= *len_lockpath)
goto out; goto out;
ret = recursive_destroy(*lockpath); ret = lxc_rm_rf(*lockpath);
if (ret < 0) if (ret < 0)
WARN("Failed to destroy \"%s\"", *lockpath); WARN("Failed to destroy \"%s\"", *lockpath);
@ -1166,6 +1166,9 @@ static int ls_recv_str(int fd, char **buf)
if (ret != sizeof(slen)) if (ret != sizeof(slen))
return -1; return -1;
if (slen == SIZE_MAX)
return -1;
if (slen > 0) { if (slen > 0) {
*buf = malloc(sizeof(char) * (slen + 1)); *buf = malloc(sizeof(char) * (slen + 1));
if (!*buf) if (!*buf)
@ -1177,6 +1180,11 @@ static int ls_recv_str(int fd, char **buf)
return -1; return -1;
} }
if (slen == SIZE_MAX) {
free(*buf);
return -1;
}
(*buf)[slen] = '\0'; (*buf)[slen] = '\0';
} }

View File

@ -35,7 +35,7 @@
#include "memory_utils.h" #include "memory_utils.h"
#include "namespace.h" #include "namespace.h"
#include "parse.h" #include "parse.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "syscall_wrappers.h" #include "syscall_wrappers.h"
#include "utils.h" #include "utils.h"
@ -1336,7 +1336,7 @@ bool lxc_switch_uid_gid(uid_t uid, gid_t gid)
int ret = 0; int ret = 0;
if (gid != LXC_INVALID_GID) { if (gid != LXC_INVALID_GID) {
ret = setgid(gid); ret = setresgid(gid, gid, gid);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to switch to gid %d", gid); SYSERROR("Failed to switch to gid %d", gid);
return false; return false;
@ -1345,7 +1345,7 @@ bool lxc_switch_uid_gid(uid_t uid, gid_t gid)
} }
if (uid != LXC_INVALID_UID) { if (uid != LXC_INVALID_UID) {
ret = setuid(uid); ret = setresuid(uid, uid, uid);
if (ret < 0) { if (ret < 0) {
SYSERROR("Failed to switch to uid %d", uid); SYSERROR("Failed to switch to uid %d", uid);
return false; return false;
@ -1747,7 +1747,7 @@ int fd_cloexec(int fd, bool cloexec)
return 0; return 0;
} }
int recursive_destroy(const char *dirname) int lxc_rm_rf(const char *dirname)
{ {
__do_closedir DIR *dir = NULL; __do_closedir DIR *dir = NULL;
int fret = 0; int fret = 0;
@ -1779,7 +1779,7 @@ int recursive_destroy(const char *dirname)
if (!S_ISDIR(mystat.st_mode)) if (!S_ISDIR(mystat.st_mode))
continue; continue;
ret = recursive_destroy(pathname); ret = lxc_rm_rf(pathname);
if (ret < 0) if (ret < 0)
fret = -1; fret = -1;
} }
@ -1860,3 +1860,47 @@ bool lxc_can_use_pidfd(int pidfd)
return log_trace(true, "Kernel supports pidfds"); return log_trace(true, "Kernel supports pidfds");
} }
int fix_stdio_permissions(uid_t uid)
{
__do_close int devnull_fd = -EBADF;
int fret = 0;
int std_fds[] = {STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO};
int ret;
struct stat st, st_null;
devnull_fd = open_devnull();
if (devnull_fd < 0)
return log_warn_errno(-1, errno, "Failed to open \"/dev/null\"");
ret = fstat(devnull_fd, &st_null);
if (ret)
return log_warn_errno(-errno, errno, "Failed to stat \"/dev/null\"");
for (int i = 0; i < ARRAY_SIZE(std_fds); i++) {
ret = fstat(std_fds[i], &st);
if (ret) {
SYSWARN("Failed to stat standard I/O file descriptor %d", std_fds[i]);
fret = -1;
continue;
}
if (st.st_rdev == st_null.st_rdev)
continue;
ret = fchown(std_fds[i], uid, st.st_gid);
if (ret) {
SYSWARN("Failed to chown standard I/O file descriptor %d to uid %d and gid %d",
std_fds[i], uid, st.st_gid);
fret = -1;
}
ret = fchmod(std_fds[i], 0700);
if (ret) {
SYSWARN("Failed to chmod standard I/O file descriptor %d", std_fds[i]);
fret = -1;
}
}
return fret;
}

View File

@ -25,7 +25,7 @@
#include "initutils.h" #include "initutils.h"
#include "macro.h" #include "macro.h"
#include "memory_utils.h" #include "memory_utils.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "string_utils.h" #include "string_utils.h"
/* returns 1 on success, 0 if there were any failures */ /* returns 1 on success, 0 if there were any failures */
@ -235,8 +235,20 @@ extern uint64_t lxc_find_next_power2(uint64_t n);
/* Set a signal the child process will receive after the parent has died. */ /* Set a signal the child process will receive after the parent has died. */
extern int lxc_set_death_signal(int signal, pid_t parent, int parent_status_fd); extern int lxc_set_death_signal(int signal, pid_t parent, int parent_status_fd);
extern int fd_cloexec(int fd, bool cloexec); extern int fd_cloexec(int fd, bool cloexec);
extern int recursive_destroy(const char *dirname); extern int lxc_rm_rf(const char *dirname);
extern int lxc_setup_keyring(char *keyring_label); extern int lxc_setup_keyring(char *keyring_label);
extern bool lxc_can_use_pidfd(int pidfd); extern bool lxc_can_use_pidfd(int pidfd);
extern int fix_stdio_permissions(uid_t uid);
static inline bool uid_valid(uid_t uid)
{
return uid != LXC_INVALID_UID;
}
static inline bool gid_valid(gid_t gid)
{
return gid != LXC_INVALID_GID;
}
#endif /* __LXC_UTILS_H */ #endif /* __LXC_UTILS_H */

View File

@ -116,7 +116,7 @@ int lxc_id128_write_fd(int fd, lxc_id128_t id)
int lxc_id128_write(const char *p, lxc_id128_t id) int lxc_id128_write(const char *p, lxc_id128_t id)
{ {
int fd = -1; __do_close int fd = -EBADF;
fd = open(p, O_WRONLY|O_CREAT|O_CLOEXEC|O_NOCTTY|O_TRUNC, 0444); fd = open(p, O_WRONLY|O_CREAT|O_CLOEXEC|O_NOCTTY|O_TRUNC, 0444);
if (fd < 0) if (fd < 0)

View File

@ -30,7 +30,7 @@ lxc_test_parse_config_file_SOURCES = parse_config_file.c \
lxc_test_raw_clone_SOURCES = lxc_raw_clone.c \ lxc_test_raw_clone_SOURCES = lxc_raw_clone.c \
lxctest.h \ lxctest.h \
../lxc/namespace.c ../lxc/namespace.h \ ../lxc/namespace.c ../lxc/namespace.h \
../lxc/raw_syscalls.c ../lxc/raw_syscalls.h ../lxc/process_utils.c ../lxc/process_utils.h
../lxc/utils.c ../lxc/utils.h ../lxc/utils.c ../lxc/utils.h
lxc_test_reboot_SOURCES = reboot.c lxc_test_reboot_SOURCES = reboot.c
lxc_test_saveconfig_SOURCES = saveconfig.c lxc_test_saveconfig_SOURCES = saveconfig.c
@ -114,7 +114,8 @@ bin_SCRIPTS += lxc-test-automount \
lxc-test-createconfig \ lxc-test-createconfig \
lxc-test-exit-code \ lxc-test-exit-code \
lxc-test-no-new-privs \ lxc-test-no-new-privs \
lxc-test-rootfs lxc-test-rootfs \
lxc-test-usernsexec
if DISTRO_UBUNTU if DISTRO_UBUNTU
bin_SCRIPTS += lxc-test-lxc-attach \ bin_SCRIPTS += lxc-test-lxc-attach \
@ -163,6 +164,7 @@ EXTRA_DIST = basic.c \
lxc-test-snapdeps \ lxc-test-snapdeps \
lxc-test-symlink \ lxc-test-symlink \
lxc-test-unpriv \ lxc-test-unpriv \
lxc-test-usernsexec \
lxc-test-utils.c \ lxc-test-utils.c \
may_control.c \ may_control.c \
mount_injection.c \ mount_injection.c \

View File

@ -37,14 +37,14 @@
} while (0) } while (0)
static void test_console_close_all(int ttyfd[MAXCONSOLES], static void test_console_close_all(int ttyfd[MAXCONSOLES],
int masterfd[MAXCONSOLES]) int ptmxfd[MAXCONSOLES])
{ {
int i; int i;
for (i = 0; i < MAXCONSOLES; i++) { for (i = 0; i < MAXCONSOLES; i++) {
if (masterfd[i] != -1) { if (ptmxfd[i] != -1) {
close(masterfd[i]); close(ptmxfd[i]);
masterfd[i] = -1; ptmxfd[i] = -1;
} }
if (ttyfd[i] != -1) { if (ttyfd[i] != -1) {
@ -59,14 +59,14 @@ static int test_console_running_container(struct lxc_container *c)
int nrconsoles, i, ret = -1; int nrconsoles, i, ret = -1;
int ttynum [MAXCONSOLES]; int ttynum [MAXCONSOLES];
int ttyfd [MAXCONSOLES]; int ttyfd [MAXCONSOLES];
int masterfd[MAXCONSOLES]; int ptmxfd[MAXCONSOLES];
for (i = 0; i < MAXCONSOLES; i++) for (i = 0; i < MAXCONSOLES; i++)
ttynum[i] = ttyfd[i] = masterfd[i] = -1; ttynum[i] = ttyfd[i] = ptmxfd[i] = -1;
ttynum[0] = 1; ttynum[0] = 1;
ret = c->console_getfd(c, &ttynum[0], &masterfd[0]); ret = c->console_getfd(c, &ttynum[0], &ptmxfd[0]);
if (ret < 0) { if (ret < 0) {
TSTERR("console allocate failed"); TSTERR("console allocate failed");
goto err1; goto err1;
@ -79,12 +79,12 @@ static int test_console_running_container(struct lxc_container *c)
} }
/* attempt to alloc same ttynum */ /* attempt to alloc same ttynum */
ret = c->console_getfd(c, &ttynum[0], &masterfd[1]); ret = c->console_getfd(c, &ttynum[0], &ptmxfd[1]);
if (ret != -1) { if (ret != -1) {
TSTERR("console allocate should fail for allocated ttynum %d", ttynum[0]); TSTERR("console allocate should fail for allocated ttynum %d", ttynum[0]);
goto err2; goto err2;
} }
close(masterfd[0]); masterfd[0] = -1; close(ptmxfd[0]); ptmxfd[0] = -1;
close(ttyfd[0]); ttyfd[0] = -1; close(ttyfd[0]); ttyfd[0] = -1;
/* ensure we can allocate all consoles, we do this a few times to /* ensure we can allocate all consoles, we do this a few times to
@ -92,7 +92,7 @@ static int test_console_running_container(struct lxc_container *c)
*/ */
for (i = 0; i < 10; i++) { for (i = 0; i < 10; i++) {
for (nrconsoles = 0; nrconsoles < MAXCONSOLES; nrconsoles++) { for (nrconsoles = 0; nrconsoles < MAXCONSOLES; nrconsoles++) {
ret = c->console_getfd(c, &ttynum[nrconsoles], &masterfd[nrconsoles]); ret = c->console_getfd(c, &ttynum[nrconsoles], &ptmxfd[nrconsoles]);
if (ret < 0) if (ret < 0)
break; break;
ttyfd[nrconsoles] = ret; ttyfd[nrconsoles] = ret;
@ -103,13 +103,13 @@ static int test_console_running_container(struct lxc_container *c)
goto err2; goto err2;
} }
test_console_close_all(ttyfd, masterfd); test_console_close_all(ttyfd, ptmxfd);
} }
ret = 0; ret = 0;
err2: err2:
test_console_close_all(ttyfd, masterfd); test_console_close_all(ttyfd, ptmxfd);
err1: err1:
return ret; return ret;

View File

@ -135,7 +135,7 @@ int main(int argc, char *argv[])
str = c->config_file_name(c); str = c->config_file_name(c);
#define CONFIGFNAM LXCPATH "/" MYNAME "/config" #define CONFIGFNAM LXCPATH "/" MYNAME "/config"
if (!str || strcmp(str, CONFIGFNAM)) { if (str && strcmp(str, CONFIGFNAM)) {
fprintf(stderr, "%d: got wrong config file name (%s, not %s)\n", __LINE__, str, CONFIGFNAM); fprintf(stderr, "%d: got wrong config file name (%s, not %s)\n", __LINE__, str, CONFIGFNAM);
goto out; goto out;
} }

View File

@ -36,11 +36,13 @@ cleanup() {
trap cleanup EXIT SIGHUP SIGINT SIGTERM trap cleanup EXIT SIGHUP SIGINT SIGTERM
if [ ! -d /etc/lxc ]; then
mkdir -p /etc/lxc/ mkdir -p /etc/lxc/
cat > /etc/lxc/default.conf << EOF cat > /etc/lxc/default.conf << EOF
lxc.net.0.type = veth lxc.net.0.type = veth
lxc.net.0.link = lxcbr0 lxc.net.0.link = lxcbr0
EOF EOF
fi
ARCH=i386 ARCH=i386
if type dpkg >/dev/null 2>&1; then if type dpkg >/dev/null 2>&1; then

368
src/tests/lxc-test-usernsexec Executable file
View File

@ -0,0 +1,368 @@
#!/bin/bash
#
# This is a bash test case to test lxc-usernsexec.
# It basically supports usring lxc-usernsexec to execute itself
# and then create files and check that their ownership is as expected.
#
# It requires that the current user has at least 1 value in subuid and /etc/subgid
TEMP_D=""
VERBOSITY=0
set -f
fail() { echo "$@" 1>&2; exit 1; }
error() { echo "$@" 1>&2; }
skip() {
error "SKIP:" "$@"
exit 0
}
debug() {
local level=${1}; shift;
[ "${level}" -gt "${VERBOSITY}" ] && return
error "${@}"
}
collect_owners() {
# collect_owners([--dir=dir], file1, file2 ...)
# set _RET to a space delimited array of
# <file1>:owner:group <file2>:owner:group ...
local out="" ret="" dir=""
if [ "${1#--dir=}" != "$1" ]; then
dir="${1#--dir=}"
shift
fi
for arg in "$@"; do
# drop the :* so that input can be same as touch_files.
out=$(stat --format "%n:%u:%g" "${dir}${arg}") || {
error "failed to stat ${arg}"
return 1;
}
ret="$ret ${out##*/}"
done
_RET="${ret# }"
}
cleanup() {
if [ -d "$TEMP_D" ]; then
rm -Rf "$TEMP_D"
fi
}
touch_files() {
# touch_files tok [tok ...]
# tok is filename:chown_id:chown_gid
# if chown_id or chown_gid is empty, then chown will do the right thing
# and only change the provided value.
local args="" tok="" fname="" uidgid=""
args=( "$@" )
for tok in "$@"; do
fname=${tok%%:*}
uidgid=${tok#$fname}
uidgid=${uidgid#:}
: > "$fname" || { error "failed to create $fname"; return 1; }
[ -z "$uidgid" ] && continue
chown $uidgid "$fname" || { error "failed to chmod '$uidgid' $fname ($?)"; return 1; }
done
}
inside_cleanup() {
local f=""
rm -f "${FILES[@]}"
echo "$STATUS" >&5
echo "$STATUS" >&6
}
set_files() {
local x=""
FILES=( )
for x in "$@"; do
FILES[${#FILES[@]}]="${x%%:*}"
done
}
inside() {
# this what gets run inside the usernsexec environment.
# basically expects arguments of <filename>:uid:gid
# it will create the file, and then chmod it to the provided uid:gid
# it writes to file descriptor 5 a single line with space delimited
# exit_value uid gid [<filename>:<owner>:<group> ... ]
STATUS=127
trap inside_cleanup EXIT
local uid="" gid="" x=""
uid=$(id -u) || fail "failed execution of id -u"
gid=$(id -g) || fail "failed execution of id -g"
set_files "$@"
touch_files "$@" || fail "failed to create files"
collect_owners "${FILES[@]}" || fail "failed to collect owners"
result="$_RET"
# tell caller we are done.
echo "0" "$uid" "$gid" "$result" >&5
STATUS=0
# let the caller do things while the files are around.
read -t 30 x <&6
exit
}
runtest() {
# runtest(mydir, nsexec_args, [inside [...]])
# - use 'mydir' as a working dir.
# - execute lxc-usernsexec $nsexec_args -- <self> inside <inside args>
#
# write to stdout
# exit_value inside_exit_value inside_uid:inside_gid <results>
#
# where results are a list of space separated
# filename:uid:gid
# for each file passed in inside_args
[ $# -ge 3 ] || { error "runtest expects 2 args"; return 1; }
local mydir="$1" nsexec_args="$2"
shift 2
local ret inside_owners t=""
KIDPID=""
mkfifo "${mydir}/5" && exec 5<>"${mydir}/5" || return
mkfifo "${mydir}/6" && exec 6<>"${mydir}/6" || return
mkdir --mode=777 "${mydir}/work" || return
cd "${mydir}/work"
set_files "$@"
local results="" oresults="" iresults="" iuid="" igid="" n=0
error "$" $USERNSEXEC ${nsexec_args} -- "$MYPATH" inside "$*"
${USERNSEXEC} ${nsexec_args} -- "$MYPATH" inside "$@" &
KIDPID=$!
[ -d "/proc/$KIDPID" ] || {
wait $KIDPID
fail "kid $KIDPID died quickly $?"
}
# if lxc-usernsexec fails to execute MYPATH inside, then
# the read below would timeout. To avoid a long timeout,
# we do a short timeout and check the pid is alive.
while ! read -t 1 ret iuid igid inside_owners <&5; do
n=$((n+1))
if [ ! -d "/proc/$KIDPID" ]; then
wait $KIDPID
fail "kid $KIDPID is gone $?"
fi
[ $n -ge 30 ] && fail "child never wrote to pipe"
done
iresults=( $inside_owners )
collect_owners "--dir=${mydir}/work/" "${FILES[@]}" || return
oresults=( $_RET )
echo 0 >&6
wait
ret=$?
results=( )
for((i=0;i<${#iresults[@]};i++)); do
results[$i]="${oresults[$i]}:${iresults[$i]#*:}"
done
echo 0 $ret "$iuid:$igid" "${results[@]}"
}
runcheck() {
local name="$1" expected="$2" nsexec_args="$3" found=""
shift 3
mkdir "${TEMP_D}/$name" || fail "failed mkdir <TEMP_D>/$name.d"
local err="${TEMP_D}/$name.err"
out=$("$MYPATH" runtest "${TEMP_D}/$name" "$nsexec_args" "$@" 2>"$err") || {
error "$name: FAIL - runtest failed $?"
[ -n "$out" ] && error " $out"
sed 's,^, ,' "$err" 1>&2
ERRORS="${ERRORS} $name"
return 1
}
set -- $out
local parentrc=$1 kidrc=$2 iuidgid="$3" found=""
shift 3
found="$*"
[ "$parentrc" = "0" -a "$kidrc" = "0" ] || {
error "$name: FAIL - parentrc=$parentrc kidrc=$kidrc found=$found"
ERRORS="${ERRORS} $name"
return 1
}
[ "$expected" = "$found" ] && {
error "$name: PASS"
PASS="${PASSES} $name"
return 0
}
echo "$name: FAIL expected '$expected' != found '$found'"
FAILS="${FAILS} $name"
return 1
}
setup_Usage() {
cat <<EOF
${0} setup_and_run [-- run-args]
setup the system by creating a user (default is '${asuser:-test-userns}')
and then run test as that user. Must be root.
If user exists, then do not create the user.
-v | --verbose - be more verbose
--create-subuid=UID:RANGE
--create-subgid=UID:RANGE if adding subuid/subgid use this START:RANGE
example (default) 3000000000:5
EOF
}
setup_and_run() {
local short_opts="hv"
local long_opts="help,user:,create-subuid:,create-subgid:,verbose"
local getopt_out=""
getopt_out=$(getopt --name "${0##*/}" \
--options "${short_opts}" --long "${long_opts}" -- "$@") &&
eval set -- "${getopt_out}" ||
{ bad_Usage; return; }
local cur="" next="" asuser="test-userns"
local create_subuid="3000000000:5" create_subgid="3000000000:5"
while [ $# -ne 0 ]; do
cur="$1"; next="$2";
case "$cur" in
-h|--help) setup_Usage ; exit 0;;
--user) asuser="$next"; shift;;
--create-subuid) create_subuid=$next; shift;;
--create-subgid) create_subgid=$next; shift;;
-v|--verbose) VERBOSITY=$((${VERBOSITY}+1));;
--) shift; break;;
esac
shift;
done
local pt_args=""
pt_args=( "$@" )
if [ "$(id -u)" != "0" ]; then
error "Sorry, setup_and_run has to be done as root, not uid=$(id -u)"
return 1
fi
local home="/home/$asuser"
if [ ! -d "$home" ]; then
debug 1 "creating user $asuser"
useradd "$asuser" --create-home "--home-dir=$home" || {
error "failed to create $asuser"
return 1
}
else
debug 1 "$asuser existed"
fi
local subuid="" subgid=""
subuid=$(awk -F: '$1 == n { print $2; exit(0); }' "n=$asuser" /etc/subuid) || {
error "failed to read /etc/subuid for $asuser"
return 1
}
if [ -n "$subuid" ]; then
debug 1 "$asuser already had subuid=$subuid"
else
debug 1 "adding $asuser:$create_subuid to /etc/subuid"
echo "$asuser:$create_subuid" >> /etc/subuid || {
error "failed to add $asuser to /etc/subuid"
}
fi
subgid=$(awk -F: '$1 == n { print $2; exit(0); }' "n=$asuser" /etc/subgid) || {
error "failed to read /etc/subgid for $asuser"
return 1
}
if [ -n "$subgid" ]; then
debug 1 "$asuser already had subgid=$subgid"
else
debug 1 "adding $asuser:$create_subgid to /etc/subgid"
echo "$asuser:$create_subgid" >> /etc/subgid || {
error "failed to add $asuser to /etc/subgid"
}
fi
debug 0 "as $asuser executing ${MYPATH} ${pt_args[*]}"
sudo -Hu "$asuser" "${MYPATH}" "${pt_args[@]}"
}
USERNSEXEC=${USERNSEXEC:-lxc-usernsexec}
MYPATH=$(readlink -f "$0") || { echo "failed to get full path to self: $0"; exit 1; }
export MYPATH
if [ "$1" = "inside" ]; then
shift
inside "$@"
exit
elif [ "$1" = "runtest" ]; then
shift
runtest "$@"
exit
elif [ "$1" = "setup_and_run" ]; then
shift
setup_and_run "$@"
exit
fi
name=$(id --user --name) || fail "failed to get username"
if [ "$name" = "root" ]; then
setup_and_run "$@"
exit
fi
subuid=$(awk -F: '$1 == n { print $2; exit(0); }' "n=$name" /etc/subuid) &&
[ -n "$subuid" ] || fail "did not find $name in /etc/subuid"
subgid=$(awk -F: '$1 == n { print $2; exit(0); }' "n=$name" /etc/subgid) &&
[ -n "$subgid" ] || fail "did not find $name in /etc/subgid"
uid=$(id --user) || fail "failed to get uid"
gid=$(id --group) || fail "failed to get gid"
mapuid="u:0:$uid:1"
mapgid="g:0:$gid:1"
ver=$(dpkg-query --show lxc-utils | awk '{print $2}')
error "uid=$uid gid=$gid name=$name subuid=$subuid subgid=$subgid ver=$ver"
error "lxc-utils=$ver kver=$(uname -r)"
error "USERNSEXEC=$USERNSEXEC"
TEMP_D=$(mktemp -d)
trap cleanup EXIT
PASSES=""; FAILS=""; ERRORS=""
runcheck nouidgid "f0:$subuid:$subgid:0:0" "" f0
runcheck myuidgid "f0:$uid:$gid:0:0" \
"-m$mapuid -m$mapgid" f0
runcheck subuidgid \
"f0:$subuid:$subgid:0:0" \
"-mu:0:$subuid:1 -mg:0:$subgid:1" f0:0:0
runcheck bothsets "f0:$uid:$gid:0:0 f1:$subuid:$subgid:1:1 f2:$uid:$subgid:0:1" \
"-m$mapuid -m$mapgid -mu:1:$subuid:1 -mg:1:$subgid:1" \
f0 f1:1:1 f2::1
runcheck mismatch "f0:$uid:$subgid:0:0 f1:$subuid:$gid:15:31" \
"-mu:0:$uid:1 -mg:0:$subgid:1 -mu:15:$subuid:1 -mg:31:$gid:1" \
f0 f1:15:31
FAILS=${FAILS# }
ERRORS=${ERRORS# }
PASSES=${PASSES# }
[ -z "${FAILS}" ] || error "FAILS: ${FAILS}"
[ -z "${ERRORS}" ] || error "ERRORS: ${ERRORS}"
[ -z "${FAILS}" -a -z "${ERRORS}" ] || exit 1
exit 0

View File

@ -39,7 +39,7 @@
#include "lxctest.h" #include "lxctest.h"
#include "namespace.h" #include "namespace.h"
#include "raw_syscalls.h" #include "process_utils.h"
#include "utils.h" #include "utils.h"
int main(int argc, char *argv[]) int main(int argc, char *argv[])

View File

@ -415,7 +415,7 @@ static bool lxc_setup_shmount(const char *shmount_path)
static void lxc_teardown_shmount(char *shmount_path) static void lxc_teardown_shmount(char *shmount_path)
{ {
(void)umount2(shmount_path, MNT_DETACH); (void)umount2(shmount_path, MNT_DETACH);
(void)recursive_destroy(shmount_path); (void)lxc_rm_rf(shmount_path);
} }
int main(int argc, char *argv[]) int main(int argc, char *argv[])

View File

@ -348,8 +348,7 @@ fi
# shellcheck disable=SC2039 # shellcheck disable=SC2039
# shellcheck disable=SC2068 # shellcheck disable=SC2068
umoci --log=error unpack ${umoci_args[@]} --image "${DOWNLOAD_TEMP}:latest" "${LXC_ROOTFS}.tmp" umoci --log=error unpack ${umoci_args[@]} --image "${DOWNLOAD_TEMP}:latest" "${LXC_ROOTFS}.tmp"
rmdir "${LXC_ROOTFS}" find "${LXC_ROOTFS}.tmp/rootfs" -mindepth 1 -maxdepth 1 -exec mv '{}' "${LXC_ROOTFS}/" \;
mv "${LXC_ROOTFS}.tmp/rootfs" "${LXC_ROOTFS}"
OCI_CONF_FILE=$(getconfigpath "${DOWNLOAD_TEMP}" latest) OCI_CONF_FILE=$(getconfigpath "${DOWNLOAD_TEMP}" latest)
LXC_CONF_FILE="${LXC_PATH}/config" LXC_CONF_FILE="${LXC_PATH}/config"