With this patch, if an unprivileged user has $HOME 700 or
750 and does
lxc-start -n c1
he'll see an error like:
lxc_container: Permission denied - could not access /home/serge. Please grant it 'x' access, or add an ACL for t he container root.
(This addresses bug pad.lv/1277466)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
- refactor cgroup into two backends, the classic cgfs driver and the new
cgmanager. Instead of lxc_handler knowing about the internals of each,
have it just store an opaque pointer to a struct that is private to
each backend.
- rename a couple of cgroup functions for consistency: those that are
considered an API (ie. exported by lxc.h) begin with lxc_ and those that
are not are just cgroup_*
- made as many backend routines static as possible, only cg*_ops_init is
exported
- made a nrtasks op which is needed by the utmp code for monitoring
container shutdown, currently only implemented for the cgfs backend
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
lxc.id_map bug when writing directly to /proc/pid/[ug]id_map
There's some code in src/lxc/conf.c that sets up the UID/GID mapping. It
can use the external newuidmap/newgidmap tools, or it can write to
/proc/pid/[ug]id_map directly. The latter case is broken: lines are written
without a newline (\n) at the end. This patch fixes that. Note that
I did not check if the newuidmap/newgidmap case still works. It should,
but I wasn't able to test it.
Signed-off-by: Miquel van Smoorenburg <mikevs@xs4all.net>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
That way templates can fix group ownership alongside uid ownership.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This introduces a new lxc.rootfs.options which lets you pass new
mountflags/mountdata when mounting the root filesystem.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
If it (or any variation thereof) is in the container configuration,
then mount /sys/fs/cgroup/cgmanager.lower (if it exists) or
/sys/fs/cgroup/cgmanager into the container so it can run a
cgproxy.
Also make sure to clear our groups when we start or attach to a
container. Else with unprivileged containers we end up with
lots of nogroups listed in /proc/1/status.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This fixes various compile errors when building with musl libc. For
example:
In file included from start.c:66:0:
monitor.h:38:12: error: 'NAME_MAX' undeclared here (not in a function)
char name[NAME_MAX+1];
^
start.c: In function 'setup_signal_fd':
start.c:202:2: error: implicit declaration of function 'sigfillset' [-Werror=implicit-function-declaration]
if (sigfillset(&mask) ||
^
...
In file included from freezer.c:36:0:
monitor.h:39:12: error: 'NAME_MAX' undeclared here (not in a function)
char name[NAME_MAX+1];
^
...
In file included from cgroup.c:45:0:
conf.h:87:13: error: 'IFNAMSIZ' undeclared here (not in a function)
char veth1[IFNAMSIZ]; /* needed for deconf */
^
cgroup.c: In function 'find_cgroup_subsystems':
cgroup.c:230:3: error: implicit declaration of function 'strdup' [-Werror=implicit-function-declaration]
(*kernel_subsystems)[kernel_subsystems_count] = strdup(line);
^
...
In file included from conf.c:65:0:
conf.h:87:13: error: 'IFNAMSIZ' undeclared here (not in a function)
char veth1[IFNAMSIZ]; /* needed for deconf */
^
In file included from conf.c:66:0:
conf.c: In function 'run_buffer':
log.h:263:9: error: implicit declaration of function 'strsignal' [-Werror=implicit-function-declaration]
struct lxc_log_locinfo locinfo = LXC_LOG_LOCINFO_INIT; \
^
...
af_unix.c: In function 'lxc_abstract_unix_send_credential':
af_unix.c:208:9: error: variable 'cred' has initializer but incomplete type
struct ucred cred = {
^
af_unix.c:209:3: error: unknown field 'pid' specified in initializer
.pid = getpid(),
^
af_unix.c:209:3: error: excess elements in struct initializer [-Werror]
af_unix.c:209:3: error: (near initialization for 'cred') [-Werror]
af_unix.c:210:3: error: unknown field 'uid' specified in initializer
.uid = getuid(),
^
af_unix.c:210:3: error: excess elements in struct initializer [-Werror]
af_unix.c:210:3: error: (near initialization for 'cred') [-Werror]
af_unix.c:211:3: error: unknown field 'gid' specified in initializer
.gid = getgid(),
^
and more...
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
lxc_map_ids can call system(3), which on error from the
spawned process returns > 0. No path should return > 0
when it meant success. So check the lxc_map_ids() value
to be != rather than just < 0.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
The geteuid() addition is being made the first element of the lxc_list,
but the first element is just a head whose entry is ignored. Therefore
userns_exec_1() was starting its tasks without the caller's uid mapped
into the namespace.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This patch splits out most of the cgroupfs-specific code, so that
cgroup-manager versions can be plugged in. The case I did
not handle is cgroup_enter at lxc_attach. I'm hoping that case can
be greatly simplified, but will worry about it after fleshing out the
cgroup manager handlers.
This also simplify the freezer functions.
This seems to not regress my common tests when running without
cgmanager, but I'd like to do a bit more testing before pushing.
However I was hoping to get some more eyes on this so am sending it
out now.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Instead of always returning -1 and call SYSERROR when the child returns
non-zero. Have userns_exec_1 always return the return value from the
function it's calling and let the caller do the error handling (as is
already done by its only caller).
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Instead of having one function for each possible key in lxc.conf which
doesn't really scale and requires an API update for every new key,
switch to a generic lxc_get_global_config_item() function which takes a
key name as argument.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Mark most of functions that are used within only one file as static.
After 95ee490bbd it's easy to prove they
are not in public API.
Several arrays and structs are also marked static.
This prevents them from being exported from liblxc.so
List of removed previously exported symbols:
bdevs
btrfs_ops
check_autodev
create_partial
dir_ops
dump_stacktrace
get_mapped_rootid
get_next_index
lock_mutex
loop_ops
lvm_ops
lxc_abort
lxcapi_clone
lxc_attach_drop_privs
lxc_attach_get_init_uidgi
lxc_attach_getpwshell
lxc_attach_remount_sys_pr
lxc_attach_set_environmen
lxc_attach_to_ns
lxc_clear_saved_nics
lxc_config_readline
lxc_devs
lxc_free_idmap
lxc_global_config_value
lxc_poll
lxc_proc_get_context_info
lxc_set_state
lxc_spawn
mk_devtmpfs
mount_check_fs
ongoing_create
overlayfs_destroy
overlayfs_ops
prepend_lxc_header
remove_partial
save_phys_nics
setup_pivot_root
signames
static_mutex
thread_mutex
unlock_mutex
unpriv_assign_nic
zfs_ops
Signed-off-by: Andrey Mazo <mazo@telum.ru>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Functions like open(), close(), socket(), socketpair(), pipe() and mkdir()
are generally thin wrappers around kernel-provided system calls.
It's the kernel not libc, who ensures race-free handling of file
descriptors.
Thus locking around these functions is unnecessary even on somewhat buggy libcs.
fopen(), fclose() and other stdio functions may maintain internal lists
of open file handles and thus can be prone to race-conditions.
Hopefully, most libcs utilize proper locking or other ways to ensure
thread-safety of these functions.
Bionic used to have non-thread-safe stdio [2] but that must be fixed
since android 4.3 [3, 4].
S.Çağlar Onur showed [1] that openpty() (because of nsswitch) is not thread-safe though.
So we workaround it by protecting openpty() calls with process_lock()/process_unlock().
Because of the need to guard openpty() with process_lock()/process_unlock(),
process_unlock() is still used after fork().
This commit reverts most of 025ed0f391.
[1] https://github.com/lxc/lxc/pull/106#issuecomment-31077269
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=687367
[3] f582340a6a
[4] 6b3f49a537
Signed-off-by: Andrey Mazo <mazo@telum.ru>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
If unprivileged users are using a veth nic, then ifindex is still 0
at lxc_assign_network() (because lxc_create_network() was skipped).
So check for that case before we use lxc->ifindex to decide if we
have an empty network namespace.
We probably should change the !netdev->ifindex check to a
netdev->type == LXC_NET_EMPTY check, but I've been making enough
mistakes today not to risk that.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
First patch in the set of changes required for container autostart.
This commit adds the new configuration keys and parsers that will then
be used by lxc-start and lxc-stop.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Currently if no lxc.network.type section is in the container
configuration, the container ends up sharing the host's network.
This is a dangerous default.
Instead, add 'lxc.network.type = none' as a valid type, and make
en empty network the default.
If none as well as another network type are specified, then the
none type will be ignored.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Just like we already had "optional", this adds two new LXC-specific
mount flags:
- create=dir (will do a mkdir_p on the path)
- create=file (will do a mkdir_p on the dirname + a fopen on the path)
This was motivated by some of the needed bind-mounts for the
unprivileged containers.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
NetworkManager at least expects all veth devices to be called veth*
otherwise it'll consider them as physical interface and try to do DHCP
on them.
This change makes lxc-user-nic use the same function that we use for LXC
itself which will give us standard vethXXXXX kind of interfaces.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Currently, all scripts, specified as "lxc.network.script.up", inherit
lxc-execute's signal mask.
This, for example, includes blocked SIGALRM signal which, in turn, makes
alarm(2), sleep(3) and setitimer(2) functions silently unusable in all programs,
invoked in turn by the "lxc.network.script.up".
To fix this, run_buffer() should restore default signal mask prior to
executing "lxc.network.script.up".
A naive implementation would temprorary unblock all signals just before
calling popen() and block them back immediately after it.
But that would result in an immediate delivery of all pending signals just
after their unblocking.
Thus, we should restore default signal mask exactly in child (after fork())
just before calling exec().
To achieve this, a home-brewed popen() alternative is needed.
The added lxc_popen() and lxc_pclose() are mostly taken from glibc with
several simplifications (as we currently need only "re" mode).
The implementation uses Linux-specific pipe2() system-call,
which is only available since Linux 2.6.27 and supported by glibc since
version 2.9 (according to pipe(2) man-page), but this shouldn't be a
problem as lxc requires a fairly recent kernel too.
lxc_popen()/lxc_pclose() are meant to be direct replacements for their
stdio counterparts, so they perform no process_lock() locking
themselves. (as fopen_cloexec() does)
All existing users of popen()/pclose() are converted to the new
lxc_popen()/lxc_pclose().
(mazo: don't clear close-on-exec flag for parent's end;
place the new functions in utils.c;
convert bdev.c to use the new functions;
coding style fixes;
comments fixes;
commit message tweaks)
Signed-off-by: Ivan Bolsunov <bolsunov@telum.ru>
Signed-off-by: Andrey Mazo <mazo@telum.ru>
Because if they are not, then we'll fail trying to map that gid into the
container.
The function doesn't change any gids, but lxc-usernsexec always does
setgid(0), so just map getgid() to 0 in the container.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
The previous patch added code to add a static route prior to adding the
gateway to the interface.
This commit simply changes the logic so that this is only done on
failure to add the gateway.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This pulls a lot of common code out of lxc_user_nic.c. It also
moves one function from conf.c that was duplicated in lxc_user_nic.c
(It removes a DEBUG statement because (a) it doesn't seem actually
useful and (b) DEBUG doesn't work in network.c).
Also replace the old test of only parsing code with a skeleton for
a full test. (Note - the test will need some work, it's just there
as do-what-i-mean code example)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This is necessary to have the rights to remove files owned by our subuids.
Also update lxc_rmdir_onedev to return 0 on success, -1 on failure.
Callers were not consistent in using it correctly, and this is more
in keeping with the rest of our code.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
If autodev is not specifically set to 0 or 1, attempts to determine if
systemd is being utilized and forces autodev=1 to prevent host system
conflicts and collisions.
If autodev is enabled and the host /dev is mounted with devtmpfs
or /dev/.lxc is mounted with another file system...
Each container created by a privileged user gets a /dev directory
mapped off the host /dev here:
/dev/.lxc/${name}.$( hash $lxcpath/$name )
Each container created by a non-privileged user gets a /dev/directory
mapped off the host /dev here:
/dev/.lxc/user/${name}.$( hash $lxcpath/$name )
The /dev/.lxc/user is mode 1777 to allow unpriv access.
The /dev/.lxc/{containerdev} is bind mounted into the container /dev.
Fallback on failure is to mount tmpfs into the container /dev.
A symlink is created from $lxcpath/$name/rootfs.dev back to the /dev
relative directory to provid a code consistent reference for updating
container devs.
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This also fixes possible crashes due to passing NULL to strlen function
Changes since v1;
* Fixed a typo spotted by Serge
Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
When moving an interface from the host netns to a container's,
the ifindex might not remain the same. This happens when the
index of the host interface is already assigned to another interface
in the new netns.
For veth/vlan/macvlan, virtual interfaces are first created on the host,
and then moved in the container. Since they are created after all other
interfaces are discovered, there is no chance for its assigned ifindex
to be already present in a freshly created netns, because it's a greater
number.
However, when moving a physical interface, there is a chance that its
ifindex in the host netns is not free in the new netns. The patch
forces ifindex re-read for the LXC_NET_PHYS case to update the
lxc_netdev structure.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Right now lxc-start always does one of two things: it creates
a new namespace or inherits it from the parent environment.
This patch adds a third option: share a namespace with another
container (actually: a process).
In some situations this is handy. For example by sharing a network
namespace it is possible to migrate services between containers
without (or with little) downtime.
This patch creates an infrastructure for inheriting any type
of namespace, but only the network namespace is supported for now.
To do so we do a quick setns into the container's netns. This
(unexpectedly) turns out cleaner than trying to rename it from
lxc_setup(), because we don't know the original nic name in
the container until we created it which we do in the parent
after the init has been cloned.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
conf.c/conf.h: have replaced bool hostid_is_mapped() with int mapped_hostid()
which returns the mapped uid for the caller's uid on the host, or -1 if
none
create_run_template: pass caller's uid into template.
lxc-ubuntu-cloud:
1. accept --mapped-uid argument
2. don't write to devices cgroup - not allowed.
3. if running in userns, use $HOME/.cache
4. chown cached files to the uid to which our caller was
mapped
5. ignore /dev when extracting rootfs in a userns
Changelog: nov 5: remove debugging INFO line.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>