This change makes it possible to create unprivileged containers as root.
They will be stored in the usual system wide location, use the usual
system wide cache but will be running using a uid/gid map.
This also updates lxc_usernsexec to use the same function as the rest of
LXC, centralizing all the userns switch in a single function.
That function now detects the presence of newuidmap and newgidmap on the
system, if they are present, they will be used for containers created as
either user or root. If they're not and the user isn't root, an error is
shown. If they're not and the user is root, LXC will directly set the
uid_map and gid_map values.
All that should allow for a consistent experience as well as supporting
distributions that don't yet ship newuidmap/newgidmap.
To make things simpler in the future, an helper function "on_path" is
also introduced and used to detect the presence of newuidmap and
newgidmap.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
(this expands on Dwight's recent patch, commit c597baa8f9)
After unshare(CLONE_NEWNS) and before doing any mounting, always
check whether rootfs is shared. Otherwise template runs or clone
scripts can bleed mount activity to the host.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Since we're no longer always returning a getenv result or some defined
string, the callers should cleanup the variable after use.
As a result, change from const char* to char*, add the needed free()
everywhere and use strdup() on strings coming from getenv.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
If it (or any variation thereof) is in the container configuration,
then mount /sys/fs/cgroup/cgmanager.lower (if it exists) or
/sys/fs/cgroup/cgmanager into the container so it can run a
cgproxy.
Also make sure to clear our groups when we start or attach to a
container. Else with unprivileged containers we end up with
lots of nogroups listed in /proc/1/status.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
The cgroupfs-specific code is moved from attach.c to cgroup.c.
lxc-cgmanager now only chgrps the container's cgroup, so that the
unprivileged user still owns the tasks file allowing him to enter
the container cgroup (for attach).
Some other changes rolled into the cgmanager update:
Make the list of subsystems not per-handler, as it will not change. As
a result, the only state we need to keep in the per-handler cgroup data
is the char *cgroup_path, so we can drop the cgm_data struct altogether.
Catch nih errors (as not doing so causes later crashes).
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This also fixes unprivileged use of lxc-snapshot and lxc-rename.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
By setting lxc.network.hwaddr to something like fe:xx:xx:xx:xx:xx each
"x" will be replaced by a random value. If less significant bit of
first byte is "templated", it will be set to 0.
This change introduce also a common randinit() function that could be
used to initialize random generator.
Signed-off-by: gza <lxc@zitta.fr>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Instead of having one function for each possible key in lxc.conf which
doesn't really scale and requires an API update for every new key,
switch to a generic lxc_get_global_config_item() function which takes a
key name as argument.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Mark most of functions that are used within only one file as static.
After 95ee490bbd it's easy to prove they
are not in public API.
Several arrays and structs are also marked static.
This prevents them from being exported from liblxc.so
List of removed previously exported symbols:
bdevs
btrfs_ops
check_autodev
create_partial
dir_ops
dump_stacktrace
get_mapped_rootid
get_next_index
lock_mutex
loop_ops
lvm_ops
lxc_abort
lxcapi_clone
lxc_attach_drop_privs
lxc_attach_get_init_uidgi
lxc_attach_getpwshell
lxc_attach_remount_sys_pr
lxc_attach_set_environmen
lxc_attach_to_ns
lxc_clear_saved_nics
lxc_config_readline
lxc_devs
lxc_free_idmap
lxc_global_config_value
lxc_poll
lxc_proc_get_context_info
lxc_set_state
lxc_spawn
mk_devtmpfs
mount_check_fs
ongoing_create
overlayfs_destroy
overlayfs_ops
prepend_lxc_header
remove_partial
save_phys_nics
setup_pivot_root
signames
static_mutex
thread_mutex
unlock_mutex
unpriv_assign_nic
zfs_ops
Signed-off-by: Andrey Mazo <mazo@telum.ru>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
After 95ee490bbd they are not in public
API and are not used throughout the lxc codebase.
This has a bonus of removing workaround for bionic.
Signed-off-by: Andrey Mazo <mazo@telum.ru>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Functions like open(), close(), socket(), socketpair(), pipe() and mkdir()
are generally thin wrappers around kernel-provided system calls.
It's the kernel not libc, who ensures race-free handling of file
descriptors.
Thus locking around these functions is unnecessary even on somewhat buggy libcs.
fopen(), fclose() and other stdio functions may maintain internal lists
of open file handles and thus can be prone to race-conditions.
Hopefully, most libcs utilize proper locking or other ways to ensure
thread-safety of these functions.
Bionic used to have non-thread-safe stdio [2] but that must be fixed
since android 4.3 [3, 4].
S.Çağlar Onur showed [1] that openpty() (because of nsswitch) is not thread-safe though.
So we workaround it by protecting openpty() calls with process_lock()/process_unlock().
Because of the need to guard openpty() with process_lock()/process_unlock(),
process_unlock() is still used after fork().
This commit reverts most of 025ed0f391.
[1] https://github.com/lxc/lxc/pull/106#issuecomment-31077269
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=687367
[3] f582340a6a
[4] 6b3f49a537
Signed-off-by: Andrey Mazo <mazo@telum.ru>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Currently, all scripts, specified as "lxc.network.script.up", inherit
lxc-execute's signal mask.
This, for example, includes blocked SIGALRM signal which, in turn, makes
alarm(2), sleep(3) and setitimer(2) functions silently unusable in all programs,
invoked in turn by the "lxc.network.script.up".
To fix this, run_buffer() should restore default signal mask prior to
executing "lxc.network.script.up".
A naive implementation would temprorary unblock all signals just before
calling popen() and block them back immediately after it.
But that would result in an immediate delivery of all pending signals just
after their unblocking.
Thus, we should restore default signal mask exactly in child (after fork())
just before calling exec().
To achieve this, a home-brewed popen() alternative is needed.
The added lxc_popen() and lxc_pclose() are mostly taken from glibc with
several simplifications (as we currently need only "re" mode).
The implementation uses Linux-specific pipe2() system-call,
which is only available since Linux 2.6.27 and supported by glibc since
version 2.9 (according to pipe(2) man-page), but this shouldn't be a
problem as lxc requires a fairly recent kernel too.
lxc_popen()/lxc_pclose() are meant to be direct replacements for their
stdio counterparts, so they perform no process_lock() locking
themselves. (as fopen_cloexec() does)
All existing users of popen()/pclose() are converted to the new
lxc_popen()/lxc_pclose().
(mazo: don't clear close-on-exec flag for parent's end;
place the new functions in utils.c;
convert bdev.c to use the new functions;
coding style fixes;
comments fixes;
commit message tweaks)
Signed-off-by: Ivan Bolsunov <bolsunov@telum.ru>
Signed-off-by: Andrey Mazo <mazo@telum.ru>
In general proc gets mounted ahead of time, so init shouldn't
have to do it. Without this patch, you cannot
lxc-execute -n x1 -s lxc.cap.drop=sys_admin /bin/bash
(See https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1253669 for
a bug about this)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Conflict occurs between following lines
[...]
269 if (values[i])
270 return values[i];
[...]
and
[...]
309 /* could not find value, use default */
310 values[i] = (*ptr)[1];
[...]
fix it using a specific lock dedicated to that problem as Serge suggested.
Also introduce a new autoconf parameter (--enable-mutex-debugging) to convert mutexes to error reporting type and to provide a stacktrace when locking fails.
Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
get_ips accepts an interface name as a parameter but there was no
way to get the interfaces names from the container. This patch
introduces a new get_interfaces call to the API so that users
can obtain the name of the interfaces.
Support for python bindings also introduced as a part of this version.
Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Moving these files should allow $lxcpath to be a read-only fs.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Adds a few useful string and array manipulation functions to utils.[ch]
Signed-off-by: Christian Seiler <christian@iwakd.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Newer glibc versions (that we can't require) allow for an additional
letter 'e' in the fopen mode that will cause the file to be opened with
the O_CLOEXEC flag, so that it will be closed if the program exec()s
away. This is important because if liblxc is used in a multithreaded
program, another thread might want to run a program. This options
prevents the leakage of file descriptors from LXC. This patch adds an
emulation for that that uses the open(2) syscall and fdopen(3). At some
later point in time, it may be dropped against fopen(..., "...e").
This commit also converts all fopen() calls in utils.c (where the
function is added) to fopen_cloexec(). Subsequently, other calls to
fopen() and open() should also be adapted.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Instead of duplicating the code for parsing the global config file for
each option, write one main function, lxc_global_config_value, that
does the parsing for an arbitrary option name and just call that
function from the existing ones.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
- Move attach functionality to a completely new API function for
attaching to containers. The API functions accepts the name of the
container, the lxcpath, a structure indicating options for attaching
and returns the pid of the attached process. The calling thread may
then use waitpid() or similar to wait for the attached process to
finish. lxc-attach itself is just a simple wrapper around the new
API function.
- Use CLONE_PARENT when creating the attached process from the
intermediate process. This allows the intermediate process to exit
immediately after attach and the original thread may supervise the
attached process directly.
- Since the intermediate process exits quickly, its only job is to
send the original process the pid of the attached process (as seen
from outside the pidns) and exit. This allows us to simplify the
synchronisation logic by quite a bit.
- Use O_CLOEXEC / SOCK_CLOEXEC on (hopefully) all FDs opened in the
main thread by the attach logic so that other threads of the same
program may safely fork+exec off. Also, use shutdown() on the
synchronisation socket, so that if another thread forks off without
exec'ing, the synchronisation will not fail. (Not tested whether
this solves this issue.)
- Instead of directly specifying a program to execute on the API
level, one specifies a callback function and a payload. This allows
code using the API to execute a custom function directly inside the
container without having to execute a program. Two default callbacks
are provided directly, one to execute an arbitrary program, another
to execute a shell. The lxc-attach utility will always use either
one of these default callbacks.
- More fine-grained control of the attached process on the API level
(not implemented in lxc-attach utility yet, some may not be sensible):
* Specify which file descriptors should be stdin/stdout/stderr of
the newly created process. If fds other than 0/1/2 are
specified, they will be dup'd in the attached process (and the
originals closed). This allows e.g. threaded applications to
specify pipes for communication with the attached process
without having to modify its own stdin/stdout/stderr before
running lxc-attach.
* Specify user and group id for the newly attached process.
* Specify initial working directory for the newly attached
process.
* Fine-grained control on whether to do any, all or none of the
following: move attached process into the container's init's
cgroup, drop capabilities of the process, set the processes's
personality, load the proper apparmor profile and (for partial
attaches to any but not mount-namespaces) whether to unshare the
mount namespace and remount /sys and /proc. If additional
features (SELinux policy, SMACK policy, ...) are implemented,
flags for those may also be provided.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Define a sha1sum_file() function in utils.c. Use that in lxcapi_create
to write out the sha1sum of the template being used. If libgnutls is
not found, then the template sha1sum simply won't be printed into the
container config.
This patch also trivially fixes some cases where SYSERROR is used after
a fclose (masking errno) and missing consts in mkdir_p.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Add a higher level console API that opens a tty/console and runs the
mainloop as well. Rename existing API to console_getfd(). Use these in
the python binding.
Allow attaching a console peer after container bootup, including if the
container was launched with -d. This is made possible by allocation of a
"proxy" pty as the peer when the console is attached to.
Improve handling of SIGWINCH, the pty size will be correctly set at the
beginning of a session and future changes when using the lxc_console() API
will be propagated to it as well.
Refactor some common code between lxc_console.c and console.c. The variable
wait4q (renamed to saw_escape) was static, making the mainloop callback not
safe across threads. This wasn't a problem when the callback was in the
non-threaded lxc-console, but now that it is internal to console.c, we have
to take care of it. This is now contained in a per-tty state structure.
Don't attempt to open /dev/null as the console peer since /dev/null cannot
be added to the mainloop (epoll_ctl() fails with EPERM). This isn't needed
to get the console setup (and the log to work) since the case of not having
a peer at console init time has to be handled to allow for attaching to it
later.
Move signalfd libc wrapper/replacement to utils.h.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This requires implementing bdev->ops->destroy() for each of the backing
store types. Then implementing lxcapi_clone(), writing lxc_destroy.c
using the api, and removing the lxc-destroy.in script.
(this also has a few other cleanups, like marking some functions
static)
Changelog:
fold into destroy: fix zfs destroy
destroy: use correct program name in help
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This fixes the build of lxccontainer.c on systems that have __NR_setns
but not HAVE_SETNS.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
/etc/lxc/lxc.conf can contain
zfsroot = custom1
lvm_vg = vg0
(Otherwise the defaults are 'lxc' for lvm_vg, and 'lxc' for zfsroot)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
1. commonize waitpid users to use a single helper. We frequently want
to run something in a clean namespace, or fork off a script. This
lets us keep the function doing fork:(1)exec(2)waitpid simpler.
2. start a blockdev backend implementation. This will be used for
mounting, copying, and snapshotting container filesystems.
3. implement btrfs, lvm, directory, and overlayfs backends.
4. For overlayfs, support a new lxc.rootfs format of
'bdevtype:<extra>'. This means you can now use overlayfs-based
containers without using lxc-start-ephemeral, by using
lxc.rootfs = overlayfs:/readonly-dir:writeable-dir
5. add a set of simple clone testcases
6. Write a new lxc_clone.c based on api clone.
Still to do (there's more, but off top of my head):
1. support zfs, aufs
2. have clone handle other mount entries (right now it only clones
the rootfs)
3. python, lua, and go bindings (not me :)
4. lxc-destroy: if lvm backing store, check for snapshots of it.
(what about directories which have overlayfs clones?)
Changes since v2:
Initialize random generator when picking new macaddr (reported
by caglar@10ur.org)
Fix wrong use of bitmask flags
On copy-clone of btrfs, create a subvolume
lxc_clone.c: respect the command line usage of the old script
lxc-clone(1): update documentation
Refuse to try changing backing stores expect to overlayfs, as
it is not implemented (yet) anyway.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Conflicts:
src/lxc/utils.h
This fixes a long standing issue that there could only be a single
lxc-monitor per container.
With this change, a new lxc-monitord daemon is spawned the first time
lxc-monitor is called against the container and will accept connections
from any subsequent lxc-monitor.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Commit e3642c43 added lxc_copy_file for use in 64e1ae63. The use of it
was removed in commit 1bc60a65. Removing it reduces dead code and the
footprint of liblxc.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
For the lxc-* C binaries, introduce a -P|--lxcpath command line option
to override the system default.
With this, I can
lxc-create -t ubuntu -n r1
lxc-create -t ubuntu -n r1 -P /home/ubuntu/lxcbase
lxc-start -n r1 -d
lxc-start -n r1 -d -P /home/ubuntu/lxcbase
lxc-console -n r1 -d -P /home/ubuntu/lxcbase
lxc-stop -n r1
all working with the right containers (module cgroup stuff).
To do:
* lxc monitor needs to be made to handle cgroups.
This is another very invasive one. I started doing this as
a part of this set, but that gets hairy, so I'm sending this
separately. Note that lxc-wait and lxc-monitor don't work
without this, and there may be niggles in what I said works
above - since start.c is doing lxc_monitor_send_state etc
to the shared abstract unix domain socket.
* Need to handle the cgroup conflicts.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Here is a patch to introduce a configurable system-wide
lxcpath. It seems to work with lxc-create, lxc-start,
and basic python3 lxc usage through the api.
For shell functions, a new /usr/share/lxc/lxc.functions is
introduced which sets some of the basic global variables,
including evaluating the right place for lxc_path.
I have not converted any of the other python code, as I was
not sure where we should keep the common functions (i.e.
for now just default_lxc_path()).
configure.ac: add an option for setting the global config file name.
utils: add a default_lxc_path() function
Use default_lxc_path in .c files
define get_lxc_path() and set_lxc_path() in C api
use get_lxc_path() in lua api
create sh helper for getting default path from config file
fix up scripts to use lxc.functions
Changelog:
feb6:
fix lxc_path in lxc.functions
utils.c: as Dwight pointed out, don't close a NULL fin.
utils.c: fix the parsing of lxcpath line
lxc-start: print which rcfile we are using
commands.c: As Dwight alluded to, the sockname handling was just
ridiculous. Clean that up.
use Dwight's recommendation for lxc.functions path: $datadir/lxc
make lxccontainer->get_config_path() return const char *
Per Dwight's suggestion, much nicer than returning strdup.
feb6 (v2):
lxccontainer: set c->config_path before using it.
convert legacy lxc-ls
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
ushort appears to be a glibc specific type which doesn't exist in
bionic, this commit simply replace all occurences by the equivalent
unsigned short type.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>