lxc_unstack_mountpoint() tries to clear all mountpoints from a given path.
It return the number of successful umounts on success and -errno on error.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Use the loop device helpers I wrote for LXD in LXC as well. They should be more
efficient.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The new{g,u}idmap binaries where a source of trouble for users when they lacked
sufficient privileges. This commit adds code to check for sufficient privilege.
It checks whether new{g,u}idmap is root owned and has the setuid bit set and if
it doesn't it checks whether new{g,u}idmap is root owned and has CAP_SETUID in
its CAP_PERMITTED and CAP_EFFECTIVE set.
Closes#296.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This commit adds lxc_switch_uid_gid() which allows to switch the uid and gid of
a process via setuid() and setgid() and lxc_setgroups() which allows to set
groups via setgroups(). The main advantage is that they nicely log the switches
they perform.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This macro can be used to set or allocate a string buffer that can hold any
64bit representable number.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This function safely parses an unsigned integer. On success it returns 0 and
stores the unsigned integer in @converted. On error it returns a negative
errno.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
lxc_append_string() appends strings without separator. This is mostly useful
for reading in whole files line-by-line.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
This is required by systemd to cleanly shutdown. Other init systems should not
have SIGRTMIN+3 in the blocked signals set.
Signed-off-by: Christian Brauner <cbrauner@suse.de>
In order to do this we make use of the MAP_FIXED flag of mmap(). MAP_FIXED
should be safe to use when it replaces an already existing mapping. To this
end, we establish an anonymous mapping that is one byte larger than the
underlying file. The pages handed to us are zero filled. Now we establish a
fixed-address mapping starting at the address we received from our anonymous
mapping and replace all bytes excluding the additional \0-byte with the file.
This allows us to use normal string-handling function. The idea implemented
here is similar to how shared libraries are mapped.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
This makes simplifying assumptions: all usable cgroups must be
mounted under /sys/fs/cgroup/controller or /sys/fs/cgroup/contr1,contr2.
Currently this will only work with cgroup namespaces, because
lxc.mount.auto = cgroup is not implemented. So cgfsng_ops_init()
returns NULL if cgroup namespaces are not enabled.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
We'll probably want to make this configurable with a
lxc.cgroupns = [1|0], but for now just always do it.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
Changelog 20160104: only try to unshare if /proc/self/ns/cgroup exists.
When a container starts up, lxc sets up the container's inital fstree
by doing a bunch of mounting, guided by the container configuration
file. The container config is owned by the admin or user on the host,
so we do not try to guard against bad entries. However, since the
mount target is in the container, it's possible that the container admin
could divert the mount with symbolic links. This could bypass proper
container startup (i.e. confinement of a root-owned container by the
restrictive apparmor policy, by diverting the required write to
/proc/self/attr/current), or bypass the (path-based) apparmor policy
by diverting, say, /proc to /mnt in the container.
To prevent this,
1. do not allow mounts to paths containing symbolic links
2. do not allow bind mounts from relative paths containing symbolic
links.
Details:
Define safe_mount which ensures that the container has not inserted any
symbolic links into any mount targets for mounts to be done during
container setup.
The host's mount path may contain symbolic links. As it is under the
control of the administrator, that's ok. So safe_mount begins the check
for symbolic links after the rootfs->mount, by opening that directory.
It opens each directory along the path using openat() relative to the
parent directory using O_NOFOLLOW. When the target is reached, it
mounts onto /proc/self/fd/<targetfd>.
Use safe_mount() in mount_entry(), when mounting container proc,
and when needed. In particular, safe_mount() need not be used in
any case where:
1. the mount is done in the container's namespace
2. the mount is for the container's rootfs
3. the mount is relative to a tmpfs or proc/sysfs which we have
just safe_mount()ed ourselves
Since we were using proc/net as a temporary placeholder for /proc/sys/net
during container startup, and proc/net is a symbolic link, use proc/tty
instead.
Update the lxc.container.conf manpage with details about the new
restrictions.
Finally, add a testcase to test some symbolic link possibilities.
Reported-by: Roman Fiedler
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
In various places throughout the code, we want to "nullify" the std fds,
opening them to /dev/null or zero or so. Instead, let's unify this code and do
it in such a way that Coverity (probably) won't complain.
v2: use /dev/null for stdin as well
v3: add a comment about use of C's short circuiting
v4: axe comment, check errors on dup2, s/quiet/need_null_stdfds
Reported-by: Coverity
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
To set lsm labels, a namespace-local proc mount is needed.
If a container does not have a lxc.mount.auto = proc set, then
tasks in the container do not have a correct /proc mount until
init feels like doing the mount. At startup we handlie this
by mounting a temporary /proc if needed. We weren't doing this
at attach, though, so that
lxc-start -n $container
lxc-wait -t 5 -s RUNNING -n $container
lxc-attach -n $container -- uname -a
could in a racy way fail with something like
lxc-attach: lsm/apparmor.c: apparmor_process_label_set: 183 No such file or directory - failed to change apparmor profile to lxc-container-default
Thanks to Chris Townsend for finding this bug at
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1452451
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Doing this requires some btrfs functions from bdev to be used in
utils.c Because utils.h is imported by lxc_init.c, I had to create
a new initutils.[ch] which are used by both lxc_init.c and utils.c
We could instead put the btrfs functions into utils.c, which would
be a shorter patch, but it really doesn't belong there. So I went
the other way figuring there may be more such cases coming up of
fns in utils.c needing code from bdev.c which can't go into lxc_init.
Currently, if we detect a btrfs subvolume we just remove it. The
st_dev on that dir is different, so we cannot detect if this is
bound in from another fs easily. If we care, we should check
whether this is a mountpoint, this patch doesn't do that.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Instead of having a parent process that's called whatever the caller of the
library is called, we instead set it to "[lxc monitor] <lxcpath> <container>"
Closes#180
v2: check for null in tok for loop, only truncate environment when necessary
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
If you have 'lxc.include = /some/dir' and /some/dir is a directory, then any
'*.conf" files under /some/dir will be read.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Function of enter_to_ns() is useful but currently is static for
lxccontainer.c.
This patch split it into two parts named as switch_to_newuser()
and switch_to_newnet() into utils.c.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
This patch adds support for checkpointing and restoring containers via CRIU.
It adds two api calls, ->checkpoint and ->restore, which are wrappers around
the CRIU CLI. CRIU has an RPC API, but reasons for preferring exec() are
discussed in [1].
To checkpoint, users specify a directory to dump the container metadata (CRIU
dump files, plus some additional information about veth pairs and which
bridges they are attached to) into this directory. On restore, this
information is read out of the directory, a CRIU command line is constructed,
and CRIU is exec()d. CRIU uses the lxc-restore-net callback (which in turn
inspects the image directory with the NIC data) to properly restore the
network.
This will only work with the current git master of CRIU; anything as of
a152c843 should work. There is a known bug where containers which have been
restored cannot be checkpointed [2].
[1]: http://lists.openvz.org/pipermail/criu/2014-July/015117.html
[2]: http://lists.openvz.org/pipermail/criu/2014-August/015876.html
v2: fixed some problems with the s/int/bool return code form api function
v3: added a testcase, fixed up the man page synopsis
v4: fix a small typo in lxc-test-checkpoint-restore
v5: remove a reference to the old CRIU_PATH, and a bad error about the same
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Originally we kept snapshots under /var/lib/lxcsnaps. If a
separate btrfs is mounted at /var/lib/lxc, then we can't
make btrfs snapshots under /var/lib/lxcsnaps.
This patch moves the default directory to /var/lib/lxc/c/snaps.
If /var/lib/lxcsnaps already exists, then we continue to use that.
add c->destroy_with_snapshots() and c->snapshot_destroy_all()
API methods. c->snashot_destroy_all() can be triggered from
lxc-snapshot using '-d ALL'. There is no command to call
c->destroy_with_snapshots(c) as of yet.
lxclock: use ".$lxcname" for container lock files
that way we can use /run/lock/lxc/$lxcpath/$lxcname/snaps as a
directory when locking snapshots without having to worry about
/run/lock//lxc/$lxcpath/$lxcname being a file.
destroy: split off a container_destroy
container_destroy() doesn't check for snapshots, so snapshot_rename can
use it. api_destroy() now does check for snapshots (previously it only
checked for fs - i.e. overlayfs/aufs - snapshots).
Add destroy to the manpage, as it was previously undocumented.
Update snapshot testcase accordingly.
[ rebased in the face of commits 840f05df and 7e36f87e. ]
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Move choose_init into utils.c so we can re-use it. Make it and on_path
accept an optional rootfs argument to prepend to the paths when checking
whether the file exists.
Also add lxc.init.static to .gitignore
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
backing stores supported by qemu-nbd can be attached to a nbd block
device using qemu-nbd. This user-space process (pair) stays around for
the duration of the device attachment. Obviously we want it to go away
when the container shuts down, but not before the filesystems have been
cleanly unmounted.
The device attachment is done from the task which will become the
container monitor before the container setup+init task is spawned.
That task starts in a new pid namespace to ensure that the qemu-nbd
process will be killed if need be. It sets its parent death signal
to sighup, and, on receiving sighup, attempts to do a clean
qemu-device detach, then exits. This should ensure that the
device is detached if the qemu monitor crashes or exits.
It may be worth adding a delay before the qemu-nbd is detached, but
my brief tests haven't seen any data corruption.
Only the parts required for running a nbd-backed container are
implemented here. Create, destroy, and clone are not. The first
use of this that I imagine is for people to use downloaded nbd-backed
images (like ubuntu cloud images, or anything previously used with
qemu). I imagine people will want to create/clone/destroy out of
band using qemu-img, but if I'm wrong about that we can implement
the rest later.
Because attach_block_device() is done before the bdev is initialized,
and bdev_init needs to know the nbd index so that it can mount the
filesystem, we now need to pass the lxc_conf.
file_exists() is moved to utils.c so we can use it from bdev.c
The nbd attach/detach should lay the groundwork for trivial implementation
of qed and raw images.
changelog (may 12): fix idx check at detach
changelog (may 15): generalize qcow2 to nbd
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Only do the funky chroot_into_slave if / is in fact the rootfs.
Rootfs is a special blacklisted case for pivot_root.
If / is not rootfs but is shared, just mount / rslave. We're
already in our own namespace.
This appears to solve the extra /proc/$$/mount entries in
containers and the host directories in lxc-attach which have
been plagueing at least fedora and arch.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This change makes it possible to create unprivileged containers as root.
They will be stored in the usual system wide location, use the usual
system wide cache but will be running using a uid/gid map.
This also updates lxc_usernsexec to use the same function as the rest of
LXC, centralizing all the userns switch in a single function.
That function now detects the presence of newuidmap and newgidmap on the
system, if they are present, they will be used for containers created as
either user or root. If they're not and the user isn't root, an error is
shown. If they're not and the user is root, LXC will directly set the
uid_map and gid_map values.
All that should allow for a consistent experience as well as supporting
distributions that don't yet ship newuidmap/newgidmap.
To make things simpler in the future, an helper function "on_path" is
also introduced and used to detect the presence of newuidmap and
newgidmap.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
(this expands on Dwight's recent patch, commit c597baa8f9)
After unshare(CLONE_NEWNS) and before doing any mounting, always
check whether rootfs is shared. Otherwise template runs or clone
scripts can bleed mount activity to the host.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Since we're no longer always returning a getenv result or some defined
string, the callers should cleanup the variable after use.
As a result, change from const char* to char*, add the needed free()
everywhere and use strdup() on strings coming from getenv.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
If it (or any variation thereof) is in the container configuration,
then mount /sys/fs/cgroup/cgmanager.lower (if it exists) or
/sys/fs/cgroup/cgmanager into the container so it can run a
cgproxy.
Also make sure to clear our groups when we start or attach to a
container. Else with unprivileged containers we end up with
lots of nogroups listed in /proc/1/status.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
The cgroupfs-specific code is moved from attach.c to cgroup.c.
lxc-cgmanager now only chgrps the container's cgroup, so that the
unprivileged user still owns the tasks file allowing him to enter
the container cgroup (for attach).
Some other changes rolled into the cgmanager update:
Make the list of subsystems not per-handler, as it will not change. As
a result, the only state we need to keep in the per-handler cgroup data
is the char *cgroup_path, so we can drop the cgm_data struct altogether.
Catch nih errors (as not doing so causes later crashes).
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This also fixes unprivileged use of lxc-snapshot and lxc-rename.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
By setting lxc.network.hwaddr to something like fe:xx:xx:xx:xx:xx each
"x" will be replaced by a random value. If less significant bit of
first byte is "templated", it will be set to 0.
This change introduce also a common randinit() function that could be
used to initialize random generator.
Signed-off-by: gza <lxc@zitta.fr>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Instead of having one function for each possible key in lxc.conf which
doesn't really scale and requires an API update for every new key,
switch to a generic lxc_get_global_config_item() function which takes a
key name as argument.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Mark most of functions that are used within only one file as static.
After 95ee490bbd it's easy to prove they
are not in public API.
Several arrays and structs are also marked static.
This prevents them from being exported from liblxc.so
List of removed previously exported symbols:
bdevs
btrfs_ops
check_autodev
create_partial
dir_ops
dump_stacktrace
get_mapped_rootid
get_next_index
lock_mutex
loop_ops
lvm_ops
lxc_abort
lxcapi_clone
lxc_attach_drop_privs
lxc_attach_get_init_uidgi
lxc_attach_getpwshell
lxc_attach_remount_sys_pr
lxc_attach_set_environmen
lxc_attach_to_ns
lxc_clear_saved_nics
lxc_config_readline
lxc_devs
lxc_free_idmap
lxc_global_config_value
lxc_poll
lxc_proc_get_context_info
lxc_set_state
lxc_spawn
mk_devtmpfs
mount_check_fs
ongoing_create
overlayfs_destroy
overlayfs_ops
prepend_lxc_header
remove_partial
save_phys_nics
setup_pivot_root
signames
static_mutex
thread_mutex
unlock_mutex
unpriv_assign_nic
zfs_ops
Signed-off-by: Andrey Mazo <mazo@telum.ru>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
After 95ee490bbd they are not in public
API and are not used throughout the lxc codebase.
This has a bonus of removing workaround for bionic.
Signed-off-by: Andrey Mazo <mazo@telum.ru>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Functions like open(), close(), socket(), socketpair(), pipe() and mkdir()
are generally thin wrappers around kernel-provided system calls.
It's the kernel not libc, who ensures race-free handling of file
descriptors.
Thus locking around these functions is unnecessary even on somewhat buggy libcs.
fopen(), fclose() and other stdio functions may maintain internal lists
of open file handles and thus can be prone to race-conditions.
Hopefully, most libcs utilize proper locking or other ways to ensure
thread-safety of these functions.
Bionic used to have non-thread-safe stdio [2] but that must be fixed
since android 4.3 [3, 4].
S.Çağlar Onur showed [1] that openpty() (because of nsswitch) is not thread-safe though.
So we workaround it by protecting openpty() calls with process_lock()/process_unlock().
Because of the need to guard openpty() with process_lock()/process_unlock(),
process_unlock() is still used after fork().
This commit reverts most of 025ed0f391.
[1] https://github.com/lxc/lxc/pull/106#issuecomment-31077269
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=687367
[3] f582340a6a
[4] 6b3f49a537
Signed-off-by: Andrey Mazo <mazo@telum.ru>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Currently, all scripts, specified as "lxc.network.script.up", inherit
lxc-execute's signal mask.
This, for example, includes blocked SIGALRM signal which, in turn, makes
alarm(2), sleep(3) and setitimer(2) functions silently unusable in all programs,
invoked in turn by the "lxc.network.script.up".
To fix this, run_buffer() should restore default signal mask prior to
executing "lxc.network.script.up".
A naive implementation would temprorary unblock all signals just before
calling popen() and block them back immediately after it.
But that would result in an immediate delivery of all pending signals just
after their unblocking.
Thus, we should restore default signal mask exactly in child (after fork())
just before calling exec().
To achieve this, a home-brewed popen() alternative is needed.
The added lxc_popen() and lxc_pclose() are mostly taken from glibc with
several simplifications (as we currently need only "re" mode).
The implementation uses Linux-specific pipe2() system-call,
which is only available since Linux 2.6.27 and supported by glibc since
version 2.9 (according to pipe(2) man-page), but this shouldn't be a
problem as lxc requires a fairly recent kernel too.
lxc_popen()/lxc_pclose() are meant to be direct replacements for their
stdio counterparts, so they perform no process_lock() locking
themselves. (as fopen_cloexec() does)
All existing users of popen()/pclose() are converted to the new
lxc_popen()/lxc_pclose().
(mazo: don't clear close-on-exec flag for parent's end;
place the new functions in utils.c;
convert bdev.c to use the new functions;
coding style fixes;
comments fixes;
commit message tweaks)
Signed-off-by: Ivan Bolsunov <bolsunov@telum.ru>
Signed-off-by: Andrey Mazo <mazo@telum.ru>
In general proc gets mounted ahead of time, so init shouldn't
have to do it. Without this patch, you cannot
lxc-execute -n x1 -s lxc.cap.drop=sys_admin /bin/bash
(See https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1253669 for
a bug about this)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Conflict occurs between following lines
[...]
269 if (values[i])
270 return values[i];
[...]
and
[...]
309 /* could not find value, use default */
310 values[i] = (*ptr)[1];
[...]
fix it using a specific lock dedicated to that problem as Serge suggested.
Also introduce a new autoconf parameter (--enable-mutex-debugging) to convert mutexes to error reporting type and to provide a stacktrace when locking fails.
Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
get_ips accepts an interface name as a parameter but there was no
way to get the interfaces names from the container. This patch
introduces a new get_interfaces call to the API so that users
can obtain the name of the interfaces.
Support for python bindings also introduced as a part of this version.
Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Moving these files should allow $lxcpath to be a read-only fs.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Adds a few useful string and array manipulation functions to utils.[ch]
Signed-off-by: Christian Seiler <christian@iwakd.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Newer glibc versions (that we can't require) allow for an additional
letter 'e' in the fopen mode that will cause the file to be opened with
the O_CLOEXEC flag, so that it will be closed if the program exec()s
away. This is important because if liblxc is used in a multithreaded
program, another thread might want to run a program. This options
prevents the leakage of file descriptors from LXC. This patch adds an
emulation for that that uses the open(2) syscall and fdopen(3). At some
later point in time, it may be dropped against fopen(..., "...e").
This commit also converts all fopen() calls in utils.c (where the
function is added) to fopen_cloexec(). Subsequently, other calls to
fopen() and open() should also be adapted.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Instead of duplicating the code for parsing the global config file for
each option, write one main function, lxc_global_config_value, that
does the parsing for an arbitrary option name and just call that
function from the existing ones.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
- Move attach functionality to a completely new API function for
attaching to containers. The API functions accepts the name of the
container, the lxcpath, a structure indicating options for attaching
and returns the pid of the attached process. The calling thread may
then use waitpid() or similar to wait for the attached process to
finish. lxc-attach itself is just a simple wrapper around the new
API function.
- Use CLONE_PARENT when creating the attached process from the
intermediate process. This allows the intermediate process to exit
immediately after attach and the original thread may supervise the
attached process directly.
- Since the intermediate process exits quickly, its only job is to
send the original process the pid of the attached process (as seen
from outside the pidns) and exit. This allows us to simplify the
synchronisation logic by quite a bit.
- Use O_CLOEXEC / SOCK_CLOEXEC on (hopefully) all FDs opened in the
main thread by the attach logic so that other threads of the same
program may safely fork+exec off. Also, use shutdown() on the
synchronisation socket, so that if another thread forks off without
exec'ing, the synchronisation will not fail. (Not tested whether
this solves this issue.)
- Instead of directly specifying a program to execute on the API
level, one specifies a callback function and a payload. This allows
code using the API to execute a custom function directly inside the
container without having to execute a program. Two default callbacks
are provided directly, one to execute an arbitrary program, another
to execute a shell. The lxc-attach utility will always use either
one of these default callbacks.
- More fine-grained control of the attached process on the API level
(not implemented in lxc-attach utility yet, some may not be sensible):
* Specify which file descriptors should be stdin/stdout/stderr of
the newly created process. If fds other than 0/1/2 are
specified, they will be dup'd in the attached process (and the
originals closed). This allows e.g. threaded applications to
specify pipes for communication with the attached process
without having to modify its own stdin/stdout/stderr before
running lxc-attach.
* Specify user and group id for the newly attached process.
* Specify initial working directory for the newly attached
process.
* Fine-grained control on whether to do any, all or none of the
following: move attached process into the container's init's
cgroup, drop capabilities of the process, set the processes's
personality, load the proper apparmor profile and (for partial
attaches to any but not mount-namespaces) whether to unshare the
mount namespace and remount /sys and /proc. If additional
features (SELinux policy, SMACK policy, ...) are implemented,
flags for those may also be provided.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Define a sha1sum_file() function in utils.c. Use that in lxcapi_create
to write out the sha1sum of the template being used. If libgnutls is
not found, then the template sha1sum simply won't be printed into the
container config.
This patch also trivially fixes some cases where SYSERROR is used after
a fclose (masking errno) and missing consts in mkdir_p.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Add a higher level console API that opens a tty/console and runs the
mainloop as well. Rename existing API to console_getfd(). Use these in
the python binding.
Allow attaching a console peer after container bootup, including if the
container was launched with -d. This is made possible by allocation of a
"proxy" pty as the peer when the console is attached to.
Improve handling of SIGWINCH, the pty size will be correctly set at the
beginning of a session and future changes when using the lxc_console() API
will be propagated to it as well.
Refactor some common code between lxc_console.c and console.c. The variable
wait4q (renamed to saw_escape) was static, making the mainloop callback not
safe across threads. This wasn't a problem when the callback was in the
non-threaded lxc-console, but now that it is internal to console.c, we have
to take care of it. This is now contained in a per-tty state structure.
Don't attempt to open /dev/null as the console peer since /dev/null cannot
be added to the mainloop (epoll_ctl() fails with EPERM). This isn't needed
to get the console setup (and the log to work) since the case of not having
a peer at console init time has to be handled to allow for attaching to it
later.
Move signalfd libc wrapper/replacement to utils.h.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This requires implementing bdev->ops->destroy() for each of the backing
store types. Then implementing lxcapi_clone(), writing lxc_destroy.c
using the api, and removing the lxc-destroy.in script.
(this also has a few other cleanups, like marking some functions
static)
Changelog:
fold into destroy: fix zfs destroy
destroy: use correct program name in help
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This fixes the build of lxccontainer.c on systems that have __NR_setns
but not HAVE_SETNS.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
/etc/lxc/lxc.conf can contain
zfsroot = custom1
lvm_vg = vg0
(Otherwise the defaults are 'lxc' for lvm_vg, and 'lxc' for zfsroot)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
1. commonize waitpid users to use a single helper. We frequently want
to run something in a clean namespace, or fork off a script. This
lets us keep the function doing fork:(1)exec(2)waitpid simpler.
2. start a blockdev backend implementation. This will be used for
mounting, copying, and snapshotting container filesystems.
3. implement btrfs, lvm, directory, and overlayfs backends.
4. For overlayfs, support a new lxc.rootfs format of
'bdevtype:<extra>'. This means you can now use overlayfs-based
containers without using lxc-start-ephemeral, by using
lxc.rootfs = overlayfs:/readonly-dir:writeable-dir
5. add a set of simple clone testcases
6. Write a new lxc_clone.c based on api clone.
Still to do (there's more, but off top of my head):
1. support zfs, aufs
2. have clone handle other mount entries (right now it only clones
the rootfs)
3. python, lua, and go bindings (not me :)
4. lxc-destroy: if lvm backing store, check for snapshots of it.
(what about directories which have overlayfs clones?)
Changes since v2:
Initialize random generator when picking new macaddr (reported
by caglar@10ur.org)
Fix wrong use of bitmask flags
On copy-clone of btrfs, create a subvolume
lxc_clone.c: respect the command line usage of the old script
lxc-clone(1): update documentation
Refuse to try changing backing stores expect to overlayfs, as
it is not implemented (yet) anyway.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Conflicts:
src/lxc/utils.h
This fixes a long standing issue that there could only be a single
lxc-monitor per container.
With this change, a new lxc-monitord daemon is spawned the first time
lxc-monitor is called against the container and will accept connections
from any subsequent lxc-monitor.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Commit e3642c43 added lxc_copy_file for use in 64e1ae63. The use of it
was removed in commit 1bc60a65. Removing it reduces dead code and the
footprint of liblxc.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
For the lxc-* C binaries, introduce a -P|--lxcpath command line option
to override the system default.
With this, I can
lxc-create -t ubuntu -n r1
lxc-create -t ubuntu -n r1 -P /home/ubuntu/lxcbase
lxc-start -n r1 -d
lxc-start -n r1 -d -P /home/ubuntu/lxcbase
lxc-console -n r1 -d -P /home/ubuntu/lxcbase
lxc-stop -n r1
all working with the right containers (module cgroup stuff).
To do:
* lxc monitor needs to be made to handle cgroups.
This is another very invasive one. I started doing this as
a part of this set, but that gets hairy, so I'm sending this
separately. Note that lxc-wait and lxc-monitor don't work
without this, and there may be niggles in what I said works
above - since start.c is doing lxc_monitor_send_state etc
to the shared abstract unix domain socket.
* Need to handle the cgroup conflicts.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Here is a patch to introduce a configurable system-wide
lxcpath. It seems to work with lxc-create, lxc-start,
and basic python3 lxc usage through the api.
For shell functions, a new /usr/share/lxc/lxc.functions is
introduced which sets some of the basic global variables,
including evaluating the right place for lxc_path.
I have not converted any of the other python code, as I was
not sure where we should keep the common functions (i.e.
for now just default_lxc_path()).
configure.ac: add an option for setting the global config file name.
utils: add a default_lxc_path() function
Use default_lxc_path in .c files
define get_lxc_path() and set_lxc_path() in C api
use get_lxc_path() in lua api
create sh helper for getting default path from config file
fix up scripts to use lxc.functions
Changelog:
feb6:
fix lxc_path in lxc.functions
utils.c: as Dwight pointed out, don't close a NULL fin.
utils.c: fix the parsing of lxcpath line
lxc-start: print which rcfile we are using
commands.c: As Dwight alluded to, the sockname handling was just
ridiculous. Clean that up.
use Dwight's recommendation for lxc.functions path: $datadir/lxc
make lxccontainer->get_config_path() return const char *
Per Dwight's suggestion, much nicer than returning strdup.
feb6 (v2):
lxccontainer: set c->config_path before using it.
convert legacy lxc-ls
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
ushort appears to be a glibc specific type which doesn't exist in
bionic, this commit simply replace all occurences by the equivalent
unsigned short type.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
All the signals (except fatal ones) are redirected to signalfd at lxc_init,
so the LXC_TTY_HANDLERs are redundant. This patch removes them.
Signed-off-by: Jian Xiao <jian@linux.vnet.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
First of all, when trying to start a container in a read-only root
lxc-start complains:
lxc-start: Read-only file system - can't make temporary mountpoint
This is in conf.c:setup_rootfs_pivot_root() function. That function
uses optional parameter "lxc.pivotdir", or creates (and later removes)
a temporary directory for pivot_root. Obviously there's no way to
create a directory in a read-only filesystem.
But lxc.pivotdir does not work either. In the function mentioned above
it is used with leading dot (eg. if I specify "lxc.pivotdir=pivot" in
the config file the pivot_root() syscall will be made to ".pivot" with
leading dot, not to "pivot"), but later on it is used without that dot,
and fails:
lxc-start: No such file or directory - failed to open /pivot/proc/mounts
lxc-start: No such file or directory - failed to read or parse mount list '/pivot/proc/mounts'
lxc-start: failed to pivot_root to '/stage/t'
(that's with "lxc.pivotdir = pivot" in the config file). After symlinking
pivot to .pivot it still fails:
lxc-start: Device or resource busy - could not unmount old rootfs
lxc-start: failed to pivot_root to '/stage/t'
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Add utility functions to parse a u16 and put a u16 on a
netlink message
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
This is not required immidiately but may be used by other init.
Signed-off-by: Michel Normand <normand@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
This patch fix a problem with the commit d983b93c3a
When the lxc daemonize, it closes fd 0, 1 and 2. But these ones are coming from
inherited fd and they are already in the inherited list of fd. When lxc creates
some file descriptors, they have the number of the previous inherited file
descriptor, so they are closed when we close all the inherited file descriptors.
In order to fix that, the lxc_close_inherited_fd function has been implemented
to close an inherited fd and remove it from the list.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
This patch makes the intermediate lxc processes to close the
inherited file descriptor. The child process will inherit these fd
in any case and that will be up to it to handle them.
Signed-off-by: Michel Normand <normand@fr.ibm.com>