v2: add get_config_item
clear_config_item is not supported, as it isn't for lxc.console, bc
you can do 'lxc.console.logfile =' to clear it. Likewise save_config
is not needed because the config is now just written through the
unexpanded char*.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Originally, we only kept a struct lxc_conf representing the current
container configuration. This was insufficient because lxc.include's
were expanded, so a clone or a snapshot would contain the expanded
include file contents, rather than the original "lxc.include". If
the host's include files are updated, clones and snapshots would not
inherit those updates.
To address this, we originally added a lxc_unexp_conf, which mirrored
the lxc_conf, except that lxc.include was not expanded.
This has its own cshortcomings, however, In particular, if a lxc.include
has a lxc.cgroup setting, and you use the api to say:
c.clear_config_item("lxc.cgroup")
this is not representable in the lxc_unexp_conf. (The original problem,
which was pointed out to me by stgraber, was slightly different, but
unlike this problem it was not unsolvable).
This patch changes the unexpanded configuration to be a textual
representation of the configuration. This allows us *order* the
configuration commands, which is what was not possible using the
struct lxc_conf *lxc_unexp_conf.
The write_config() now becomes a simple fwrite. However, lxc_clone
is slightly complicated in parts, the worst of which is the need to
rewrite the network configuration if we are changing the macaddrs.
With this patch, lxc-clone and clear_config_item do the right thing.
lxc-test-saveconfig and lxc-test-clonetest both pass.
There is room for improvement - multiple calls to
c.append_config_item("lxc.network.link", "lxcbr0")
will result in multiple such lines in the configuration file. In that
particular case it is harmless. There may be cases where it is not.
Overall, this should be a huge improvement in terms of correctness.
Changelog: Aug 1: updated to current lxc git head. All lxc-test* and
python api test passed.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Pull the #defines and struct definitions for btrfs into a separate
.h file to not clutter bdev.c
Implement btrfs recursive delete support
A non-root user isn't allow to do the ioctls needed for searching (as you can
verify with 'btrfs subvolume list'). So for an unprivileged user, if the
rootfs has subvolumes under it, deletion will fail. Otherwise, it will
succeed.
Changelog: Aug 1:
. Fix wrong objid passing when determining directory paths
. In do_remove_btrfs_children, avoid dereferencing NULL dirid
. Fix memleak in error case.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Factor this out of the lxc-net.conf upstart job, so that it can be used by
init.d scripts and systemd units, too.
Part of https://launchpad.net/bugs/1312532
Signed-off-by: Martin Pitt <martin.pitt@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
We only call it (so far) after doing a fork(), so this is fine. If we
ever need such a thing from threaded context, we'll simply need to write
our own version for android.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This gives me:
ubuntu@c-t1:~$ lxc-create -t download -n u1
lxc_container: No mapping for container root
lxc_container: Error chowning /home/ubuntu/.local/share/lxc/u1/rootfs to container root
lxc_container: You must either run as root, or define uid mappings
lxc_container: To pass uid mappings to lxc-create, you could create
lxc_container: ~/.config/lxc/default.conf:
lxc_container: lxc.include = /etc/lxc/default.conf
lxc_container: lxc.id_map = u 0 100000 65536
lxc_container: lxc.id_map = g 0 100000 65536
lxc_container: Error creating backing store type (none) for u1
lxc_container: Error creating container u1
when I create a container without having an id mapping defined.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This adds the few missing bits so that the new lxc.environment config
entry can be queried, cleared and saved as the others are.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
It's quite useful to be able to configure containers by specifying
environment variables, which init (or initscripts) can use to adjust the
container's operation.
This patch adds one new configuration parameter, `lxc.environment`, which
can be specified zero or more times to define env vars to set in the
container, like this:
lxc.environment = APP_ENV=production
lxc.environment = SYSLOG_SERVER=192.0.2.42
lxc.environment = SOMETHING_FUNNY=platypus
Default operation is unchanged; if the user doesn't specify any
lxc.environment parameters, the container environment will be what it is
today ('container=lxc').
Signed-off-by: Matt Palmer <mpalmer@hezmatt.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Introduce a new -F option (no-op for now) as an opposite of -d.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
We detect whether ovs-vsctl is available. If so, then we support
adding network interfaces to openvswitch bridges with it.
Note that with this patch, veths do not appear to be removed from the
openvswitch bridge. This seems a bug in openvswitch, as the veths
in fact do disappear from the system. If lxc is required to remove
the port from the bridge manually, that becomes more complicated
for unprivileged containers, as it would require a setuid-root
wrapper to be called at shutdown.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Rather than always using eth0. Otherwise unpriv containers cannot have
multiple lxc.network.type = veth's without manually setting
lxc.network.name =.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This patch adds SIGPWR support to lxc_init.
This helps to properly shutdown lxc_init based containers.
Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
These tests are failing on new kernels because the container root is
not privileged over the directories, since privilege no requires
the group being mapped into the container.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
The netdev->priv is shared for the netdev types. A bad config file
could mix configuration for different types, resulting in a bad
netdev->priv when starting or even destroying a container. So sanity
check the netdev->type before setting a netdev->priv element.
This should fix https://github.com/lxc/lxc/issues/254
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This is based on the patch submitted by:
Yuto KAWAMURA(kawamuray) <kawamuray.dadada@gmail.com>
Updated to use lxc.version rather than @LXC_VERSION@ and to apply to
both lxc-ls and lxc-device rather than just the former.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Currently do_reboot_and_check() is decreasing timeout variable even if
it is set to -1, so running 'lxc-stop --reboot --timeout=-1 ...' will
exits immediately at end of second iteration of loop, without waiting
container reboot.
Also, there is no need to call gettimeofday if timeout is set to -1, so
these statements should be evaluated only when timeout is enabled.
Signed-off-by: Yuto KAWAMURA(kawamuray) <kawamuray.dadada@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
/etc/filesystems could be contain blank lines and comments.
Change find_fstype_cb() to ignore blank lines and comments which starts
with '#'.
Signed-off-by: Yuto KAWAMURA(kawamuray) <kawamuray.dadada@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
New kernels require that to have privilege over a file, your
userns must have the old and new groups mapped into your userns.
So if a file is owned by our uid but another groupid, then we
have to chgrp the file to our primary group before we can try
(in a new user namespace) to chgrp the file to a group id in the
namespace.
But in some cases (when cloning) the file may already be mapped
into the container. Now we cannot chgrp the file to our own
primary group - and we don't have to.
So detect that case. Only try to chgrp the file to our primary
group if the file is owned by our euid (i.e. not by the container)
and the owning group is not already mapped into the container by
default.
With this patch, I'm again able to both create and clone containers
with no errors again.
Reported-by: S.Çağlar Onur <caglar@10ur.org>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
stat.st_gid is unsigned long in bionic instead of the expected gid_t, so
just cast it to gid_t.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Change idmap_add_id() to add both ID_TYPE_UID and ID_TYPE_GID entries
to an existing lxc_conf, not just an ID_TYPE_UID entry, so as to work
lxc-destroy with unprivileged containers on recent kernel.
Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Acked-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Change chown_mapped_root() to map in both the root uid and gid, not
just the uid, so as to work lxc-start with unprivileged containers on
recent kernel.
Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Previously this was done by strncpy, but now we just read
the len bytes - not including \0 - from a pipe, so pre-fill
@value with 0s to be safe.
This fixes the python3 api_test failure.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This allows users to get/set cgroup settings when logged into a different
session than that from which they started the container.
There is no cgmanager command to do an _abs variant of cgmanager_get_value
and cgmanager_set_value. So we fork off a new task, which enters the
parent cgroup of the started container, then can get/set the value from
there. The reason not to go straight into the container's cgroup is that
if we are freezing the container, or the container is already frozen, we'll
freeze as well :) The reason to fork off a new task is that if we are
in a cgroup which is set to remove-on-empty, we may not be able to return
to our original cgroup after making the change.
This should fix https://github.com/lxc/lxc/issues/246
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
write_config doesn't check the value sig_name function returns,
this causes write_config to produce corrupted container config when
using non-predefined signal names.
Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Note that building init.lxc.static still requires a static libutil.a
and libpthread.a, but these are available on most distro's through
glibc-static.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
When calling seccomp_rule_add(), you must pass the native syscall number
even if the context is a 32-bit context. So use resolve_name rather
than resolve_name_arch.
Enhance the check of /proc/self/status for Seccomp: so that we do not
enable seccomp policies if seccomp is not built into the kernel. This
is needed before we can enable by-default seccomp policies (which we
want to do next)
Fix wrong return value check from seccomp_arch_exist, and remove
needless abstraction in arch handling.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
seccomp_ctx is already a void*, so don't use 'scmp_filter_ctx *'
Separately track the native arch from the arch a rule is aimed at.
Clearly ignore irrelevant architectures (i.e. arm rules on x86)
Don't try to load seccomp (and don't fail) if we are already
seccomp-confined. Otherwise nested containers fail.
Make it clear that the extra seccomp ctx is only for compat calls
on 64-bit arch. (This will be extended to arm64 when libseccomp
supports it). Power may will complicate this (if ever it is supported)
and require a new rethink and rewrite.
NOTE - currently when starting a 32-bit container on 64-bit host,
rules pertaining to 32-bit syscalls (as opposed to once which have
the same syscall #) appear to be ignored. I can reproduce that without
lxc, so either there is a bug in seccomp or a fundamental
misunderstanding in how I"m merging the contexts.
Rereading the seccomp_rule_add manpage suggests that keeping the seccond
seccomp context may not be necessary, but this is not something I care
to test right now. If it's true, then the code could be simplified, and
it may solve my concerns about power.
With this patch I'm able to start nested containers (with seccomp
policies defined) including 32-bit and 32-bit-in-64-bit.
[ this patch does not yet add the default seccomp policy ]
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Commit 1fb86a7c introduced a way to drop capabilities without having to
specify them all explicitly. Unfortunately, there is no way to drop them
all, as just specifying an empty keep list, ie:
lxc.cap.keep =
clears the keep list, causing no capabilities to be dropped.
This change allows a special value "none" to be given, which will clear
all keep capabilities parsed up to this point. If the last parsed value
is none, all capabilities will be dropped.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Commit 0af683cf added clearing of capabilities to lxc-init, but only
after lxc_setup_fs() was done, likely so that the mounting done in
that routine wouldn't fail.
However, in my testing lxc_caps_reset() wasn't really effective
anyway since it did not clear the bounding set. Adding prctl
PR_CAPBSET_DROP in a loop from 0 to CAP_LAST_CAP would fix this, but I
don't think its necessary to forcefully clear all capabilities since
users can now specify lxc.cap.keep = none to drop all capabilities.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Originally we kept snapshots under /var/lib/lxcsnaps. If a
separate btrfs is mounted at /var/lib/lxc, then we can't
make btrfs snapshots under /var/lib/lxcsnaps.
This patch moves the default directory to /var/lib/lxc/c/snaps.
If /var/lib/lxcsnaps already exists, then we continue to use that.
add c->destroy_with_snapshots() and c->snapshot_destroy_all()
API methods. c->snashot_destroy_all() can be triggered from
lxc-snapshot using '-d ALL'. There is no command to call
c->destroy_with_snapshots(c) as of yet.
lxclock: use ".$lxcname" for container lock files
that way we can use /run/lock/lxc/$lxcpath/$lxcname/snaps as a
directory when locking snapshots without having to worry about
/run/lock//lxc/$lxcpath/$lxcname being a file.
destroy: split off a container_destroy
container_destroy() doesn't check for snapshots, so snapshot_rename can
use it. api_destroy() now does check for snapshots (previously it only
checked for fs - i.e. overlayfs/aufs - snapshots).
Add destroy to the manpage, as it was previously undocumented.
Update snapshot testcase accordingly.
[ rebased in the face of commits 840f05df and 7e36f87e. ]
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>