The kernel (net/core/dev_ioctl.c:dev_ioctl()) is going to NULL terminate
this name after the copy-in of the ifr, so even though this is a fixed
sized array the last byte isn't usable as part of the name. All the ioctls
we're using go through this code path.
Use the ifr name in the DEBUG message in case it was possibly truncated.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Note this results in nics named things like 'lxcuser-0p'. We'll
likely want to pass the requested name to lxc-user-nic, but let's
do that in a separate patch.
If we're not root, we can't create new network itnerfaces to pass
into the container. Instead wait until the container is started,
and call lxc-user-nic to create and assign the nics.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
It needs to be done from the handler, not the container, since
the container may not have the rights.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Changelog:
Jul 22: remove hardcoded path for /bin/chown
Jul 22: use new lxc-usernsexec
Conflicts:
src/lxc/lxccontainer.c
1. lxcapi_create: don't try to unshare and mount for dir backed containers
It's unnecessary, and breaks unprivileged lxc-create (since unpriv users
cannot yet unshare(CLONE_NEWNS)).
2. api_create: chown rootfs
chown rootfs to the host uid to which container root will be mapped
3. create: run template in a mapped user ns
4. use (setuid-root) newxidmap to set id_map if we are not root
This is needed to be able to set userns mappings as an unprivileged
user, for unprivileged lxc-start.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
In a few places we checked for LONG_MIN or LONG_MAX as indication
that strtoul failed. That's not reliable. As suggested in the
manpage, switch to checking errno value.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Those are a bit less obvious than those I pushed directly to master.
All those changes were required to build LXC under clang here.
With this, gcc can be replaced by clang to build LXC so long as you're
not using the python3 binding (as python extensions can't be built under
clang at the moment).
For reference, the clang output for those is: http://paste.ubuntu.com/6292460/
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
This fix is coming from Debian bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=720122
The reason for the hardcoded gid= and mode= is because of the fix for
CVE-2013-2207 which removes pt_chown from glibc and so requires proper
write access to devpts.
It looks like the "tty" group is guaranteed to be gid=5 on at least all
RedHat based and Debian based systems. So this hardcode gid shouldn't be
a big problem. If we however support any distro where that's not the
case, we'll need to implement an extra lxc.conf option and matching
template changes.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This adds quite a few more ways to mount the cgroup filesystem
automatically:
- Specify ro/rw/mixed:
- ro: everything mounted read-only
- rw: everything mounted read-write
- mixed: only container's own cgroup is rw, rest ro
(default)
- Add cgroup-full that mounts the entire cgroup tree to the
corresponding directories. ro/rw/mixed also apply here.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Improve lxc.mount.auto code: allow the user to specify whether to mount
certain things read-only or read-write. Also make the code much more
easily extensible for the future.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Currently, a maximum of one LSM within LXC will be initialized and
used. If in the future stacked LSMs become a reality, we can support it
without changing the configuration syntax and add support for more than
a single LSM at a time to the lsm code.
Generic LXC code should note that lsm_process_label_set() will take
effect "now" for AppArmor, and upon exec() for SELinux.
- fix Oracle template mounting of proc and sysfs, needed when using SELinux
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
pthread_mutex_lock() will only return an error if it was set to
PTHREAD_MUTEX_ERRORCHECK and we are recursively calling it (and
would otherwise have deadlocked). If that's the case then log a
message for future debugging and exit. Trying to "recover" is
nonsense at that point.
process_lock() was held over too long a time in lxcapi_start()
in the daemonize case. (note the non-daemonized case still needs a
check to enforce that it must NOT be called while threaded). Add
process_lock() at least across all open/close/socket() calls.
Anything done after a fork() doesn't need the locks as it is no
longer threaded - so some open/close/dups()s are not locked for
that reason. However, some common functions are called from both
threaded and non-threaded contexts. So after doing a fork(), do
a possibly-extraneous process_unlock() to make sure that, if we
were forked while pthread mutex was held, we aren't deadlocked by
nobody.
Tested that lp:~serge-hallyn/+junk/lxc-test still works with this
patch.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Tested-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Chane pinning mechanism: Use $rootfs/lxc.hold instead of $rootfs.hold
(in case $rootfs is a mountpoint itself), but delete the file
immediately after creating it (but keep it open). This will keep the
root filesystem busy but does not leave any unnecessary files lying
around.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This patch adds the lxc.mount.auto configuration option that allows the
user to specify that certain standard filesystems should be
automatically pre-mounted when the container is started.
Currently, four things are implemented:
- /proc (mounted read-write)
- /sys (mounted read-only)
- /sys/fs/cgroup (special logic, see mailing list discussions)
- /proc/sysrq-trigger (see below)
/proc/sysrq-trigger may be used from within a container to trigger a
forced host reboot (echo b > /proc/sysrq-trigger) or do other things
that a container shouldn't be able to do. The logic here is to
bind-mount /dev/null over /proc/sysrq-trigger, so that that cannot
happen. This obviously only protects fully if CAP_SYS_ADMIN is not
available inside the container (otherwise that bind-mount could be
removed).
Signed-off-by: Christian Seiler <christian@iwakd.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
It's a legitimate use case to use read-only $lxcpath. If we can't
create the pin file, then we're not worried about marking the fs
readonly on exit.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
The lxc configuration file currently supports 'lxc.cap.drop', a list of
capabilities to be dropped (using the bounding set) from the container.
The problem with this is that over time new capabilities are added. So
an older container configuration file may, over time, become insecure.
Walter has in the past suggested replacing lxc.cap.drop with
lxc.cap.preserve, which would have the inverse sense - any capabilities
in that set would be kept, any others would be dropped.
Realistically both have the same problem - the sendmail capabilities
bug proved that running code with unexpectedly dropped privilege can be
dangerous. This patch gives the admin a choice: You can use either
lxc.cap.keep or lxc.cap.drop, not both.
Both continue to be ignored if a user namespace is in use.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
initstate/random doesn't work on bionic, srand/rand works on everything,
so let's use that.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
This adds a local ifaddrs implementation to be used on Bionic or other C
libraries that don't come with a getifaddrs implementation.
This code was written by Kenneth MacKay and is under a two-clause BSD
license (copyright information in the file headers).
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Currently, if you create a container and use the mountcgruop hook,
you get the /lxc/c1/c1.real cgroup mounted to /. If you then try
to start containers inside that container, lxc can get confused.
This patch addresses that, by accepting that the cgroup as found
in /proc/self/cgroup can be partially hidden by bind mounts.
In this patch:
Add optional 'lxc.cgroup.use' to /etc/lxc/lxc.conf to specify which
mounted cgroup filesystems lxc should use. So far only the cgroup
creation respects this.
Keep separate cgroup information for each cgroup mountpoint. So if
the caller is in devices cgroup /a but cpuset cgroup /b that should
now be ok.
Change how we decide whether to ignore failure to set devices cgroup
settings. Actually look to see if our current cgroup already has the
settings. If not, add them.
Finally, the real reason for this patch: in a nested container,
/proc/self/cgroup says nothing about where under /sys/fs/cgroup you
might find yourself. Handle this by searching for our pid in tasks
files, and keep that info in the cgroup handler.
Also remove all strdupa from cgroup.c (not android-friendly).
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Using mktemp() leads to build time warnings and isn't actually
appropriate for what we want to do as it's checking for the existence of
a file and not a network interface.
Replace those calls by an equivalent mkifname() function which uses the
same template as mktemp but instead checks for existing network
interfaces.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Currently if loglevel/logfile are specified on command line in a
program using LXC api, and that program does any
container->save_config(), then the new config will be saved with the
loglevel/logfile specified on command line. This is wrong, especially
in the case of
cat > lxc.conf << EOF
lxc.logfile=a
EOF
lxc-create -t cirros -n c1 -o b
which will result in a container config with lxc.logfile=b.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Otherwise (a) there is a memory leak when using user namespaces and
clearing a config, and (b) saving a container configuration file doesn't
maintain the userns mapping. For instance, if container c1 has
lxc.id_map configuration entries, then
python3
import lxc
c=lxc.Container("c1")
c.save_config("/tmp/config1")
should show 'lxc.id_map =' entries in /tmp/config1.
Changelog for v2:
1. fix incorrect saving of group types (s/'c'/'g')
2. fix typo -> idmap->type should be idmap->idtype
Reported-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Tested-by: Dwight Engen <dwight.engen@oracle.com>
3.10 kernel comes with proper hierarchical enforcement of devices
cgroup. To keep that code somewhat sane, certain things are not
allowed. Switching from default-allow to default-deny and vice versa
are not allowed when there are children cgroups. (This *could* be
simplified in the kernel by checking that all child cgroups are
unpopulated, but that has not yet been done and may be rejected)
The mountcgroup hook causes lxc-start to break with 3.10 kernels, because
you cannot write 'a' to devices.deny once you have a child cgroup. With
this patch, (a) lxcpath is passed to hooks, (b) the cgroup mount hook sets
the container's devices cgroup, and (c) setup_cgroup() during lxc startup
ignores failures to write to devices subsystem if we are already in a
child of the container's new cgroup.
((a) is not really related to this bug, but is definately needed.
The followup work of making the other hooks use the passed-in lxcpath
is still to be done)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
The reason is that the generic code which handles reading
lxc.rootfs.mount always frees the old value if not NULL.
So without this setting lxc.rootfs.mount = /mnt causes
segfault.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Currently due to some safety checks for !rootfs.path, lxc-execute works
ok if you do not set lxc.rootfs at all in your lxc.conf. But if you
set lxc.rootfs = '/', then it sets up console, and when you do an
lxc-execute, the console appears hung.
However the lxc.rootfs NULL check was just incidental to not dereference
a NULL pointer. In fact we should not be setting up a console if the
container isn't running a full-fledged distro with a getty/login
running on the container's /dev/console.
Have lxc_execute() mark in lxc_conf that this is a lxc-execute and not
an lxc-start, and don't set up the console.
The issue is documented at https://sourceforge.net/p/lxc/bugs/67/ .
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Add a higher level console API that opens a tty/console and runs the
mainloop as well. Rename existing API to console_getfd(). Use these in
the python binding.
Allow attaching a console peer after container bootup, including if the
container was launched with -d. This is made possible by allocation of a
"proxy" pty as the peer when the console is attached to.
Improve handling of SIGWINCH, the pty size will be correctly set at the
beginning of a session and future changes when using the lxc_console() API
will be propagated to it as well.
Refactor some common code between lxc_console.c and console.c. The variable
wait4q (renamed to saw_escape) was static, making the mainloop callback not
safe across threads. This wasn't a problem when the callback was in the
non-threaded lxc-console, but now that it is internal to console.c, we have
to take care of it. This is now contained in a per-tty state structure.
Don't attempt to open /dev/null as the console peer since /dev/null cannot
be added to the mainloop (epoll_ctl() fails with EPERM). This isn't needed
to get the console setup (and the log to work) since the case of not having
a peer at console init time has to be handled to allow for attaching to it
later.
Move signalfd libc wrapper/replacement to utils.h.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
otherwise we won't be allowed to set an apparmor context (on pid 1)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Add a clone hook called from api_clone. Pass arguments to it from
lxc_clone.c.
The clone update hook is called while the container's bdev is mounted.
Information about the container is passed in through environment
variables LXC_ROOTFS_PATH, LXC_NAME, The LXC_ROOTFS_MOUNT, and
LXC_CONFIG_FILE.
LXC_ROOTFS_MOUNT=/usr/lib/x86_64-linux-gnu/lxc
LXC_CONFIG_FILE=/var/lib/lxc/demo3/config
LXC_ROOTFS_PATH=/var/lib/lxc/demo3/rootfs
LXC_NAME=demo3
So from the hook, updates to the container should be made under
$LXC_ROOTFS_MOUNT/ .
The hook also receives command line arguments as follows:
First argument is container name, second is always 'lxc', third
is the hook name (always clone), then come the arguments which
were passed to lxc-clone. I.e. when I did:
sudo lxc-clone demo2 demo3 -- hey there dude
the arguments passed in were "demo3 lxc clone hey there dude"
I personally would like to drop the first two arguments. The
name is available as $LXC_NAME, and the section argument ('lxc')
is meaningless. However, doing so risks invalidating existing
hooks.
Soon analogous create and destroy hooks will be added as well.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This allows some special cgroup items such as memory.kmem.limit_in_bytes
to be successfully set, since they must be set before any task is put
into the cgroup.
The devices cgroup is setup later giving the container a chance to mount
file systems before the device it might want to mount from becomes
unavailable.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
commit ab81cef053 meant to remove the
added break, but apparently i had not done 'git add' before commit
--amend. Remove the added break.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
also break once we have found root, no need to search the rest of the mounts
Changelog: May 6: Serge: don't add the break. (see m-l)
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
when releasing the conf, add free conf->rcfile which is from malloc
Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
1. commonize waitpid users to use a single helper. We frequently want
to run something in a clean namespace, or fork off a script. This
lets us keep the function doing fork:(1)exec(2)waitpid simpler.
2. start a blockdev backend implementation. This will be used for
mounting, copying, and snapshotting container filesystems.
3. implement btrfs, lvm, directory, and overlayfs backends.
4. For overlayfs, support a new lxc.rootfs format of
'bdevtype:<extra>'. This means you can now use overlayfs-based
containers without using lxc-start-ephemeral, by using
lxc.rootfs = overlayfs:/readonly-dir:writeable-dir
5. add a set of simple clone testcases
6. Write a new lxc_clone.c based on api clone.
Still to do (there's more, but off top of my head):
1. support zfs, aufs
2. have clone handle other mount entries (right now it only clones
the rootfs)
3. python, lua, and go bindings (not me :)
4. lxc-destroy: if lvm backing store, check for snapshots of it.
(what about directories which have overlayfs clones?)
Changes since v2:
Initialize random generator when picking new macaddr (reported
by caglar@10ur.org)
Fix wrong use of bitmask flags
On copy-clone of btrfs, create a subvolume
lxc_clone.c: respect the command line usage of the old script
lxc-clone(1): update documentation
Refuse to try changing backing stores expect to overlayfs, as
it is not implemented (yet) anyway.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Conflicts:
src/lxc/utils.h
The recent change to use strtok_r causes a build warning with this older
gcc version, so initialize saveptr to NULL to quiet the compiler and
unbreak the build. There was no warning with gcc 4.7.2 that I
originally tested with.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
pclose returns the exit status from wait, we need to check that to see if
the script itself failed or not. Tested a script that returned 0, 1, and
also one that did a sleep and then was killed by a signal.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
If the filesystem mounts on the host have the MS_SHARED or MS_SLAVE
flag set, and a container without a rootfs is started, then any new
mounts created inside the container are currently propagated into
the host. In addition to mounts placed in the configuration file of
the container or performed manually after startup, the automatic
mounting of /proc by lxc-execute will propagate back into the host,
effectively crippling the entire system. This can be prevented by
setting the MS_SLAVE flag on all mounts (inside the container's own
mount namespace) during startup if a rootfs is not configured.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Otherwise containers fail to start even if they aren't trying to map
ids.
Also don't allocate buf unless we need to.
Reported-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Had this changeset hanging around for some time, maybe this would be useful
until some better solution come up.
Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
The kernel requires a single atomic write for setting the /proc
idmap files. We were calling write(2) more than once when multiple
ranges were configured so instead build a buffer to pass in one write(2)
call.
Change id types to unsigned long to handle large id mappings gracefully.
Fix max id in example comment.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
The id ordering and case of u,g is also consistent with uidmapshift,
reducing confusion.
doc: Moved example to the the EXAMPLES section, and used values
corresponding to the defaults in the pending shadow-utils subuid patch.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
1. if there's no rootfs, return -2, not 0.
2. don't close pinfd unconditionally in do_start().
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: David Ward <david.ward@ll.mit.edu>
Add a monitor command to get the cgroup for a running container. This
allows container r1 started from /var/lib/lxc and container r1 started
from /home/ubuntu/lxcbase to pick unique cgroup directories (which
will be /sys/fs/cgroup/$subsys/lxc/r1 and .../r1-1), and all the lxc-*
tools to get that path over the monitor at lxcpath.
Rework the cgroup code. Before, if /sys/fs/cgroup/$subsys/lxc/r1
already existed, it would be moved to 'deadXXXXX', and a new r1 created.
Instead, if r1 exists, use r1-1, r1-2, etc.
I ended up removing both the use of cgroup.clone_children and support
for ns cgroup. Presumably we'll want to put support for ns cgroup
back in for older kernels. Instead of guessing whether or not we
have clone_children support, just always explicitly do the only thing
that feature buys us - set cpuset.{cpus,mems} for newly created cgroups.
Note that upstream kernel is working toward strict hierarchical
limit enforcements, which will be good for us.
NOTE - I am changing the lxc_answer struct size. This means that
upgrades to this version while containers are running will result
in lxc_* commands on pre-running containers will fail.
Changelog: (v3)
implement cgroup attach
fix a subtle bug arising when we lxc_get_cgpath() returned
STOPPED rather than -1 (STOPPED is 0, and 0 meant success).
Rename some functions and add detailed comments above most.
Drop all my lxc_attach changes in favor of those by Christian
Seiler (which are mostly the same, but improved).
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
As Kees pointed out, write() errors can be delayed and returned as
close() errors. So don't ignore error on close when writing the
userns id mapping.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
For the lxc-* C binaries, introduce a -P|--lxcpath command line option
to override the system default.
With this, I can
lxc-create -t ubuntu -n r1
lxc-create -t ubuntu -n r1 -P /home/ubuntu/lxcbase
lxc-start -n r1 -d
lxc-start -n r1 -d -P /home/ubuntu/lxcbase
lxc-console -n r1 -d -P /home/ubuntu/lxcbase
lxc-stop -n r1
all working with the right containers (module cgroup stuff).
To do:
* lxc monitor needs to be made to handle cgroups.
This is another very invasive one. I started doing this as
a part of this set, but that gets hairy, so I'm sending this
separately. Note that lxc-wait and lxc-monitor don't work
without this, and there may be niggles in what I said works
above - since start.c is doing lxc_monitor_send_state etc
to the shared abstract unix domain socket.
* Need to handle the cgroup conflicts.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Here is a patch to introduce a configurable system-wide
lxcpath. It seems to work with lxc-create, lxc-start,
and basic python3 lxc usage through the api.
For shell functions, a new /usr/share/lxc/lxc.functions is
introduced which sets some of the basic global variables,
including evaluating the right place for lxc_path.
I have not converted any of the other python code, as I was
not sure where we should keep the common functions (i.e.
for now just default_lxc_path()).
configure.ac: add an option for setting the global config file name.
utils: add a default_lxc_path() function
Use default_lxc_path in .c files
define get_lxc_path() and set_lxc_path() in C api
use get_lxc_path() in lua api
create sh helper for getting default path from config file
fix up scripts to use lxc.functions
Changelog:
feb6:
fix lxc_path in lxc.functions
utils.c: as Dwight pointed out, don't close a NULL fin.
utils.c: fix the parsing of lxcpath line
lxc-start: print which rcfile we are using
commands.c: As Dwight alluded to, the sockname handling was just
ridiculous. Clean that up.
use Dwight's recommendation for lxc.functions path: $datadir/lxc
make lxccontainer->get_config_path() return const char *
Per Dwight's suggestion, much nicer than returning strdup.
feb6 (v2):
lxccontainer: set c->config_path before using it.
convert legacy lxc-ls
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
If 'optional' is in the mount options, then avoid failure in
mount().
Experiments suggest we could just do this checking data at
mount_entry(), but that feels less proper than using
hasmntopt() against the mntent.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
In eglibc st_uid and st_gid are defined as unsigned integers, in bionic those
are defined as unsigned long (which is inconsistent with the kernel's
defintion that's uint_32).
To workaround this problem, simply cast those two to int.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
The 3.8 kernel now supporst uid mappings, so I believe it's appropriate
to proceed with this patchset.
The container config supports new entries of the form:
lxc.id_map = U 100000 0 10000
lxc.id_map = G 100000 0 10000
meaning map 'virtual' uids (in the container) 0-10000 to uids
100000-110000 on the host, and same for gids. So long as there are
mappings specified in the container config, then CONFIG_NEWUSER will
be used when the container is cloned. This means that container
setup is no longer done with root privilege on the host, only root
privilege in the container. Therefore cgroup setup is moved from the
init task to the monitor task.
To use this patchset, you currently need to either use the raring
kernel at ppa:serge-hallyn/usern-natty, or build your own kernel
from either git://kernel.ubuntu.com/serge/quantal-userns.git.
(Alternatively you can use Eric's tree at the latest userns-always-map-*
branch at
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
but you will likely want to at least enable tmpfs mounts in user namespaces)
You also need to chown the files in the container rootfs into the
mapped range. There is a utility at
https://code.launchpad.net/~serge-hallyn/+junk/nsexec to do this.
uidmapshift does the chowning, while the container-userns-convert
script nicely wraps that program. So I simply
sudo lxc-create -t ubuntu -n r1
sudo container-userns-convert r1 200000
will create a container which is shifted so uid 0 in the container
is uid 200000 on the host.
TODO: when doing setuid(0), need to only do that if 0 is one of the
ids we map to. Similarly, when dropping capabilities, need to only
not do that if 0 is one of the ids we map to. However, the question
of what to do for 'weird' containers in private user namespaces is
one I'm punting for later.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This is a first step to enabling user namespaces. When starting a
container in a new user namespace, the child will not have the
rights to write to the cgroup fs. (We can give it that right, but
don't always want to have to).
At the parent, we don't want to setup_cgroups() before the child
has set itself up. But we also don't want to wait until it has
started running it's init, since that is racy.
Therefore introduce a new sync point. The child will let the
parent know when it is ready to be confined, and wait for the
parent to respond that it has done so. Then the child will finish
constraining itself with LSM and seccomp and execute init.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Ok... Here's the patch again. Since Serge is removing the loglevel
structure member, this patch no longer references that element.
From the original description:
1) Removes run_makedev() and the call to it from conf.c per discussion.
2) Adds an lxc.hook.autodev hook.
Note: This hook is very close (one routine level abstracted) from where
the run_makedev was called. Anyone really rrreeeaaalllyyy needing
MAKEDEV can add it in with a small shim script to do whatever they want
under whatever distro they're using, so no functionality is lost there.
3) Added a number of environment variables for all the hook scripts to
reference to assist in execution. Things like LXC_ROOTFS_MOUNT could be
very useful but others were added as well. Room for more if anyone has
an itch. All in one spot in lxc_start.c.
4) clearenv and putenv( "container=lxc" ) calls were moved to just after
the "start" hook in the container just prior to actually firing up the
container so we could use environment variables prior to that and have
them flushed them before firing up init. Nice side effect is that you
can define environment variables and then call lxc-start and have them
show up in those hooks scripts.
5) I actually DID update the man page for lxc.conf! I guess I lied when
I said I wouldn't get that done.
[... and ...]
I added the rcfile to the lxc_conf structure as suggested and moved the
setenv bundle from lxc-start.c over to start.c just prior to calling
run_lxc_hooks for the pre-start hook.
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
The options are still supported in the lxc configuration file.
However they are stored only in local variables in src/lxc/log.c,
which can be read using two new functions:
int lxc_log_get_level(void);
const char *lxc_log_get_file(void);
Changelog: jan 14:
have lxc_log_init use lxc_log_set_file(), have lxc_log_set_file() take
a const char *, and have it keep its own strdup'd copy of the filename.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
There's no good reason to call setup_mount_entries if we don't have any
lxc.mount.entry. This also avoids an issue on bionic where the tmpfile()
call in setup_mount_entries requires the presence of /tmp which isn't the
case by default.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
__S_ISTYPE doesn't exist in all C libraries, so define it if it's missing.
Additionaly, replace one occurence where it wasn't actually needed.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Bionic (at least) is missing some of the usual mntent functions.
This adds code defining those that we need when they're missing from the C
library.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Some libc implementation (bionic) is lacking some of the syscall functions
that are present in the glibc.
For those, detect at build time the they are missing and implement a minimal
syscall() wrapper that will essentially give the same result as the glibc
function.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Some platforms don't have personality.h in their C library, this change
adds buildtime detection for the header and turns off the personality setting
code in those cases.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
In the effort to make LXC work with non-standard Linux distros, this change
allows for the user to build LXC without capability support through a new
--disable-capabilities option to configure.
This effectively will cause LXC not to link against libcap and will turn all
the _cap_ functions into no-ops.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
bionic is missing an openpty() function, so ship our own and only
build it and use it on bionic.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
LO_FLAGS_AUTOCLEAR isn't defined on bionic, so add an extra ifndef
and set it to its usual value if it's not.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
According to docs, mknod clears each permission bit whose
corresponding bit in the process umask is set, so we should fix it
before creating device nodes.
Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
lxc-start -c makes the named file/device the container's console, but using
this with a regular file in order to get a log of the console output does
not work very well if you also want to login on the console. This change
implements an additional option (-L) to simply log the console's output to
a file.
Both options can be used separately or together. For example to get a usable
console and log: lxc-start -n name -c /dev/tty8 -L console.log
The console state is cleaned up more when lxc_delete_console is called, and
some of the clean up paths in lxc_create_console were fixed.
The lxc_priv and lxc_unpriv macros were modified to make use of gcc's local
label feature so they can be expanded more than once in the same function.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
(I'll be out until Jan 2, but in the meantime, here is hopefully a
little newyears gift - this seems to allow lxc-start with / being
MS_SHARED on the host)
When / is MS_SHARED (for instance with f18 and modern arch), lxc-start
fails on pivot_root. The kernel enforces that, when doing pivot_root,
the parent of current->fs->root (as well as the new root and the putold
location) not be MS_SHARED.
To work around this, check /proc/self/mountinfo for a 'shared:' in
the '/' line. If it is there, then create a tiny MS_SLAVE tmpfs dir to
serve as parent of /, recursively bind mount / into /root under that dir,
make it rslave, and chroot into it.
Tested with ubuntu raring image after doing 'mount --make-rshared /'.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
If you start more than one lxc-start/lxc-execute with the same name at the
same time, or just do an lxc-start/lxc-execute with the name of a container
that is already running, lxc doesn't figure out that the container with this
name is already running until fairly late in the initialization process: ie
when __lxc_start() -> lxc_poll() -> lxc_command_mainloop_add() attempts to
create the same abstract socket name.
By this point a fair amount of initialization has been done that actually
messes up the running container. For example __lxc_start() -> lxc_spawn() ->
lxc_cgroup_create() -> lxc_one_cgroup_create() -> try_to_move_cgname() moves
the running container's cgroup to a name of deadXXXXXX.
The solution in this patch is to use the atomic existence of the abstract
socket name as the indicator that the container is already running. To do
so, I just refactored lxc_command_mainloop_add() into an lxc_command_init()
routine that attempts to bind the socket, and ensure this is called earlier
before much initialization has been done.
In testing, I verified that maincmd_fd was still open at the time of lxc_fini,
so the entire lifetime of the container's run should be covered. The only
explicit close of this fd was in the reboot case of lxcapi_start(), which is
now moved to lxc_fini(), which I think is more appropriate.
Even though it is not checked any more, set maincmd_fd to -1 instead of 0 to
indicate its not open since 0 could be a valid fd.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
For example doing "lxc-execute -n tmpct /bin/bash" will call setup_kmsg(), but
in this case rootfs->mount/dev directory doesn't even exist so the call to
symlink fails with ENOENT. Commit f62b3449 made this failure not fatal, but
we should not even try it when we know it will fail. See similar code in
setup_tty(), setup_console(), etc.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
When a physical nic is being set up, store its ifindex and original name
in struct lxc_conf. At reboot, reset the original name.
We can't just go over the original network list in lxc_conf at shutdown
because that may be tweaked in the meantime through the C api. The
saved_nics list is only setup during lxc_spawn(), and restored and
freed after lxc_start.
Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1086244
Changelog: remove non-effect change in execute.c
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Add 'lxc.logfile' and 'lxc.loglevel' config items. Values provided on
the command line override the config items.
Have lxccontainer not set a default loglevel and logfile.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
mounted-dev.conf won't be running that in container's userspace as it
previously would have, so make sure that all the devices it would have
created (other than ones which lxc later finagles) get created.
To achieve this, we have to first mount /dev, then run MAKEDEV, then
run setup_autodev to populate the rest of /dev.
Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1075717
Changelog:
v2: Use INFO rather than ERROR when makedev fails, since we won't stop the container boot.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This makes it easier to write a binding, and presents a cleaner API. Use
strdupa in a few places to get mutable strings for tokenizing / parsing.
Also change the argv type in lxcapi_start and lxcapi_create to match
that of execv(3).
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Most of these were found with valgrind by repeatedly doing lxc_container_new
followed by lxc_container_put. Also free memory when config items are
re-parsed, as happens when lxcapi_set_config_item() is called. Refactored
path type config items to use a common underlying routine.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Valgrind showed use of ->next field after item has been free()ed.
Introduce a lxc_list_for_each_safe() which allows traversal of a list
when the body of the loop may remove the currently iterated item.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Add a container config option to mount and populate /dev in a container.
We might want to add options to specify a max size for /dev other than
the default 100k, and to specify other devices to create. And maybe
someone can think of a better name than autodev.
Changelog: Don't error out if we couldn't mknod a /dev/ttyN.
Changelog: Describe the option in lxc.conf manpage.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
The check for conf->rootfs.mount not being equal to LXCROOTFSMOUNT
wasn't done with strcmp which was leading to undefined behaviour
and triggered gcc warnings.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Add a few missing #if's to fix compilation when configured without
AppArmor.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Then after lxcapi container->create(), free whatever lxc_conf may be
loaded and reload from the newly created configuration file.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This happens in the container's namespace, but before the rootfs is
setup and mounted. This gives us a chance to mangle the rootfs - i.e.
ecryptfs-mount it.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This turns liblxc into a public library implementing a container structure.
The container structure is meant to cover most LXC commands and can easily be
used to write bindings in other programming languages.
More information on the new functions can be found in src/lxc/lxccontainer.h
Test programs using the API can also be found in src/tests/
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Analogously to lxc.network.script.up, add the ability to register a down
script. It is called before the guest network is finally destroyed,
allowing to clean up resources that are not reset/destroyed
automatically. Parameters of the down script are identical to the up
script except for the execution context "down".
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>