The id ordering and case of u,g is also consistent with uidmapshift,
reducing confusion.
doc: Moved example to the the EXAMPLES section, and used values
corresponding to the defaults in the pending shadow-utils subuid patch.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
1. if there's no rootfs, return -2, not 0.
2. don't close pinfd unconditionally in do_start().
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: David Ward <david.ward@ll.mit.edu>
Add a monitor command to get the cgroup for a running container. This
allows container r1 started from /var/lib/lxc and container r1 started
from /home/ubuntu/lxcbase to pick unique cgroup directories (which
will be /sys/fs/cgroup/$subsys/lxc/r1 and .../r1-1), and all the lxc-*
tools to get that path over the monitor at lxcpath.
Rework the cgroup code. Before, if /sys/fs/cgroup/$subsys/lxc/r1
already existed, it would be moved to 'deadXXXXX', and a new r1 created.
Instead, if r1 exists, use r1-1, r1-2, etc.
I ended up removing both the use of cgroup.clone_children and support
for ns cgroup. Presumably we'll want to put support for ns cgroup
back in for older kernels. Instead of guessing whether or not we
have clone_children support, just always explicitly do the only thing
that feature buys us - set cpuset.{cpus,mems} for newly created cgroups.
Note that upstream kernel is working toward strict hierarchical
limit enforcements, which will be good for us.
NOTE - I am changing the lxc_answer struct size. This means that
upgrades to this version while containers are running will result
in lxc_* commands on pre-running containers will fail.
Changelog: (v3)
implement cgroup attach
fix a subtle bug arising when we lxc_get_cgpath() returned
STOPPED rather than -1 (STOPPED is 0, and 0 meant success).
Rename some functions and add detailed comments above most.
Drop all my lxc_attach changes in favor of those by Christian
Seiler (which are mostly the same, but improved).
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
As Kees pointed out, write() errors can be delayed and returned as
close() errors. So don't ignore error on close when writing the
userns id mapping.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
For the lxc-* C binaries, introduce a -P|--lxcpath command line option
to override the system default.
With this, I can
lxc-create -t ubuntu -n r1
lxc-create -t ubuntu -n r1 -P /home/ubuntu/lxcbase
lxc-start -n r1 -d
lxc-start -n r1 -d -P /home/ubuntu/lxcbase
lxc-console -n r1 -d -P /home/ubuntu/lxcbase
lxc-stop -n r1
all working with the right containers (module cgroup stuff).
To do:
* lxc monitor needs to be made to handle cgroups.
This is another very invasive one. I started doing this as
a part of this set, but that gets hairy, so I'm sending this
separately. Note that lxc-wait and lxc-monitor don't work
without this, and there may be niggles in what I said works
above - since start.c is doing lxc_monitor_send_state etc
to the shared abstract unix domain socket.
* Need to handle the cgroup conflicts.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Here is a patch to introduce a configurable system-wide
lxcpath. It seems to work with lxc-create, lxc-start,
and basic python3 lxc usage through the api.
For shell functions, a new /usr/share/lxc/lxc.functions is
introduced which sets some of the basic global variables,
including evaluating the right place for lxc_path.
I have not converted any of the other python code, as I was
not sure where we should keep the common functions (i.e.
for now just default_lxc_path()).
configure.ac: add an option for setting the global config file name.
utils: add a default_lxc_path() function
Use default_lxc_path in .c files
define get_lxc_path() and set_lxc_path() in C api
use get_lxc_path() in lua api
create sh helper for getting default path from config file
fix up scripts to use lxc.functions
Changelog:
feb6:
fix lxc_path in lxc.functions
utils.c: as Dwight pointed out, don't close a NULL fin.
utils.c: fix the parsing of lxcpath line
lxc-start: print which rcfile we are using
commands.c: As Dwight alluded to, the sockname handling was just
ridiculous. Clean that up.
use Dwight's recommendation for lxc.functions path: $datadir/lxc
make lxccontainer->get_config_path() return const char *
Per Dwight's suggestion, much nicer than returning strdup.
feb6 (v2):
lxccontainer: set c->config_path before using it.
convert legacy lxc-ls
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
If 'optional' is in the mount options, then avoid failure in
mount().
Experiments suggest we could just do this checking data at
mount_entry(), but that feels less proper than using
hasmntopt() against the mntent.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
In eglibc st_uid and st_gid are defined as unsigned integers, in bionic those
are defined as unsigned long (which is inconsistent with the kernel's
defintion that's uint_32).
To workaround this problem, simply cast those two to int.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
The 3.8 kernel now supporst uid mappings, so I believe it's appropriate
to proceed with this patchset.
The container config supports new entries of the form:
lxc.id_map = U 100000 0 10000
lxc.id_map = G 100000 0 10000
meaning map 'virtual' uids (in the container) 0-10000 to uids
100000-110000 on the host, and same for gids. So long as there are
mappings specified in the container config, then CONFIG_NEWUSER will
be used when the container is cloned. This means that container
setup is no longer done with root privilege on the host, only root
privilege in the container. Therefore cgroup setup is moved from the
init task to the monitor task.
To use this patchset, you currently need to either use the raring
kernel at ppa:serge-hallyn/usern-natty, or build your own kernel
from either git://kernel.ubuntu.com/serge/quantal-userns.git.
(Alternatively you can use Eric's tree at the latest userns-always-map-*
branch at
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
but you will likely want to at least enable tmpfs mounts in user namespaces)
You also need to chown the files in the container rootfs into the
mapped range. There is a utility at
https://code.launchpad.net/~serge-hallyn/+junk/nsexec to do this.
uidmapshift does the chowning, while the container-userns-convert
script nicely wraps that program. So I simply
sudo lxc-create -t ubuntu -n r1
sudo container-userns-convert r1 200000
will create a container which is shifted so uid 0 in the container
is uid 200000 on the host.
TODO: when doing setuid(0), need to only do that if 0 is one of the
ids we map to. Similarly, when dropping capabilities, need to only
not do that if 0 is one of the ids we map to. However, the question
of what to do for 'weird' containers in private user namespaces is
one I'm punting for later.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This is a first step to enabling user namespaces. When starting a
container in a new user namespace, the child will not have the
rights to write to the cgroup fs. (We can give it that right, but
don't always want to have to).
At the parent, we don't want to setup_cgroups() before the child
has set itself up. But we also don't want to wait until it has
started running it's init, since that is racy.
Therefore introduce a new sync point. The child will let the
parent know when it is ready to be confined, and wait for the
parent to respond that it has done so. Then the child will finish
constraining itself with LSM and seccomp and execute init.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Ok... Here's the patch again. Since Serge is removing the loglevel
structure member, this patch no longer references that element.
From the original description:
1) Removes run_makedev() and the call to it from conf.c per discussion.
2) Adds an lxc.hook.autodev hook.
Note: This hook is very close (one routine level abstracted) from where
the run_makedev was called. Anyone really rrreeeaaalllyyy needing
MAKEDEV can add it in with a small shim script to do whatever they want
under whatever distro they're using, so no functionality is lost there.
3) Added a number of environment variables for all the hook scripts to
reference to assist in execution. Things like LXC_ROOTFS_MOUNT could be
very useful but others were added as well. Room for more if anyone has
an itch. All in one spot in lxc_start.c.
4) clearenv and putenv( "container=lxc" ) calls were moved to just after
the "start" hook in the container just prior to actually firing up the
container so we could use environment variables prior to that and have
them flushed them before firing up init. Nice side effect is that you
can define environment variables and then call lxc-start and have them
show up in those hooks scripts.
5) I actually DID update the man page for lxc.conf! I guess I lied when
I said I wouldn't get that done.
[... and ...]
I added the rcfile to the lxc_conf structure as suggested and moved the
setenv bundle from lxc-start.c over to start.c just prior to calling
run_lxc_hooks for the pre-start hook.
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
The options are still supported in the lxc configuration file.
However they are stored only in local variables in src/lxc/log.c,
which can be read using two new functions:
int lxc_log_get_level(void);
const char *lxc_log_get_file(void);
Changelog: jan 14:
have lxc_log_init use lxc_log_set_file(), have lxc_log_set_file() take
a const char *, and have it keep its own strdup'd copy of the filename.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
There's no good reason to call setup_mount_entries if we don't have any
lxc.mount.entry. This also avoids an issue on bionic where the tmpfile()
call in setup_mount_entries requires the presence of /tmp which isn't the
case by default.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
__S_ISTYPE doesn't exist in all C libraries, so define it if it's missing.
Additionaly, replace one occurence where it wasn't actually needed.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Bionic (at least) is missing some of the usual mntent functions.
This adds code defining those that we need when they're missing from the C
library.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Some libc implementation (bionic) is lacking some of the syscall functions
that are present in the glibc.
For those, detect at build time the they are missing and implement a minimal
syscall() wrapper that will essentially give the same result as the glibc
function.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Some platforms don't have personality.h in their C library, this change
adds buildtime detection for the header and turns off the personality setting
code in those cases.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
In the effort to make LXC work with non-standard Linux distros, this change
allows for the user to build LXC without capability support through a new
--disable-capabilities option to configure.
This effectively will cause LXC not to link against libcap and will turn all
the _cap_ functions into no-ops.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
bionic is missing an openpty() function, so ship our own and only
build it and use it on bionic.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
LO_FLAGS_AUTOCLEAR isn't defined on bionic, so add an extra ifndef
and set it to its usual value if it's not.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
According to docs, mknod clears each permission bit whose
corresponding bit in the process umask is set, so we should fix it
before creating device nodes.
Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
lxc-start -c makes the named file/device the container's console, but using
this with a regular file in order to get a log of the console output does
not work very well if you also want to login on the console. This change
implements an additional option (-L) to simply log the console's output to
a file.
Both options can be used separately or together. For example to get a usable
console and log: lxc-start -n name -c /dev/tty8 -L console.log
The console state is cleaned up more when lxc_delete_console is called, and
some of the clean up paths in lxc_create_console were fixed.
The lxc_priv and lxc_unpriv macros were modified to make use of gcc's local
label feature so they can be expanded more than once in the same function.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
(I'll be out until Jan 2, but in the meantime, here is hopefully a
little newyears gift - this seems to allow lxc-start with / being
MS_SHARED on the host)
When / is MS_SHARED (for instance with f18 and modern arch), lxc-start
fails on pivot_root. The kernel enforces that, when doing pivot_root,
the parent of current->fs->root (as well as the new root and the putold
location) not be MS_SHARED.
To work around this, check /proc/self/mountinfo for a 'shared:' in
the '/' line. If it is there, then create a tiny MS_SLAVE tmpfs dir to
serve as parent of /, recursively bind mount / into /root under that dir,
make it rslave, and chroot into it.
Tested with ubuntu raring image after doing 'mount --make-rshared /'.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
If you start more than one lxc-start/lxc-execute with the same name at the
same time, or just do an lxc-start/lxc-execute with the name of a container
that is already running, lxc doesn't figure out that the container with this
name is already running until fairly late in the initialization process: ie
when __lxc_start() -> lxc_poll() -> lxc_command_mainloop_add() attempts to
create the same abstract socket name.
By this point a fair amount of initialization has been done that actually
messes up the running container. For example __lxc_start() -> lxc_spawn() ->
lxc_cgroup_create() -> lxc_one_cgroup_create() -> try_to_move_cgname() moves
the running container's cgroup to a name of deadXXXXXX.
The solution in this patch is to use the atomic existence of the abstract
socket name as the indicator that the container is already running. To do
so, I just refactored lxc_command_mainloop_add() into an lxc_command_init()
routine that attempts to bind the socket, and ensure this is called earlier
before much initialization has been done.
In testing, I verified that maincmd_fd was still open at the time of lxc_fini,
so the entire lifetime of the container's run should be covered. The only
explicit close of this fd was in the reboot case of lxcapi_start(), which is
now moved to lxc_fini(), which I think is more appropriate.
Even though it is not checked any more, set maincmd_fd to -1 instead of 0 to
indicate its not open since 0 could be a valid fd.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
For example doing "lxc-execute -n tmpct /bin/bash" will call setup_kmsg(), but
in this case rootfs->mount/dev directory doesn't even exist so the call to
symlink fails with ENOENT. Commit f62b3449 made this failure not fatal, but
we should not even try it when we know it will fail. See similar code in
setup_tty(), setup_console(), etc.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
When a physical nic is being set up, store its ifindex and original name
in struct lxc_conf. At reboot, reset the original name.
We can't just go over the original network list in lxc_conf at shutdown
because that may be tweaked in the meantime through the C api. The
saved_nics list is only setup during lxc_spawn(), and restored and
freed after lxc_start.
Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1086244
Changelog: remove non-effect change in execute.c
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Add 'lxc.logfile' and 'lxc.loglevel' config items. Values provided on
the command line override the config items.
Have lxccontainer not set a default loglevel and logfile.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
mounted-dev.conf won't be running that in container's userspace as it
previously would have, so make sure that all the devices it would have
created (other than ones which lxc later finagles) get created.
To achieve this, we have to first mount /dev, then run MAKEDEV, then
run setup_autodev to populate the rest of /dev.
Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1075717
Changelog:
v2: Use INFO rather than ERROR when makedev fails, since we won't stop the container boot.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This makes it easier to write a binding, and presents a cleaner API. Use
strdupa in a few places to get mutable strings for tokenizing / parsing.
Also change the argv type in lxcapi_start and lxcapi_create to match
that of execv(3).
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Most of these were found with valgrind by repeatedly doing lxc_container_new
followed by lxc_container_put. Also free memory when config items are
re-parsed, as happens when lxcapi_set_config_item() is called. Refactored
path type config items to use a common underlying routine.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Valgrind showed use of ->next field after item has been free()ed.
Introduce a lxc_list_for_each_safe() which allows traversal of a list
when the body of the loop may remove the currently iterated item.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Add a container config option to mount and populate /dev in a container.
We might want to add options to specify a max size for /dev other than
the default 100k, and to specify other devices to create. And maybe
someone can think of a better name than autodev.
Changelog: Don't error out if we couldn't mknod a /dev/ttyN.
Changelog: Describe the option in lxc.conf manpage.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
The check for conf->rootfs.mount not being equal to LXCROOTFSMOUNT
wasn't done with strcmp which was leading to undefined behaviour
and triggered gcc warnings.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Add a few missing #if's to fix compilation when configured without
AppArmor.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Then after lxcapi container->create(), free whatever lxc_conf may be
loaded and reload from the newly created configuration file.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This happens in the container's namespace, but before the rootfs is
setup and mounted. This gives us a chance to mangle the rootfs - i.e.
ecryptfs-mount it.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This turns liblxc into a public library implementing a container structure.
The container structure is meant to cover most LXC commands and can easily be
used to write bindings in other programming languages.
More information on the new functions can be found in src/lxc/lxccontainer.h
Test programs using the API can also be found in src/tests/
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Analogously to lxc.network.script.up, add the ability to register a down
script. It is called before the guest network is finally destroyed,
allowing to clean up resources that are not reset/destroyed
automatically. Parameters of the down script are identical to the up
script except for the execution context "down".
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This way init log messages can be seen on the console. If containerized
syslog ever comes around, we can get rid of this.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
That means, don't try to pin a null rootfs, and don't try to mount /proc
since /var/lib/lxc/root/proc doesn't exist to be mounted onto.
The apparmor patches are not yet upstream, so this patch will not go
upstream by itself.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Using mnt means that lxc fstab entries do not work when placed under
the container's /mnt/ (i.e. /mnt/etc).
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
At the same time, allow lxc.mount.entry to specify an absolute target
path relative to /var/lib/lxc/CN/rootfs, even if rootfs is a blockdev.
Otherwise all such entries are ignored for blockdev-backed containers.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
This patch introduces support for 4 hooks. We'd like to have 6 in
all to mirror the openvz ones (thanks to Stéphane for this info):
pre-start: in the host namespace before container mounting happens
mount: after container mounting (as per config and /var/lib/lxc/container/fstab)
but before pivot_root
start: immediately before exec'ing init
stop: in container namespace and in chroot before shutdown
umount: after other unmounting has happened
post-stop: outside of the container
stop and umount are not implemented here because when the kernel kills
the container init, it kills the namespace. We can probably work around
this, i.e. by keeping the /proc/pid/ns/mnt open, and using that, though
all container tasks including init would still be dead. Is that worth
pursuing?
start also presents a bit of an issue. openvz allows a script on the
host to be specified, apparently. My patch requires the script or
program to exist in the container. I'm fine with trying to do it the
openvz way, but I wasn't sure what the best way to do that was. Openvz
(I'm told) opens the script and passes its contents to a bash in the
container. But that limits the hooks to being only scripts. By
requiring the hook to be in the container, we can allow any sort of
hook, and assume that any required libraries/dependencies exist
there.
Other than that with this patchset I can add
lxc.hook.pre-start = /var/lib/lxc/p1/pre-start
lxc.hook.mount = /var/lib/lxc/p1/mount
lxc.hook.start = /start
lxc.hook.post-stop = /var/lib/lxc/p1/post-stop
to my /var/lib/lxc/p1/config, and the hooks get executed as expected.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
This could be done as generic 'lsm_init()' and 'lsm_load()' functions,
however that would make it impossible to compile one package supporting
more than one lsm. If we explicitly add the selinux, smack, and aa
hooks in the source, then one package can be built to support multiple
kernels.
The smack support should be pretty trivial, and probably very close
to the apparmor support.
The selinux support may require more, including labeling the passed-in
fds (consoles etc) and filesystems.
If someone on the list has the inclination and experience to add selinux
support, please let me know. Otherwise, I'll do Smack and SELinux.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
If set, then the console and ttys will be bind-mounted not over /dev/console,
but /dev/<ttydir>/console and then symlinked from there to /dev/console.
Signed-off-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
lxc.cap.drop now also accepts numeric values for capabilities. This allows
the user to specify capabilities LXC doesn't know about yet or capabilities
that were not part of the kernel headers LXC was compiled against.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
When used in conjunction with a bridge, veth devices with random addresses
may change the mac address of the bridge itself if the mac address of the
interface newly added is numerically lower than the previous mac address
of the bridge. This is documented kernel behavior. To avoid changing the
host's mac address back and forth when starting and/or stopping containers,
this patch ensures that the high byte of the mac address of the veth
interface visible from the host side is set to 0xfe.
A similar logic is also implemented in libvirt.
Fixes SF bug #3411497
See also: <http://thread.gmane.org/gmane.linux.kernel.containers.lxc.general/2709>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
For veth and macvlan networks, this can look up the host address on the
bridge (link) interface and add a default route on the guest to that
address. This facilitates a typical setup where guests are bridged
together.
syntax:
lxc.ipv4.gateway = auto
lxc.ipv6.gateway = auto
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
This directive adds a default route to the guest at startup.
syntax:
lxc.network.ipv4.gateway = 10.0.0.1
lxc.network.ipv6.gateway = 2001:db8:85a3::8a2e:370:7334
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
It's OK, if /dev/ptmx points to /dev/pts/ptmx via a symlink.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Also add #ifndef for compability with glibc before 2.12.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Each network interface was brought up regardless of the configuration,
as the wrong boolean operator was being used to test the IFF_UP flag.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
We should always have the veth host's side up, otherwise if we omit
the up flag in the configurationn, letting the container to configure
its interface, the network will be never enabled as the host's side
is not up.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Dear all,
while setting up a container on x86_64 (archlinux host/guest) I had trouble
with mounting dev/pts and others from container.fstab and a ssh login does not
work (only ssh container bash -i gives you a shell)
The cause is that conf.c does not initialize mntflags.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Change the mount point in the rootfs because we mount the rootfs
in ROOTFSDIR for the pivot. We have to substitute the real mount
path to the new path located in ROOTFSDIR.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Change the code to encapsulate the different mounts point.
* mount on the host fs
* mount relatively to the rootfs
* mount absolutely to the rootfs (broken)
That will make the code cleaner to fix the latter.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Why not chdir into the root of container right when
the root filesystem is (bind-)mounted, and let all
mount entries to be relative to the container root?
Even more, to warn if lxc.mount[.entry] contains
absolute path for the destination directory (or a
variation of this, absolute and does not start with
container root mount point)?
This way, all mounts will look much more sane, and
it will be much easier to move/clone containers -
by changing only lxc.rootfs.
I do it this way locally since the beginning, by
chdir'ing to the proper directory (rootfs) before
running lxc-start (in a startup script), but this
is now broken in 0.7.3 which bind-mounts rootfs
somewhere in /usr/lib/lxc.
Signed-off-by: Michael Tokarev<mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Add support for `dirsync' mount option. MS_DIRSYNC is on of the
mount(2) mountflags so don't send it as extra mount option to avoid:
lxc-start: Invalid argument - failed to mount ...
errors.
Signed-off-by: Sergey S. Kostyliov <rathamahata@gmail.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
The capability header makes the inclusion of the loop header to
fail. Moving the inclusion of loop.h before capability.h fixes the
problem.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Change the run_script function to use popen and to redirect
the output of the script to the log file.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
This commit adds an configuration option to specify a script to be
executed after creating and configuring the network used by the
container. The following arguments are passed to the script:
* container name
* config section name (net)
Additional arguments depend on the config section employing a
script hook; the following are used by the network system:
* execution context (up)
* network type (empty/veth/macvlan/phys)
Depending on the network type, other arguments may be passed:
veth/macvlan/phys:
* (host-sided) device name
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
This patch allows to specify an image or a block device.
The image or the block device is mounted on rootfs->mount.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Let's initialize rootfs->mount to LXCROOTFSMOUNT. The value
will be overwritten by the configuration in case it is specified.
That will make the code nicer, instead of the ugly rootfs->mount checks.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Split the rootfs setup by mounting the rootfs to the mount
point. This mount point will be used as the facto place where
the rootfs is placed.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
When a container is installed with 32bits binaries while we are
running on a 64bits host, inside the container we are seen as
64bits arch. That leads to some problems for the package updates
because the scripts will download 64bits packages instead of 32bits.
This patch defines a configuration variable to set the architecture
of the container.
lxc.arch = i686 | x86 | x86_64 | amd64
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
If the physical link is not specified in the configuration
the check in if_nametoindex(netdev->link) leads to a segfault.
Check the link is specified.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Reported-by: Ferenc Wagner <wferi@niif.hu>
When the interface used in the container is a physical
interface from the host, we keep the initial name.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Reported-by: Sabdar <sabdar@wellspringsys.com>
Hello all!
This bug stalked me for a while, but only now it bit me quite
badly... (Lost about an hour of work...)
So the culprit: inside the fstab file for the `lxc.mount` option I
can use options like `ro` together with `bind`. Unfortunately the
kernel just laughs in my face and ignores any options I've put in
there... :) But not any more: I've updated `./src/lxc/conf.c`
(`mount_file_entries` function) so that when it encounters a `bind`
option it executes it twice (one without any extra options, and a
second time with the remount flag set.)
I've marginally (as in my particular case) tested it and it works.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
The mnt directory has a good chance to already exist in the new root
filesystem, so creation and removal can be avoided. This also eases
use of read only root filesystems (no configuration necessary).
Signed-off-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Fix the following warnings:
console.c: In function ‘console_handler’:
console.c:252: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result
console.c:254: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result
conf.c: In function ‘instanciate_veth’:
conf.c:1130: warning: ignoring return value of ‘mktemp’, declared with attribute warn_unused_result
conf.c:1135: warning: ignoring return value of ‘mktemp’, declared with attribute warn_unused_result
conf.c: In function ‘instanciate_macvlan’:
conf.c:1206: warning: ignoring return value of ‘mktemp’, declared with attribute warn_unused_result
af_unix.c: In function ‘lxc_af_unix_send_fd’:
af_unix.c:124: warning: dereferencing type-punned pointer will break strict-aliasing rules
af_unix.c: In function ‘lxc_af_unix_recv_fd’:
af_unix.c:169: warning: dereferencing type-punned pointer will break strict-aliasing rules
af_unix.c: In function ‘lxc_af_unix_send_credential’:
af_unix.c:195: warning: dereferencing type-punned pointer will break strict-aliasing rules
af_unix.c: In function ‘lxc_af_unix_rcv_credential’:
af_unix.c:237: warning: dereferencing type-punned pointer will break strict-aliasing rules
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
That breaks the reboot because when we reexec, fd 0 and fd 1 will be
closed and these one are created by lxc, not inherited.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
The removal does not account for possible leading path components that
were also created during creation of pivotdir.
Signed-off-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
As we defined a path where to mount the rootfs, we can use without
ambiguity because it is defined by default at compile time or by the
configuration.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
We have pivot_dir and rootfs defined in lxc_conf structure.
Let's encapsulate them in a rootfs structure.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Ferenc Wagner <wferi@niif.hu> writes:
> Daniel Lezcano <dlezcano@fr.ibm.com> writes:
>
>> Ferenc Wagner wrote:
>>
>>> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>>>
>>>> Ferenc Wagner wrote:
>>>>
>>>>> While playing with lxc-start, I noticed that /tmp is infested by
>>>>> empty lxc-r* directories: [...] Ok, this name comes from lxc-rootfs
>>>>> in conf.c:setup_rootfs. After setup_rootfs_pivot_root returns, the
>>>>> original /tmp is not available anymore, so rmdir(tmpname) at the
>>>>> bottom of setup_rootfs can't achieve much. Why is this temporary
>>>>> name needed anyway? Is pivoting impossible without it?
>>>>
>>>> That was put in place with chroot, before pivot_root, so the distro's
>>>> scripts can remount their '/' without failing.
>>>>
>>>> Now we have pivot_root, I suppose we can change that to something cleaner...
>>>
>>> Like simply nuking it? Shall I send a patch?
>>
>> Sure, if we can kill it, I will be glad to take your patch :)
>
> I can't see any reason why lxc-start couldn't do without that temporary
> recursive bind mount of the original root. If neither do you, I'll
> patch it out and see if it still flies.
For my purposes the patch below works fine. I only run applications,
though, not full systems, so wider testing is definitely needed.
Thanks,
Feri.
>From 98b24c13f809f18ab8969fb4d84defe6f812b25c Mon Sep 17 00:00:00 2001
Date: Thu, 6 May 2010 14:47:39 +0200
That was put in place before lxc-start started using pivot_root, so
the distro scripts can remount / without problems.
Signed-off-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
First of all, when trying to start a container in a read-only root
lxc-start complains:
lxc-start: Read-only file system - can't make temporary mountpoint
This is in conf.c:setup_rootfs_pivot_root() function. That function
uses optional parameter "lxc.pivotdir", or creates (and later removes)
a temporary directory for pivot_root. Obviously there's no way to
create a directory in a read-only filesystem.
But lxc.pivotdir does not work either. In the function mentioned above
it is used with leading dot (eg. if I specify "lxc.pivotdir=pivot" in
the config file the pivot_root() syscall will be made to ".pivot" with
leading dot, not to "pivot"), but later on it is used without that dot,
and fails:
lxc-start: No such file or directory - failed to open /pivot/proc/mounts
lxc-start: No such file or directory - failed to read or parse mount list '/pivot/proc/mounts'
lxc-start: failed to pivot_root to '/stage/t'
(that's with "lxc.pivotdir = pivot" in the config file). After symlinking
pivot to .pivot it still fails:
lxc-start: Device or resource busy - could not unmount old rootfs
lxc-start: failed to pivot_root to '/stage/t'
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
There is only one such perror call, so remove it in nl.c
In this same patch, verify that all functions of nl.c and network.c
are reporting a -errno value in case of error;
value that is reported in lxc log by the callers in conf.c
Signed-off-by: Michel Normand <normand@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
When the reboot is detected, reboot the container.
That needs to set all file descriptor opened by lxc-start
to be flagged with the close-on-exec flag, otherwise when
re-execing ourself, we inherit our own fd.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Open the console at the setup time, otherwise the openeded
file descriptor will be considered as an inherited fd and the
startup will fail.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Delete the network devices when an error occurs before they are moved
to the network namespace (network namespace destruction triggers the
network devices deletion). Otherwise they stay in the system.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
When the umount fails, we force the umount and make the mount point
unaccessible by using a lazy umount.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
The actual behaviour of the console is messy as:
* it relies on a heuristic (tty or not, rootfs or not, etc ...)
* the container init stole the tty and we lose the control
The following patch:
* allocates a tty
* maps this tty to the container console
* proxy the io from the console to the file specified in the configuration
lxc.console=<file>
That allows to specify a file, a fifo, a $(tty), and can be extended with an
uri like file://mypath, net://1.2.3.4:1234, etc ...
That solves the problem with the heuristic and the container does no longer stole
our current tty.
Note by default, the console output will go to a blackhole if no configuration is
specified making the container showing nothing.
In order to access the console from the tty, use
lxc-start -n foo -s lxc.console=$(tty)
I propose the make the container to daemonize by default now.
I tried the following:
in a shell:
touch /var/lib/lxc/foo/console
tail --retry -f /var/lib/lxc/foo/console
in another shell:
lxc-start -n foo -s lxc.console=/var/lib/lxc/foo/console
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
There are several cases where the system can no longer access a mount
point or a mount point configuration makes the algorithm bogus.
For example, we mount something and then we chroot, the mount information
will give an unaccessible path and the container won't be able to start
because this mount point will be unaccessible. But if it's the case, then
we can just warn and continue running the container.
Another case is the path to a mount point is not accessible because there
is another mount point on top of it hiding the mount point. So the umount
will fail and the container won't start.
Easy to reproduce:
mkdir -p /tmp/dir1/dir2
mount -t tmpfs tmpfs /tmp/dir1/dir2
mount -t tmpfs tmpfs /tmp/dir1
So can we just ignore the error when unmounting and continue to the list again
and again until it shrinks.
At the end, we just display the list of the unmounted points.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
the last patch commit 81810dd120
make lxc to not compile anymore on rhel5u1
Signed-off-by: Michel Normand <normand@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Hello everyone!
I've written a patch which adds a new config keyword
'lxc.cap.drop'. This keyword allows to specify capabilities which are
dropped before executing the container binary.
Example:
lxc.cap.drop = sys_chroot
lxc.cap.drop = mknod
lxc.cap.drop = sys_module
or specify in a single line:
lxc.cap.drop = sys_chroot mknod sys_module
Reworked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Michael Holzt <lxc@my.fqdn.org>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
The getline function allocate the needed memory. Fix buffer can lead
to 'hard to find' bug. I don't test the pivot_root part but the other
parts are ok.
Signed-off-by: Clement Calmels <clement.calmels@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
lxc currently does a chroot into the target rootfs. chroot is insecure and
can easily be broken, as demonstrated here:
| root@synergy:~# touch /this_is_the_realrootfs_ouch
| # touch /container/webhost/this_is_the_container
| # lxc-start -n webhost /bin/sh
| # ls this*
| this_is_the_container
| # ./breakchroot
| # ls this*
| this_is_the_realrootfs_ouch
code to break chroot taken from
http://www.bpfh.net/simes/computing/chroot-break.html
Now this can be fixed. As our container has his own mount namespace, we can
easily pivot_root into the rootfs and then unmount all old mounts. The patch
attached add a new config keyword which contains the path to a temporary
mount for the old rootfs (inside the container). This stops the chroot break
method shown before.
Example:
| root@synergy:~# grep pivotdir /var/lib/lxc/webhost/config
| lxc.pivotdir = /oldrootfs
| root@synergy:~# ls -lad /container/webhost/oldrootfs
| drwxr-xr-x 2 root root 4096 2010-01-02 03:59 /container/webhost/oldrootfs
| root@synergy:~# lxc-start -n webhost /bin/sh
| # mount -t proc proc /proc
| # cat /proc/mounts
| rootfs / rootfs rw 0 0
| /dev/root / ext3 rw,relatime,errors=remount-ro,data=writeback 0 0
| devpts /dev/console devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
| proc /proc proc rw,relatime 0 0
| # ls this*
| this_is_the_container
| # ./breakchroot
| # ls this*
| this_is_the_container
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Michael Holtz <lxc@my.fqdn.org>
conf object is on stack and is used in forked process.
Signed-off-by: Michel Normand <normand@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
The future kernel 2.6.33 will incorporate the macvlan bridge
mode where all the macvlan will be able to communicate if they are
using the same physical interface. This is an interesting feature
to have containers to communicate together. If we are outside of the
container, we have to setup a macvlan on the same physical interface than
the containers and use it to communicate with them.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Some devices like veth or vlans have a bit of extra details that
are specific to them. Example veth.pair and vlan.vlanid.
Separate them from the common so we can update cleanly in the future.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
This adds ability to migrate vlan interfaces into namespaces
by specifying them in a config
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
commit 985d15b106 "fix fdleak and errors
in lxc_create_tty()" created a zero-sized malloc(), causing memory
corruption. use config->tty like all the other code does.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
the same cleanup as in instanciate_macvlan(). Just makes code
shorter and less "jumpy" (as with goto back)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Currently we allocate veth device with random name on host side,
so that things like firewall rules or accounting does not work
at all. Fix this by recognizing yet anothe keyword to specify
the host-side device name: lxc.network.pair, and use it instead
of random name if specified.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
if, for some reason, openpty() fails, lxc_create_tty() will
leak all previous ptys and leave the config structure in a
inconsistent state (wrt the number of ptys actually opened)
Fix that by explicitly closing all previously opened ptys
in case of failure and by setting number of actually opened
ttys after actual open
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Ensure that lxc.netdev.link is specified for macvlan interfaces,
since it's required.
While at it, simplify logic in instanciate_macvlan():
remove unnecessary-complicating goto statements (we only
need to perform a cleanup in one place)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Before, a veth device pair required a link which was treated as
a bridge device. Code crashed if there was no lxc.network.link
specified. Fix that by allowing lxc.network.link to be unset
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
struct lxc_netdev is used to hold information from cnfig file
about a network device/configuration. Make the fields of this
structure to be named similarily with the config file keywords,
namely:
s/ifname/link/ - host-side link for the device (bridge or eth0)
s/newname/name/ - container-side ifname
It is insane to have completely different names in config file
and in structure/variable names :)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
<lxc/lxc.h> should only include what is needed. This patch removes
all useless headers from lxc.h and fixed other .c files.
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
The purpose of this new keyword is to save in main config file
all the lines of a provided fstab file.
This will ultimately replace the the lxc.mount keyword
when lxc scripts will use the new keyword.
Warning: I did not validated this patch
in all conditions of provided malformed input string.
Signed-off-by: Michel Normand <michel_mno@laposte.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
in today's code lxc-start to not stop if setup_cgroup is detecting an error
Signed-off-by: Michel Normand <michel_mno@laposte.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>