Commit Graph

284 Commits

Author SHA1 Message Date
Dwight Engen
ac7725e7bb make [ug]id map ordering consistent with /proc/<nr>/[ug]id_map
The id ordering and case of u,g is also consistent with uidmapshift,
reducing confusion.

doc: Moved example to the the EXAMPLES section, and used values
corresponding to the defaults in the pending shadow-utils subuid patch.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2013-03-11 16:35:14 -04:00
Serge Hallyn
0d03360a77 rootfs pin: fix two bugs
1. if there's no rootfs, return -2, not 0.
2. don't close pinfd unconditionally in do_start().

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: David Ward <david.ward@ll.mit.edu>
2013-03-11 08:42:11 -05:00
Serge Hallyn
ae5c8b8ed5 cgroup: improve support for multiple lxcpaths (v3)
Add a monitor command to get the cgroup for a running container.  This
allows container r1 started from /var/lib/lxc and container r1 started
from /home/ubuntu/lxcbase to pick unique cgroup directories (which
will be /sys/fs/cgroup/$subsys/lxc/r1 and .../r1-1), and all the lxc-*
tools to get that path over the monitor at lxcpath.

Rework the cgroup code.  Before, if /sys/fs/cgroup/$subsys/lxc/r1
already existed, it would be moved to 'deadXXXXX', and a new r1 created.
Instead, if r1 exists, use r1-1, r1-2, etc.

I ended up removing both the use of cgroup.clone_children and support
for ns cgroup.  Presumably we'll want to put support for ns cgroup
back in for older kernels.  Instead of guessing whether or not we
have clone_children support, just always explicitly do the only thing
that feature buys us - set cpuset.{cpus,mems} for newly created cgroups.

Note that upstream kernel is working toward strict hierarchical
limit enforcements, which will be good for us.

NOTE - I am changing the lxc_answer struct size.  This means that
upgrades to this version while containers are running will result
in lxc_* commands on pre-running containers will fail.

Changelog: (v3)
   implement cgroup attach
   fix a subtle bug arising when we lxc_get_cgpath() returned
     STOPPED rather than -1 (STOPPED is 0, and 0 meant success).
   Rename some functions and add detailed comments above most.
   Drop all my lxc_attach changes in favor of those by Christian
     Seiler (which are mostly the same, but improved).

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2013-03-04 14:39:30 -06:00
Serge Hallyn
e4ccd113dc userns: handle delayed write errors at fclose
As Kees pointed out, write() errors can be delayed and returned as
close() errors.  So don't ignore error on close when writing the
userns id mapping.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2013-03-04 14:27:18 -06:00
Stéphane Graber
67e571de63 Introduce --lxcpath cmdline option, and make default_lxc_path() return const char *
For the lxc-* C binaries, introduce a -P|--lxcpath command line option
to override the system default.

With this, I can

    lxc-create -t ubuntu -n r1
    lxc-create -t ubuntu -n r1 -P /home/ubuntu/lxcbase
    lxc-start -n r1 -d
    lxc-start -n r1 -d -P /home/ubuntu/lxcbase
    lxc-console -n r1 -d -P /home/ubuntu/lxcbase
    lxc-stop -n r1

all working with the right containers (module cgroup stuff).

To do:
    * lxc monitor needs to be made to handle cgroups.
      This is another very invasive one.  I started doing this as
      a part of this set, but that gets hairy, so I'm sending this
      separately.  Note that lxc-wait and lxc-monitor don't work
      without this, and there may be niggles in what I said works
      above - since start.c is doing lxc_monitor_send_state etc
      to the shared abstract unix domain socket.
    * Need to handle the cgroup conflicts.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2013-02-19 11:52:44 -05:00
Serge Hallyn
2a59a68183 Switch from use of LXCPATH to a configurable default_lxc_path
Here is a patch to introduce a configurable system-wide
lxcpath.  It seems to work with lxc-create, lxc-start,
and basic python3 lxc usage through the api.

For shell functions, a new /usr/share/lxc/lxc.functions is
introduced which sets some of the basic global variables,
including evaluating the right place for lxc_path.

I have not converted any of the other python code, as I was
not sure where we should keep the common functions (i.e.
for now just default_lxc_path()).

configure.ac: add an option for setting the global config file name.
utils: add a default_lxc_path() function
Use default_lxc_path in .c files
define get_lxc_path() and set_lxc_path() in C api
use get_lxc_path() in lua api
create sh helper for getting default path from config file
fix up scripts to use lxc.functions

Changelog:
  feb6:
	fix lxc_path in lxc.functions
	utils.c: as Dwight pointed out, don't close a NULL fin.
	utils.c: fix the parsing of lxcpath line
	lxc-start: print which rcfile we are using
	commands.c: As Dwight alluded to, the sockname handling was just
	   ridiculous.  Clean that up.
	use Dwight's recommendation for lxc.functions path: $datadir/lxc
	make lxccontainer->get_config_path() return const char *
		Per Dwight's suggestion, much nicer than returning strdup.
  feb6 (v2):
        lxccontainer: set c->config_path before using it.
	convert legacy lxc-ls

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2013-02-08 10:55:14 -05:00
Serge Hallyn
68c152ef7a setup_mount_entries: ignore mount failure if 'optional'
If 'optional' is in the mount options, then avoid failure in
mount().

Experiments suggest we could just do this checking data at
mount_entry(), but that feels less proper than using
hasmntopt() against the mntent.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2013-01-28 17:59:38 -05:00
Stéphane Graber
2008796233 conf.c: Cast st_uid and st_gid to int
In eglibc st_uid and st_gid are defined as unsigned integers, in bionic those
are defined as unsigned long (which is inconsistent with the kernel's
defintion that's uint_32).

To workaround this problem, simply cast those two to int.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-15 12:57:40 -05:00
Serge Hallyn
f6d3e3e470 Implement userid mappings (enable user namespaces)
The 3.8 kernel now supporst uid mappings, so I believe it's appropriate
to proceed with this patchset.
The container config supports new entries of the form:
 lxc.id_map = U 100000 0 10000
 lxc.id_map = G 100000 0 10000
meaning map 'virtual' uids (in the container) 0-10000 to uids
100000-110000 on the host, and same for gids.  So long as there are
mappings specified in the container config, then CONFIG_NEWUSER will
be used when the container is cloned.  This means that container
setup is no longer done with root privilege on the host, only root
privilege in the container.  Therefore cgroup setup is moved from the
init task to the monitor task.

To use this patchset, you currently need to either use the raring
kernel at ppa:serge-hallyn/usern-natty, or build your own kernel
from either git://kernel.ubuntu.com/serge/quantal-userns.git.
(Alternatively you can use Eric's tree at the latest userns-always-map-*
branch at
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
but you will likely want to at least enable tmpfs mounts in user namespaces)

You also need to chown the files in the container rootfs into the
mapped range.  There is a utility at
https://code.launchpad.net/~serge-hallyn/+junk/nsexec to do this.
uidmapshift does the chowning, while the container-userns-convert
script nicely wraps that program.  So I simply

	sudo lxc-create -t ubuntu -n r1
	sudo container-userns-convert r1 200000

will create a container which is shifted so uid 0 in the container
is uid 200000 on the host.

TODO: when doing setuid(0), need to only do that if 0 is one of the
ids we map to.  Similarly, when dropping capabilities, need to only
not do that if 0 is one of the ids we map to.  However, the question
of what to do for 'weird' containers in private user namespaces is
one I'm punting for later.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2013-01-15 12:09:33 -05:00
Serge Hallyn
544a48a0bd setup cgroups from parent
This is a first step to enabling user namespaces.  When starting a
container in a new user namespace, the child will not have the
rights to write to the cgroup fs.  (We can give it that right, but
don't always want to have to).

At the parent, we don't want to setup_cgroups() before the child
has set itself up.  But we also don't want to wait until it has
started running it's init, since that is racy.

Therefore introduce a new sync point.  The child will let the
parent know when it is ready to be confined, and wait for the
parent to respond that it has done so.  Then the child will finish
constraining itself with LSM and seccomp and execute init.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2013-01-15 11:57:02 -05:00
Michael H. Warfield
f7bee6c6f3 MAKEDEV call, add autodev hooks, add environment variables for hook scripts.
Ok...  Here's the patch again.  Since Serge is removing the loglevel
structure member, this patch no longer references that element.

From the original description:

1) Removes run_makedev() and the call to it from conf.c per discussion.

2) Adds an lxc.hook.autodev hook.

Note: This hook is very close (one routine level abstracted) from where
the run_makedev was called.  Anyone really rrreeeaaalllyyy needing
MAKEDEV can add it in with a small shim script to do whatever they want
under whatever distro they're using, so no functionality is lost there.

3) Added a number of environment variables for all the hook scripts to
reference to assist in execution.  Things like LXC_ROOTFS_MOUNT could be
very useful but others were added as well.  Room for more if anyone has
an itch.  All in one spot in lxc_start.c.

4) clearenv and putenv( "container=lxc" ) calls were moved to just after
the "start" hook in the container just prior to actually firing up the
container so we could use environment variables prior to that and have
them flushed them before firing up init.  Nice side effect is that you
can define environment variables and then call lxc-start and have them
show up in those hooks scripts.

5) I actually DID update the man page for lxc.conf!  I guess I lied when
I said I wouldn't get that done.

[... and ...]

I added the rcfile to the lxc_conf structure as suggested and moved the
setenv bundle from lxc-start.c over to start.c just prior to calling
run_lxc_hooks for the pre-start hook.

Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2013-01-14 14:04:09 -06:00
Serge Hallyn
9ea87d5ded remove logfile and loglevel from struct lxc_conf
The options are still supported in the lxc configuration file.
However they are stored only in local variables in src/lxc/log.c,
which can be read using two new functions:
	int lxc_log_get_level(void);
	const char *lxc_log_get_file(void);

Changelog: jan 14:
 have lxc_log_init use lxc_log_set_file(), have lxc_log_set_file() take
 a const char *, and have it keep its own strdup'd copy of the filename.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2013-01-14 14:03:57 -06:00
Stéphane Graber
c1dc38c2e8 Don't call setup_mount_entries if the list is empty
There's no good reason to call setup_mount_entries if we don't have any
lxc.mount.entry. This also avoids an issue on bionic where the tmpfile()
call in setup_mount_entries requires the presence of /tmp which isn't the
case by default.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-09 10:22:54 -05:00
Stéphane Graber
72f919c42a conf.c: Cleanup __S_ISTYPE
__S_ISTYPE doesn't exist in all C libraries, so define it if it's missing.
Additionaly, replace one occurence where it wasn't actually needed.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-09 10:22:54 -05:00
Stéphane Graber
edaf8b1bf1 Add local implementation of mntent.h
Bionic (at least) is missing some of the usual mntent functions.
This adds code defining those that we need when they're missing from the C
library.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-09 10:22:48 -05:00
Stéphane Graber
2d76d1d7e5 Workaround missing functions in other libc
Some libc implementation (bionic) is lacking some of the syscall functions
that are present in the glibc.

For those, detect at build time the they are missing and implement a minimal
syscall() wrapper that will essentially give the same result as the glibc
function.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-09 10:13:45 -05:00
Stéphane Graber
6ff05e18a3 personality.h: Make the personality code optional
Some platforms don't have personality.h in their C library, this change
adds buildtime detection for the header and turns off the personality setting
code in those cases.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-09 10:13:41 -05:00
Stéphane Graber
495d2046f6 Don't hard depend on capability.h and libcap
In the effort to make LXC work with non-standard Linux distros, this change
allows for the user to build LXC without capability support through a new
--disable-capabilities option to configure.

This effectively will cause LXC not to link against libcap and will turn all
the _cap_ functions into no-ops.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-09 10:11:56 -05:00
Stéphane Graber
e827ff7e2f tty.h: Ship our own minimal openpty.h
bionic is missing an openpty() function, so ship our own and only
build it and use it on bionic.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-09 10:10:32 -05:00
Stéphane Graber
9818cae412 conf.c: Define LO_FLAGS_AUTOCLEAR if it's not
LO_FLAGS_AUTOCLEAR isn't defined on bionic, so add an extra ifndef
and set it to its usual value if it's not.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-09 10:10:22 -05:00
Alexander Vladimirov
3a32201c5a Set umask before populating /dev and restore it after.
According to docs, mknod clears each permission bit whose
corresponding bit in the process umask is set, so we should fix it
before creating device nodes.

Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-08 12:07:34 -05:00
Dwight Engen
596a818d4b separate console device from console log
lxc-start -c makes the named file/device the container's console, but using
this with a regular file in order to get a log of the console output does
not work very well if you also want to login on the console. This change
implements an additional option (-L) to simply log the console's output to
a file.

Both options can be used separately or together. For example to get a usable
console and log: lxc-start -n name -c /dev/tty8 -L console.log

The console state is cleaned up more when lxc_delete_console is called, and
some of the clean up paths in lxc_create_console were fixed.

The lxc_priv and lxc_unpriv macros were modified to make use of gcc's local
label feature so they can be expanded more than once in the same function.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-01-02 18:08:12 -05:00
Natanael Copa
859a6da0fa define MS_SHARED if needed
Fixes build on uClibc.

Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2012-12-25 12:58:50 +01:00
Serge Hallyn
cc28d0b0a6 Support MS_SHARED /
(I'll be out until Jan 2, but in the meantime, here is hopefully a
little newyears gift - this seems to allow lxc-start with / being
MS_SHARED on the host)

When / is MS_SHARED (for instance with f18 and modern arch), lxc-start
fails on pivot_root.  The kernel enforces that, when doing pivot_root,
the parent of current->fs->root (as well as the new root and the putold
location) not be MS_SHARED.

To work around this, check /proc/self/mountinfo for a 'shared:' in
the '/' line.  If it is there, then create a tiny MS_SLAVE tmpfs dir to
serve as parent of /, recursively bind mount / into /root under that dir,
make it rslave, and chroot into it.

Tested with ubuntu raring image after doing 'mount --make-rshared /'.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2012-12-20 10:51:01 +01:00
Dwight Engen
d2e30e99b4 Fix race/corruption with multiple lxc-start, lxc-execute
If you start more than one lxc-start/lxc-execute with the same name at the
same time, or just do an lxc-start/lxc-execute with the name of a container
that is already running, lxc doesn't figure out that the container with this
name is already running until fairly late in the initialization process: ie
when __lxc_start() -> lxc_poll() -> lxc_command_mainloop_add() attempts to
create the same abstract socket name.

By this point a fair amount of initialization has been done that actually
messes up the running container. For example __lxc_start() -> lxc_spawn() ->
lxc_cgroup_create() -> lxc_one_cgroup_create() -> try_to_move_cgname() moves
the running container's cgroup to a name of deadXXXXXX.

The solution in this patch is to use the atomic existence of the abstract
socket name as the indicator that the container is already running.  To do
so, I just refactored lxc_command_mainloop_add() into an lxc_command_init()
routine that attempts to bind the socket, and ensure this is called earlier
before much initialization has been done.

In testing, I verified that maincmd_fd was still open at the time of lxc_fini,
so the entire lifetime of the container's run should be covered. The only
explicit close of this fd was in the reboot case of lxcapi_start(), which is
now moved to lxc_fini(), which I think is more appropriate.

Even though it is not checked any more, set maincmd_fd to -1 instead of 0 to
indicate its not open since 0 could be a valid fd.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2012-12-13 23:26:39 -05:00
Dwight Engen
222fea5a10 Don't attempt to symlink kmsg without rootfs->path
For example doing "lxc-execute -n tmpct /bin/bash" will call setup_kmsg(), but
in this case rootfs->mount/dev directory doesn't even exist so the call to
symlink fails with ENOENT. Commit f62b3449 made this failure not fatal, but
we should not even try it when we know it will fail. See similar code in
setup_tty(), setup_console(), etc.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2012-12-11 15:29:39 -05:00
Serge Hallyn
769872f9f2 support new libseccomp api
Detect the new api by existence in seccomp.h of the scmp_filter_ctx
type in configure.ac.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-12-11 12:33:40 -06:00
Serge Hallyn
ff918b1832 seccomp: free conf->seccomp (filename char *)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-12-11 11:08:09 -06:00
Dwight Engen
e29bf450ca Use LXCPATH and LOCALSTATEDIR instead of hardcoded /var
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2012-12-05 13:31:06 -05:00
Serge Hallyn
7b35f3d60a rename physical nics at shutdown
When a physical nic is being set up, store its ifindex and original name
in struct lxc_conf.  At reboot, reset the original name.
We can't just go over the original network list in lxc_conf at shutdown
because that may be tweaked in the meantime through the C api.  The
saved_nics list is only setup during lxc_spawn(), and restored and
freed after lxc_start.

Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1086244

Changelog: remove non-effect change in execute.c

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2012-12-04 14:16:30 -06:00
Serge Hallyn
4a85ce2ad0 lxc_conf logfile and loglevel support
Add 'lxc.logfile' and 'lxc.loglevel' config items.  Values provided on
the command line override the config items.

Have lxccontainer not set a default loglevel and logfile.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2012-12-04 11:38:25 -06:00
Serge Hallyn
61435768cd check and warn of return value from fchdir
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-11-29 20:05:37 -06:00
Serge Hallyn
91c3830e22 Description: run MAKEDEV console when doing lxc.autodev
mounted-dev.conf won't be running that in container's userspace as it
previously would have, so make sure that all the devices it would have
created (other than ones which lxc later finagles) get created.
To achieve this, we have to first mount /dev, then run MAKEDEV, then
run setup_autodev to populate the rest of /dev.

Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1075717

Changelog:
  v2: Use INFO rather than ERROR when makedev fails, since we won't stop the container boot.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2012-11-29 11:14:07 -06:00
Dwight Engen
12a50cc6ab Make config api items const
This makes it easier to write a binding, and presents a cleaner API. Use
strdupa in a few places to get mutable strings for tokenizing / parsing.
Also change the argv type in lxcapi_start and lxcapi_create to match
that of execv(3).

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2012-11-27 14:59:33 -05:00
Dwight Engen
d95db067d2 Free allocated configuration memory
Most of these were found with valgrind by repeatedly doing lxc_container_new
followed by lxc_container_put. Also free memory when config items are
re-parsed, as happens when lxcapi_set_config_item() is called. Refactored
path type config items to use a common underlying routine.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2012-11-26 13:13:47 -05:00
Dwight Engen
9ebb03ad4a Fix use of list item memory after free
Valgrind showed use of ->next field after item has been free()ed.
Introduce a lxc_list_for_each_safe() which allows traversal of a list
when the body of the loop may remove the currently iterated item.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2012-11-26 12:54:00 -05:00
Serge Hallyn
c6883f383e Add lxc.autodev
Add a container config option to mount and populate /dev in a container.

We might want to add options to specify a max size for /dev other than
the default 100k, and to specify other devices to create.  And maybe
someone can think of a better name than autodev.

Changelog: Don't error out if we couldn't mknod a /dev/ttyN.
Changelog: Describe the option in lxc.conf manpage.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-11-26 10:02:47 -06:00
Serge Hallyn
f62b344996 dont fail on failure to link kmsg
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-11-14 08:55:07 -06:00
Serge Hallyn
ae9242c86a switch use of #define with static char*
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-11-13 17:54:01 -06:00
Serge Hallyn
c95cf86f39 Revert "Fix check against LXCROOTFSMOUNT to use strcmp"
This reverts commit 5bf2c5ce9b.
2012-11-13 17:50:40 -06:00
Stéphane Graber
5bf2c5ce9b Fix check against LXCROOTFSMOUNT to use strcmp
The check for conf->rootfs.mount not being equal to LXCROOTFSMOUNT
wasn't done with strcmp which was leading to undefined behaviour
and triggered gcc warnings.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2012-11-12 14:39:43 -05:00
Serge Hallyn
17ed13a3bc Support individual hook types in clear_config_item
Without this patch, only clear_config_item("lxc.hook") works.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-11-12 13:18:32 -05:00
Dwight Engen
1f530df632 fix compile without apparmor (against git staging)
Add a few missing #if's to fix compilation when configured without
AppArmor.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-11-12 13:17:54 -05:00
Serge Hallyn
8eb5694baf Add lxc_conf_free()
Then after lxcapi container->create(), free whatever lxc_conf may be
loaded and reload from the newly created configuration file.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-11-12 13:17:30 -05:00
Serge Hallyn
89eaa05ed1 replace HOOK define with proper code.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-11-12 13:17:30 -05:00
Stéphane Graber
427b3a21ef Change lxc_remove_nic from returning int to void
The function wasn't returning anything and none of the callers
were checking for a return code.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2012-11-12 13:17:30 -05:00
Serge Hallyn
5ea6163a62 Add lxc.hook.pre-mount
This happens in the container's namespace, but before the rootfs is
setup and mounted.  This gives us a chance to mangle the rootfs - i.e.
ecryptfs-mount it.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-11-12 13:17:30 -05:00
Stéphane Graber
72d0e1cb2f Merge the liblxc API work by Serge Hallyn.
This turns liblxc into a public library implementing a container structure.
The container structure is meant to cover most LXC commands and can easily be
used to write bindings in other programming languages.

More information on the new functions can be found in src/lxc/lxccontainer.h
Test programs using the API can also be found in src/tests/

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2012-11-12 13:16:16 -05:00
Jan Kiszka
74a2b5864f Add network-down script
Analogously to lxc.network.script.up, add the ability to register a down
script. It is called before the guest network is finally destroyed,
allowing to clean up resources that are not reset/destroyed
automatically. Parameters of the down script are identical to the up
script except for the execution context "down".

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-11-12 12:04:30 -05:00
Serge Hallyn
773fb9cad7 replace HOOK define with proper code.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-10-25 10:51:09 +02:00
Serge Hallyn
1bd051a6b0 link /dev/kmsg to /dev/console in the container
This way init log messages can be seen on the console.  If containerized
syslog ever comes around, we can get rid of this.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-10-25 10:35:08 +02:00
Serge Hallyn
87af3ecd48 log errno when pclose fails
When lxc is executing a script and pclose fails, log the
errno to help debug what happened.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2012-10-25 10:29:53 +02:00
Stéphane Graber
d0a36f2c8b Add missing include for apparmor.h in conf.c
This include is conditional on apparmor being selected.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
2012-10-25 10:22:50 +02:00
Stéphane Graber
9ac3ffb517 Make lxc-execute without rootfs work.
That means, don't try to pin a null rootfs, and don't try to mount /proc
since /var/lib/lxc/root/proc doesn't exist to be mounted onto.
The apparmor patches are not yet upstream, so this patch will not go
upstream by itself.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
2012-10-25 10:19:37 +02:00
Serge Hallyn
30c5d29201 use lxc_putold as pivot_dir put dir, not mnt
Using mnt means that lxc fstab entries do not work when placed under
the container's /mnt/ (i.e. /mnt/etc).

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-07-31 16:07:18 +02:00
Serge Hallyn
e99ee0decc don't try to pin a null rootfs.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-07-31 16:04:33 +02:00
Serge Hallyn
9ba8130c96 switch all sprintfs which can overrun to snprintfs
and check return values

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-07-31 16:04:33 +02:00
Serge Hallyn
80a881b232 templates: use relative paths when creating containers
At the same time, allow lxc.mount.entry to specify an absolute target
path relative to /var/lib/lxc/CN/rootfs, even if rootfs is a blockdev.
Otherwise all such entries are ignored for blockdev-backed containers.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-07-31 16:04:33 +02:00
Serge Hallyn
26ddeedd83 Introduce a first set of container hooks
This patch introduces support for 4 hooks.  We'd like to have 6 in
all to mirror the openvz ones (thanks to Stéphane for this info):

pre-start: in the host namespace before container mounting happens
mount: after container mounting (as per config and /var/lib/lxc/container/fstab)
       but before pivot_root
start: immediately before exec'ing init
stop: in container namespace and in chroot before shutdown
umount: after other unmounting has happened
post-stop: outside of the container

stop and umount are not implemented here because when the kernel kills
the container init, it kills the namespace.  We can probably work around
this, i.e. by keeping the /proc/pid/ns/mnt open, and using that, though
all container tasks including init would still be dead.  Is that worth
pursuing?

start also presents a bit of an issue.  openvz allows a script on the
host to be specified, apparently.  My patch requires the script or
program to exist in the container.  I'm fine with trying to do it the
openvz way, but I wasn't sure what the best way to do that was.  Openvz
(I'm told) opens the script and passes its contents to a bash in the
container.  But that limits the hooks to being only scripts.  By
requiring the hook to be in the container, we can allow any sort of
hook, and assume that any required libraries/dependencies exist
there.

Other than that with this patchset I can add

lxc.hook.pre-start = /var/lib/lxc/p1/pre-start
lxc.hook.mount = /var/lib/lxc/p1/mount
lxc.hook.start = /start
lxc.hook.post-stop = /var/lib/lxc/p1/post-stop

to my /var/lib/lxc/p1/config, and the hooks get executed as expected.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-07-31 16:04:33 +02:00
Serge Hallyn
e075f5d9b6 Introduce apparmor support
This could be done as generic 'lsm_init()' and 'lsm_load()' functions,
however that would make it impossible to compile one package supporting
more than one lsm.  If we explicitly add the selinux, smack, and aa
hooks in the source, then one package can be built to support multiple
kernels.

The smack support should be pretty trivial, and probably very close
to the apparmor support.

The selinux support may require more, including labeling the passed-in
fds (consoles etc) and filesystems.

If someone on the list has the inclination and experience to add selinux
support, please let me know.  Otherwise, I'll do Smack and SELinux.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-07-31 16:04:33 +02:00
Serge Hallyn
0c54752318 pin container's rootfs to prevent read-only remount
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-07-31 16:03:30 +02:00
Serge Hallyn
7c6ef2a2ee add lxc.devttydir config variable
If set, then the console and ttys will be bind-mounted not over /dev/console,
but /dev/<ttydir>/console and then symlinked from there to /dev/console.

Signed-off-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-02-26 10:44:41 +01:00
Christian Seiler
d55bc1adad Accept numeric values for capabilities to drop
lxc.cap.drop now also accepts numeric values for capabilities. This allows
the user to specify capabilities LXC doesn't know about yet or capabilities
that were not part of the kernel headers LXC was compiled against.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-02-23 09:57:13 +01:00
Christian Seiler
5170c71633 Add CAP_SYSLOG and CAP_WAKE_ALARM to list of capabilities
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-02-23 09:57:13 +01:00
Daniel Lezcano
d8f8e35202 Fix network cleanup on error
Network cleanup does not cleanup correctly the virtual interfaces
in case of an error.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-01-05 22:45:32 +01:00
Daniel Lezcano
7ad84da79b fix indentation of the previous patch
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2012-01-05 22:45:31 +01:00
Christian Seiler
49684c0b43 Set high byte of mac addresses for host veth devices to 0xfe
When used in conjunction with a bridge, veth devices with random addresses
may change the mac address of the bridge itself if the mac address of the
interface newly added is numerically lower than the previous mac address
of the bridge. This is documented kernel behavior. To avoid changing the
host's mac address back and forth when starting and/or stopping containers,
this patch ensures that the high byte of the mac address of the veth
interface visible from the host side is set to 0xfe.

A similar logic is also implemented in libvirt.

Fixes SF bug #3411497
See also: <http://thread.gmane.org/gmane.linux.kernel.containers.lxc.general/2709>

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
2012-01-05 22:45:31 +01:00
Matthijs Kooijman
19a26f8214 add autodetection of the gateway address
For veth and macvlan networks, this can look up the host address on the
bridge (link) interface and add a default route on the guest to that
address. This facilitates a typical setup where guests are bridged
together.

syntax:
	lxc.ipv4.gateway = auto
	lxc.ipv6.gateway = auto

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-08-30 23:50:23 +02:00
Matthijs Kooijman
f8fee0e2c3 .gateway configuration
This directive adds a default route to the guest at startup.

syntax:
	lxc.network.ipv4.gateway = 10.0.0.1
	lxc.network.ipv6.gateway = 2001:db8:85a3::8a2e:370:7334

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-08-30 23:50:23 +02:00
Serge Hallyn
c1c75c04a6 print netdev name, not link, after moving dev into netns
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-06-25 15:17:47 +02:00
Sven Wegener
77890c6d6b Check for existing ptmx symlink
It's OK, if /dev/ptmx points to /dev/pts/ptmx via a symlink.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-05-23 23:12:24 +02:00
Sven Wegener
88d413d5b6 Add relatime and strictatime mount options
Also add #ifndef for compability with glibc before 2.12.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-05-23 23:12:24 +02:00
Michael Santos
95642a1068 conf: increase buffer size to include spaces
Signed-off-by: Michael Santos <michael.santos@gmail.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-03-22 15:04:52 +01:00
Daniel Lezcano
071a2b8cc9 fix mount path
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-03-14 21:47:15 +01:00
Daniel Lezcano
d472214b83 rename physical device to the original name
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-03-07 02:08:47 +01:00
Daniel Lezcano
b84f58b9fb factor out networking configuration code
Change the name of the functions and factor some of them.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-03-07 02:08:47 +01:00
Daniel Lezcano
7b57e8b681 fix empty network configuration
The return statement is at the wrong place.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-02-02 21:55:49 +01:00
David Ward
b0efbac48e Only bring up network interface if IFF_UP is set
Each network interface was brought up regardless of the configuration,
as the wrong boolean operator was being used to test the IFF_UP flag.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-01-17 10:18:50 +01:00
Daniel Lezcano
6e35af2e39 set veth host's side always up
We should always have the veth host's side up, otherwise if we omit
the up flag in the configurationn, letting the container to configure
its interface, the network will be never enabled as the host's side
is not up.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-01-17 10:18:50 +01:00
Joerg Gollnick
91656ce587 Fix mntflags initialization
Dear all,
while setting up a container on x86_64 (archlinux host/guest) I had trouble 
with mounting dev/pts and others from container.fstab and a ssh login does not 
work (only ssh container bash -i gives you a shell)
The cause is that conf.c does not initialize mntflags.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-01-16 19:40:46 +01:00
Daniel Lezcano
013bd42848 substitute the absolute rootfs mount path
Change the mount point in the rootfs because we mount the rootfs
in ROOTFSDIR for the pivot. We have to substitute the real mount
path to the new path located in ROOTFSDIR.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-01-13 16:25:14 +01:00
Daniel Lezcano
911324ef25 encapsulate mount point code
Change the code to encapsulate the different mounts point.

 * mount on the host fs
 * mount relatively to the rootfs
 * mount absolutely to the rootfs (broken)

That will make the code cleaner to fix the latter.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-01-13 16:25:14 +01:00
Daniel Lezcano
d330fe7b86 mindless changes to conform indentation
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2011-01-13 16:15:15 +01:00
Michael Tokarev
59760f5d0f Make mount paths relative to rootfs
Why not chdir into the root of container right when
the root filesystem is (bind-)mounted, and let all
mount entries to be relative to the container root?

Even more, to warn if lxc.mount[.entry] contains
absolute path for the destination directory (or a
variation of this, absolute and does not start with
container root mount point)?

This way, all mounts will look much more sane, and
it will be much easier to move/clone containers -
by changing only lxc.rootfs.

I do it this way locally since the beginning, by
chdir'ing to the proper directory (rootfs) before
running lxc-start (in a startup script), but this
is now broken in 0.7.3 which bind-mounts rootfs
somewhere in /usr/lib/lxc.

Signed-off-by: Michael Tokarev<mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-12-17 11:43:36 +01:00
Sergey S. Kostyliov
968fbd3605 add support for dirsync mount option
Add support for `dirsync' mount option. MS_DIRSYNC is on of the
mount(2) mountflags so don't send it as extra mount option to avoid:

 	lxc-start: Invalid argument - failed to mount ...

errors.

Signed-off-by: Sergey S. Kostyliov <rathamahata@gmail.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-30 21:41:19 +02:00
Daniel Lezcano
b3ecde1ec3 Fix compilation error on fc12
The capability header makes the inclusion of the loop header to 
fail. Moving the inclusion of loop.h before capability.h fixes the
problem.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-12 15:11:45 +02:00
Daniel Lezcano
2656d23127 reduce function name
Cosmetic change by reducing the function names.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-12 14:57:57 +02:00
Daniel Lezcano
abbfd20baa use popen and redirect script output
Change the run_script function to use popen and to redirect
the output of the script to the log file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-12 10:52:47 +02:00
Daniel Lezcano
751d9dcd39 fix Coding Style
Fix the coding style, 80 chars lines, etc ...
Fix indentation blocks if ... then ... else ... fi

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-12 10:52:47 +02:00
Stefan Tomanek
e3b4c4c44a add lxc.network.script.up configuration hook
This commit adds an configuration option to specify a script to be
executed after creating and configuring the network used by the
container. The following arguments are passed to the script:

	* container name
	* config section name (net)

Additional arguments depend on the config section employing a
script hook; the following are used by the network system:

	* execution context (up)
	* network type (empty/veth/macvlan/phys)

Depending on the network type, other arguments may be passed:

veth/macvlan/phys:
	* (host-sided) device name

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-12 10:52:47 +02:00
Daniel Lezcano
a6afdde95c allow to specify a image or a device block as rootfs
This patch allows to specify an image or a block device.

The image or the block device is mounted on rootfs->mount.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-03 23:09:36 +02:00
Daniel Lezcano
12297168e9 Initialize default mount point
Let's initialize rootfs->mount to LXCROOTFSMOUNT. The value
will be overwritten by the configuration in case it is specified.

That will make the code nicer, instead of the ugly rootfs->mount checks.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-03 23:09:35 +02:00
Daniel Lezcano
bc9bd0e31e use the rootfs mount point for the tty's
The rootfs is always located in rootfs->mount, let's use it for
the tty.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-03 23:09:35 +02:00
Daniel Lezcano
466978b083 use the rootfs mount point for the console
The rootfs is always located in the mount point now, let's
use it.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-03 23:09:35 +02:00
Daniel Lezcano
ac7787080c mount the rootfs to the mount directory first
Split the rootfs setup by mounting the rootfs to the mount
point. This mount point will be used as the facto place where
the rootfs is placed.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-10-03 23:09:35 +02:00
Daniel Lezcano
cccc74b514 configure container architecture
When a container is installed with 32bits binaries while we are
running on a 64bits host, inside the container we are seen as
64bits arch. That leads to some problems for the package updates
because the scripts will download 64bits packages instead of 32bits.

This patch defines a configuration variable to set the architecture
of the container.

lxc.arch = i686 | x86 | x86_64 | amd64

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-09-13 15:36:20 +02:00
Daniel Lezcano
96bcd56ae2 Dont' try to remove a physical nic on error
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-07-23 15:10:38 +02:00
Daniel Lezcano
6168e99fff fix core dump when using physical interface
If the physical link is not specified in the configuration
the check in if_nametoindex(netdev->link) leads to a segfault.

Check the link is specified.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Reported-by: Ferenc Wagner <wferi@niif.hu>
2010-07-23 15:10:38 +02:00
Daniel Lezcano
fb6d9b2f40 keep the name of the physical interface
When the interface used in the container is a physical
interface from the host, we keep the initial name.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Reported-by: Sabdar <sabdar@wellspringsys.com>
2010-07-22 15:59:44 +02:00
Ciprian Dorin, Craciun
e76b8764fa lxc to apply mount options for bind mounts
Hello all!

    This bug stalked me for a while, but only now it bit me quite
badly... (Lost about an hour of work...)

    So the culprit: inside the fstab file for the `lxc.mount` option I
can use options like `ro` together with `bind`. Unfortunately the
kernel just laughs in my face and ignores any options I've put in
there... :) But not any more: I've updated `./src/lxc/conf.c`
(`mount_file_entries` function) so that when it encounters a `bind`
option it executes it twice (one without any extra options, and a
second time with the remount flag set.)

I've marginally (as in my particular case) tested it and it works.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-06-24 09:47:14 +02:00
Ferenc Wagner
4f9293b1f0 fix comment
Signed-off-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-06-11 15:56:25 +02:00
Ferenc Wagner
3103609ddc change pivotdir default to mnt
The mnt directory has a good chance to already exist in the new root
filesystem, so creation and removal can be avoided.  This also eases
use of read only root filesystems (no configuration necessary).

Signed-off-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-06-11 15:56:25 +02:00
Ferenc Wagner
9527e566fc conditional use of new capabilities
Signed-off-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-06-07 11:33:56 +02:00
Daniel Lezcano
0e391e57b0 fix compilation warnings
Fix the following warnings:

console.c: In function ‘console_handler’:
console.c:252: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result
console.c:254: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result
conf.c: In function ‘instanciate_veth’:
conf.c:1130: warning: ignoring return value of ‘mktemp’, declared with attribute warn_unused_result
conf.c:1135: warning: ignoring return value of ‘mktemp’, declared with attribute warn_unused_result
conf.c: In function ‘instanciate_macvlan’:
conf.c:1206: warning: ignoring return value of ‘mktemp’, declared with attribute warn_unused_result
af_unix.c: In function ‘lxc_af_unix_send_fd’:
af_unix.c:124: warning: dereferencing type-punned pointer will break strict-aliasing rules
af_unix.c: In function ‘lxc_af_unix_recv_fd’:
af_unix.c:169: warning: dereferencing type-punned pointer will break strict-aliasing rules
af_unix.c: In function ‘lxc_af_unix_send_credential’:
af_unix.c:195: warning: dereferencing type-punned pointer will break strict-aliasing rules
af_unix.c: In function ‘lxc_af_unix_rcv_credential’:
af_unix.c:237: warning: dereferencing type-punned pointer will break strict-aliasing rules

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-06-01 12:13:32 +02:00
Daniel Lezcano
5045eedff0 disable rootfs automatic detection
Avoid a warning at compile time by disabling temporary the code.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-28 17:39:11 +02:00
Daniel Lezcano
cc6f6dd7d8 fix pivot umount algorithm
Make a function and fix bad parameter to umount.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-28 17:39:11 +02:00
Daniel Lezcano
b3df193c50 fix whitespace
Fix whitespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-28 11:49:25 +02:00
Cedric Le Goater
2ac29abe45 use ptmxmode mount option
Save one call by using the ptmxmode mount option.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-27 16:16:30 +02:00
Daniel Lezcano
5332bb844a Don't close fd 0, fd 1
That breaks the reboot because when we reexec, fd 0 and fd 1 will be
closed and these one are created by lxc, not inherited. 

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-19 22:15:28 +02:00
Daniel Lezcano
0093bb8ced added locally modified files for broadcast support
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-18 19:13:26 +02:00
Ferenc Wagner
9232212afd fix typos in error messages
Signed-off-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-12 23:47:55 +02:00
Ferenc Wagner
a91d897a7b remove pivotdir only if it was created by us
The removal does not account for possible leading path components that
were also created during creation of pivotdir.

Signed-off-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-12 23:47:55 +02:00
Daniel Lezcano
b1789442d6 use defined rootfs mount point
As we defined a path where to mount the rootfs, we can use without
ambiguity because it is defined by default at compile time or by the
configuration.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-12 23:44:28 +02:00
Daniel Lezcano
33fcb7a047 encapsulate rootfs data in a structure
We have pivot_dir and rootfs defined in lxc_conf structure.
Let's encapsulate them in a rootfs structure.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-12 23:44:28 +02:00
Ferenc Wagner
25368b5249 no need to use a temporary directory for pivoting
Ferenc Wagner <wferi@niif.hu> writes:

> Daniel Lezcano <dlezcano@fr.ibm.com> writes:
>
>> Ferenc Wagner wrote:
>>
>>> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>>>
>>>> Ferenc Wagner wrote:
>>>>
>>>>> While playing with lxc-start, I noticed that /tmp is infested by
>>>>> empty lxc-r* directories: [...] Ok, this name comes from lxc-rootfs
>>>>> in conf.c:setup_rootfs.  After setup_rootfs_pivot_root returns, the
>>>>> original /tmp is not available anymore, so rmdir(tmpname) at the
>>>>> bottom of setup_rootfs can't achieve much.  Why is this temporary
>>>>> name needed anyway?  Is pivoting impossible without it?
>>>>
>>>> That was put in place with chroot, before pivot_root, so the distro's
>>>> scripts can remount their '/' without failing.
>>>>
>>>> Now we have pivot_root, I suppose we can change that to something cleaner...
>>>
>>> Like simply nuking it?  Shall I send a patch?
>>
>> Sure, if we can kill it, I will be glad to take your patch :)
>
> I can't see any reason why lxc-start couldn't do without that temporary
> recursive bind mount of the original root.  If neither do you, I'll
> patch it out and see if it still flies.

For my purposes the patch below works fine.  I only run applications,
though, not full systems, so wider testing is definitely needed.

Thanks,
Feri.

>From 98b24c13f809f18ab8969fb4d84defe6f812b25c Mon Sep 17 00:00:00 2001
Date: Thu, 6 May 2010 14:47:39 +0200

That was put in place before lxc-start started using pivot_root, so
the distro scripts can remount / without problems.

Signed-off-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-10 11:50:10 +02:00
Daniel LEzcano
0b7a835335 factor out pivot_root code
Clean up and factor a bit the pivot_root code.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-05-10 11:50:09 +02:00
Daniel Lezcano
1b09f2c057 fix pivot_root temporary directory
First of all, when trying to start a container in a read-only root
lxc-start complains:
  lxc-start: Read-only file system - can't make temporary mountpoint

This is in conf.c:setup_rootfs_pivot_root() function.  That function
uses optional parameter "lxc.pivotdir", or creates (and later removes)
a temporary directory for pivot_root.  Obviously there's no way to
create a directory in a read-only filesystem.

But lxc.pivotdir does not work either. In the function mentioned above
it is used with leading dot (eg. if I specify "lxc.pivotdir=pivot" in
the config file the pivot_root() syscall will be made to ".pivot" with
leading dot, not to "pivot"), but later on it is used without that dot,
and fails:

  lxc-start: No such file or directory - failed to open /pivot/proc/mounts
  lxc-start: No such file or directory - failed to read or parse mount list '/pivot/proc/mounts'
  lxc-start: failed to pivot_root to '/stage/t'

(that's with "lxc.pivotdir = pivot" in the config file).  After symlinking
pivot to .pivot it still fails:

  lxc-start: Device or resource busy - could not unmount old rootfs
  lxc-start: failed to pivot_root to '/stage/t'

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
2010-05-10 11:50:09 +02:00
Michel Normand
3cfc0f3a65 lxc: remove perror call in nl.c (V2)
There is only one such perror call, so remove it in nl.c

In this same patch, verify that all functions of nl.c and network.c
are reporting a -errno value in case of error;
value that is reported in lxc log by the callers in conf.c

Signed-off-by: Michel Normand <normand@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-04-29 10:03:59 +02:00
Daniel Lezcano
91480a0f0a restart the container at reboot
When the reboot is detected, reboot the container.
That needs to set all file descriptor opened by lxc-start
to be flagged with the close-on-exec flag, otherwise when
re-execing ourself, we inherit our own fd.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-04-08 09:44:23 +02:00
Daniel Lezcano
f78a1f32f4 fix when console is not specified
When no console is specified, do not try to setup the console.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-04-02 23:37:42 +02:00
Michel Normand
adc1e6c25d typo in error message
Wrong variable.

Signed-off-by: Michel Normand <normand@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-03-22 11:08:34 +01:00
Daniel Lezcano
28a4b0e55c open the console later
Open the console at the setup time, otherwise the openeded
file descriptor will be considered as an inherited fd and the
startup will fail.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-03-22 11:08:34 +01:00
Daniel Lezcano
7fef7a06d8 fix network devices cleanup on error
Delete the network devices when an error occurs before they are moved
to the network namespace (network namespace destruction triggers the
network devices deletion). Otherwise they stay in the system.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-02-25 10:24:13 +01:00
Daniel Lezcano
c08556c6ec use lazy umount when umount returns EBUSY
When the umount fails, we force the umount and make the mount point
unaccessible by using a lazy umount.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-02-24 10:57:43 +01:00
Daniel Lezcano
63376d7db3 allocate a console to be proxied
The actual behaviour of the console is messy as:
 * it relies on a heuristic (tty or not, rootfs or not, etc ...)
 * the container init stole the tty and we lose the control

The following patch:
 * allocates a tty
 * maps this tty to the container console
 * proxy the io from the console to the file specified in the configuration
 lxc.console=<file>

That allows to specify a file, a fifo, a $(tty), and can be extended with an
uri like file://mypath, net://1.2.3.4:1234, etc ...
That solves the problem with the heuristic and the container does no longer stole
our current tty.

Note by default, the console output will go to a blackhole if no configuration is
specified making the container showing nothing.

In order to access the console from the tty, use

 lxc-start -n foo -s lxc.console=$(tty)

I propose the make the container to daemonize by default now.

I tried the following:

 in a shell:
  touch /var/lib/lxc/foo/console
  tail --retry -f /var/lib/lxc/foo/console
 in another shell:
  lxc-start -n foo -s lxc.console=/var/lib/lxc/foo/console

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-02-24 10:57:43 +01:00
Daniel Lezcano
246541036c rename network type enum
Use a prefixed enum to avoid conflict later.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-02-24 10:57:43 +01:00
Daniel Lezcano
236087a6c8 fix empty network namespace
When there is an empty network namespace, we must not move the
network device.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-02-24 10:57:42 +01:00
Daniel Lezcano
7df119eeae unmount failure is not fatal
There are several cases where the system can no longer access a mount
point or a mount point configuration makes the algorithm bogus.

For example, we mount something and then we chroot, the mount information
will give an unaccessible path and the container won't be able to start
because this mount point will be unaccessible. But if it's the case, then
we can just warn and continue running the container.

Another case is the path to a mount point is not accessible because there
is another mount point on top of it hiding the mount point. So the umount
will fail and the container won't start.

Easy to reproduce:

mkdir -p /tmp/dir1/dir2
mount -t tmpfs tmpfs /tmp/dir1/dir2
mount -t tmpfs tmpfs /tmp/dir1

So can we just ignore the error when unmounting and continue to the list again
and again until it shrinks.

At the end, we just display the list of the unmounted points.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-01-22 11:29:10 +01:00
Michel Normand
b09094da2d Add some define to compile on rhel5u1
the last patch commit 81810dd120
make lxc to not compile anymore on rhel5u1

Signed-off-by: Michel Normand <normand@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-01-22 11:21:46 +01:00
Daniel Lezcano
1e11be345d fix tab vs space indentation
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-01-21 15:15:26 +01:00
Daniel Lezcano
81810dd120 drop capabilities
Hello everyone!

I've written a patch which adds a new config keyword
'lxc.cap.drop'. This keyword allows to specify capabilities which are
dropped before executing the container binary.

Example:

lxc.cap.drop = sys_chroot
lxc.cap.drop = mknod
lxc.cap.drop = sys_module

or specify in a single line:

lxc.cap.drop = sys_chroot mknod sys_module

Reworked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Michael Holzt <lxc@my.fqdn.org>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-01-21 15:06:42 +01:00
Clement Calmels
2382ecffdb use getline instead of fgets
The getline function allocate the needed memory. Fix buffer can lead
to 'hard to find' bug. I don't test the pivot_root part but the other
parts are ok.

Signed-off-by: Clement Calmels <clement.calmels@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-01-18 23:08:12 +01:00
Cedric Le Goater
7a7ff0c6fb fix lxc_file_cb prototype
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-01-13 18:51:15 +01:00
Daniel Lezcano
932b94f5de Remove dead code
Remove dead code.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-01-08 15:19:19 +01:00
Michael Holtz
bf601689a9 use pivot_root instead of chroot
lxc currently does a chroot into the target rootfs. chroot is insecure and
can easily be broken, as demonstrated here:

| root@synergy:~# touch /this_is_the_realrootfs_ouch
| # touch /container/webhost/this_is_the_container
| # lxc-start -n webhost /bin/sh
| # ls this*
| this_is_the_container
| # ./breakchroot
| # ls this*
| this_is_the_realrootfs_ouch

code to break chroot taken from
http://www.bpfh.net/simes/computing/chroot-break.html

Now this can be fixed. As our container has his own mount namespace, we can
easily pivot_root into the rootfs and then unmount all old mounts. The patch
attached add a new config keyword which contains the path to a temporary
mount for the old rootfs (inside the container). This stops the chroot break
method shown before. 

Example:

| root@synergy:~# grep pivotdir /var/lib/lxc/webhost/config
| lxc.pivotdir = /oldrootfs
| root@synergy:~# ls -lad /container/webhost/oldrootfs
| drwxr-xr-x 2 root root 4096 2010-01-02 03:59 /container/webhost/oldrootfs
| root@synergy:~# lxc-start -n webhost /bin/sh
| # mount -t proc proc /proc
| # cat /proc/mounts
| rootfs / rootfs rw 0 0
| /dev/root / ext3 rw,relatime,errors=remount-ro,data=writeback 0 0
| devpts /dev/console devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
| proc /proc proc rw,relatime 0 0
| # ls this*   
| this_is_the_container
| # ./breakchroot
| # ls this*
| this_is_the_container

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Michael Holtz <lxc@my.fqdn.org>
2010-01-08 14:34:13 +01:00
Michel Normand
7b379ab3a5 lxc: avoid memory corruption on ppc and s390 V4
conf object is on stack and is used in forked process.

Signed-off-by: Michel Normand <normand@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2010-01-08 14:34:13 +01:00
Daniel Lezcano
e892973e39 add macvlan vepa and bridge mode
The future kernel 2.6.33 will incorporate the macvlan bridge
mode where all the macvlan will be able to communicate if they are
using the same physical interface. This is an interesting feature
to have containers to communicate together. If we are outside of the
container, we have to setup a macvlan on the same physical interface than
the containers and use it to communicate with them.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-12-28 22:10:11 +01:00
Daniel Lezcano
1d6b1976a0 fix mount entry typo
Added missing carriage-return when adding a new entry.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-12-18 14:19:59 +01:00
Jamal Hadi Salim
f6cc1de1a9 Introduce per netdev priv structure
Some devices like veth or vlans have a bit of extra details that
are specific to them. Example veth.pair and vlan.vlanid.
Separate them from the common so we can update cleanly in the future.
    
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-12-15 10:14:27 +01:00
Jamal Hadi Salim
26c390288b Add VLAN support in config
This adds ability to migrate vlan interfaces into namespaces
by specifying them in a config
    
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-12-15 10:14:27 +01:00
Sven Wegener
e4e7d59db8 use correct number of ttys during setup
commit 985d15b106 "fix fdleak and errors
in lxc_create_tty()" created a zero-sized malloc(), causing memory
corruption. use config->tty like all the other code does.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-11-26 16:46:23 +01:00
Michael Tokarev
6ab9ab6d08 minor cleanups for instanciate_veth()
the same cleanup as in instanciate_macvlan(). Just makes code
shorter and less "jumpy" (as with goto back)

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-11-26 16:46:23 +01:00
Michael Tokarev
8634bc197f allow lxc.network.pair to specify host-side name for veth interface
Currently we allocate veth device with random name on host side,
so that things like firewall rules or accounting does not work
at all.  Fix this by recognizing yet anothe keyword to specify
the host-side device name: lxc.network.pair, and use it instead
of random name if specified.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-11-26 16:46:23 +01:00
Michael Tokarev
985d15b106 fix fdleak and errors in lxc_create_tty()
if, for some reason, openpty() fails, lxc_create_tty() will
leak all previous ptys and leave the config structure in a
inconsistent state (wrt the number of ptys actually opened)
Fix that by explicitly closing all previously opened ptys
in case of failure and by setting number of actually opened
ttys after actual open

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-11-24 09:47:27 +01:00
Michael Tokarev
d957ae2d51 check if lxc.netdev.link is set for macvlan
Ensure that lxc.netdev.link is specified for macvlan interfaces,
since it's required.

While at it, simplify logic in instanciate_macvlan():
remove unnecessary-complicating goto statements (we only
need to perform a cleanup in one place)

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-11-20 15:01:31 +01:00
Michael Tokarev
734915aca1 allow link-less veth devices
Before, a veth device pair required a link which was treated as
a bridge device.  Code crashed if there was no lxc.network.link
specified.  Fix that by allowing lxc.network.link to be unset

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-11-20 15:01:31 +01:00
Michael Tokarev
9d0834025e rename struct lxc_netdev fields to match reality
struct lxc_netdev is used to hold information from cnfig file
about a network device/configuration.  Make the fields of this
structure to be named similarily with the config file keywords,
namely:
 s/ifname/link/ - host-side link for the device (bridge or eth0)
 s/newname/name/ - container-side ifname
It is insane to have completely different names in config file
and in structure/variable names :)

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-11-19 15:06:02 +01:00
Cedric Le Goater
00b3c2e284 cleanup <lxc/lxc.h>
<lxc/lxc.h>  should only include what is needed. This patch removes
all useless headers from lxc.h and fixed other .c files.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-11-17 10:56:23 +01:00
Michel Normand
e7938e9ee3 lxc: add a new lxc.mount.entry keyword
The purpose of this new keyword is to save in main config file
all the lines of a provided fstab file.
This will ultimately replace the the lxc.mount keyword
when lxc scripts will use the new keyword.

Warning: I did not validated this patch
in all conditions of provided malformed input string.

Signed-off-by: Michel Normand <michel_mno@laposte.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-11-13 11:48:29 +01:00
Michel Normand
88329c69cd stop config reading if cgroup setting failed
in today's code lxc-start to not stop if setup_cgroup is detecting an error

Signed-off-by: Michel Normand <michel_mno@laposte.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
2009-11-13 11:48:29 +01:00