RW bind mounts need to be restricted for some paths in
order to avoid MAC restriction bypasses, but read-only bind
mounts shouldn't have that problem.
Additionally, combinations of 'nosuid', 'nodev' and
'noexec' flags shouldn't be a problem either and are
required with newer systemd versions, so let's allow those
as long as they're combined with 'ro,remount,bind'.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
For generated profiles with apparmor namespaces we get
profile names with slashes in them. To match those, we need
to allow changing to lxc-**, not just lxc-*.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
openSUSE Leap 15 is using --libdir=/usr/lib64 when building for
x86_64 so we need to allow this path in the apparmor profiles.
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1099239
Signed-off-by: Markos Chandras <mchandras@suse.de>
- remove legacy binaries
- conditionalize creation of docs and tests for the command line tools and the
shared library helper commands
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This patch allows users to start containers in AppArmor namespaces.
Users can define their own profiles for their containers, but
lxc-start must be allowed to change to a namespace.
A container configuration file can wrap a container in an AppArmor
profile using lxc.aa_profile.
A process in an AppArmor namespace is restricted to view
or manage only the profiles belonging to this namespace, as if no
other profiles existed. A namespace can be created as follow:
sudo mkdir /sys/kernel/security/apparmor/policy/namespaces/$NAMESPACE
AppArmor can stack profiles so that the contained process is bound
by the intersection of all profiles of the stack. This is achieved
using the '//&' operator as follow:
lxc.aa_profile = $PROFILE//&:$NAMESPACE://unconfined
In this case, even the guest process appears unconfined in the
namespace, it is still confined by $PROFILE.
A guest allowed to access "/sys/kernel/security/apparmor/** rwklix,"
will be able to manage its own profile set, while still being
enclosed in the topmost profile $PROFILE:
Different guests can be assigned the same namespace or different
namespaces. In the first case, they will share their profiles.
In the second case, they will have distinct sets of profiles.
This is validated on privileged containers.
Signed-off-by: Frédéric Dalleau <frederic.dalleau@collabora.com>
The profile already contains
mount options=(rw, make-slave) -> **,
Which allows going through all mountpoints with make-slave,
so it seems to make sense to also allow the directly
recursive variant with "make-rslave".
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Properly list all of the states and the right apparmor stanza for them,
then comment them all as actually enabling this would currently let the
user bypass apparmor entirely.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Bind-mounts aren't harmful in containers, so long as they're not used to
bypass MAC policies.
This change allows bind-mounting of any path which isn't a dangerous
filesystem that's otherwise blocked by apparmor.
This also allows switching paths {r}shared or {r}private.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Prevent privileged containers from messing with the host's pci devices
directly. Refuse access under /proc/bus, and drop cap_sys_rawio. Some
containers may need to re-enable cap_sys_rawio (i.e. if they run an
X server).
It may be desirable to break some of this stuff into files which can be
separately included (or not included), but this patch isn't the right
place for that.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Unprivileged containers cannot read it anyway, but also prevent root
owned containers from doing so. Sadly upstart's mountall won't run
if we try to prevent it from being mounted at all.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This reverts commit 833bf9c2b2.
This change wasn't actually safe and is now superseded by the cgns profile.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
This isn't safe for privileged containers which do not use cgroup
namespaces, but is required for systemd containers with cgroup
namespaces. So create a new profile for it which lxc will use as
the default when it knows it can.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Some systems need to be able to bind-mount /run to /var/run
and /run/lock to /var/run/lock. (Tested with opensuse 13.1
containers migrated from openvz.)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
This fixes invocations of certain commands when python3 is installed in
a nonstandard path (/usr/local/bin, for example).
Signed-off-by: Fox Wilson <2016fwilson@tjhsst.edu>
Newer kernels have added a new restriction: if /proc or /sys on the
host has files or non-empty directories which are over-mounted, and
there is no /proc which fully visible, then it assumes there is a
"security" reason for this. It prevents anyone in a non-initial user
namespace from creating a new proc or sysfs mount.
To work around this, this patch adds a new 'nesting.conf' which can be
lxc.include'd from a container configuration file. It adds a
non-overmounted mount of /proc and /sys under /dev/.lxc, so that the
kernel can see that we're not trying to *hide* things like /proc/uptime.
and /sys/devices/virtual/net. If the host adds this to the config file
for container w1, then container w1 will support unprivileged child
containers.
The nesting.conf file also sets the apparmor profile to the with-nesting
variant, since that is required anyway. This actually means that
supporting nesting isn't really more work than it used to be, just
different. Instead of adding
lxc.aa_profile = lxc-container-default-with-nesting
you now just need to
lxc.include = /usr/share/lxc/config/nesting.conf
(Look, fewer characters :)
Finally, in order to maintain the current apparmor protections on
proc and sys, we make /dev/.lxc/{proc,sys} non-read/writeable.
We don't need to be able to use them, we're just showing the
kernel what's what.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Because we now create the ttys from inside the container, we had to
add an apparmor rule for start-container to bind-mount /dev/pts/** -> /dev/tty*/.
However that's not sufficient if the container sets lxc.ttydir, in
which case we need to support mounting onto files in subdirs of /dev.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Lxc has always created the ptys for use by console and ttys early
on from the monitor process. This has some advantages, but also
has disadvantages, namely (1) container ptys counting against the
max ptys for the host, and (2) not having a /dev/pts/N in the
container to pass to getty. (2) was not a problem for us historically
because we bind-mounted the host's /dev/pts/N onto a /dev/ttyN in
the container. However, systemd hardocdes a check for container_ttys
that the path have 'pts/' in it. If it were only for (2) I'd have
opted for a systemd patch to check the device major number, but (1)
made it worth moving the openpty to the container namespace.
So this patch moves the tty creation into the task which becomes
the container init. It then passes the fds for the opened ptys
back to the monitor over a unix socketpair (for use by lxc-console).
The /dev/console is still created in the monitor process, so that
it can for instance be used by lxc.logfd.
So now if you have a foreground container with lxc.tty = 4, you
should end up with one host /dev/pts entry per container rather than 5.
And lxc-console now works with systemd containers.
Note that if the container init mounts its own devpts over the
one mounted by lxc, the tty /dev/pts/n will be hidden. This is ok
since it's only systemd that needs it, and systemd won't do that.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Just like we block access to mem and kmem, there's no good reason for
the container to have access to kcore.
Reported-by: Marc Schaefer
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Restrict signal and ptrace for processes running under the container
profile. Rules based on AppArmor base abstraction. Add unix rules for
processes running under the container profile.
Signed-off-by: Jamie Strandboge <jamie@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Without this, if the system uses shared subtrees by default (like systemd), you
get a large stream of
lxc-start: Permission denied - Failed to make /<mountpoint> rslave
lxc-start: Continuing...
with
apparmor="DENIED" operation="mount" info="failed flags match" error=-13
profile="/usr/bin/lxc-start" name="/" pid=17284 comm="lxc-start" flags="rw, slave"
and eventual failure plus a lot of leftover mounts in the host.
https://launchpad.net/bugs/1325468
/proc/sys/kernel/sem* and /proc/sys/kernel/msg* are ipc sysctls
which are properly namespaced. Allow writes to them from
containers.
Reported-by: Dan Kegel <dank@kegel.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Recent fixes in the apparmor kernel code is now making at least the CI
environment and quite possibly some others fail due to an invalid path
in the pivot_root stanza.
So update both lines to allow a more generic pivot_root call for
anything in LXC's work directory.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Allow writes to kernel.shm*, net.*, kernel/domainname and
kernel/hostname,
Also fix a bug in the lxc-generate-aa-rules.py script in a
path which wasn't being exercised before, which returned a
path element rather than its child.
Changelog (v2): remove trailing / from block path
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
This uses the generate-apparmor-rules.py script I sent out some time
ago to auto-generate apparmor rules based on a higher level set of
block/allow rules.
Add apparmor policy testcase to make sure that some of the paths we
expect to be denied (and allowed) write access to are in fact in
effect in the final policy.
With this policy, libvirt in a container is able to start its
default network, which previously it could not.
v2: address feedback from stgraber
put lxc-generate-aa-rules.py into EXTRA_DIST
add lxc-test-apparmor, container-base and container-rules to .gitignore
take lxc-test-apparmor out of EXTRA_DIST
make lxc-generate-aa-rules.py pep8-compliant
don't automatically generate apparmor rules
This is only bc we can't be guaranteed that python3 will be
available.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Leave the line to do it (commented out) as some users may not be
using cgmanager, and may in fact still need those mounts.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>