LXC has supported the bpf device controlller for a while now. A bpf device
program can be attached to the container's cgroup if this is a pure cgroup2
host.
The format for specifying device rules for the cgroup2 bpf device controller is
the same as for the legacy cgroup device controller; only the configuration key
prefix has to change. Specifically, device rules for the legacy cgroup device
controller are specified by via lxc.cgroup.devices.{allow,deny} whereas for the
cgroup2 bpf device controller lxc.cgroup2.devices.{allow,deny} must be used.
The following semantics apply:
1. The device rule "lxc.cgroup2.devices.deny = a" will cause LXC to instruct
the kernel to block access to all devices by default. To grant access to
devices "allow device rules" must be added via the
"lxc.cgroup2.devices.allow" key. This is referred to as a "allowlist" device
program.
2. The device rule "lxc.cgroup2.devices.allow = a" will cause LXC to instruct
the kernel to allow access to all devices by default. To deny access to
devices "deny device rules" must be added via "lxc.cgroup2.devices.deny"
key. This is referred to as a "denylist" device program.
3. Specifying a rule as explained in 1. or 2. will cause all previous rules to
be cleared, i.e. the device list will be reset.
For example the set of rules:
lxc.cgroup2.devices.deny = a
lxc.cgroup2.devices.allow = c *:* m
lxc.cgroup2.devices.allow = b *:* m
lxc.cgroup2.devices.allow = c 1:3 rwm
implements a "allowlist" device program, i.e. the kernel will block access to
all devices not specifically allowed in this list. This particular program
states that all character and block devices might be created but only /dev/null
might be read or written.
If we to switch to the set of rules to:
lxc.cgroup2.devices.allow = a
lxc.cgroup2.devices.deny = c *:* m
lxc.cgroup2.devices.deny = b *:* m
lxc.cgroup2.devices.deny = c 1:3 rwm
then LXC would instruct the kernel to implement a "denylist", i.e. the kernel
will allow access to all devices not specifically denied in this list. This
particular program states that no character devices or block devices might be
created and that /dev/null is not allow allowed to be read, written, or
created.
Consider the same program but followed by a rule as explained in 1. or 2.:
lxc.cgroup2.devices.allow = a
lxc.cgroup2.devices.deny = c *:* m
lxc.cgroup2.devices.deny = b *:* m
lxc.cgroup2.devices.deny = c 1:3 rwm
lxc.cgroup2.devices.allow = a
The last line will cause LXC to reset the device list without changing the type
of device program.
lxc.cgroup2.devices.allow = a
lxc.cgroup2.devices.deny = c *:* m
lxc.cgroup2.devices.deny = b *:* m
lxc.cgroup2.devices.deny = c 1:3 rwm
lxc.cgroup2.devices.deny = a
The last line will cause LXC to reset the device list and switch from a
"allowlist" program to a "denylist" program.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>