The seccomp notify API has a few variables: The struct sizes
are queried at runtime, and we now also have a user
configured cookie.
This means that with a SOCK_STREAM connection the proxy
needs to carefully read() the right amount of data based on
the contents of our proxy message struct to avoid ending up
in the middle of a packet.
While for now this may not be too tragic, since we currently
only ever send a single packet and then wait for the
response, we may at some point want to be able to handle
multiple processes simultaneously, hence it makes sense to
switch to a packet based connection.
So switch to using SOCK_SEQPACKET which is packet based,
(and also guarantees ordering). The `MSG_PEEK` flag can be
used with `recvmsg()` to figure out a packet's size on the
other end, and usually the size *should* not change after
that for an existing connection from a running container.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
The previous API doesn't reflect the fact that
`seccomp_notif` and `seccomp_notif_resp` are allocatd
dynamically with sizes figured out at runtime.
We now query the sizes via the seccomp(2) syscall and change
`struct seccomp_notify_proxy_msg` to contain the sizes
instead of the data, with the data following afterwards.
Additionally it did not provide a convenient way to identify
the container the message originated from, for which we now
include a cookie configured via `lxc.seccomp.notify.cookie`.
Since we currently always send exactly one request and await
the response immediately, verify the `id` in the client's
response.
Finally, the proxy message's "version" field is removed, and
we reserve 64 bits in its place.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
This restores the lxc.net.x.ipv4.gateway = auto and
lxc.net.x.ipv6.gateway = auto functionality.
When the child is created the parent and child have different views of
struct lxc_handler since - obviously - virtual memory is duplicated. So any
changes to done by the parent that the child should see need to be IPCed to it.
For any non-actual device creation stuff this does not make much sense. This
includes finding gateway addresses. Move it back prior to clone().
Fixes#3078
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
[christian.brauner@ubuntu.com: non-functional changes and update commit message]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Make sure that network creation happens at the same time for containers started
by privileged and unprivileged users. The only reason we didn't do this so far
was to avoid sending network device ifindices around in the privileged case.
Link: https://github.com/lxc/lxc/issues/3066
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>