mirror of
https://git.proxmox.com/git/mirror_lxc
synced 2025-08-05 06:19:25 +00:00
lxc_clone: pass non-stack allocated stack to clone
There are two problems with this code: 1. The math is wrong. We allocate a char *foo[__LXC_STACK_SIZE]; which means it's really sizeof(char *) * __LXC_STACK_SIZE, instead of just __LXC_STACK SIZE. 2. We can't actually allocate it on our stack. When we use CLONE_VM (which we do in the shared ns case) that means that the new thread is just running one page lower on the stack, but anything that allocates a page on the stack may clobber data. This is a pretty short race window since we just do the shared ns stuff and then do a clone without CLONE_VM. However, it does point out an interesting possible privilege escalation if things aren't configured correctly: do_share_ns() sets up namespaces while it shares the address space of the task that spawned it; once it enters the pid ns of the thing it's sharing with, the thing it's sharing with can ptrace it and write stuff into the host's address space. Since the function that does the clone() is lxc_spawn(), it has a struct cgroup_ops* on the stack, which itself has function pointers called later in the function, so it's possible to allocate shellcode in the address space of the host and run it fairly easily. ASLR doesn't mitigate this since we know exactly the stack offsets; however this patch has the kernel allocate a new stack, which will help. Of course, the attacker could just check /proc/pid/maps to find the location of the stack, but they'd still have to guess where to write stuff in. The thing that does prevent this is the default configuration of apparmor. Since the apparmor profile is set in the second clone, and apparmor prevents ptracing things under a different profile, attackers confined by apparmor can't do this. However, if users are using a custom configuration with shared namespaces, care must be taken to avoid this race. Shared namespaces aren't widely used now, so perhaps this isn't a problem, but with the advent of crio-lxc for k8s, this functionality will be used more. Signed-off-by: Tycho Andersen <tycho@tycho.ws>
This commit is contained in:
parent
7aea50feb9
commit
c74e921744
@ -56,19 +56,23 @@ static int do_clone(void *arg)
|
||||
#define __LXC_STACK_SIZE 4096
|
||||
pid_t lxc_clone(int (*fn)(void *), void *arg, int flags, int *pidfd)
|
||||
{
|
||||
size_t stack_size;
|
||||
pid_t ret;
|
||||
struct clone_arg clone_arg = {
|
||||
.fn = fn,
|
||||
.arg = arg,
|
||||
};
|
||||
char *stack[__LXC_STACK_SIZE] = {0};
|
||||
stack_size = __LXC_STACK_SIZE;
|
||||
void *stack;
|
||||
|
||||
stack = malloc(__LXC_STACK_SIZE);
|
||||
if (!stack) {
|
||||
SYSERROR("Failed to allocate clone stack");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
#ifdef __ia64__
|
||||
ret = __clone2(do_clone, stack, stack_size, flags | SIGCHLD, &clone_arg, pidfd);
|
||||
ret = __clone2(fn, stack, __LXC_STACK_SIZE, flags | SIGCHLD, &clone_arg, pidfd);
|
||||
#else
|
||||
ret = clone(do_clone, stack + stack_size, flags | SIGCHLD, &clone_arg, pidfd);
|
||||
ret = clone(fn, stack + __LXC_STACK_SIZE, flags | SIGCHLD, &clone_arg, pidfd);
|
||||
#endif
|
||||
if (ret < 0)
|
||||
SYSERROR("Failed to clone (%#x)", flags);
|
||||
|
Loading…
Reference in New Issue
Block a user