mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-11-15 22:43:55 +00:00
When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflicts for each other. In the normal non-CET case this is handled in two ways. With regular CLONE_VM a new stack is provided by userspace such that the parent and child have different stacks. For vfork, the parent is suspended until the child exits. So as long as the child doesn't return from the vfork()/CLONE_VFORK calling function and sticks to a limited set of operations, the parent and child can share the same stack. For shadow stack, these scenarios present similar sharing problems. For the CLONE_VM case, the child and the parent must have separate shadow stacks. Instead of changing clone to take a shadow stack, have the kernel just allocate one and switch to it. Use stack_size passed from clone3() syscall for thread shadow stack size. A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32-bit address space. The clone() does not pass stack_size, which was added to clone3(). In that case, use RLIMIT_STACK size and cap to 4 GB. For shadow stack enabled vfork(), the parent and child can share the same shadow stack, like they can share a normal stack. Since the parent is suspended until the child terminates, the child will not interfere with the parent while executing as long as it doesn't return from the vfork() and overwrite up the shadow stack. The child can safely overwrite down the shadow stack, as the parent can just overwrite this later. So CET does not add any additional limitations for vfork(). Free the shadow stack on thread exit by doing it in mm_release(). Skip this when exiting a vfork() child since the stack is shared in the parent. During this operation, the shadow stack pointer of the new thread needs to be updated to point to the newly allocated shadow stack. Since the ability to do this is confined to the FPU subsystem, change fpu_clone() to take the new shadow stack pointer, and update it internally inside the FPU subsystem. This part was suggested by Thomas Gleixner. Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-30-rick.p.edgecombe%40intel.com
70 lines
2.1 KiB
C
70 lines
2.1 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _ASM_X86_FPU_SCHED_H
|
|
#define _ASM_X86_FPU_SCHED_H
|
|
|
|
#include <linux/sched.h>
|
|
|
|
#include <asm/cpufeature.h>
|
|
#include <asm/fpu/types.h>
|
|
|
|
#include <asm/trace/fpu.h>
|
|
|
|
extern void save_fpregs_to_fpstate(struct fpu *fpu);
|
|
extern void fpu__drop(struct fpu *fpu);
|
|
extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
|
|
unsigned long shstk_addr);
|
|
extern void fpu_flush_thread(void);
|
|
|
|
/*
|
|
* FPU state switching for scheduling.
|
|
*
|
|
* This is a two-stage process:
|
|
*
|
|
* - switch_fpu_prepare() saves the old state.
|
|
* This is done within the context of the old process.
|
|
*
|
|
* - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
|
|
* will get loaded on return to userspace, or when the kernel needs it.
|
|
*
|
|
* If TIF_NEED_FPU_LOAD is cleared then the CPU's FPU registers
|
|
* are saved in the current thread's FPU register state.
|
|
*
|
|
* If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not
|
|
* hold current()'s FPU registers. It is required to load the
|
|
* registers before returning to userland or using the content
|
|
* otherwise.
|
|
*
|
|
* The FPU context is only stored/restored for a user task and
|
|
* PF_KTHREAD is used to distinguish between kernel and user threads.
|
|
*/
|
|
static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
|
|
{
|
|
if (cpu_feature_enabled(X86_FEATURE_FPU) &&
|
|
!(current->flags & (PF_KTHREAD | PF_USER_WORKER))) {
|
|
save_fpregs_to_fpstate(old_fpu);
|
|
/*
|
|
* The save operation preserved register state, so the
|
|
* fpu_fpregs_owner_ctx is still @old_fpu. Store the
|
|
* current CPU number in @old_fpu, so the next return
|
|
* to user space can avoid the FPU register restore
|
|
* when is returns on the same CPU and still owns the
|
|
* context.
|
|
*/
|
|
old_fpu->last_cpu = cpu;
|
|
|
|
trace_x86_fpu_regs_deactivated(old_fpu);
|
|
}
|
|
}
|
|
|
|
/*
|
|
* Delay loading of the complete FPU state until the return to userland.
|
|
* PKRU is handled separately.
|
|
*/
|
|
static inline void switch_fpu_finish(void)
|
|
{
|
|
if (cpu_feature_enabled(X86_FEATURE_FPU))
|
|
set_thread_flag(TIF_NEED_FPU_LOAD);
|
|
}
|
|
|
|
#endif /* _ASM_X86_FPU_SCHED_H */
|