Commit Graph

282 Commits

Author SHA1 Message Date
Chang S. Bae
471f0aa7fa x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly
When an extended state component is not present in fpstate, but in init
state, the function copies from init_fpstate via copy_feature().

But, dynamic states are not present in init_fpstate because of all-zeros
init states. Then retrieving them from init_fpstate will explode like this:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 ...
 RIP: 0010:memcpy_erms+0x6/0x10
  ? __copy_xstate_to_uabi_buf+0x381/0x870
  fpu_copy_guest_fpstate_to_uabi+0x28/0x80
  kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm]
  ? __this_cpu_preempt_check+0x13/0x20
  ? vmx_vcpu_put+0x2e/0x260 [kvm_intel]
  kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? __fget_light+0xd4/0x130
  __x64_sys_ioctl+0xe3/0x910
  ? debug_smp_processor_id+0x17/0x20
  ? fpregs_assert_state_consistent+0x27/0x50
  do_syscall_64+0x3f/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Adjust the 'mask' to zero out the userspace buffer for the features that
are not available both from fpstate and from init_fpstate.

The dynamic features depend on the compacted XSAVE format. Ensure it is
enabled before reading XCOMP_BV in init_fpstate.

Fixes: 2308ee57d9 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Yuan Yao <yuan.yao@intel.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/
Link: https://lkml.kernel.org/r/20221021185844.13472-1-chang.seok.bae@intel.com
2022-10-21 15:22:09 -07:00
Chang S. Bae
a401f45e38 x86/fpu: Exclude dynamic states from init_fpstate
== Background ==

The XSTATE init code initializes all enabled and supported components.
Then, the init states are saved in the init_fpstate buffer that is
statically allocated in about one page.

The AMX TILE_DATA state is large (8KB) but its init state is zero. And the
feature comes only with the compacted format with these established
dependencies: AMX->XFD->XSAVES. So this state is excludable from
init_fpstate.

== Problem ==

But the buffer is formatted to include that large state. Then, this can be
the cause of a noisy splat like the below.

This came from XRSTORS for the task with init_fpstate in its XSAVE buffer.
It is reproducible on AMX systems when the running kernel is built with
CONFIG_DEBUG_PAGEALLOC=y and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y:

 Bad FPU state detected at restore_fpregs_from_fpstate+0x57/0xd0, reinitializing FPU registers.
 ...
 RIP: 0010:restore_fpregs_from_fpstate+0x57/0xd0
  ? restore_fpregs_from_fpstate+0x45/0xd0
  switch_fpu_return+0x4e/0xe0
  exit_to_user_mode_prepare+0x17b/0x1b0
  syscall_exit_to_user_mode+0x29/0x40
  do_syscall_64+0x67/0x80
  ? do_syscall_64+0x67/0x80
  ? exc_page_fault+0x86/0x180
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

== Solution ==

Adjust init_fpstate to exclude dynamic states. XRSTORS from init_fpstate
still initializes those states when their bits are set in the
requested-feature bitmap.

Fixes: 2308ee57d9 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Lin X Wang <lin.x.wang@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lin X Wang <lin.x.wang@intel.com>
Link: https://lore.kernel.org/r/20220824191223.1248-4-chang.seok.bae@intel.com
2022-10-17 15:44:25 +02:00
Chang S. Bae
d3e021adac x86/fpu: Fix the init_fpstate size check with the actual size
The init_fpstate buffer is statically allocated. Thus, the sanity test was
established to check whether the pre-allocated buffer is enough for the
calculated size or not.

The currently measured size is not strictly relevant. Fix to validate the
calculated init_fpstate size with the pre-allocated area.

Also, replace the sanity check function with open code for clarity. The
abstraction itself and the function naming do not tend to represent simply
what it does.

Fixes: 2ae996e0c1 ("x86/fpu: Calculate the default sizes independently")
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220824191223.1248-3-chang.seok.bae@intel.com
2022-10-17 15:44:25 +02:00
Chang S. Bae
c32d7cab57 x86/fpu: Configure init_fpstate attributes orderly
The init_fpstate setup code is spread out and out of order. The init image
is recorded before its scoped features and the buffer size are determined.

Determine the scope of init_fpstate components and its size before
recording the init state. Also move the relevant code together.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: neelnatu@google.com
Link: https://lore.kernel.org/r/20220824191223.1248-2-chang.seok.bae@intel.com
2022-10-17 15:44:25 +02:00
Linus Torvalds
e36ae2290f - Add support for XSAVEC - the Compacted XSTATE saving variant - and
thus allow for guests to use this compacted XSTATE variant when the
 hypervisor exports that support
 
 - A variable shadowing cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmKLsPQACgkQEsHwGGHe
 VUoA7hAAoAP6qWntADHcDcA8QMjX9fvOi3uFjiJyGeiYCRH2rmwAAg8Y0DdI/1UE
 Wq+7tzTPdyDPulqaEe9PV7f3HRY72cGA/2jdkMxkGG5mGZfVganb0OWgFXecdo6r
 CIWf9vMOPwULIT4XvcnaWF6fv+1ZbFZOks9NpxZQZTYA3WQhozgfQOWlkoFFSdC/
 pIwWFCUOv/pBPWVSeizE/Y6Yfuaix3KiElwk9NMDTPCRhyBd6VmpkpcBer+n3JUA
 HoppbGLYonZEw1PkMmTlQJuFHKJzqwThGGoVY3FDtlAMD4+vmGt1vXNbLlfvtqup
 zYHAIG/hqql7Ai9bgXSC2ccYG9v1op+gIFzKTBhI7FkVwEc6R6JtV7uGF7GAr6SL
 KPnweo9GCoRmnc6Ju0+IuT0JIMXjO3iQIC0J3uLX8gCbsXVM29qdqhkYcLC75vOc
 sXjAUrdolkDIRXzwkJURTxWT/yeKaN9n8r1s7BCmZ7Pg6zZS3/K1nHQkFTWCjSfA
 oEy7GmEeI2uFgQX9qpF7NRlNj+D3AxV6W5IURCTI7GsP32e20jhOdU4AyrqsTy2N
 8PgUVP9baioUpjY6BKsMc3JiR0ihb0OM3wX9fThu8lu5uHE9Oar+S4OOlFtxPXth
 kG7pIS0MqB4N6aKWDFxvLvlUVgAxSqSmnWL4rQSP+Ralu9CY4k0=
 =eDaz
 -----END PGP SIGNATURE-----

Merge tag 'x86_fpu_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Borislav Petkov:

 - Add support for XSAVEC - the Compacted XSTATE saving variant - and
   thus allow for guests to use this compacted XSTATE variant when the
   hypervisor exports that support

 - A variable shadowing cleanup

* tag 'x86_fpu_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Cleanup variable shadowing
  x86/fpu/xsave: Support XSAVEC in the kernel
2022-05-23 18:49:16 -07:00
Thomas Gleixner
f5c0b4f304 x86/prctl: Remove pointless task argument
The functions invoked via do_arch_prctl_common() can only operate on
the current task and none of these function uses the task argument.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/87lev7vtxj.ffs@tglx
2022-05-13 12:56:28 +02:00
Thomas Gleixner
b91c0922bf x86/fpu: Cleanup variable shadowing
Addresses: warning: Local variable 'mask' shadows outer variable

Remove extra variable declaration and switch the bit mask assignment to use
BIT_ULL() while at it.

Fixes: 522e92743b ("x86/fpu: Deduplicate copy_uabi_from_user/kernel_to_xstate()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/202204262032.jFYKit5j-lkp@intel.com
2022-05-02 09:28:31 +02:00
Thomas Gleixner
8ad7e8f696 x86/fpu/xsave: Support XSAVEC in the kernel
XSAVEC is the user space counterpart of XSAVES which cannot save supervisor
state. In virtualization scenarios the hypervisor does not expose XSAVES
but XSAVEC to the guest, though the kernel does not make use of it.

That's unfortunate because XSAVEC uses the compacted format of saving the
XSTATE. This is more efficient in terms of storage space vs. XSAVE[OPT] as
it does not create holes for XSTATE components which are not supported or
enabled by the kernel but are available in hardware. There is room for
further optimizations when XSAVEC/S and XGETBV1 are supported.

In order to support XSAVEC:

 - Define the XSAVEC ASM macro as it's not yet supported by the required
   minimal toolchain.

 - Create a software defined X86_FEATURE_XCOMPACTED to select the compacted
   XSTATE buffer format for both XSAVEC and XSAVES.

 - Make XSAVEC an option in the 'XSAVE' ASM alternatives

Requested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220404104820.598704095@linutronix.de
2022-04-25 15:05:37 +02:00
Thomas Gleixner
d6d6d50f1e x86/fpu/xstate: Consolidate size calculations
Use the offset calculation to do the size calculation which avoids yet
another series of CPUID instructions for each invocation.

  [ Fix the FP/SSE only case which missed to take the xstate
    header into account, as
    Reported-by: kernel test robot <oliver.sang@intel.com> ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/87o81pgbp2.ffs@tglx
2022-03-30 12:04:09 +02:00
Thomas Gleixner
781c64bfcb x86/fpu/xstate: Handle supervisor states in XSTATE permissions
The size calculation in __xstate_request_perm() fails to take supervisor
states into account because the permission bitmap is only relevant for user
states.

Up to 5.17 this does not matter because there are no supervisor states
supported, but the (re-)enabling of ENQCMD makes them available.

Fixes: 7c1ef59145 ("x86/cpufeatures: Re-enable ENQCMD")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.681768598@linutronix.de
2022-03-30 11:35:22 +02:00
Thomas Gleixner
7aa5128b5f x86/fpu/xsave: Handle compacted offsets correctly with supervisor states
So far the cached fixed compacted offsets worked, but with (re-)enabling
of ENQCMD this does no longer work with KVM fpstate.

KVM does not have supervisor features enabled for the guest FPU, which
means that KVM has then a different XSAVE area layout than the host FPU
state. This in turn breaks the copy from/to UABI functions when invoked for
a guest state.

Remove the pre-calculated compacted offsets and calculate the offset
of each component at runtime based on the XCOMP_BV field in the XSAVE
header.

The runtime overhead is not interesting because these copy from/to UABI
functions are not used in critical fast paths. KVM uses them to save and
restore FPU state during migration. The host uses them for ptrace and for
the slow path of 32bit signal handling.

Fixes: 7c1ef59145 ("x86/cpufeatures: Re-enable ENQCMD")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.627636809@linutronix.de
2022-03-30 11:20:36 +02:00
Thomas Gleixner
6afbb58cc2 x86/fpu: Cache xfeature flags from CPUID
In preparation for runtime calculation of XSAVE offsets cache the feature
flags for each XSTATE component during feature enumeration via CPUID(0xD).

EDX has two relevant bits:
    0	Supervisor component
    1	Feature storage must be 64 byte aligned

These bits are currently only evaluated during init, but the alignment bit
must be cached to make runtime calculation of XSAVE offsets efficient.

Cache the full EDX content and use it for the existing alignment and
supervisor checks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.573656209@linutronix.de
2022-03-30 11:05:32 +02:00
Thomas Gleixner
35a77d4503 x86/fpu/xsave: Initialize offset/size cache early
Reading XSTATE feature information from CPUID over and over does not make
sense. The information has to be cached anyway, so it can be done early.

Prepare for runtime calculation of XSTATE offsets and allow
consolidation of the size calculation functions in a later step.

Rename the function while at it as it does not setup any features.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.519411939@linutronix.de
2022-03-30 10:55:44 +02:00
Thomas Gleixner
d47f71f6de x86/fpu: Remove unused supervisor only offsets
No users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.465066249@linutronix.de
2022-03-30 10:44:51 +02:00
Yang Zhong
063452fd94 x86/fpu/xstate: Fix the ARCH_REQ_XCOMP_PERM implementation
ARCH_REQ_XCOMP_PERM is supposed to add the requested feature to the
permission bitmap of thread_group_leader()->fpu. But the code overwrites
the bitmap with the requested feature bit only rather than adding it.

Fix the code to add the requested feature bit to the master bitmask.

Fixes: db8268df09 ("x86/arch_prctl: Add controls for dynamic XSTATE components")
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <bonzini@gnu.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220129173647.27981-2-chang.seok.bae@intel.com
2022-03-23 21:28:34 +01:00
Leonardo Bras
ad856280dd x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel
swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate().

When xsave feature is available, the fpu swap is done by:
- xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used
  to store the current state of the fpu registers to a buffer.
- xrstor(s) instruction, with (fpu_kernel_cfg.max_features &
  XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs.

For xsave(s) the mask is used to limit what parts of the fpu regs will
be copied to the buffer. Likewise on xrstor(s), the mask is used to
limit what parts of the fpu regs will be changed.

The mask for xsave(s), the guest's fpstate->xfeatures, is defined on
kvm_arch_vcpu_create(), which (in summary) sets it to all features
supported by the cpu which are enabled on kernel config.

This means that xsave(s) will save to guest buffer all the fpu regs
contents the cpu has enabled when the guest is paused, even if they
are not used.

This would not be an issue, if xrstor(s) would also do that.

xrstor(s)'s mask for host/guest swap is basically every valid feature
contained in kernel config, except XFEATURE_MASK_PKRU.
Accordingto kernel src, it is instead switched in switch_to() and
flush_thread().

Then, the following happens with a host supporting PKRU starts a
guest that does not support it:
1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest,
2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU)
3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU)
4 - guest runs, then switch back to host,
5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU)
6 - xrstor(s) host fpstate to fpu regs.
7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with
    XFEATURE_MASK_PKRU, which should not be supported by guest vcpu)

On 5, even though the guest does not support PKRU, it does have the flag
set on guest fpstate, which is transferred to userspace via vcpu ioctl
KVM_GET_XSAVE.

This becomes a problem when the user decides on migrating the above guest
to another machine that does not support PKRU: the new host restores
guest's fpu regs to as they were before (xrstor(s)), but since the new
host don't support PKRU, a general-protection exception ocurs in xrstor(s)
and that crashes the guest.

This can be solved by making the guest's fpstate->user_xfeatures hold
a copy of guest_supported_xcr0. This way, on 7 the only flags copied to
userspace will be the ones compatible to guest requirements, and thus
there will be no issue during migration.

As a bonus, it will also fail if userspace tries to set fpu features
(with the KVM_SET_XSAVE ioctl) that are not compatible to the guest
configuration.  Such features will never be returned by KVM_GET_XSAVE
or KVM_GET_XSAVE2.

Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures,
there is not need to set it in kvm_check_cpuid(). So, change
fpstate_realloc() so it does not touch fpstate->user_xfeatures if a
non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid()
calls it.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220217053028.96432-2-leobras@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-17 10:05:57 -05:00
Yang Zhong
c862dcd199 x86/fpu: Fix inline prefix warnings
Fix sparse warnings in xstate and remove inline prefix.

Fixes: 980fe2fddc ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions")
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220113180825.322333-1-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:48:38 -05:00
Thomas Gleixner
c60427dd50 x86/fpu: Add uabi_size to guest_fpu
Userspace needs to inquire KVM about the buffer size to work
with the new KVM_SET_XSAVE and KVM_GET_XSAVE2. Add the size info
to guest_fpu for KVM to access.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-18-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:44:40 -05:00
Thomas Gleixner
c270ce393d x86/fpu: Add guest support to xfd_enable_feature()
Guest support for dynamically enabled FPU features requires a few
modifications to the enablement function which is currently invoked from
the #NM handler:

  1) Use guest permissions and sizes for the update

  2) Update fpu_guest state accordingly

  3) Take into account that the enabling can be triggered either from a
     running guest via XSETBV and MSR_IA32_XFD write emulation or from
     a guest restore. In the latter case the guests fpstate is not the
     current tasks active fpstate.

Split the function and implement the guest mechanics throughout the
callchain.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-7-yang.zhong@intel.com>
[Add 32-bit stub for __xfd_enable_feature. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:43:11 -05:00
Thomas Gleixner
980fe2fddc x86/fpu: Extend fpu_xstate_prctl() with guest permissions
KVM requires a clear separation of host user space and guest permissions
for dynamic XSTATE components.

Add a guest permissions member to struct fpu and a separate set of prctl()
arguments: ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM.

The semantics are equivalent to the host user space permission control
except for the following constraints:

  1) Permissions have to be requested before the first vCPU is created

  2) Permissions are frozen when the first vCPU is created to ensure
     consistency. Any attempt to expand permissions via the prctl() after
     that point is rejected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07 13:33:03 -05:00
Chang S. Bae
2308ee57d9 x86/fpu/amx: Enable the AMX feature in 64-bit mode
Add the AMX state components in XFEATURE_MASK_USER_SUPPORTED and the
TILE_DATA component to the dynamic states and update the permission check
table accordingly.

This is only effective on 64 bit kernels as for 32bit kernels
XFEATURE_MASK_TILE is defined as 0.

TILE_DATA is caller-saved state and the only dynamic state. Add build time
sanity check to ensure the assumption that every dynamic feature is caller-
saved.

Make AMX state depend on XFD as it is dynamic feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-24-chang.seok.bae@intel.com
2021-10-26 10:53:03 +02:00
Chang S. Bae
db3e7321b4 x86/fpu: Add XFD handling for dynamic states
To handle the dynamic sizing of buffers on first use the XFD MSR has to be
armed. Store the delta between the maximum available and the default
feature bits in init_fpstate where it can be retrieved for task creation.

If the delta is non zero then dynamic features are enabled. This needs also
to enable the static key which guards the XFD updates. This is delayed to
an initcall because the FPU setup runs before jump labels are initialized.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-23-chang.seok.bae@intel.com
2021-10-26 10:53:03 +02:00
Chang S. Bae
2ae996e0c1 x86/fpu: Calculate the default sizes independently
When dynamically enabled states are supported the maximum and default sizes
for the kernel buffers and user space interfaces are not longer identical.

Put the necessary calculations in place which only take the default enabled
features into account.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-22-chang.seok.bae@intel.com
2021-10-26 10:53:02 +02:00
Chang S. Bae
eec2113eab x86/fpu/amx: Define AMX state components and have it used for boot-time checks
The XSTATE initialization uses check_xstate_against_struct() to sanity
check the size of XSTATE-enabled features. AMX is a XSAVE-enabled feature,
and its size is not hard-coded but discoverable at run-time via CPUID.

The AMX state is composed of state components 17 and 18, which are all user
state components. The first component is the XTILECFG state of a 64-byte
tile-related control register. The state component 18, called XTILEDATA,
contains the actual tile data, and the state size varies on
implementations. The architectural maximum, as defined in the CPUID(0x1d,
1): EAX[15:0], is a byte less than 64KB. The first implementation supports
8KB.

Check the XTILEDATA state size dynamically. The feature introduces the new
tile register, TMM. Define one register struct only and read the number of
registers from CPUID. Cross-check the overall size with CPUID again.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-21-chang.seok.bae@intel.com
2021-10-26 10:53:02 +02:00
Chang S. Bae
70c3f1671b x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers
The kernel checks at boot time which features are available by walking a
XSAVE feature table which contains the CPUID feature bit numbers which need
to be checked whether a feature is available on a CPU or not. So far the
feature numbers have been linear, but AMX will create a gap which the
current code cannot handle.

Make the table entries explicitly indexed and adjust the loop code
accordingly to prepare for that.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-20-chang.seok.bae@intel.com
2021-10-26 10:53:02 +02:00
Chang S. Bae
500afbf645 x86/fpu/xstate: Add fpstate_realloc()/free()
The fpstate embedded in struct fpu is the default state for storing the FPU
registers. It's sized so that the default supported features can be stored.
For dynamically enabled features the register buffer is too small.

The #NM handler detects first use of a feature which is disabled in the
XFD MSR. After handling permission checks it recalculates the size for
kernel space and user space state and invokes fpstate_realloc() which
tries to reallocate fpstate and install it.

Provide the allocator function which checks whether the current buffer size
is sufficient and if not allocates one. If allocation is successful the new
fpstate is initialized with the new features and sizes and the now enabled
features is removed from the task's XFD mask.

realloc_fpstate() uses vzalloc(). If use of this mechanism grows to
re-allocate buffers larger than 64KB, a more sophisticated allocation
scheme that includes purpose-built reclaim capability might be justified.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-19-chang.seok.bae@intel.com
2021-10-26 10:53:02 +02:00
Chang S. Bae
783e87b404 x86/fpu/xstate: Add XFD #NM handler
If the XFD MSR has feature bits set then #NM will be raised when user space
attempts to use an instruction related to one of these features.

When the task has no permissions to use that feature, raise SIGILL, which
is the same behavior as #UD.

If the task has permissions, calculate the new buffer size for the extended
feature set and allocate a larger fpstate. In the unlikely case that
vzalloc() fails, SIGSEGV is raised.

The allocation function will be added in the next step. Provide a stub
which fails for now.

  [ tglx: Updated serialization ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-18-chang.seok.bae@intel.com
2021-10-26 10:53:02 +02:00
Chang S. Bae
672365477a x86/fpu: Update XFD state where required
The IA32_XFD_MSR allows to arm #NM traps for XSTATE components which are
enabled in XCR0. The register has to be restored before the tasks XSTATE is
restored. The life time rules are the same as for FPU state.

XFD is updated on return to userspace only when the FPU state of the task
is not up to date in the registers. It's updated before the XRSTORS so
that eventually enabled dynamic features are restored as well and not
brought into init state.

Also in signal handling for restoring FPU state from user space the
correctness of the XFD state has to be ensured.

Add it to CPU initialization and resume as well.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-17-chang.seok.bae@intel.com
2021-10-26 10:53:02 +02:00
Thomas Gleixner
5529acf47e x86/fpu: Add sanity checks for XFD
Add debug functionality to ensure that the XFD MSR is up to date for XSAVE*
and XRSTOR* operations.

 [ tglx: Improve comment. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-16-chang.seok.bae@intel.com
2021-10-26 10:52:35 +02:00
Chang S. Bae
53599b4d54 x86/fpu/signal: Prepare for variable sigframe length
The software reserved portion of the fxsave frame in the signal frame
is copied from structures which have been set up at boot time. With
dynamically enabled features the content of these structures is no
longer correct because the xfeatures and size can be different per task.

Calculate the software reserved portion at runtime and fill in the
xfeatures and size values from the tasks active fpstate.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-10-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Chang S. Bae
db8268df09 x86/arch_prctl: Add controls for dynamic XSTATE components
Dynamically enabled XSTATE features are by default disabled for all
processes. A process has to request permission to use such a feature.

To support this implement a architecture specific prctl() with the options:

   - ARCH_GET_XCOMP_SUPP

     Copies the supported feature bitmap into the user space provided
     u64 storage. The pointer is handed in via arg2

   - ARCH_GET_XCOMP_PERM

     Copies the process wide permitted feature bitmap into the user space
     provided u64 storage. The pointer is handed in via arg2

   - ARCH_REQ_XCOMP_PERM

     Request permission for a feature set. A feature set can be mapped to a
     facility, e.g. AMX, and can require one or more XSTATE components to
     be enabled.

     The feature argument is the number of the highest XSTATE component
     which is required for a facility to work.

     The request argument is not a user supplied bitmap because that makes
     filtering harder (think seccomp) and even impossible because to
     support 32bit tasks the argument would have to be a pointer.

The permission mechanism works this way:

   Task asks for permission for a facility and kernel checks whether that's
   supported. If supported it does:

     1) Check whether permission has already been granted

     2) Compute the size of the required kernel and user space buffer
        (sigframe) size.

     3) Validate that no task has a sigaltstack installed
        which is smaller than the resulting sigframe size

     4) Add the requested feature bit(s) to the permission bitmap of
        current->group_leader->fpu and store the sizes in the group
        leaders fpu struct as well.

If that is successful then the feature is still not enabled for any of the
tasks. The first usage of a related instruction will result in a #NM
trap. The trap handler validates the permission bit of the tasks group
leader and if permitted it installs a larger kernel buffer and transfers
the permission and size info to the new fpstate container which makes all
the FPU functions which require per task information aware of the extended
feature set.

  [ tglx: Adopted to new base code, added missing serialization,
          massaged namings, comments and changelog ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-7-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Chang S. Bae
84e4dccc8f x86/fpu/xstate: Provide xstate_calculate_size()
Split out the size calculation from the paranoia check so it can be used
for recalculating buffer sizes when dynamically enabled features are
supported.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
[ tglx: Adopted to changed base code ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-4-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Thomas Gleixner
eda32f4f93 x86/fpu: Rework restore_regs_from_fpstate()
xfeatures_mask_fpstate() is no longer valid when dynamically enabled
features come into play.

Rework restore_regs_from_fpstate() so it takes a constant mask which will
then be applied against the maximum feature set so that the restore
operation brings all features which are not in the xsave buffer xfeature
bitmap into init state.

This ensures that if the previous task used a dynamically enabled feature
that the task which restores has all unused components properly initialized.

Cleanup the last user of xfeatures_mask_fpstate() as well and remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211014230739.461348278@linutronix.de
2021-10-22 11:09:15 +02:00
Thomas Gleixner
daddee2473 x86/fpu: Mop up xfeatures_mask_uabi()
Use the new fpu_user_cfg to retrieve the information instead of
xfeatures_mask_uabi() which will be no longer correct when dynamically
enabled features become available.

Using fpu_user_cfg is appropriate when setting XCOMP_BV in the
init_fpstate since it has space allocated for "max_features". But,
normal fpstates might only have space for default xfeatures. Since
XRSTOR* derives the format of the XSAVE buffer from XCOMP_BV, this can
lead to XRSTOR reading out of bounds.

So when copying actively used fpstate, simply read the XCOMP_BV features
bits directly out of the fpstate instead.

This correction courtesy of Dave Hansen <dave.hansen@linux.intel.com>

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211014230739.408879849@linutronix.de
2021-10-22 11:04:46 +02:00
Thomas Gleixner
1c253ff228 x86/fpu: Move xstate feature masks to fpu_*_cfg
Move the feature mask storage to the kernel and user config
structs. Default and maximum feature set are the same for now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211014230739.352041752@linutronix.de
2021-10-21 20:36:58 +02:00
Thomas Gleixner
2bd264bce2 x86/fpu: Move xstate size to fpu_*_cfg
Use the new kernel and user space config storage to store and retrieve the
XSTATE buffer sizes. The default and the maximum size are the same for now,
but will change when support for dynamically enabled features is added.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211014230739.296830097@linutronix.de
2021-10-21 19:38:55 +02:00
Thomas Gleixner
cd9ae76174 x86/fpu/xstate: Cleanup size calculations
The size calculations are partially unreadable gunk. Clean them up.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211014230739.241223689@linutronix.de
2021-10-21 19:37:53 +02:00
Thomas Gleixner
49e4eb4125 x86/fpu/xstate: Use fpstate for copy_uabi_to_xstate()
Prepare for dynamically enabled states per task. The function needs to
retrieve the features and sizes which are valid in a fpstate
context. Retrieve them from fpstate.

Move the function declarations to the core header as they are not
required anywhere else.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145323.233529986@linutronix.de
2021-10-21 14:24:14 +02:00
Thomas Gleixner
3ac8d75778 x86/fpu: Use fpstate in __copy_xstate_to_uabi_buf()
With dynamically enabled features the copy function must know the features
and the size which is valid for the task. Retrieve them from fpstate.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145323.181495492@linutronix.de
2021-10-21 14:18:23 +02:00
Thomas Gleixner
248452ce21 x86/fpu: Add size and mask information to fpstate
Add state size and feature mask information to the fpstate container. This
will be used for runtime checks with the upcoming support for dynamically
enabled features and dynamically sized buffers. That avoids conditionals
all over the place as the required information is accessible for both
default and extended buffers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145322.921388806@linutronix.de
2021-10-21 13:51:42 +02:00
Thomas Gleixner
c20942ce51 x86/fpu/core: Convert to fpstate
Convert the rest of the core code to the new register storage mechanism in
preparation for dynamically sized buffers.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145322.659456185@linutronix.de
2021-10-20 23:54:26 +02:00
Thomas Gleixner
087df48c29 x86/fpu: Replace KVMs xstate component clearing
In order to prepare for the support of dynamically enabled FPU features,
move the clearing of xstate components to the FPU core code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211013145322.399567049@linutronix.de
2021-10-20 22:26:41 +02:00
Thomas Gleixner
f83ac56acd x86/fpu: Convert fpstate_init() to struct fpstate
Convert fpstate_init() and related code to the new register storage
mechanism in preparation for dynamically sized buffers.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145322.292157401@linutronix.de
2021-10-20 22:26:34 +02:00
Thomas Gleixner
b56d2795b2 x86/fpu: Replace the includes of fpu/internal.h
Now that the file is empty, fixup all references with the proper includes
and delete the former kitchen sink.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011540.001197214@linutronix.de
2021-10-20 15:27:29 +02:00
Thomas Gleixner
d9d005f32a x86/fpu: Move mxcsr related code to core
No need to expose that to code which only needs the XCR0 accessors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.740012411@linutronix.de
2021-10-20 15:27:28 +02:00
Thomas Gleixner
34002571cb x86/fpu: Move legacy ASM wrappers to core
Nothing outside the core code requires them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.572439164@linutronix.de
2021-10-20 15:27:28 +02:00
Thomas Gleixner
b579d0c375 x86/fpu: Make os_xrstor_booting() private
It's only required in the xstate init code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.455836597@linutronix.de
2021-10-20 15:27:28 +02:00
Thomas Gleixner
9603445549 x86/fpu: Mark fpu__init_prepare_fx_sw_frame() as __init
No need to keep it around.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.296435736@linutronix.de
2021-10-20 15:27:27 +02:00
Thomas Gleixner
ca834defd3 x86/fpu: Rework copy_xstate_to_uabi_buf()
Prepare for replacing the KVM copy xstate to user function by extending
copy_xstate_to_uabi_buf() with a pkru argument which allows the caller to
hand in the pkru value, which is required for KVM because the guest PKRU is
not accessible via current. Fixup all callsites accordingly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.191902137@linutronix.de
2021-10-20 15:27:27 +02:00
Thomas Gleixner
ea4d6938d4 x86/fpu: Replace KVMs home brewed FPU copy from user
Copying a user space buffer to the memory buffer is already available in
the FPU core. The copy mechanism in KVM lacks sanity checks and needs to
use cpuid() to lookup the offset of each component, while the FPU core has
this information cached.

Make the FPU core variant accessible for KVM and replace the home brewed
mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211015011539.134065207@linutronix.de
2021-10-20 15:27:27 +02:00
Thomas Gleixner
a0ff0611c2 x86/fpu: Move KVMs FPU swapping to FPU core
Swapping the host/guest FPU is directly fiddling with FPU internals which
requires 5 exports. The upcoming support of dynamically enabled states
would even need more.

Implement a swap function in the FPU core code and export that instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211015011539.076072399@linutronix.de
2021-10-20 15:27:27 +02:00
Thomas Gleixner
63cf05a19a x86/fpu/xstate: Mark all init only functions __init
No point to keep them around after boot.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.017919252@linutronix.de
2021-10-20 15:27:27 +02:00
Thomas Gleixner
ffd3e504c9 x86/fpu/xstate: Provide and use for_each_xfeature()
These loops evaluating xfeature bits are really hard to read. Create an
iterator and use for_each_set_bit_from() inside which already does the right
thing.

No functional changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011538.958107505@linutronix.de
2021-10-20 15:27:26 +02:00
Thomas Gleixner
126fe04018 x86/fpu: Cleanup xstate xcomp_bv initialization
No point in having this duplicated all over the place with needlessly
different defines.

Provide a proper initialization function which initializes user buffers
properly and make KVM use it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011538.897664678@linutronix.de
2021-10-20 15:27:26 +02:00
Thomas Gleixner
dc2f39fd1b x86/fpu: Cleanup the on_boot_cpu clutter
Defensive programming is useful, but this on_boot_cpu debug is really
silly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011538.665080855@linutronix.de
2021-10-20 15:27:26 +02:00
Thomas Gleixner
f5daf836f2 x86/fpu: Restrict xsaves()/xrstors() to independent states
These interfaces are really only valid for features which are independently
managed and not part of the task context state for various reasons.

Tighten the checks and adjust the misleading comments.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011538.608492174@linutronix.de
2021-10-20 15:27:26 +02:00
Thomas Gleixner
93c2cdc975 x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again
The change which made copy_xstate_to_uabi_buf() usable for
[x]fpregs_get() removed the zeroing of the header which means the
header, which is copied to user space later, contains except for the
xfeatures member, random stack content.

Add the memset() back to zero it before usage.

Fixes: eb6f51723f ("x86/fpu: Make copy_xstate_to_kernel() usable for [x]fpregs_get()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/875yy3wb8h.ffs@nanos.tec.linutronix.de
2021-06-24 17:19:51 +02:00
Thomas Gleixner
30a304a138 x86/fpu: Mask PKRU from kernel XRSTOR[S] operations
As the PKRU state is managed separately restoring it from the xstate
buffer would be counterproductive as it might either restore a stale
value or reinit the PKRU state to 0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121456.606745195@linutronix.de
2021-06-23 19:47:35 +02:00
Dave Hansen
e84ba47e31 x86/fpu: Hook up PKRU into ptrace()
One nice thing about having PKRU be XSAVE-managed is that it gets naturally
exposed into the XSAVE-using ABIs.  Now that XSAVE will not be used to
manage PKRU, these ABIs need to be manually enabled to deal with PKRU.

ptrace() uses copy_uabi_xstate_to_kernel() to collect the tracee's
XSTATE. As PKRU is not in the task's XSTATE buffer, use task->thread.pkru
for filling in up the ptrace buffer.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121456.508770763@linutronix.de
2021-06-23 19:44:24 +02:00
Thomas Gleixner
65e9521021 x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi()
Rename it so it's clear that this is about user ABI features which can
differ from the feature set which the kernel saves and restores because the
kernel handles e.g. PKRU differently. But the user ABI (ptrace, signal
frame) expects it to be there.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121456.211585137@linutronix.de
2021-06-23 19:29:52 +02:00
Thomas Gleixner
8a1dc55a3f x86/cpu: Sanitize X86_FEATURE_OSPKE
X86_FEATURE_OSPKE is enabled first on the boot CPU and the feature flag is
set. Secondary CPUs have to enable CR4.PKE as well and set their per CPU
feature flag. That's ineffective because all call sites have checks for
boot_cpu_data.

Make it smarter and force the feature flag when PKU is enabled on the boot
cpu which allows then to use cpu_feature_enabled(X86_FEATURE_OSPKE) all
over the place. That either compiles the code out when PKEY support is
disabled in Kconfig or uses a static_cpu_has() for the feature check which
makes a significant difference in hotpaths, e.g. context switch.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121455.305113644@linutronix.de
2021-06-23 18:59:44 +02:00
Thomas Gleixner
a75c52896b x86/fpu/xstate: Sanitize handling of independent features
The copy functions for the independent features are horribly named and the
supervisor and independent part is just overengineered.

The point is that the supplied mask has either to be a subset of the
independent features or a subset of the task->fpu.xstate managed features.

Rewrite it so it checks for invalid overlaps of these areas in the caller
supplied feature mask. Rename it so it follows the new naming convention
for these operations. Mop up the function documentation.

This allows to use that function for other purposes as well.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/20210623121455.004880675@linutronix.de
2021-06-23 18:46:20 +02:00
Andy Lutomirski
01707b6653 x86/fpu: Rename "dynamic" XSTATEs to "independent"
The salient feature of "dynamic" XSTATEs is that they are not part of the
main task XSTATE buffer.  The fact that they are dynamically allocated is
irrelevant and will become quite confusing when user math XSTATEs start
being dynamically allocated.  Rename them to "independent" because they
are independent of the main XSTATE code.

This is just a search-and-replace with some whitespace updates to keep
things aligned.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/1eecb0e4f3e07828ebe5d737ec77dc3b708fad2d.1623388344.git.luto@kernel.org
Link: https://lkml.kernel.org/r/20210623121454.911450390@linutronix.de
2021-06-23 18:42:11 +02:00
Thomas Gleixner
522e92743b x86/fpu: Deduplicate copy_uabi_from_user/kernel_to_xstate()
copy_uabi_from_user_to_xstate() and copy_uabi_from_kernel_to_xstate() are
almost identical except for the copy function.

Unify them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20210623121454.414215896@linutronix.de
2021-06-23 18:26:00 +02:00
Thomas Gleixner
1cc34413ff x86/fpu: Rename xstate copy functions which are related to UABI
Rename them to reflect that these functions deal with user space format
XSAVE buffers.

      copy_kernel_to_xstate() -> copy_uabi_from_kernel_to_xstate()
      copy_user_to_xstate()   -> copy_sigframe_from_user_to_xstate()

Again a clear statement that these functions deal with user space ABI.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121454.318485015@linutronix.de
2021-06-23 18:23:14 +02:00
Thomas Gleixner
b16313f71c x86/fpu: Rename copy_xregs_to_kernel() and copy_kernel_to_xregs()
The function names for xsave[s]/xrstor[s] operations are horribly named and
a permanent source of confusion.

Rename:
	copy_xregs_to_kernel() to os_xsave()
	copy_kernel_to_xregs() to os_xrstor()

These are truly low level wrappers around the actual instructions
XSAVE[OPT]/XRSTOR and XSAVES/XRSTORS with the twist that the selection
based on the available CPU features happens with an alternative to avoid
conditionals all over the place and to provide the best performance for hot
paths.

The os_ prefix tells that this is the OS selected mechanism.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121453.830239347@linutronix.de
2021-06-23 17:57:57 +02:00
Thomas Gleixner
1f3171252d x86/fpu: Get rid of copy_supervisor_to_kernel()
If the fast path of restoring the FPU state on sigreturn fails or is not
taken and the current task's FPU is active then the FPU has to be
deactivated for the slow path to allow a safe update of the tasks FPU
memory state.

With supervisor states enabled, this requires to save the supervisor state
in the memory state first. Supervisor states require XSAVES so saving only
the supervisor state requires to reshuffle the memory buffer because XSAVES
uses the compacted format and therefore stores the supervisor states at the
beginning of the memory state. That's just an overengineered optimization.

Get rid of it and save the full state for this case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121453.734561971@linutronix.de
2021-06-23 17:53:31 +02:00
Thomas Gleixner
9fe8a6f5ee x86/fpu: Cleanup arch_set_user_pkey_access()
The function does a sanity check with a WARN_ON_ONCE() but happily proceeds
when the pkey argument is out of range.

Clean it up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121453.635764326@linutronix.de
2021-06-23 17:52:41 +02:00
Thomas Gleixner
02b93c0b00 x86/fpu: Get rid of using_compacted_format()
This function is pointlessly global and a complete misnomer because it's
usage is related to both supervisor state checks and compacted format
checks. Remove it and just make the conditions check the XSAVES feature.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121453.425493349@linutronix.de
2021-06-23 17:49:47 +02:00
Thomas Gleixner
afac9e8943 x86/fpu: Remove fpstate_sanitize_xstate()
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121453.124819167@linutronix.de
2021-06-23 17:49:47 +02:00
Thomas Gleixner
eb6f51723f x86/fpu: Make copy_xstate_to_kernel() usable for [x]fpregs_get()
When xsave with init state optimization is used then a component's state
in the task's xsave buffer can be stale when the corresponding feature bit
is not set.

fpregs_get() and xfpregs_get() invoke fpstate_sanitize_xstate() to update
the task's xsave buffer before retrieving the FX or FP state. That's just
duplicated code as copy_xstate_to_kernel() already handles this correctly.

Add a copy mode argument to the function which allows to restrict the state
copy to the FP and SSE features.

Also rename the function to copy_xstate_to_uabi_buf() so the name reflects
what it is doing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121452.805327286@linutronix.de
2021-06-23 17:49:47 +02:00
Dave Hansen
3a3351126e x86/fpu: Simplify PTRACE_GETREGS code
ptrace() has interfaces that let a ptracer inspect a ptracee's register state.
This includes XSAVE state.  The ptrace() ABI includes a hardware-format XSAVE
buffer for both the SETREGS and GETREGS interfaces.

In the old days, the kernel buffer and the ptrace() ABI buffer were the
same boring non-compacted format.  But, since the advent of supervisor
states and the compacted format, the kernel buffer has diverged from the
format presented in the ABI.

This leads to two paths in the kernel:
1. Effectively a verbatim copy_to_user() which just copies the kernel buffer
   out to userspace.  This is used when the kernel buffer is kept in the
   non-compacted form which means that it shares a format with the ptrace
   ABI.
2. A one-state-at-a-time path: copy_xstate_to_kernel().  This is theoretically
   slower since it does a bunch of piecemeal copies.

Remove the verbatim copy case.  Speed probably does not matter in this path,
and the vast majority of new hardware will use the one-state-at-a-time path
anyway.  This ensures greater testing for the "slow" path.

This also makes enabling PKRU in this interface easier since a single path
can be patched instead of two.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121452.408457100@linutronix.de
2021-06-23 17:49:46 +02:00
Thomas Gleixner
947f4947cf x86/fpu: Reject invalid MXCSR values in copy_kernel_to_xstate()
Instead of masking out reserved bits, check them and reject the provided
state as invalid if not zero.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121452.308388343@linutronix.de
2021-06-23 17:49:46 +02:00
Thomas Gleixner
43be46e896 x86/fpu: Sanitize xstateregs_set()
xstateregs_set() operates on a stopped task and tries to copy the provided
buffer into the task's fpu.state.xsave buffer.

Any error while copying or invalid state detected after copying results in
wiping the target task's FPU state completely including supervisor states.

That's just wrong. The caller supplied invalid data or has a problem with
unmapped memory, so there is absolutely no justification to corrupt the
target state.

Fix this with the following modifications:

 1) If data has to be copied from userspace, allocate a buffer and copy from
    user first.

 2) Use copy_kernel_to_xstate() unconditionally so that header checking
    works correctly.

 3) Return on error without corrupting the target state.

This prevents corrupting states and lets the caller deal with the problem
it caused in the first place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121452.214903673@linutronix.de
2021-06-23 17:49:46 +02:00
Thomas Gleixner
4098b3eef3 x86/fpu: Remove unused get_xsave_field_ptr()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121451.915614415@linutronix.de
2021-06-23 17:49:46 +02:00
Thomas Gleixner
ce38f038ed x86/fpu: Get rid of fpu__get_supported_xfeatures_mask()
This function is really not doing what the comment advertises:

 "Find supported xfeatures based on cpu features and command-line input.
  This must be called after fpu__init_parse_early_param() is called and
  xfeatures_mask is enumerated."

fpu__init_parse_early_param() does not exist anymore and the function just
returns a constant.

Remove it and fix the caller and get rid of further references to
fpu__init_parse_early_param().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121451.816404717@linutronix.de
2021-06-23 17:49:46 +02:00
Thomas Gleixner
4e8e4313cf x86/fpu: Make xfeatures_mask_all __ro_after_init
Nothing has to modify this after init.

But of course there is code which unconditionally masks
xfeatures_mask_all on CPU hotplug. This goes unnoticed during boot
hotplug because at that point the variable is still RW mapped.

This is broken in several ways:

  1) Masking this in post init CPU hotplug means that any
     modification of this state goes unnoticed until actual hotplug
     happens.

  2) If that ever happens then these bogus feature bits are already
     populated all over the place and the system is in inconsistent state
     vs. the compacted XSTATE offsets. If at all then this has to panic the
     machine because the inconsistency cannot be undone anymore.

Make this a one-time paranoia check in xstate init code and disable
xsave when this happens.

Reported-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121451.712803952@linutronix.de
2021-06-23 17:49:45 +02:00
Thomas Gleixner
ce578f1634 x86/fpu: Mark various FPU state variables __ro_after_init
Nothing modifies these after booting.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20210623121451.611751529@linutronix.de
2021-06-23 17:49:45 +02:00
Thomas Gleixner
9625895011 x86/fpu: Fix copy_xstate_to_kernel() gap handling
The gap handling in copy_xstate_to_kernel() is wrong when XSAVES is in
use.

Using init_fpstate for copying the init state of features which are
not set in the xstate header is only correct for the legacy area, but
not for the extended features area because when XSAVES is in use then
init_fpstate is in compacted form which means the xstate offsets which
are used to copy from init_fpstate are not valid.

Fortunately, this is not a real problem today because all extended
features in use have an all-zeros init state, but it is wrong
nevertheless and with a potentially dynamically sized init_fpstate this
would result in an access outside of the init_fpstate.

Fix this by keeping track of the last copied state in the target buffer and
explicitly zero it when there is a feature or alignment gap.

Use the compacted offset when accessing the extended feature space in
init_fpstate.

As this is not a functional issue on older kernels this is intentionally
not tagged for stable.

Fixes: b8be15d588 ("x86/fpu/xstate: Re-enable XSAVES")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121451.294282032@linutronix.de
2021-06-23 17:49:45 +02:00
Borislav Petkov
c4cf5f6198 Merge x86/urgent into x86/fpu
Pick up dependent changes which either went mainline (x86/urgent is
based on -rc7 and that contains them) as urgent fixes and the current
x86/urgent branch which contains two more urgent fixes, so that the
bigger FPU rework can base off ontop.

Signed-off-by: Borislav Petkov <bp@suse.de>
2021-06-23 17:43:38 +02:00
Thomas Gleixner
f9dfb5e390 x86/fpu: Make init_fpstate correct with optimized XSAVE
The XSAVE init code initializes all enabled and supported components with
XRSTOR(S) to init state. Then it XSAVEs the state of the components back
into init_fpstate which is used in several places to fill in the init state
of components.

This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because
those use the init optimization and skip writing state of components which
are in init state. So init_fpstate.xsave still contains all zeroes after
this operation.

There are two ways to solve that:

   1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when
      XSAVES is enabled because XSAVES uses compacted format.

   2) Save the components which are known to have a non-zero init state by other
      means.

Looking deeper, #2 is the right thing to do because all components the
kernel supports have all-zeroes init state except the legacy features (FP,
SSE). Those cannot be hard coded because the states are not identical on all
CPUs, but they can be saved with FXSAVE which avoids all conditionals.

Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with
a BUILD_BUG_ON() which reminds developers to validate that a newly added
component has all zeroes init state. As a bonus remove the now unused
copy_xregs_to_kernel_booting() crutch.

The XSAVE and reshuffle method can still be implemented in the unlikely
case that components are added which have a non-zero init state and no
other means to save them. For now, FXSAVE is just simple and good enough.

  [ bp: Fix a typo or two in the text. ]

Fixes: 6bad06b768 ("x86, xsave: Use xsaveopt in context-switch path when supported")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210618143444.587311343@linutronix.de
2021-06-22 11:06:21 +02:00
Andy Lutomirski
f72a249b0b x86/fpu: Add address range checks to copy_user_to_xstate()
copy_user_to_xstate() uses __copy_from_user(), which provides a negligible
speedup.  Fortunately, both call sites are at least almost correct.

__fpu__restore_sig() checks access_ok() with xstate_sigframe_size()
length and ptrace regset access uses fpu_user_xstate_size. These should
be valid upper bounds on the length, so, at worst, this would cause
spurious failures and not accesses to kernel memory.

Nonetheless, this is far more fragile than necessary and none of these
callers are in a hotpath.

Use copy_from_user() instead.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Link: https://lkml.kernel.org/r/20210608144346.140254130@linutronix.de
2021-06-09 14:46:20 +02:00
Thomas Gleixner
9bfecd0583 x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
While digesting the XSAVE-related horrors which got introduced with
the supervisor/user split, the recent addition of ENQCMD-related
functionality got on the radar and turned out to be similarly broken.

update_pasid(), which is only required when X86_FEATURE_ENQCMD is
available, is invoked from two places:

 1) From switch_to() for the incoming task

 2) Via a SMP function call from the IOMMU/SMV code

#1 is half-ways correct as it hacks around the brokenness of get_xsave_addr()
   by enforcing the state to be 'present', but all the conditionals in that
   code are completely pointless for that.

   Also the invocation is just useless overhead because at that point
   it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task
   and all of this can be handled at return to user space.

#2 is broken beyond repair. The comment in the code claims that it is safe
   to invoke this in an IPI, but that's just wishful thinking.

   FPU state of a running task is protected by fregs_lock() which is
   nothing else than a local_bh_disable(). As BH-disabled regions run
   usually with interrupts enabled the IPI can hit a code section which
   modifies FPU state and there is absolutely no guarantee that any of the
   assumptions which are made for the IPI case is true.

   Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is
   invoked with a NULL pointer argument, so it can hit a completely
   unrelated task and unconditionally force an update for nothing.
   Worse, it can hit a kernel thread which operates on a user space
   address space and set a random PASID for it.

The offending commit does not cleanly revert, but it's sufficient to
force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid()
code to make this dysfunctional all over the place. Anything more
complex would require more surgery and none of the related functions
outside of the x86 core code are blatantly wrong, so removing those
would be overkill.

As nothing enables the PASID bit in the IA32_XSS MSR yet, which is
required to make this actually work, this cannot result in a regression
except for related out of tree train-wrecks, but they are broken already
today.

Fixes: 20f0afd1fb ("x86/mmu: Allocate/free a PASID")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.de
2021-06-03 16:33:09 +02:00
Ingo Molnar
d9f6e12fb0 x86: Fix various typos in comments
Fix ~144 single-word typos in arch/x86/ code comments.

Doing this in a single commit should reduce the churn.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
2021-03-18 15:31:53 +01:00
Yejune Deng
0a74d61c7d x86/fpu/xstate: Use sizeof() instead of a constant
Use sizeof() instead of a constant in fpstate_sanitize_xstate().
Remove use of the address of the 0th array element of ->st_space and
->xmm_space which is equivalent to the array address itself:

No code changed:

  # arch/x86/kernel/fpu/xstate.o:

   text    data     bss     dec     hex filename
   9694     899       4   10597    2965 xstate.o.before
   9694     899       4   10597    2965 xstate.o.after

md5:
   5a43fc70bad8e2a1784f67f01b71aabb  xstate.o.before.asm
   5a43fc70bad8e2a1784f67f01b71aabb  xstate.o.after.asm

 [ bp: Massage commit message. ]

Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210122071925.41285-1-yejune.deng@gmail.com
2021-01-29 12:33:17 +01:00
Fenghua Yu
20f0afd1fb x86/mmu: Allocate/free a PASID
A PASID is allocated for an "mm" the first time any thread binds to an
SVA-capable device and is freed from the "mm" when the SVA is unbound
by the last thread. It's possible for the "mm" to have different PASID
values in different binding/unbinding SVA cycles.

The mm's PASID (non-zero for valid PASID or 0 for invalid PASID) is
propagated to a per-thread PASID MSR for all threads within the mm
through IPI, context switch, or inherited. This is done to ensure that a
running thread has the right PASID in the MSR matching the mm's PASID.

 [ bp: s/SVM/SVA/g; massage. ]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1600187413-163670-10-git-send-email-fenghua.yu@intel.com
2020-09-17 20:22:15 +02:00
Yu-cheng Yu
b454feb9ab x86/fpu/xstate: Add supervisor PASID state for ENQCMD
The ENQCMD instruction reads a PASID from the IA32_PASID MSR. The
MSR is stored in the task's supervisor XSAVE* PASID state and is
context-switched by XSAVES/XRSTORS.

 [ bp: Add (in-)definite articles and massage. ]

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1600187413-163670-6-git-send-email-fenghua.yu@intel.com
2020-09-17 20:22:10 +02:00
Linus Torvalds
50f6c7dbd9 Misc fixes and small updates all around the place:
- Fix mitigation state sysfs output
  - Fix an FPU xstate/sxave code assumption bug triggered by Architectural LBR support
  - Fix Lightning Mountain SoC TSC frequency enumeration bug
  - Fix kexec debug output
  - Fix kexec memory range assumption bug
  - Fix a boundary condition in the crash kernel code
 
  - Optimize porgatory.ro generation a bit
  - Enable ACRN guests to use X2APIC mode
  - Reduce a __text_poke() IRQs-off critical section for the benefit of PREEMPT_RT
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl83ybgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iJnQ/+OAkE5hiQ+F1ikQ4rKyjaT6FjvynReNUA
 ysQjcCypGB4x+slR8o3k5yrzYJ9WbDfOz7a0uekZtNHvJ80+3yheV5Yvf+Uz3EYM
 Jj/OubCNMNnvS5cJMNXs196SGd/ELLWBbCjwUWPsiWJ0ZMTgKmpZz1LgB1QZjhyw
 fbAc1WgTLVO+emE5FwBrmFzvgBxn5EtiFoLhegFtACHadNcJLiKpXpiK3NKkEirO
 owF1/Qg6mn6MowKDBDkWgmwi0HVYbraqu0hXRrCq9o105CVwgwUdORTwjK3rnUNs
 et10Zz2UmSpjXJOhKZdZLFCtYOmrADmS4pnoXF6W6cLLFvkq4b2ducnlFBtNKqMh
 ljPkIT04sF99gIKijEYWsru+MgS4qO1VNHtJxkr/ZCUjqahsa1nN9F0lP0QOXjwf
 hbK4h1NrML3UiCGAe2hjIh9zY2c8s2Q90PyCvZkKNKquSQ1E011hzcEE2RIoBBYB
 mc1d6lgfCFWVkbgRA5sx1CVtgnAvHk2wu9w/8N9XTGjPgiQJRr3I8cNUZw59gaMH
 43auWyvpVAA4vdfbKJrPVrTLhTTnQYv0A966l7/i0d8MkGN4u09sAiB3ZevZMEK9
 45b7IXWluCi0ikBAmCvQ+qEzhg7pApCziVKuaZ/4j+qPLTDAutGwz7YuaXyOKrUX
 Aj/uCev6D6c=
 =fvpv
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Misc fixes and small updates all around the place:

   - Fix mitigation state sysfs output

   - Fix an FPU xstate/sxave code assumption bug triggered by
     Architectural LBR support

   - Fix Lightning Mountain SoC TSC frequency enumeration bug

   - Fix kexec debug output

   - Fix kexec memory range assumption bug

   - Fix a boundary condition in the crash kernel code

   - Optimize porgatory.ro generation a bit

   - Enable ACRN guests to use X2APIC mode

   - Reduce a __text_poke() IRQs-off critical section for the benefit of
     PREEMPT_RT"

* tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternatives: Acquire pte lock with interrupts enabled
  x86/bugs/multihit: Fix mitigation reporting when VMX is not in use
  x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
  x86/purgatory: Don't generate debug info for purgatory.ro
  x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC
  kexec_file: Correctly output debugging information for the PT_LOAD ELF header
  kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges
  x86/crash: Correct the address boundary of function parameters
  x86/acrn: Remove redundant chars from ACRN signature
  x86/acrn: Allow ACRN guest to use X2APIC mode
2020-08-15 10:38:03 -07:00
Linus Torvalds
19b39c38ab Merge branch 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ptrace regset updates from Al Viro:
 "Internal regset API changes:

   - regularize copy_regset_{to,from}_user() callers

   - switch to saner calling conventions for ->get()

   - kill user_regset_copyout()

  The ->put() side of things will have to wait for the next cycle,
  unfortunately.

  The balance is about -1KLoC and replacements for ->get() instances are
  a lot saner"

* 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
  regset: kill user_regset_copyout{,_zero}()
  regset(): kill ->get_size()
  regset: kill ->get()
  csky: switch to ->regset_get()
  xtensa: switch to ->regset_get()
  parisc: switch to ->regset_get()
  nds32: switch to ->regset_get()
  nios2: switch to ->regset_get()
  hexagon: switch to ->regset_get()
  h8300: switch to ->regset_get()
  openrisc: switch to ->regset_get()
  riscv: switch to ->regset_get()
  c6x: switch to ->regset_get()
  ia64: switch to ->regset_get()
  arc: switch to ->regset_get()
  arm: switch to ->regset_get()
  sh: convert to ->regset_get()
  arm64: switch to ->regset_get()
  mips: switch to ->regset_get()
  sparc: switch to ->regset_get()
  ...
2020-08-07 09:29:25 -07:00
Kan Liang
76d10256a9 x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
An xstate size check warning is triggered on machines which support
Architectural LBRs.

    XSAVE consistency problem, dumping leaves
    WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:649 fpu__init_system_xstate+0x4d4/0xd0e
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted intel-arch_lbr+
    RIP: 0010:fpu__init_system_xstate+0x4d4/0xd0e

The xstate size check routine, init_xstate_size(), compares the size
retrieved from the hardware with the size of task->fpu, which is
calculated by the software.

The size from the hardware is the total size of the enabled xstates in
XCR0 | IA32_XSS. Architectural LBR state is a dynamic supervisor
feature, which sets the corresponding bit in the IA32_XSS at boot time.
The size from the hardware includes the size of the Architectural LBR
state.

However, a dynamic supervisor feature doesn't allocate a buffer in the
task->fpu. The size of task->fpu doesn't include the size of the
Architectural LBR state. The mismatch will trigger the warning.

Three options as below were considered to fix the issue:

- Correct the size from the hardware by subtracting the size of the
  dynamic supervisor features.
  The purpose of the check is to compare the size CPU told with the size
  of the XSAVE buffer, which is calculated by the software. If the
  software mucks with the number from hardware, it removes the value of
  the check.
  This option is not a good option.

- Prevent the hardware from counting the size of the dynamic supervisor
  feature by temporarily removing the corresponding bits in IA32_XSS.
  Two extra MSR writes are required to flip the IA32_XSS. The option is
  not pretty, but it is workable. The check is only called once at early
  boot time. The synchronization or context-switching doesn't need to be
  worried.
  This option is implemented here.

- Remove the check entirely, because the check hasn't found any real
  problems. The option may be an alternative as option 2.
  This option is not implemented here.

Add a new function, get_xsaves_size_no_dynamic(), which retrieves the
total size without the dynamic supervisor features from the hardware.
The size will be used to compare with the size of task->fpu.

Fixes: f0dccc9da4 ("x86/fpu/xstate: Support dynamic supervisor feature for LBR")
Reported-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/r/1595253051-75374-1-git-send-email-kan.liang@linux.intel.com
2020-08-07 01:32:00 +02:00
Ingo Molnar
e89d4ca1df Linux 5.8-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl8d8h4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGd0sH/2iktYhMwPxzzpnb
 eI3OuTX/mRn4vUFOfpx9dmGVleMfKkpbvnn3IY7wA62Qfv7J7lkFRa1Bd1DlqXfW
 yyGTGDSKG5chiRCOU3s9ni92M4xIzFlrojyt/dIK2lUGMzUPI9FGlZRGQLKqqwLh
 2syOXRWbcQ7e52IHtDSy3YBNveKRsP4NyqV+GxGiex18SMB/M3Pw9EMH614eDPsE
 QAGQi5uGv4hPJtFHgXgUyBPLFHIyFAiVxhFRIj7u2DSEKY79+wO1CGWFiFvdTY4B
 CbqKXLffY3iQdFsLJkj9Dl8cnOQnoY44V0EBzhhORxeOp71StUVaRwQMFa5tp48G
 171s5Hs=
 =BQIl
 -----END PGP SIGNATURE-----

Merge tag 'v5.8-rc7' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-28 13:18:01 +02:00
Al Viro
0557d64d98 x86: switch to ->regset_get()
All instances of ->get() in arch/x86 switched; that might or might
not be worth splitting up.  Notes:

	* for xstateregs_get() the amount we want to store is determined at
the boot time; see init_xstate_size() and update_regset_xstate_info() for
details.  task->thread.fpu.state.xsave ends with a flexible array member and
the amount of data in it depends upon the FPU features supported/enabled.

	* fpregs_get() writes slightly less than full ->thread.fpu.state.fsave
(the last word is not copied); we pass the full size of state.fsave and let
membuf_write() trim to the amount declared by regset - __regset_get() will
make sure that the space in buffer is no more than that.

	* copy_xstate_to_user() and its helpers are gone now.

	* fpregs_soft_get() was getting user_regset_copyout() arguments
wrong.  Since "x86: x86 user_regset math_emu" back in 2008...  I really
doubt that it's worth splitting out for -stable, though - you need
a 486SX box for that to trigger...

[Kevin's braino fix for copy_xstate_to_kernel() essentially duplicated here]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27 14:31:07 -04:00
Kevin Buettner
5714ee50bb copy_xstate_to_kernel: Fix typo which caused GDB regression
This fixes a regression encountered while running the
gdb.base/corefile.exp test in GDB's test suite.

In my testing, the typo prevented the sw_reserved field of struct
fxregs_state from being output to the kernel XSAVES area.  Thus the
correct mask corresponding to XCR0 was not present in the core file for
GDB to interrogate, resulting in the following behavior:

   [kev@f32-1 gdb]$ ./gdb -q testsuite/outputs/gdb.base/corefile/corefile testsuite/outputs/gdb.base/corefile/corefile.core
   Reading symbols from testsuite/outputs/gdb.base/corefile/corefile...
   [New LWP 232880]

   warning: Unexpected size of section `.reg-xstate/232880' in core file.

With the typo fixed, the test works again as expected.

Signed-off-by: Kevin Buettner <kevinb@redhat.com>
Fixes: 9e46365459 ("copy_xstate_to_kernel(): don't leave parts of destination uninitialized")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-19 17:09:10 -07:00
Kan Liang
ce711ea3ca perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch
In the LBR call stack mode, LBR information is used to reconstruct a
call stack. To get the complete call stack, perf has to save/restore
all LBR registers during a context switch. Due to a large number of the
LBR registers, this process causes a high CPU overhead. To reduce the
CPU overhead during a context switch, use the XSAVES/XRSTORS
instructions.

Every XSAVE area must follow a canonical format: the legacy region, an
XSAVE header and the extended region. Although the LBR information is
only kept in the extended region, a space for the legacy region and
XSAVE header is still required. Add a new dedicated structure for LBR
XSAVES support.

Before enabling XSAVES support, the size of the LBR state has to be
sanity checked, because:
- the size of the software structure is calculated from the max number
of the LBR depth, which is enumerated by the CPUID leaf for Arch LBR.
The size of the LBR state is enumerated by the CPUID leaf for XSAVE
support of Arch LBR. If the values from the two CPUID leaves are not
consistent, it may trigger a buffer overflow. For example, a hypervisor
may unconsciously set inconsistent values for the two emulated CPUID.
- unlike other state components, the size of an LBR state depends on the
max number of LBRs, which may vary from generation to generation.

Expose the function xfeature_size() for the sanity check.
The LBR XSAVES support will be disabled if the size of the LBR state
enumerated by CPUID doesn't match with the size of the software
structure.

The XSAVE instruction requires 64-byte alignment for state buffers. A
new macro is added to reflect the alignment requirement. A 64-byte
aligned kmem_cache is created for architecture LBR.

Currently, the structure for each state component is maintained in
fpu/types.h. The structure for the new LBR state component should be
maintained in the same place. Move structure lbr_entry to fpu/types.h as
well for broader sharing.

Add dedicated lbr_save/lbr_restore functions for LBR XSAVES support,
which invokes the corresponding xstate helpers to XSAVES/XRSTORS LBR
information at the context switch when the call stack mode is enabled.
Since the XSAVES/XRSTORS instructions will be eventually invoked, the
dedicated functions is named with '_xsaves'/'_xrstors' postfix.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-23-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Kan Liang
50f408d96d x86/fpu/xstate: Add helpers for LBR dynamic supervisor feature
The perf subsystem will only need to save/restore the LBR state.
However, the existing helpers save all supported supervisor states to a
kernel buffer, which will be unnecessary. Two helpers are introduced to
only save/restore requested dynamic supervisor states. The supervisor
features in XFEATURE_MASK_SUPERVISOR_SUPPORTED and
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED mask cannot be saved/restored using
these helpers.

The helpers will be used in the following patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-22-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Kan Liang
f0dccc9da4 x86/fpu/xstate: Support dynamic supervisor feature for LBR
Last Branch Records (LBR) registers are used to log taken branches and
other control flows. In perf with call stack mode, LBR information is
used to reconstruct a call stack. To get the complete call stack, perf
has to save/restore all LBR registers during a context switch. Due to
the large number of the LBR registers, e.g., the current platform has
96 LBR registers, this process causes a high CPU overhead. To reduce
the CPU overhead during a context switch, an LBR state component that
contains all the LBR related registers is introduced in hardware. All
LBR registers can be saved/restored together using one XSAVES/XRSTORS
instruction.

However, the kernel should not save/restore the LBR state component at
each context switch, like other state components, because of the
following unique features of LBR:
- The LBR state component only contains valuable information when LBR
  is enabled in the perf subsystem, but for most of the time, LBR is
  disabled.
- The size of the LBR state component is huge. For the current
  platform, it's 808 bytes.
If the kernel saves/restores the LBR state at each context switch, for
most of the time, it is just a waste of space and cycles.

To efficiently support the LBR state component, it is desired to have:
- only context-switch the LBR when the LBR feature is enabled in perf.
- only allocate an LBR-specific XSAVE buffer on demand.
  (Besides the LBR state, a legacy region and an XSAVE header have to be
   included in the buffer as well. There is a total of (808+576) byte
   overhead for the LBR-specific XSAVE buffer. The overhead only happens
   when the perf is actively using LBRs. There is still a space-saving,
   on average, when it replaces the constant 808 bytes of overhead for
   every task, all the time on the systems that support architectural
   LBR.)
- be able to use XSAVES/XRSTORS for accessing LBR at run time.
  However, the IA32_XSS should not be adjusted at run time.
  (The XCR0 | IA32_XSS are used to determine the requested-feature
  bitmap (RFBM) of XSAVES.)

A solution, called dynamic supervisor feature, is introduced to address
this issue, which
- does not allocate a buffer in each task->fpu;
- does not save/restore a state component at each context switch;
- sets the bit corresponding to the dynamic supervisor feature in
  IA32_XSS at boot time, and avoids setting it at run time.
- dynamically allocates a specific buffer for a state component
  on demand, e.g. only allocates LBR-specific XSAVE buffer when LBR is
  enabled in perf. (Note: The buffer has to include the LBR state
  component, a legacy region and a XSAVE header space.)
  (Implemented in a later patch)
- saves/restores a state component on demand, e.g. manually invokes
  the XSAVES/XRSTORS instruction to save/restore the LBR state
  to/from the buffer when perf is active and a call stack is required.
  (Implemented in a later patch)

A new mask XFEATURE_MASK_DYNAMIC and a helper xfeatures_mask_dynamic()
are introduced to indicate the dynamic supervisor feature. For the
systems which support the Architecture LBR, LBR is the only dynamic
supervisor feature for now. For the previous systems, there is no
dynamic supervisor feature available.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-21-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Linus Torvalds
0a319ef75d Most of the changes here related to 'XSAVES supervisor state' support,
which is a feature that allows kernel-only data to be automatically
 saved/restored by the FPU context switching code.
 
 CPU features that can be supported this way are Intel PT, 'PASID' and
 CET features.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VMZgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jmAQ/7BJpyAHUjFJdChtkvUmLcBgI2qnxP7rc8
 Eh/tSo4PKh484Uqb4WY6XAHIAPBzEt3rHJG3fdaavzlUl98YJCdD9tstfwMPcCQ4
 L4c2Ru+h+mPQCMOZUctOphPjDzGWPzR4IhceH6gqhoS4vg9EqgN4o158x4jW6KFN
 Jlocp9CMfIaGSmaMlRrIUZ4Dj3mgboqqHsuCaibtaKAMK6LqZQDViTEal4mNbESX
 KQPOFpKrhoq6Jtzzer7fLPY2qb6kkLrL03X5IUGFP5UxigSejnfrI9SZpAuPP9S0
 kdN04Jo0T2aBIAikBTVhDWdLMJk19qeu7YXBrFEVbyhZHl1HdDqOhMdWPOp1GH9W
 CtGUalbIvz/5FbXuUImiiNh/bw2FxYjHsrDguW96IvMVFteucrFg9QyL+taYb1cV
 WqWdpIC0VoMuQxQI5FBWu4Bb/cLNV9VCxWAZjZQ806kwmyDxldsw5mucMGmH3+bO
 LD6bwRShSMRzI9bzcJSG+Z3y7Fe8b5IGNjCjzgPb88ezffBEFHzIEKdCL6QTNlRF
 6UgSGbRs41SqXwNw5tdQQNwPpDO73p+KVRGoEzyMJvojLKRGTcOHHUDriGZ30MNX
 3oHvLf5+dNrLC/frbOqUmQ7doBQOplR5VxlZVwwqkdpPw13Jf5zn4ewzriTOmKCq
 mEHMQmbkyi4=
 =M+BC
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 FPU updates from Ingo Molnar:
 "Most of the changes here related to 'XSAVES supervisor state' support,
  which is a feature that allows kernel-only data to be automatically
  saved/restored by the FPU context switching code.

  CPU features that can be supported this way are Intel PT, 'PASID' and
  CET features"

* tag 'x86-fpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu/xstate: Restore supervisor states for signal return
  x86/fpu/xstate: Preserve supervisor states for the slow path in __fpu__restore_sig()
  x86/fpu: Introduce copy_supervisor_to_kernel()
  x86/fpu/xstate: Update copy_kernel_to_xregs_err() for supervisor states
  x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstates
  x86/fpu/xstate: Define new functions for clearing fpregs and xstates
  x86/fpu/xstate: Introduce XSAVES supervisor states
  x86/fpu/xstate: Separate user and supervisor xfeatures mask
  x86/fpu/xstate: Define new macros for supervisor and user xstates
  x86/fpu/xstate: Rename validate_xstate_header() to validate_user_xstate_header()
2020-06-01 14:09:26 -07:00
Al Viro
9e46365459 copy_xstate_to_kernel(): don't leave parts of destination uninitialized
copy the corresponding pieces of init_fpstate into the gaps instead.

Cc: stable@kernel.org
Tested-by: Alexander Potapenko <glider@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-27 17:06:31 -04:00
Yu-cheng Yu
eeedf15336 x86/fpu: Introduce copy_supervisor_to_kernel()
The XSAVES instruction takes a mask and saves only the features specified
in that mask.  The kernel normally specifies that all features be saved.

XSAVES also unconditionally uses the "compacted format" which means that
all specified features are saved next to each other in memory.  If a
feature is removed from the mask, all the features after it will "move
up" into earlier locations in the buffer.

Introduce copy_supervisor_to_kernel(), which saves only supervisor states
and then moves those states into the standard location where they are
normally found.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200512145444.15483-9-yu-cheng.yu@intel.com
2020-05-16 11:24:14 +02:00
Yu-cheng Yu
71581eefd7 x86/fpu/xstate: Introduce XSAVES supervisor states
Enable XSAVES supervisor states by setting MSR_IA32_XSS bits according
to CPUID enumeration results. Also revise comments at various places.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-5-yu-cheng.yu@intel.com
2020-05-13 12:16:47 +02:00