Commit Graph

11591 Commits

Author SHA1 Message Date
Jakub Kicinski
80901bff81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/batman-adv/hard-interface.c
  commit 690bb6fb64 ("batman-adv: Request iflink once in batadv-on-batadv check")
  commit 6ee3c393ee ("batman-adv: Demote batadv-on-batadv skip error message")
https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/

net/smc/af_smc.c
  commit 4d08b7b57e ("net/smc: Fix cleanup when register ULP fails")
  commit 462791bbfa ("net/smc: add sysctl interface for SMC")
https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03 11:55:12 -08:00
Suma Hegde
91f410aa67 platform/x86: Add AMD system management interface
Recent Fam19h EPYC server line of processors from AMD support system
management functionality via HSMP (Host System Management Port) interface.

The Host System Management Port (HSMP) is an interface to provide
OS-level software with access to system management functions via a
set of mailbox registers.

More details on the interface can be found in chapter
"7 Host System Management Port (HSMP)" of the following PPR
https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip

This patch adds new amd_hsmp module under the drivers/platforms/x86/
which creates miscdevice with an IOCTL interface to the user space.
/dev/hsmp is for running the hsmp mailbox commands.

Signed-off-by: Suma Hegde <suma.hegde@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Nathan Fontenot <nathan.fontenot@amd.com>
Link: https://lore.kernel.org/r/20220222050501.18789-1-nchatrad@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-03-02 11:42:36 +01:00
Sean Christopherson
527d5cd7ee KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped
Zap only obsolete roots when responding to zapping a single root shadow
page.  Because KVM keeps root_count elevated when stuffing a previous
root into its PGD cache, shadowing a 64-bit guest means that zapping any
root causes all vCPUs to reload all roots, even if their current root is
not affected by the zap.

For many kernels, zapping a single root is a frequent operation, e.g. in
Linux it happens whenever an mm is dropped, e.g. process exits, etc...

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220225182248.3812651-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01 08:58:25 -05:00
Paolo Bonzini
0c1c92f15f KVM: x86/mmu: do not pass vcpu to root freeing functions
These functions only operate on a given MMU, of which there is more
than one in a vCPU (we care about two, because the third does not have
any roots and is only used to walk guest page tables).  They do need a
struct kvm in order to lock the mmu_lock, but they do not needed anything
else in the struct kvm_vcpu.  So, pass the vcpu->kvm directly to them.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 08:20:17 -05:00
Paolo Bonzini
b9e5603c2a KVM: x86: use struct kvm_mmu_root_info for mmu->root
The root_hpa and root_pgd fields form essentially a struct kvm_mmu_root_info.
Use the struct to have more consistency between mmu->root and
mmu->prev_roots.

The patch is entirely search and replace except for cached_root_available,
which does not need a temporary struct kvm_mmu_root_info anymore.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 08:20:16 -05:00
David Dunn
ba7bb663f5 KVM: x86: Provide per VM capability for disabling PMU virtualization
Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of
settings/features to allow userspace to configure PMU virtualization on
a per-VM basis.  For now, support a single flag, KVM_PMU_CAP_DISABLE,
to allow disabling PMU virtualization for a VM even when KVM is configured
with enable_pmu=true a module level.

To keep KVM simple, disallow changing VM's PMU configuration after vCPUs
have been created.

Signed-off-by: David Dunn <daviddunn@google.com>
Message-Id: <20220223225743.2703915-2-daviddunn@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 08:20:14 -05:00
Sean Christopherson
925088781e KVM: x86: Fix pointer mistmatch warning when patching RET0 static calls
Cast kvm_x86_ops.func to 'void *' when updating KVM static calls that are
conditionally patched to __static_call_return0().  clang complains about
using mismatching pointers in the ternary operator, which breaks the
build when compiling with CONFIG_KVM_WERROR=y.

  >> arch/x86/include/asm/kvm-x86-ops.h:82:1: warning: pointer type mismatch
  ('bool (*)(struct kvm_vcpu *)' and 'void *') [-Wpointer-type-mismatch]

Fixes: 5be2226f41 ("KVM: x86: allow defining return-0 static calls")
Reported-by: Like Xu <like.xu.linux@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Dunn <daviddunn@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Message-Id: <20220223162355.3174907-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 08:20:14 -05:00
Arnd Bergmann
dd865f090f
Merge branch 'set_fs-4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic into asm-generic
Christoph Hellwig and a few others spent a huge effort on removing
set_fs() from most of the important architectures, but about half the
other architectures were never completed even though most of them don't
actually use set_fs() at all.

I did a patch for microblaze at some point, which turned out to be fairly
generic, and now ported it to most other architectures, using new generic
implementations of access_ok() and __{get,put}_kernel_nocheck().

Three architectures (sparc64, ia64, and sh) needed some extra work,
which I also completed.

* 'set_fs-4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  uaccess: remove CONFIG_SET_FS
  ia64: remove CONFIG_SET_FS support
  sh: remove CONFIG_SET_FS support
  sparc64: remove CONFIG_SET_FS support
  lib/test_lockup: fix kernel pointer check for separate address spaces
  uaccess: generalize access_ok()
  uaccess: fix type mismatch warnings from access_ok()
  arm64: simplify access_ok()
  m68k: fix access_ok for coldfire
  MIPS: use simpler access_ok()
  MIPS: Handle address errors for accesses above CPU max virtual user address
  uaccess: add generic __{get,put}_kernel_nofault
  nios2: drop access_ok() check from __put_user()
  x86: use more conventional access_ok() definition
  x86: remove __range_not_ok()
  sparc64: add __{get,put}_kernel_nofault()
  nds32: fix access_ok() checks in get/put_user
  uaccess: fix nios2 and microblaze get_user_8()
  uaccess: fix integer overflow on access_ok()
2022-02-25 11:16:58 +01:00
Arnd Bergmann
12700c17fc uaccess: generalize access_ok()
There are many different ways that access_ok() is defined across
architectures, but in the end, they all just compare against the
user_addr_max() value or they accept anything.

Provide one definition that works for most architectures, checking
against TASK_SIZE_MAX for user processes or skipping the check inside
of uaccess_kernel() sections.

For architectures without CONFIG_SET_FS(), this should be the fastest
check, as it comes down to a single comparison of a pointer against a
compile-time constant, while the architecture specific versions tend to
do something more complex for historic reasons or get something wrong.

Type checking for __user annotations is handled inconsistently across
architectures, but this is easily simplified as well by using an inline
function that takes a 'const void __user *' argument. A handful of
callers need an extra __user annotation for this.

Some architectures had trick to use 33-bit or 65-bit arithmetic on the
addresses to calculate the overflow, however this simpler version uses
fewer registers, which means it can produce better object code in the
end despite needing a second (statically predicted) branch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64, asm-generic]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Arnd Bergmann
34737e2698 uaccess: add generic __{get,put}_kernel_nofault
Nine architectures are still missing __{get,put}_kernel_nofault:
alpha, ia64, microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa.

Add a generic version that lets everything use the normal
copy_{from,to}_kernel_nofault() code based on these, removing the last
use of get_fs()/set_fs() from architecture-independent code.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Arnd Bergmann
1830a1d6a5 x86: use more conventional access_ok() definition
The way that access_ok() is defined on x86 is slightly different from
most other architectures, and a bit more complex.

The generic version tends to result in the best output on all
architectures, as it results in single comparison against a constant
limit for calls with a known size.

There are a few callers of __range_not_ok(), all of which use TASK_SIZE
as the limit rather than TASK_SIZE_MAX, but I could not see any reason
for picking this. Changing these to call __access_ok() instead uses the
default limit, but keeps the behavior otherwise.

x86 is the only architecture with a WARN_ON_IN_IRQ() checking
access_ok(), but it's probably best to leave that in place.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Arnd Bergmann
36903abedf x86: remove __range_not_ok()
The __range_not_ok() helper is an x86 (and sparc64) specific interface
that does roughly the same thing as __access_ok(), but with different
calling conventions.

Change this to use the normal interface in order for consistency as we
clean up all access_ok() implementations.

This changes the limit from TASK_SIZE to TASK_SIZE_MAX, which Al points
out is the right thing do do here anyway.

The callers have to use __access_ok() instead of the normal access_ok()
though, because on x86 that contains a WARN_ON_IN_IRQ() check that cannot
be used inside of NMI context while tracing.

The check in copy_code() is not needed any more, because this one is
already done by copy_from_user_nmi().

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/lkml/YgsUKcXGR7r4nINj@zeniv-ca.linux.org.uk/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Linus Torvalds
1f840c0ef4 x86 host:
* Expose KVM_CAP_ENABLE_CAP since it is supported
 
 * Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode
 
 * Ensure async page fault token is nonzero
 
 * Fix lockdep false negative
 
 * Fix FPU migration regression from the AMX changes
 
 x86 guest:
 
 * Don't use PV TLB/IPI/yield on uniprocessor guests
 
 PPC:
 * reserve capability id (topic branch for ppc/kvm)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmIXyQAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPKJQf/T9NeXOFIPIIlH4ZKM7155qlwX8dx
 NR2YV+RNYd27MDkaEm9w4ucXacGpPuBPPx9v7UiLlAqAN+NP7nF3rQKC0SpQMC6H
 EKFtm+8al8EzyDYP36fqnwDne/xWHlOeGXRRJMKPGhXBSoXoY5cK35IXmNZjfteQ
 hK7siBs2saJ2VFqMCbJ9Pqdu1NDO6OEt8HWz2Dnx6EUd90O0pHWZy5JvWOYfyLjL
 Y2pP0dZQxuB/PmqkpVj2gV9jK2Zhj33eerzDV4tVXPV7le8fgGeTaJ8ft+SUIizS
 YCcPR89+u5c9yzlwY2i7mvloayKnuqkECiGtRG6VHNlrPZTPijems8tH1w==
 =lWjy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "x86 host:

   - Expose KVM_CAP_ENABLE_CAP since it is supported

   - Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode

   - Ensure async page fault token is nonzero

   - Fix lockdep false negative

   - Fix FPU migration regression from the AMX changes

  x86 guest:

   - Don't use PV TLB/IPI/yield on uniprocessor guests

  PPC:

   - reserve capability id (topic branch for ppc/kvm)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled
  KVM: x86/mmu: make apf token non-zero to fix bug
  KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3
  x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU
  x86/kvm: Fix compilation warning in non-x86_64 builds
  x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0
  x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode
  KVM: Fix lockdep false negative during host resume
  KVM: x86: Add KVM_CAP_ENABLE_CAP to x86
2022-02-24 14:05:49 -08:00
Brijesh Singh
1e8c5971c2 x86/mm/cpa: Generalize __set_memory_enc_pgtable()
The kernel provides infrastructure to set or clear the encryption mask
from the pages for AMD SEV, but TDX requires few tweaks.

- TDX and SEV have different requirements to the cache and TLB
  flushing.

- TDX has own routine to notify VMM about page encryption status change.

Modify __set_memory_enc_pgtable() and make it flexible enough to cover
both AMD SEV and Intel TDX. The AMD-specific behavior is isolated in the
callbacks under x86_platform.guest. TDX will provide own version of said
callbacks.

  [ bp: Beat into submission. ]

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20220223043528.2093214-1-brijesh.singh@amd.com
2022-02-23 19:14:29 +01:00
Kirill A. Shutemov
b577f542f9 x86/coco: Add API to handle encryption mask
AMD SME/SEV uses a bit in the page table entries to indicate that the
page is encrypted and not accessible to the VMM.

TDX uses a similar approach, but the polarity of the mask is opposite to
AMD: if the bit is set the page is accessible to VMM.

Provide vendor-neutral API to deal with the mask: cc_mkenc() and
cc_mkdec() modify given address to make it encrypted/decrypted. It can
be applied to phys_addr_t, pgprotval_t or page table entry value.

pgprot_encrypted() and pgprot_decrypted() reimplemented using new
helpers.

The implementation will be extended to cover TDX.

pgprot_decrypted() is used by drivers (i915, virtio_gpu, vfio).
cc_mkdec() called by pgprot_decrypted(). Export cc_mkdec().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20220222185740.26228-5-kirill.shutemov@linux.intel.com
2022-02-23 19:14:29 +01:00
Kirill A. Shutemov
655a0fa34b x86/coco: Explicitly declare type of confidential computing platform
The kernel derives the confidential computing platform
type it is running as from sme_me_mask on AMD or by using
hv_is_isolation_supported() on HyperV isolation VMs. This detection
process will be more complicated as more platforms get added.

Declare a confidential computing vendor variable explicitly and set it
via cc_set_vendor() on the respective platform.

  [ bp: Massage commit message, fixup HyperV check. ]

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20220222185740.26228-4-kirill.shutemov@linux.intel.com
2022-02-23 19:14:16 +01:00
Christoph Hellwig
4509d950a6 x86/pat: Remove the unused set_pages_array_wt() function
Commit

  623dffb2a2 ("x86/mm/pat: Add set_memory_wt() for Write-Through type")

added it but there were no users.

  [ bp: Add a commit message. ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220223072852.616143-1-hch@lst.de
2022-02-23 13:34:08 +01:00
Ingo Molnar
669f45f19c sched/headers: Add initial new headers as identity mappings
This allows code sharing between fast-headers tree and the vanilla
scheduler tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23 10:58:28 +01:00
Ingo Molnar
6255b48aeb Linux 5.17-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmISrYgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGg20IAKDZr7rfSHBopjQV
 Cocw744tom0XuxpvSZpp2GGOOXF+tkswcNNaRIrbGOl1mkyxA7eBZCTMpDeDS9aQ
 wB0D0Gxx8QBAJp4KgB1W7TB+hIGes/rs8Ve+6iO4ulLLdCVWX/q2boI0aZ7QX9O9
 qNi8OsoZQtk6falRvciZFHwV5Av1p2Sy1AW57udQ7DvJ4H98AfKf1u8/z208WWW8
 1ixC+qJxQcUcM9vI+7P9Tt7NbFSKv8SvAmqjFY7P+DxQAsVw6KXoqVXykDzeOv0t
 fUNOE/t0oFZafwtn8h7KBQnwS9lH03+3KkslVZs+iMFyUj/Bar+NVVyKoDhWXtVg
 /PuMhEg=
 =eU1o
 -----END PGP SIGNATURE-----

Merge tag 'v5.17-rc5' into sched/core, to resolve conflicts

New conflicts in sched/core due to the following upstream fixes:

  44585f7bc0 ("psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n")
  a06247c680 ("psi: Fix uaf issue when psi trigger is destroyed while being polled")

Conflicts:
	include/linux/psi_types.h
	kernel/sched/psi.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-02-21 11:53:51 +01:00
Peter Zijlstra
1e19da8522 x86/speculation: Add eIBRS + Retpoline options
Thanks to the chaps at VUsec it is now clear that eIBRS is not
sufficient, therefore allow enabling of retpolines along with eIBRS.

Add spectre_v2=eibrs, spectre_v2=eibrs,lfence and
spectre_v2=eibrs,retpoline options to explicitly pick your preferred
means of mitigation.

Since there's new mitigations there's also user visible changes in
/sys/devices/system/cpu/vulnerabilities/spectre_v2 to reflect these
new mitigations.

  [ bp: Massage commit message, trim error messages,
    do more precise eIBRS mode checking. ]

Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2022-02-21 10:21:35 +01:00
Peter Zijlstra (Intel)
d45476d983 x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
The RETPOLINE_AMD name is unfortunate since it isn't necessarily
AMD only, in fact Hygon also uses it. Furthermore it will likely be
sufficient for some Intel processors. Therefore rename the thing to
RETPOLINE_LFENCE to better describe what it is.

Add the spectre_v2=retpoline,lfence option as an alias to
spectre_v2=retpoline,amd to preserve existing setups. However, the output
of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed.

  [ bp: Fix typos, massage. ]

Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2022-02-21 10:21:28 +01:00
Mark Rutland
8a69fe0be1 sched/preempt: Refactor sched_dynamic_update()
Currently sched_dynamic_update needs to open-code the enabled/disabled
function names for each preemption model it supports, when in practice
this is a boolean enabled/disabled state for each function.

Make this clearer and avoid repetition by defining the enabled/disabled
states at the function definition, and using helper macros to perform the
static_call_update(). Where x86 currently overrides the enabled
function, it is made to provide both the enabled and disabled states for
consistency, with defaults provided by the core code otherwise.

In subsequent patches this will allow us to support PREEMPT_DYNAMIC
without static calls.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-3-mark.rutland@arm.com
2022-02-19 11:11:07 +01:00
Sean Christopherson
1bbc60d0c7 KVM: x86/mmu: Remove MMU auditing
Remove mmu_audit.c and all its collateral, the auditing code has suffered
severe bitrot, ironically partly due to shadow paging being more stable
and thus not benefiting as much from auditing, but mostly due to TDP
supplanting shadow paging for non-nested guests and shadowing of nested
TDP not heavily stressing the logic that is being audited.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 13:46:23 -05:00
Paolo Bonzini
5be2226f41 KVM: x86: allow defining return-0 static calls
A few vendor callbacks are only used by VMX, but they return an integer
or bool value.  Introduce KVM_X86_OP_OPTIONAL_RET0 for them: if a func is
NULL in struct kvm_x86_ops, it will be changed to __static_call_return0
when updating static calls.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 12:44:22 -05:00
Paolo Bonzini
abb6d479e2 KVM: x86: make several APIC virtualization callbacks optional
All their invocations are conditional on vcpu->arch.apicv_active,
meaning that they need not be implemented by vendor code: even
though at the moment both vendors implement APIC virtualization,
all of them can be optional.  In fact SVM does not need many of
them, and their implementation can be deleted now.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 12:41:23 -05:00
Paolo Bonzini
dd2319c618 KVM: x86: warn on incorrectly NULL members of kvm_x86_ops
Use the newly corrected KVM_X86_OP annotations to warn about possible
NULL pointer dereferences as soon as the vendor module is loaded.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 12:31:43 -05:00
Paolo Bonzini
e4fc23bad8 KVM: x86: remove KVM_X86_OP_NULL and mark optional kvm_x86_ops
The original use of KVM_X86_OP_NULL, which was to mark calls
that do not follow a specific naming convention, is not in use
anymore.  Instead, let's mark calls that are optional because
they are always invoked within conditionals or with static_call_cond.
Those that are _not_, i.e. those that are defined with KVM_X86_OP,
must be defined by both vendor modules or some kind of NULL pointer
dereference is bound to happen at runtime.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 12:31:27 -05:00
Paolo Bonzini
8a2897853c KVM: x86: return 1 unconditionally for availability of KVM_CAP_VAPIC
The two ioctls used to implement userspace-accelerated TPR,
KVM_TPR_ACCESS_REPORTING and KVM_SET_VAPIC_ADDR, are available
even if hardware-accelerated TPR can be used.  So there is
no reason not to report KVM_CAP_VAPIC.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 12:30:56 -05:00
Jakub Kicinski
6b5567b1b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 11:44:20 -08:00
Leonardo Bras
988896bb61 x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0
kvm_vcpu_arch currently contains the guest supported features in both
guest_supported_xcr0 and guest_fpu.fpstate->user_xfeatures field.

Currently both fields are set to the same value in
kvm_vcpu_after_set_cpuid() and are not changed anywhere else after that.

Since it's not good to keep duplicated data, remove guest_supported_xcr0.

To keep the code more readable, introduce kvm_guest_supported_xcr()
and kvm_guest_supported_xfd() to replace the previous usages of
guest_supported_xcr0.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220217053028.96432-3-leobras@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-17 10:06:49 -05:00
Gustavo A. R. Silva
5224f79096 treewide: Replace zero-length arrays with flexible-array members
There is a regular need in the kernel to provide a way to declare
having a dynamically sized set of trailing elements in a structure.
Kernel code should always use “flexible array members”[1] for these
cases. The older style of one-element or zero-length arrays should
no longer be used[2].

This code was transformed with the help of Coccinelle:
(next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch)

@@
identifier S, member, array;
type T1, T2;
@@

struct S {
  ...
  T1 member;
  T2 array[
- 0
  ];
};

UAPI and wireless changes were intentionally excluded from this patch
and will be sent out separately.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/78
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2022-02-17 07:00:39 -06:00
Masahiro Yamada
4a3233c1a6
shmbuf.h: add asm/shmbuf.h to UAPI compile-test coverage
asm/shmbuf.h is currently excluded from the UAPI compile-test because of
the errors like follows:

    HDRTEST usr/include/asm/shmbuf.h
  In file included from ./usr/include/asm/shmbuf.h:6,
                   from <command-line>:
  ./usr/include/asm-generic/shmbuf.h:26:33: error: field ‘shm_perm’ has incomplete type
     26 |         struct ipc64_perm       shm_perm;       /* operation perms */
        |                                 ^~~~~~~~
  ./usr/include/asm-generic/shmbuf.h:27:9: error: unknown type name ‘size_t’
     27 |         size_t                  shm_segsz;      /* size of segment (bytes) */
        |         ^~~~~~
  ./usr/include/asm-generic/shmbuf.h:40:9: error: unknown type name ‘__kernel_pid_t’
     40 |         __kernel_pid_t          shm_cpid;       /* pid of creator */
        |         ^~~~~~~~~~~~~~
  ./usr/include/asm-generic/shmbuf.h:41:9: error: unknown type name ‘__kernel_pid_t’
     41 |         __kernel_pid_t          shm_lpid;       /* pid of last operator */
        |         ^~~~~~~~~~~~~~

The errors can be fixed by replacing size_t with __kernel_size_t and by
including proper headers.

Then, remove the no-header-test entry from user/include/Makefile.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-17 09:09:37 +01:00
Masahiro Yamada
72113d0a7d
signal.h: add linux/signal.h and asm/signal.h to UAPI compile-test coverage
linux/signal.h and asm/signal.h are currently excluded from the UAPI
compile-test because of the errors like follows:

    HDRTEST usr/include/asm/signal.h
  In file included from <command-line>:
  ./usr/include/asm/signal.h:103:9: error: unknown type name ‘size_t’
    103 |         size_t ss_size;
        |         ^~~~~~

The errors can be fixed by replacing size_t with __kernel_size_t.

Then, remove the no-header-test entries from user/include/Makefile.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-17 09:09:36 +01:00
Linus Torvalds
c5d9ae265b ARM:
* Read HW interrupt pending state from the HW
 
 x86:
 
 * Don't truncate the performance event mask on AMD
 
 * Fix Xen runstate updates to be atomic when preempting vCPU
 
 * Fix for AMD AVIC interrupt injection race
 
 * Several other AMD fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmIL4G4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNkQQf/Z75dnmdRl8sHHnGjwH2IhWHwAg+h
 5O+mJphYt4cvVMexP5dj69b7mHtKMeg/0TxPvPfwCLlhzKkW1gQFwwBAq/YuBCKw
 cnMuVPeCSWo6znpS+jYUF4FAJgPKkzfFR9UwYAR5UexSWyOwU8rLcvSxj8vJjO/l
 sIke+f767Ks2KgcTMIudObg+vDcgnQXI8n8ztI7hF1WJKYHdTKFkYN7BYRxQ9BW6
 4fq51218DhRMv6S7so5dhYC473f+D0t8b5S/Mygur/x6mzsdQJKeOmi8aWGoDa/B
 Bmse+X0lHoOkdXaxqpBgQCYeyrXohNcXx7cpGRVFnS45Jf7MLG4OfVHWNQ==
 =kD2l
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Read HW interrupt pending state from the HW

  x86:

   - Don't truncate the performance event mask on AMD

   - Fix Xen runstate updates to be atomic when preempting vCPU

   - Fix for AMD AVIC interrupt injection race

   - Several other AMD fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW
  KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event
  KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  KVM: SVM: set IRR in svm_deliver_interrupt
  KVM: SVM: extract avic_ring_doorbell
  selftests: kvm: Remove absent target file
  KVM: arm64: vgic: Read HW interrupt pending state from the HW
  KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU
  KVM: x86: SVM: move avic definitions from AMD's spec to svm.h
  KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it
  KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them
  KVM: x86: nSVM: expose clean bit support to the guest
  KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM
  KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state
  KVM: x86: nSVM: fix potential NULL derefernce on nested migration
  KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case
  Revert "svm: Add warning message for AVIC IPI invalid target"
2022-02-15 11:07:59 -08:00
Alexander Shishkin
161a9a3370 perf/x86/intel/pt: Add a capability and config bit for disabling TNTs
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called TNT-Disable which is enabled config bit 55.

TNT-Disable disables Taken-Not-Taken packets to reduce the tracing
overhead, but with the result that exact control flow information is
lost.

Add a capability and config bit for TNT-Disable.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20220126104815.2807416-3-adrian.hunter@intel.com
2022-02-15 17:47:11 +01:00
Alexander Shishkin
28c24ded64 perf/x86/intel/pt: Add a capability and config bit for event tracing
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which is enabled config bit 31.

Event Trace exposes details about asynchronous events such as interrupts
and VM-Entry/Exit.

Add a capability and config bit for Event Trace.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20220126104815.2807416-2-adrian.hunter@intel.com
2022-02-15 17:47:11 +01:00
Fenghua Yu
7c1ef59145 x86/cpufeatures: Re-enable ENQCMD
The ENQCMD feature can only be used if CONFIG_INTEL_IOMMU_SVM is set.
Add X86_FEATURE_ENQCMD to the disabled features mask as appropriate so
that cpu_feature_enabled() can be used to check the feature.

  [ bp: Massage commit message. ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220207230254.3342514-10-fenghua.yu@intel.com
2022-02-15 11:31:43 +01:00
Linus Torvalds
42964a18f8 - Fix a case where objtool would mistakenly warn about instructions being unreachable
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmII/BUACgkQEsHwGGHe
 VUr5VBAAhK4/Hqe5wgbbKE2CwUfvCd7KBbKQb0FbBguO9Sg9NkQhD0ROE5qQCsRE
 AMBJKuC4AahoVIsPuJ2nr73iaM5UY3xx6Dth6qPnA7VEwpWM/h/A2t0D9bgm5PdR
 1FK5QnmIP+o/astYTvP8YpIlhRqHK97fZM0KYZX8SfLG2sbnYiBANj7YiwafNLQs
 7FR/J+LH2PRAU1sBrdrhvd/u4jvSTjcbutPEETGuTTMoPiJj4TGa0XyZNQR4/br+
 WTnbzLFJpRabBMVE2ELzbzSXnEeUZpSKe79G9LW5AFaGa9UpyooXVUn4PcPNXTia
 Q8zkMKusNFUlQc0pbcUxaS/g+UnC7koFn/XvPxp5k9tL7x4exq9hhOn/F+K9Ctuw
 jUVrg63+/VFYtMwbZpYd81k5I/rx+o7t8rkmUrk0Wz/gpE9CDgbjfhGyAFqmXAFU
 mGAGcFHbBaG6J5/XGqOGZR42yajlg9lwxpaj+taCtfbf46/48E6mTtLG2qQRDAqW
 QlGVS8H9t+vmwfO8oAt2tWwLTyZqt+6VmNTbwerKSEblEC+yB/lLO5AcWsnNwHVl
 9ZwSRTPw8ejkj/AdyoTSedMJHzcsAXT8PtIr2rSjuX1b8SubN8GmbsApyQmzV3G0
 n5DLTcKvpCPjtWjtpF44yjP1rfdDLFBIh+RYHF7iNPFo0uhQFOg=
 =+cTD
 -----END PGP SIGNATURE-----

Merge tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fix from Borislav Petkov:
 "Fix a case where objtool would mistakenly warn about instructions
  being unreachable"

* tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
2022-02-13 09:43:34 -08:00
Borislav Petkov
b008893b08 x86/ptrace: Always inline v8086_mode() for instrumentation
Instrumentation glue like KASAN causes the following warning:

  vmlinux.o: warning: objtool: mce_gather_info()+0x5f: call to v8086_mode.constprop.0() leaves .noinstr.text section

due to gcc creating a function call for that oneliner. Force-inline it
and even save some vmlinux bytes (.config is close to an allmodconfig):

     text    data     bss     dec     hex filename
  209431677       208257651       34411048        452100376       1af28118	vmlinux.before
  209431519       208257615       34411048        452100182       1af28056	vmlinux.after

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20220204083015.17317-3-bp@alien8.de
2022-02-12 22:07:13 +01:00
Borislav Petkov
f5c54f77b0 cpumask: Add a x86-specific cpumask_clear_cpu() helper
Add a x86-specific cpumask_clear_cpu() helper which will be used in
places where the explicit KASAN-instrumentation in the *_bit() helpers
is unwanted.

Also, always inline two more cpumask generic helpers.

allyesconfig:

     text    data     bss     dec     hex filename
  190553143       159425889       32076404        382055436       16c5b40c vmlinux.before
  190551812       159424945       32076404        382053161       16c5ab29 vmlinux.after

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20220204083015.17317-2-bp@alien8.de
2022-02-12 18:20:05 +01:00
Linus Torvalds
4a387c98b3 xen: branch for v5.17-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYgd+vgAKCRCAXGG7T9hj
 vrMKAQCGIOlp3iLisC9ZbzZF2SkeEPW602QF0LC3hexPKgtD/wD/dPeU33MtzkIC
 d53GcdcDUBv4ByYKz6/tGPiZhzQSEwI=
 =nm20
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.17a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - Two small cleanups

 - Another fix for addressing the EFI framebuffer above 4GB when running
   as Xen dom0

 - A patch to let Xen guests use reserved bits in MSI- and IO-APIC-
   registers for extended APIC-IDs the same way KVM guests are doing it
   already

* tag 'for-linus-5.17a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pci: Make use of the helper macro LIST_HEAD()
  xen/x2apic: Fix inconsistent indenting
  xen/x86: detect support for extended destination ID
  xen/x86: obtain full video frame buffer address for Dom0 also under EFI
2022-02-12 09:08:57 -08:00
Ard Biesheuvel
297565aa22 lib/xor: make xor prototypes more friendly to compiler vectorization
Modern compilers are perfectly capable of extracting parallelism from
the XOR routines, provided that the prototypes reflect the nature of the
input accurately, in particular, the fact that the input vectors are
expected not to overlap. This is not documented explicitly, but is
implied by the interchangeability of the various C routines, some of
which use temporary variables while others don't: this means that these
routines only behave identically for non-overlapping inputs.

So let's decorate these input vectors with the __restrict modifier,
which informs the compiler that there is no overlap. While at it, make
the input-only vectors pointer-to-const as well.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/563
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-02-11 20:39:39 +11:00
Jakub Kicinski
5b91c5cc0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-10 17:29:56 -08:00
David Matlack
cb00a70bd4 KVM: x86/mmu: Split huge pages mapped by the TDP MMU during KVM_CLEAR_DIRTY_LOG
When using KVM_DIRTY_LOG_INITIALLY_SET, huge pages are not
write-protected when dirty logging is enabled on the memslot. Instead
they are write-protected once userspace invokes KVM_CLEAR_DIRTY_LOG for
the first time and only for the specific sub-region being cleared.

Enhance KVM_CLEAR_DIRTY_LOG to also try to split huge pages prior to
write-protecting to avoid causing write-protection faults on vCPU
threads. This also allows userspace to smear the cost of huge page
splitting across multiple ioctls, rather than splitting the entire
memslot as is the case when initially-all-set is not used.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-17-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:43 -05:00
David Matlack
a3fe5dbda0 KVM: x86/mmu: Split huge pages mapped by the TDP MMU when dirty logging is enabled
When dirty logging is enabled without initially-all-set, try to split
all huge pages in the memslot down to 4KB pages so that vCPUs do not
have to take expensive write-protection faults to split huge pages.

Eager page splitting is best-effort only. This commit only adds the
support for the TDP MMU, and even there splitting may fail due to out
of memory conditions. Failures to split a huge page is fine from a
correctness standpoint because KVM will always follow up splitting by
write-protecting any remaining huge pages.

Eager page splitting moves the cost of splitting huge pages off of the
vCPU threads and onto the thread enabling dirty logging on the memslot.
This is useful because:

 1. Splitting on the vCPU thread interrupts vCPUs execution and is
    disruptive to customers whereas splitting on VM ioctl threads can
    run in parallel with vCPU execution.

 2. Splitting all huge pages at once is more efficient because it does
    not require performing VM-exit handling or walking the page table for
    every 4KiB page in the memslot, and greatly reduces the amount of
    contention on the mmu_lock.

For example, when running dirty_log_perf_test with 96 virtual CPUs, 1GiB
per vCPU, and 1GiB HugeTLB memory, the time it takes vCPUs to write to
all of their memory after dirty logging is enabled decreased by 95% from
2.94s to 0.14s.

Eager Page Splitting is over 100x more efficient than the current
implementation of splitting on fault under the read lock. For example,
taking the same workload as above, Eager Page Splitting reduced the CPU
required to split all huge pages from ~270 CPU-seconds ((2.94s - 0.14s)
* 96 vCPU threads) to only 1.55 CPU-seconds.

Eager page splitting does increase the amount of time it takes to enable
dirty logging since it has split all huge pages. For example, the time
it took to enable dirty logging in the 96GiB region of the
aforementioned test increased from 0.001s to 1.55s.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-16-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:42 -05:00
Sean Christopherson
03d004cd07 KVM: x86: Use more verbose names for mem encrypt kvm_x86_ops hooks
Use slightly more verbose names for the so called "memory encrypt",
a.k.a. "mem enc", kvm_x86_ops hooks to bridge the gap between the current
super short kvm_x86_ops names and SVM's more verbose, but non-conforming
names.  This is a step toward using kvm-x86-ops.h with KVM_X86_CVM_OP()
to fill svm_x86_ops.

Opportunistically rename mem_enc_op() to mem_enc_ioctl() to better
reflect its true nature, as it really is a full fledged ioctl() of its
own.  Ideally, the hook would be named confidential_vm_ioctl() or so, as
the ioctl() is a gateway to more than just memory encryption, and because
its underlying purpose to support Confidential VMs, which can be provided
without memory encryption, e.g. if the TCB of the guest includes the host
kernel but not host userspace, or by isolation in hardware without
encrypting memory.  But, diverging from KVM_MEMORY_ENCRYPT_OP even
further is undeseriable, and short of creating alises for all related
ioctl()s, which introduces a different flavor of divergence, KVM is stuck
with the nomenclature.

Defer renaming SVM's functions to a future commit as there are additional
changes needed to make SVM fully conforming and to match reality (looking
at you, svm_vm_copy_asid_from()).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-20-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:50:29 -05:00
Sean Christopherson
872e0c5308 KVM: x86: Move get_cs_db_l_bits() helper to SVM
Move kvm_get_cs_db_l_bits() to SVM and rename it appropriately so that
its svm_x86_ops entry can be filled via kvm-x86-ops, and to eliminate a
superfluous export from KVM x86.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:47:21 -05:00
Sean Christopherson
7ad02ef0da KVM: x86: Use static_call() for copy/move encryption context ioctls()
Define and use static_call()s for .vm_{copy,move}_enc_context_from(),
mostly so that the op is defined in kvm-x86-ops.h.  This will allow using
KVM_X86_OP in vendor code to wire up the implementation.  Any performance
gains eeked out by using static_call() is a happy bonus and not the
primary motiviation.

Opportunistically refactor the code to reduce indentation and keep line
lengths reasonable, and to be consistent when wrapping versus running
a bit over the 80 char soft limit.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:47:20 -05:00
Sean Christopherson
a0941a64a9 KVM: x86: Use static_call() for .vcpu_deliver_sipi_vector()
Define and use a static_call() for kvm_x86_ops.vcpu_deliver_sipi_vector(),
mostly so that the op is defined in kvm-x86-ops.h.  This will allow using
KVM_X86_OP in vendor code to wire up the implementation.  Any performance
gains eeked out by using static_call() is a happy bonus and not the
primary motiviation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:47:18 -05:00
Sean Christopherson
e27bc0440e KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names
Rename a variety of kvm_x86_op function pointers so that preferred name
for vendor implementations follows the pattern <vendor>_<function>, e.g.
rename .run() to .vcpu_run() to match {svm,vmx}_vcpu_run().  This will
allow vendor implementations to be wired up via the KVM_X86_OP macro.

In many cases, VMX and SVM "disagree" on the preferred name, though in
reality it's VMX and x86 that disagree as SVM blindly prepended _svm to
the kvm_x86_ops name.  Justification for using the VMX nomenclature:

  - set_{irq,nmi} => inject_{irq,nmi} because the helper is injecting an
    event that has already been "set" in e.g. the vIRR.  SVM's relevant
    VMCB field is even named event_inj, and KVM's stat is irq_injections.

  - prepare_guest_switch => prepare_switch_to_guest because the former is
    ambiguous, e.g. it could mean switching between multiple guests,
    switching from the guest to host, etc...

  - update_pi_irte => pi_update_irte to allow for matching match the rest
    of VMX's posted interrupt naming scheme, which is vmx_pi_<blah>().

  - start_assignment => pi_start_assignment to again follow VMX's posted
    interrupt naming scheme, and to provide context for what bit of code
    might care about an otherwise undescribed "assignment".

The "tlb_flush" => "flush_tlb" creates an inconsistency with respect to
Hyper-V's "tlb_remote_flush" hooks, but Hyper-V really is the one that's
wrong.  x86, VMX, and SVM all use flush_tlb, and even common KVM is on a
variant of the bandwagon with "kvm_flush_remote_tlbs", e.g. a more
appropriate name for the Hyper-V hooks would be flush_remote_tlbs.  Leave
that change for another time as the Hyper-V hooks always start as NULL,
i.e. the name doesn't matter for using kvm-x86-ops.h, and changing all
names requires an astounding amount of churn.

VMX and SVM function names are intentionally left as is to minimize the
diff.  Both VMX and SVM will need to rename even more functions in order
to fully utilize KVM_X86_OPS, i.e. an additional patch for each is
inevitable.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:47:17 -05:00
Jinrong Liang
62711e5a74 KVM: x86: Remove unused "vcpu" of kvm_scale_tsc()
The "struct kvm_vcpu *vcpu" parameter of kvm_scale_tsc() is not used,
so remove it. No functional change intended.

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Message-Id: <20220125095909.38122-18-cloudliang@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10 13:47:14 -05:00
Roger Pau Monne
e07e98da92 xen/x86: detect support for extended destination ID
Xen allows the usage of some previously reserved bits in the IO-APIC
RTE and the MSI address fields in order to store high bits for the
target APIC ID. Such feature is already implemented by QEMU/KVM and
HyperV, so in order to enable it just add the handler that checks for
it's presence.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220120152527.7524-3-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-02-10 11:10:17 +01:00
Jakub Kicinski
1127170d45 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-02-09

We've added 126 non-merge commits during the last 16 day(s) which contain
a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).

The main changes are:

1) Add custom BPF allocator for JITs that pack multiple programs into a huge
   page to reduce iTLB pressure, from Song Liu.

2) Add __user tagging support in vmlinux BTF and utilize it from BPF
   verifier when generating loads, from Yonghong Song.

3) Add per-socket fast path check guarding from cgroup/BPF overhead when
   used by only some sockets, from Pavel Begunkov.

4) Continued libbpf deprecation work of APIs/features and removal of their
   usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
   and various others.

5) Improve BPF instruction set documentation by adding byte swap
   instructions and cleaning up load/store section, from Christoph Hellwig.

6) Switch BPF preload infra to light skeleton and remove libbpf dependency
   from it, from Alexei Starovoitov.

7) Fix architecture-agnostic macros in libbpf for accessing syscall
   arguments from BPF progs for non-x86 architectures,
   from Ilya Leoshkevich.

8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
   of 16-bit field with anonymous zero padding, from Jakub Sitnicki.

9) Add new bpf_copy_from_user_task() helper to read memory from a different
   task than current. Add ability to create sleepable BPF iterator progs,
   from Kenny Yu.

10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
    utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.

11) Generate temporary netns names for BPF selftests to avoid naming
    collisions, from Hangbin Liu.

12) Implement bpf_core_types_are_compat() with limited recursion for
    in-kernel usage, from Matteo Croce.

13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
    to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.

14) Misc minor fixes to libbpf and selftests from various folks.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
  selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
  bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
  libbpf: Fix compilation warning due to mismatched printf format
  selftests/bpf: Test BPF_KPROBE_SYSCALL macro
  libbpf: Add BPF_KPROBE_SYSCALL macro
  libbpf: Fix accessing the first syscall argument on s390
  libbpf: Fix accessing the first syscall argument on arm64
  libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
  selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
  libbpf: Fix accessing syscall arguments on riscv
  libbpf: Fix riscv register names
  libbpf: Fix accessing syscall arguments on powerpc
  selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
  libbpf: Add PT_REGS_SYSCALL_REGS macro
  selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
  bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
  bpf: Fix leftover header->pages in sparc and powerpc code.
  libbpf: Fix signedness bug in btf_dump_array_data()
  selftests/bpf: Do not export subtest as standalone test
  bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
  ...
====================

Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09 18:40:56 -08:00
Maxim Levitsky
3915035282 KVM: x86: SVM: move avic definitions from AMD's spec to svm.h
asm/svm.h is the correct place for all values that are defined in
the SVM spec, and that includes AVIC.

Also add some values from the spec that were not defined before
and will be soon useful.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220207155447.840194-10-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-08 13:30:50 -05:00
Jim Mattson
fa31a4d669 x86/cpufeatures: Put the AMX macros in the word 18 block
These macros are for bits in CPUID.(EAX=7,ECX=0):EDX, not for bits in
CPUID(EAX=7,ECX=1):EAX. Put them with their brethren.

  [ bp: Sort word 18 bits properly, as caught by Like Xu
    <like.xu.linux@gmail.com> ]

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20220203194308.2469117-1-jmattson@google.com
2022-02-08 10:23:35 +01:00
Song Liu
0e06b40371 x86/alternative: Introduce text_poke_copy
This will be used by BPF jit compiler to dump JITed binary to a RX huge
page, and thus allow multiple BPF programs sharing the a huge (2MB) page.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20220204185742.271030-6-song@kernel.org
2022-02-07 18:13:01 -08:00
Linus Torvalds
90c9e950c0 xen: branch for v5.17-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYf5Y3AAKCRCAXGG7T9hj
 vmfRAP9dBr4LfnfLkY+If70xAVdcImOjK7NzTYCaWAFF1evmJgEAueEWUrV7hJQq
 HYiLXPWFsr5eqnzlcWwLPaBxFH+uIAY=
 =Jxjf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - documentation fixes related to Xen

 - enable x2apic mode when available when running as hardware
   virtualized guest under Xen

 - cleanup and fix a corner case of vcpu enumeration when running a
   paravirtualized Xen guest

* tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/Xen: streamline (and fix) PV CPU enumeration
  xen: update missing ioctl magic numers documentation
  Improve docs for IOCTL_GNTDEV_MAP_GRANT_REF
  xen: xenbus_dev.h: delete incorrect file name
  xen/x2apic: enable x2apic mode when supported for HVM
2022-02-05 10:40:17 -08:00
Paolo Bonzini
7e6a6b400d KVM/arm64 fixes for 5.17, take #2
- A couple of fixes when handling an exception while a SError has been
   delivered
 
 - Workaround for Cortex-A510's single-step[ erratum
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmH9LlcPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDLTcP/3Ry8CzvPubZquMyNdRUFvEg2EcfTa6vtIGW
 Fw7ap2hwPUaXUgJKDihMFIWj3Wf/wPmXw4t2Sr8R/yq8v9kWe+IG1isnT0yQhY3W
 kLXEqc8Mu4Rf8+jvlFHsp5mLENHIswpWAv/EY49ChgZkNmtkKpnPm1qnD89d8bNv
 tUwooDWidQ/7nXdM3z6zygSROJS24+OGTYTWzOQ1KgV3FGaXbqYiCleoPOpRR/Tc
 DQQWF/tVl8bZCqgkGKZCv3aXT0ZUPrQggARJGai78vP0l2sE/Kyaydgq5I7npZja
 2L2U4kDNoPYIVa8A1jvV3Ef3AqNFs6B7+jXWfYIgAcXjCYzDK3cZcxavf/Inq9F1
 3udVGJGSzH1KkGaihW3BVhsqGORRHKCdksJzWRgqf6vGyJhJw0u0D2u1rTWcT+jw
 Nm4KxShp0CX59HSLnVF5sR0Mct3jNNZ7UCCgH7q10wuBqYRfJT32hCo2ZrT7g9oD
 IQ+pa2dVYa3SaKZ4O6T/lSlbLOuuxtvmcEIfxYpPD6m10S5RrxOdsW3MCtiYM5HQ
 24oo2mk6NIu/va0XxhcW+NBMcYtLQD9JUGbkUkpcRy2mgilTi9b4YPp+muYM7plQ
 /S1gj2kGY8vjMg0H+wysjMJyl2huEwSRsZ/UfxCAgW+MYhHLDxhxAnDWc8EcwGgE
 tUzomowB
 =Mbx/
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.17, take #2

- A couple of fixes when handling an exception while a SError has been
  delivered

- Workaround for Cortex-A510's single-step[ erratum
2022-02-05 00:58:25 -05:00
Ricardo Neri
7b8f40b3de x86/cpu: Add definitions for the Intel Hardware Feedback Interface
Add the CPUID feature bit and the model-specific registers needed to
identify and configure the Intel Hardware Feedback Interface.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03 19:50:49 +01:00
Nick Desaulniers
bfb1a7c91f x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
In __WARN_FLAGS(), we had two asm statements (abbreviated):

  asm volatile("ud2");
  asm volatile(".pushsection .discard.reachable");

These pair of statements are used to trigger an exception, but then help
objtool understand that for warnings, control flow will be restored
immediately afterwards.

The problem is that volatile is not a compiler barrier. GCC explicitly
documents this:

> Note that the compiler can move even volatile asm instructions
> relative to other code, including across jump instructions.

Also, no clobbers are specified to prevent instructions from subsequent
statements from being scheduled by compiler before the second asm
statement. This can lead to instructions from subsequent statements
being emitted by the compiler before the second asm statement.

Providing a scheduling model such as via -march= options enables the
compiler to better schedule instructions with known latencies to hide
latencies from data hazards compared to inline asm statements in which
latencies are not estimated.

If an instruction gets scheduled by the compiler between the two asm
statements, then objtool will think that it is not reachable, producing
a warning.

To prevent instructions from being scheduled in between the two asm
statements, merge them.

Also remove an unnecessary unreachable() asm annotation from BUG() in
favor of __builtin_unreachable(). objtool is able to track that the ud2
from BUG() terminates control flow within the function.

Link: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile
Link: https://github.com/ClangBuiltLinux/linux/issues/1483
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220202205557.2260694-1-ndesaulniers@google.com
2022-02-02 14:41:04 -08:00
Kan Liang
ee28855a54 perf/x86/intel: Increase max number of the fixed counters
The new PEBS format 5 implies that the number of the fixed counters can
be up to 16. The current INTEL_PMC_MAX_FIXED is still 4. If the current
kernel runs on a future platform which has more than 4 fixed counters,
a warning will be triggered. The number of the fixed counters will be
clipped to 4. Users have to upgrade the kernel to access the new fixed
counters.

Add a new default constraint for PerfMon v5 and up, which can support
up to 16 fixed counters. The pseudo-encoding is applied for the fixed
counters 4 and later. The user can have generic support for the new
fixed counters on the future platfroms without updating the kernel.

Increase the INTEL_PMC_MAX_FIXED to 16.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1643750603-100733-3-git-send-email-kan.liang@linux.intel.com
2022-02-02 13:11:44 +01:00
Wei Wang
0144ba0c5b KVM: x86: use the KVM side max supported fixed counter
KVM vPMU doesn't support to emulate all the fixed counters that the
host PMU driver has supported, e.g. the fixed counter 3 used by
Topdown metrics hasn't been supported by KVM so far.

Rename MAX_FIXED_COUNTERS to KVM_PMC_MAX_FIXED to have a more
straightforward naming convention as INTEL_PMC_MAX_FIXED used by the
host PMU driver, and fix vPMU to use the KVM side KVM_PMC_MAX_FIXED
for the virtual fixed counter emulation, instead of the host side
INTEL_PMC_MAX_FIXED.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1643750603-100733-2-git-send-email-kan.liang@linux.intel.com
2022-02-02 13:11:44 +01:00
Kan Liang
2145e77fec perf/x86/intel: Enable PEBS format 5
The new PEBS Record Format 5 is similar to the PEBS Record Format 4. The
only difference is the layout of the Counter Reset fields of the PEBS
Config Buffer in the DS area. For the PEBS format 4, the Counter Reset
fields allocation is for 8 general-purpose counters followed by 4
fixed-function counters. For the PEBS format 5, the Counter Reset fields
allocation is for 32 general-purpose counters followed by 16
fixed-function counters.

Extend the MAX_PEBS_EVENTS to 32. Add MAX_PEBS_EVENTS_FMT4 for the
previous platform. Except for the DS auto-reload code, other places
already assume 32 counters. Only check the PEBS_FMT in the DS
auto-reload code.

Extend the MAX_FIXED_PEBS_EVENTS to 16, which only impacts the size of
struct debug_store and some local temporary variables. The size of
struct debug_store increases 288B, which is small and should be
acceptable.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1643750603-100733-1-git-send-email-kan.liang@linux.intel.com
2022-02-02 13:11:43 +01:00
Adrian Hunter
1fb85d06ad x86: Share definition of __is_canonical_address()
Reduce code duplication by moving canonical address code to a common header
file.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220131072453.2839535-3-adrian.hunter@intel.com
2022-02-02 13:11:42 +01:00
Tony Luck
ab28e94419 topology/sysfs: Add PPIN in sysfs under cpu topology
PPIN is the Protected Processor Identification Number.
This is used to identify the socket as a Field Replaceable Unit (FRU).

Existing code only displays this when reporting errors. But this makes
it inconvenient for large clusters to use it for its intended purpose
of inventory control.

Add ppin to /sys/devices/system/cpu/cpu*/topology to make what
is already available using RDMSR more easily accessible. Make
the file read only for root in case there are still people
concerned about making a unique system "serial number" available.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220131230111.2004669-6-tony.luck@intel.com
2022-02-01 16:36:42 +01:00
Tony Luck
822ccfade5 x86/cpu: Read/save PPIN MSR during initialization
Currently, the PPIN (Protected Processor Inventory Number) MSR is read
by every CPU that processes a machine check, CMCI, or just polls machine
check banks from a periodic timer. This is not a "fast" MSR, so this
adds to overhead of processing errors.

Add a new "ppin" field to the cpuinfo_x86 structure. Read and save the
PPIN during initialization. Use this copy in mce_setup() instead of
reading the MSR.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220131230111.2004669-4-tony.luck@intel.com
2022-02-01 16:29:26 +01:00
Sean Christopherson
57dfd7b53d KVM: x86: Move delivery of non-APICv interrupt into vendor code
Handle non-APICv interrupt delivery in vendor code, even though it means
VMX and SVM will temporarily have duplicate code.  SVM's AVIC has a race
condition that requires KVM to fall back to legacy interrupt injection
_after_ the interrupt has been logged in the vIRR, i.e. to fix the race,
SVM will need to open code the full flow anyways[*].  Refactor the code
so that the SVM bug without introducing other issues, e.g. SVM would
return "success" and thus invoke trace_kvm_apicv_accept_irq() even when
delivery through the AVIC failed, and to opportunistically prepare for
using KVM_X86_OP to fill each vendor's kvm_x86_ops struct, which will
rely on the vendor function matching the kvm_x86_op pointer name.

No functional change intended.

[*] https://lore.kernel.org/all/20211213104634.199141-4-mlevitsk@redhat.com

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-01 06:03:41 -05:00
Al Viro
0c9dceb9bb asm/user.h: killed unused macros
Some of them used to be used by libbfd for a.out coredump handling.
Seeing that
	* libbfd has their copies anyway
	* we don't export them into userland headers
	* we don't support a.out coredumps anymore
let's bury the definitions.  They never had in-kernel
users anyway...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-01-30 21:17:00 -05:00
Linus Torvalds
3cd7cd8a62 Two larger x86 series:
* Redo incorrect fix for SEV/SMAP erratum
 
 * Windows 11 Hyper-V workaround
 
 Other x86 changes:
 
 * Various x86 cleanups
 
 * Re-enable access_tracking_perf_test
 
 * Fix for #GP handling on SVM
 
 * Fix for CPUID leaf 0Dh in KVM_GET_SUPPORTED_CPUID
 
 * Fix for ICEBP in interrupt shadow
 
 * Avoid false-positive RCU splat
 
 * Enable Enlightened MSR-Bitmap support for real
 
 ARM:
 
 * Correctly update the shadow register on exception injection when
 running in nVHE mode
 
 * Correctly use the mm_ops indirection when performing cache invalidation
 from the page-table walker
 
 * Restrict the vgic-v3 workaround for SEIS to the two known broken
 implementations
 
 Generic code changes:
 
 * Dead code cleanup
 
 There will be another pull request for ARM fixes next week, but
 those patches need a bit more soak time.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHz5eIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNv4wgAopj0Zlutrrtw3KT4/XnmSdMPgN0j
 jQNzysSLTO5wGQCEogycjYXkGUDFu1Gdi+K91QAyjeKja20pIhPLeS2CBDRJyOc5
 73K7sxqz51JnQiVFzkTuA+qzn+lXaJ9LUXtdg8BnQMSKyt2AJOqE8uT10kcYOD5q
 mW4V3QUA0QpVKN0cYHv/G/zvBwQGGSLZetFbuAzwH2EDTpIi1aio5ZN1r0AoH18L
 2x5kYPpqmnoBvo2cB4b7SNmxv3ZPQ5K+wta0uwZ4pO+UuYiRd84RPr5lErywJC3w
 nci0eC0DoXrC6h+35UItqM8RqAGv6LADbDnr1RGojmfogSD0OtbX8y3hjw==
 =iKnI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Two larger x86 series:

   - Redo incorrect fix for SEV/SMAP erratum

   - Windows 11 Hyper-V workaround

  Other x86 changes:

   - Various x86 cleanups

   - Re-enable access_tracking_perf_test

   - Fix for #GP handling on SVM

   - Fix for CPUID leaf 0Dh in KVM_GET_SUPPORTED_CPUID

   - Fix for ICEBP in interrupt shadow

   - Avoid false-positive RCU splat

   - Enable Enlightened MSR-Bitmap support for real

  ARM:

   - Correctly update the shadow register on exception injection when
     running in nVHE mode

   - Correctly use the mm_ops indirection when performing cache
     invalidation from the page-table walker

   - Restrict the vgic-v3 workaround for SEIS to the two known broken
     implementations

  Generic code changes:

   - Dead code cleanup"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
  KVM: eventfd: Fix false positive RCU usage warning
  KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
  KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()
  KVM: nVMX: Rename vmcs_to_field_offset{,_table}
  KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
  KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
  selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP
  KVM: x86: add system attribute to retrieve full set of supported xsave states
  KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr
  selftests: kvm: move vm_xsave_req_perm call to amx_test
  KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time
  KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS
  KVM: x86: Keep MSR_IA32_XSS unchanged for INIT
  KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2}
  KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02
  KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest
  KVM: x86: Check .flags in kvm_cpuid_check_equal() too
  KVM: x86: Forcibly leave nested virt when SMM state is toggled
  KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments()
  KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real
  ...
2022-01-28 19:00:26 +02:00
Paolo Bonzini
dd6e631220 KVM: x86: add system attribute to retrieve full set of supported xsave states
Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded
VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that
have not been enabled.  Probing those, for example so that they can be
passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl.
The latter is in fact worse, even though that is what the rest of the
API uses, because it would require supported_xcr0 to be moved from the
KVM module to the kernel just for this use.  In addition, the value
would be nonsensical (or an error would have to be returned) until
the KVM module is loaded in.

Therefore, to limit the growth of system ioctls, add a /dev/kvm
variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86
with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-28 07:33:32 -05:00
Roger Pau Monne
c8980fcb21 xen/x2apic: enable x2apic mode when supported for HVM
There's no point in disabling x2APIC mode when running as a Xen HVM
guest, just enable it when available.

Remove some unneeded wrapping around the detection functions, and
simply provide a xen_x2apic_available helper that's a wrapper around
x2apic_supported.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20220121090146.13697-1-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-28 13:26:13 +01:00
Sean Christopherson
f7e570780e KVM: x86: Forcibly leave nested virt when SMM state is toggled
Forcibly leave nested virtualization operation if userspace toggles SMM
state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
vmxon=false and smm.vmxon=false, but all other nVMX state allocated.

Don't attempt to gracefully handle the transition as (a) most transitions
are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
sufficient information to handle all transitions, e.g. SVM wants access
to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
KVM_SET_NESTED_STATE during state restore as the latter disallows putting
the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
being post-VMXON in SMM if SMM is not active.

Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
in an architecturally impossible state.

  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Modules linked in:
  CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
  Call Trace:
   <TASK>
   kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
   kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
   kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
   kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
   kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
   kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
   kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
   kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
   __fput+0x286/0x9f0 fs/file_table.c:311
   task_work_run+0xdd/0x1a0 kernel/task_work.c:164
   exit_task_work include/linux/task_work.h:32 [inline]
   do_exit+0xb29/0x2a30 kernel/exit.c:806
   do_group_exit+0xd2/0x2f0 kernel/exit.c:935
   get_signal+0x4b0/0x28c0 kernel/signal.c:2862
   arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
   handle_signal_work kernel/entry/common.c:148 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
   exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
   __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
   syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
   do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

Cc: stable@vger.kernel.org
Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220125220358.2091737-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-26 12:15:03 -05:00
Sean Christopherson
4d31d9eff2 KVM: x86: Pass emulation type to can_emulate_instruction()
Pass the emulation type to kvm_x86_ops.can_emulate_insutrction() so that
a future commit can harden KVM's SEV support to WARN on emulation
scenarios that should never happen.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-26 12:15:00 -05:00
Quanfa Fu
d081a343dd KVM/X86: Make kvm_vcpu_reload_apic_access_page() static
Make kvm_vcpu_reload_apic_access_page() static
as it is no longer invoked directly by vmx
and it is also no longer exported.

No functional change intended.

Signed-off-by: Quanfa Fu <quanfafu@gmail.com>
Message-Id: <20211219091446.174584-1-quanfafu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-25 09:40:20 -05:00
Jan Beulich
2e1f8e55f9 x86/paravirt: Use %rip-relative addressing in hook calls
While using a plain (constant) address works, its use needlessly invokes
a SIB addressing mode, making every call site one byte larger than
necessary:

  ff 14 25 98 89 42 82    call   *0xffffffff82428998

Instead of using an "i" constraint with address-of operator and a 'c'
operand modifier, simply use an ordinary "m" constraint, which the
64-bit compiler will translate to %rip-relative addressing:

  ff 15 62 fb d2 00       call   *0xd2fb62(%rip)	# ffffffff82428998 <pv_ops+0x18>

This way the compiler is also told the truth about operand usage - the
memory location gets actually read, after all.

32-bit code generation is unaffected by the change.

  [ bp: Remove "we", add examples. ]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/b8192e8a-13ef-6ac6-6364-8ba58992cd1d@suse.com
2022-01-24 20:21:19 +01:00
Adrian Hunter
16273fa4f3 x86/insn: Add AVX512-FP16 instructions to the x86 instruction decoder
The x86 instruction decoder is used for both kernel instructions and
user space instructions (e.g. uprobes, perf tools Intel PT), so it is
good to update it with new instructions.

Add AVX512-FP16 instructions to x86 instruction decoder.

Note the EVEX map field is extended by 1 bit, and most instructions are in
map 5 and map 6.

Reference:
Intel AVX512-FP16 Architecture Specification
June 2021
Revision 1.0
Document Number: 347407-001US

Example using perf tools' x86 instruction decoder test:

  $ perf test -v "x86 instruction decoder" |& grep vfcmaddcph | head -2
  Decoded ok: 62 f6 6f 48 56 cb           vfcmaddcph %zmm3,%zmm2,%zmm1
  Decoded ok: 62 f6 6f 48 56 8c c8 78 56 34 12    vfcmaddcph 0x12345678(%eax,%ecx,8),%zmm2,%zmm1

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20211202095029.2165714-7-adrian.hunter@intel.com
2022-01-23 20:38:01 +01:00
Linus Torvalds
3689f9f8b0 bitmap patches for 5.17-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
 b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
 A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
 iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
 m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
 9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
 MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
 nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
 CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
 5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
 =RKW4
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - introduce for_each_set_bitrange()

 - use find_first_*_bit() instead of find_next_*_bit() where possible

 - unify for_each_bit() macros

* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
  vsprintf: rework bitmap_list_string
  lib: bitmap: add performance test for bitmap_print_to_pagebuf
  bitmap: unify find_bit operations
  mm/percpu: micro-optimize pcpu_is_populated()
  Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
  find: micro-optimize for_each_{set,clear}_bit()
  include/linux: move for_each_bit() macros from bitops.h to find.h
  cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
  tools: sync tools/bitmap with mother linux
  all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
  cpumask: use find_first_and_bit()
  lib: add find_first_and_bit()
  arch: remove GENERIC_FIND_FIRST_BIT entirely
  include: move find.h from asm_generic to linux
  bitops: move find_bit_*_le functions from le.h to find.h
  bitops: protect find_first_{,zero}_bit properly
2022-01-23 06:20:44 +02:00
Linus Torvalds
636b5284d8 Generic:
- selftest compilation fix for non-x86
 
 - KVM: avoid warning on s390 in mark_page_dirty
 
 x86:
 - fix page write-protection bug and improve comments
 
 - use binary search to lookup the PMU event filter, add test
 
 - enable_pmu module parameter support for Intel CPUs
 
 - switch blocked_vcpu_on_cpu_lock to raw spinlock
 
 - cleanups of blocked vCPU logic
 
 - partially allow KVM_SET_CPUID{,2} after KVM_RUN (5.16 regression)
 
 - various small fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHpmT0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOstggAi1VSpT43oGslQjXNDZacHEARoYQs
 b0XpoW7HXicGSGRMWspCmiAPdJyYTsioEACttAmXUMs7brAgHb9n/vzdlcLh1ymL
 rQw2YFQlfqqB1Ki1iRhNkWlH9xOECsu28WLng6ylrx51GuT/pzWRt+V3EGUFTxIT
 ldW9HgZg2oFJIaLjg2hQVR/8EbBf0QdsAD3KV3tyvhBlXPkyeLOMcGe9onfjZ/NE
 JQeW7FtKtP4SsIFt1KrJpDPjtiwFt3bRM0gfgGw7//clvtKIqt1LYXZiq4C3b7f5
 tfYiC8lO2vnOoYcfeYEmvybbSsoS/CgSliZB32qkwoVvRMIl82YmxtDD+Q==
 =/Mak
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
 "Generic:

   - selftest compilation fix for non-x86

   - KVM: avoid warning on s390 in mark_page_dirty

 x86:

   - fix page write-protection bug and improve comments

   - use binary search to lookup the PMU event filter, add test

   - enable_pmu module parameter support for Intel CPUs

   - switch blocked_vcpu_on_cpu_lock to raw spinlock

   - cleanups of blocked vCPU logic

   - partially allow KVM_SET_CPUID{,2} after KVM_RUN (5.16 regression)

   - various small fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits)
  docs: kvm: fix WARNINGs from api.rst
  selftests: kvm/x86: Fix the warning in lib/x86_64/processor.c
  selftests: kvm/x86: Fix the warning in pmu_event_filter_test.c
  kvm: selftests: Do not indent with spaces
  kvm: selftests: sync uapi/linux/kvm.h with Linux header
  selftests: kvm: add amx_test to .gitignore
  KVM: SVM: Nullify vcpu_(un)blocking() hooks if AVIC is disabled
  KVM: SVM: Move svm_hardware_setup() and its helpers below svm_x86_ops
  KVM: SVM: Drop AVIC's intermediate avic_set_running() helper
  KVM: VMX: Don't do full kick when handling posted interrupt wakeup
  KVM: VMX: Fold fallback path into triggering posted IRQ helper
  KVM: VMX: Pass desired vector instead of bool for triggering posted IRQ
  KVM: VMX: Don't do full kick when triggering posted interrupt "fails"
  KVM: SVM: Skip AVIC and IRTE updates when loading blocking vCPU
  KVM: SVM: Use kvm_vcpu_is_blocking() in AVIC load to handle preemption
  KVM: SVM: Remove unnecessary APICv/AVIC update in vCPU unblocking path
  KVM: SVM: Don't bother checking for "running" AVIC when kicking for IPIs
  KVM: SVM: Signal AVIC doorbell iff vCPU is in guest mode
  KVM: x86: Remove defunct pre_block/post_block kvm_x86_ops hooks
  KVM: x86: Unexport LAPIC's switch_to_{hv,sw}_timer() helpers
  ...
2022-01-22 09:40:01 +02:00
Sean Christopherson
c3e8abf0f3 KVM: x86: Remove defunct pre_block/post_block kvm_x86_ops hooks
Drop kvm_x86_ops' pre/post_block() now that all implementations are nops.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211208015236.1616697-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19 12:14:40 -05:00
Paolo Bonzini
4f5a884fc2 Merge branch 'kvm-pi-raw-spinlock' into HEAD
Bring in fix for VT-d posted interrupts before further changing the code in 5.17.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19 12:14:02 -05:00
Sean Christopherson
fc4fad79fc KVM: VMX: Reject KVM_RUN if emulation is required with pending exception
Reject KVM_RUN if emulation is required (because VMX is running without
unrestricted guest) and an exception is pending, as KVM doesn't support
emulating exceptions except when emulating real mode via vm86.  The vCPU
is hosed either way, but letting KVM_RUN proceed triggers a WARN due to
the impossible condition.  Alternatively, the WARN could be removed, but
then userspace and/or KVM bugs would result in the vCPU silently running
in a bad state, which isn't very friendly to users.

Originally, the bug was hit by syzkaller with a nested guest as that
doesn't require kvm_intel.unrestricted_guest=0.  That particular flavor
is likely fixed by commit cd0e615c49 ("KVM: nVMX: Synthesize
TRIPLE_FAULT for L2 if emulation is required"), but it's trivial to
trigger the WARN with a non-nested guest, and userspace can likely force
bad state via ioctls() for a nested guest as well.

Checking for the impossible condition needs to be deferred until KVM_RUN
because KVM can't force specific ordering between ioctls.  E.g. clearing
exception.pending in KVM_SET_SREGS doesn't prevent userspace from setting
it in KVM_SET_VCPU_EVENTS, and disallowing KVM_SET_VCPU_EVENTS with
emulation_required would prevent userspace from queuing an exception and
then stuffing sregs.  Note, if KVM were to try and detect/prevent the
condition prior to KVM_RUN, handle_invalid_guest_state() and/or
handle_emulation_failure() would need to be modified to clear the pending
exception prior to exiting to userspace.

 ------------[ cut here ]------------
 WARNING: CPU: 6 PID: 137812 at arch/x86/kvm/vmx/vmx.c:1623 vmx_queue_exception+0x14f/0x160 [kvm_intel]
 CPU: 6 PID: 137812 Comm: vmx_invalid_nes Not tainted 5.15.2-7cc36c3e14ae-pop #279
 Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
 RIP: 0010:vmx_queue_exception+0x14f/0x160 [kvm_intel]
 Code: <0f> 0b e9 fd fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
 RSP: 0018:ffffa45c83577d38 EFLAGS: 00010202
 RAX: 0000000000000003 RBX: 0000000080000006 RCX: 0000000000000006
 RDX: 0000000000000000 RSI: 0000000000010002 RDI: ffff9916af734000
 RBP: ffff9916af734000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000006
 R13: 0000000000000000 R14: ffff9916af734038 R15: 0000000000000000
 FS:  00007f1e1a47c740(0000) GS:ffff99188fb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f1e1a6a8008 CR3: 000000026f83b005 CR4: 00000000001726e0
 Call Trace:
  kvm_arch_vcpu_ioctl_run+0x13a2/0x1f20 [kvm]
  kvm_vcpu_ioctl+0x279/0x690 [kvm]
  __x64_sys_ioctl+0x83/0xb0
  do_syscall_64+0x3b/0xc0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-by: syzbot+82112403ace4cbd780d8@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211228232437.1875318-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19 12:12:25 -05:00
Linus Torvalds
79e06c4c49 RISCV:
- Use common KVM implementation of MMU memory caches
 
 - SBI v0.2 support for Guest
 
 - Initial KVM selftests support
 
 - Fix to avoid spurious virtual interrupts after clearing hideleg CSR
 
 - Update email address for Anup and Atish
 
 ARM:
 - Simplification of the 'vcpu first run' by integrating it into
   KVM's 'pid change' flow
 
 - Refactoring of the FP and SVE state tracking, also leading to
   a simpler state and less shared data between EL1 and EL2 in
   the nVHE case
 
 - Tidy up the header file usage for the nvhe hyp object
 
 - New HYP unsharing mechanism, finally allowing pages to be
   unmapped from the Stage-1 EL2 page-tables
 
 - Various pKVM cleanups around refcounting and sharing
 
 - A couple of vgic fixes for bugs that would trigger once
   the vcpu xarray rework is merged, but not sooner
 
 - Add minimal support for ARMv8.7's PMU extension
 
 - Rework kvm_pgtable initialisation ahead of the NV work
 
 - New selftest for IRQ injection
 
 - Teach selftests about the lack of default IPA space and
   page sizes
 
 - Expand sysreg selftest to deal with Pointer Authentication
 
 - The usual bunch of cleanups and doc update
 
 s390:
 - fix sigp sense/start/stop/inconsistency
 
 - cleanups
 
 x86:
 - Clean up some function prototypes more
 
 - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation
 
 - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery
 
 - completely remove potential TOC/TOU races in nested SVM consistency checks
 
 - update some PMCs on emulated instructions
 
 - Intel AMX support (joint work between Thomas and Intel)
 
 - large MMU cleanups
 
 - module parameter to disable PMU virtualization
 
 - cleanup register cache
 
 - first part of halt handling cleanups
 
 - Hyper-V enlightened MSR bitmap support for nested hypervisors
 
 Generic:
 - clean up Makefiles
 
 - introduce CONFIG_HAVE_KVM_DIRTY_RING
 
 - optimize memslot lookup using a tree
 
 - optimize vCPU array usage by converting to xarray
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHhxvsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPZkAf+Nz92UL/5nNGcdHtE4m7AToMmitE9
 bYkesf9BMQvAe5wjkABLuoHGi6ay4jabo4fiGzbdkiK7lO5YgfsWiMB3/MT5fl4E
 jRPzaVQabp3YZLM8UYCBmfUVuRj524S967SfSRe0AvYjDEH8y7klPf4+7sCsFT0/
 Px9Vf2KGuOlf0eM78yKg4rGaF0jS22eLgXm6FfNMY8/e29ZAo/jyUmqBY+Z2xxZG
 aWhceDtSheW1jwLHLj3nOlQJvHTn8LVGXBE/R8Gda3ZjrBV2rKaDi4Fh+HD+dz86
 2zVXwzQ7uck2CMW73GMoXMTWoKSHMyvlBOs1BdvBm4UsnGcXR+q8IFCeuQ==
 =s73m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "RISCV:

   - Use common KVM implementation of MMU memory caches

   - SBI v0.2 support for Guest

   - Initial KVM selftests support

   - Fix to avoid spurious virtual interrupts after clearing hideleg CSR

   - Update email address for Anup and Atish

  ARM:

   - Simplification of the 'vcpu first run' by integrating it into KVM's
     'pid change' flow

   - Refactoring of the FP and SVE state tracking, also leading to a
     simpler state and less shared data between EL1 and EL2 in the nVHE
     case

   - Tidy up the header file usage for the nvhe hyp object

   - New HYP unsharing mechanism, finally allowing pages to be unmapped
     from the Stage-1 EL2 page-tables

   - Various pKVM cleanups around refcounting and sharing

   - A couple of vgic fixes for bugs that would trigger once the vcpu
     xarray rework is merged, but not sooner

   - Add minimal support for ARMv8.7's PMU extension

   - Rework kvm_pgtable initialisation ahead of the NV work

   - New selftest for IRQ injection

   - Teach selftests about the lack of default IPA space and page sizes

   - Expand sysreg selftest to deal with Pointer Authentication

   - The usual bunch of cleanups and doc update

  s390:

   - fix sigp sense/start/stop/inconsistency

   - cleanups

  x86:

   - Clean up some function prototypes more

   - improved gfn_to_pfn_cache with proper invalidation, used by Xen
     emulation

   - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery

   - completely remove potential TOC/TOU races in nested SVM consistency
     checks

   - update some PMCs on emulated instructions

   - Intel AMX support (joint work between Thomas and Intel)

   - large MMU cleanups

   - module parameter to disable PMU virtualization

   - cleanup register cache

   - first part of halt handling cleanups

   - Hyper-V enlightened MSR bitmap support for nested hypervisors

  Generic:

   - clean up Makefiles

   - introduce CONFIG_HAVE_KVM_DIRTY_RING

   - optimize memslot lookup using a tree

   - optimize vCPU array usage by converting to xarray"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits)
  x86/fpu: Fix inline prefix warnings
  selftest: kvm: Add amx selftest
  selftest: kvm: Move struct kvm_x86_state to header
  selftest: kvm: Reorder vcpu_load_state steps for AMX
  kvm: x86: Disable interception for IA32_XFD on demand
  x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
  kvm: selftests: Add support for KVM_CAP_XSAVE2
  kvm: x86: Add support for getting/setting expanded xstate buffer
  x86/fpu: Add uabi_size to guest_fpu
  kvm: x86: Add CPUID support for Intel AMX
  kvm: x86: Add XCR0 support for Intel AMX
  kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
  kvm: x86: Emulate IA32_XFD_ERR for guest
  kvm: x86: Intercept #NM for saving IA32_XFD_ERR
  x86/fpu: Prepare xfd_err in struct fpu_guest
  kvm: x86: Add emulation for IA32_XFD
  x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
  kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2
  x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
  x86/fpu: Add guest support to xfd_enable_feature()
  ...
2022-01-16 16:15:14 +02:00
Linus Torvalds
cb3f09f9af hyperv-next for 5.17
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmHhw7oTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXrjSB/979LV4Dn1PMcFYsSdlFEMeHcjzJdw/
 kFnLPXMaPJyfg6QPuf83jxzw9uxw8fcePMdVq/FFBtmVV9fJMAv62B8jaGS1p58c
 WnAg+7zsTN+xEoJn+tskSSon8BNMWVrl41zP3K4Ged+5j8UEBk62GB8Orz1qkpwL
 fTh3/+xAvczJeD4zZb1dAm4WnmcQJ4vhg45p07jX6owvnwQAikMFl45aSW54I5o8
 vAxGzFgdsZ2NtExnRNKh3b3DozA8JUE89KckBSZnDtq4rH8Fyy6Wij56Hc6v6Cml
 SUohiNbHX7hsNwit/lxL8wuF97IiA0pQSABobEg3rxfTghTUep51LlaN
 =/m4A
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - More patches for Hyper-V isolation VM support (Tianyu Lan)

 - Bug fixes and clean-up patches from various people

* tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  scsi: storvsc: Fix storvsc_queuecommand() memory leak
  x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
  Drivers: hv: vmbus: Initialize request offers message for Isolation VM
  scsi: storvsc: Fix unsigned comparison to zero
  swiotlb: Add CONFIG_HAS_IOMEM check around swiotlb_mem_remap()
  x86/hyperv: Fix definition of hv_ghcb_pg variable
  Drivers: hv: Fix definition of hypercall input & output arg variables
  net: netvsc: Add Isolation VM support for netvsc driver
  scsi: storvsc: Add Isolation VM support for storvsc driver
  hyper-v: Enable swiotlb bounce buffer for Isolation VM
  x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
  swiotlb: Add swiotlb bounce buffer remap function for HV IVM
2022-01-16 15:53:00 +02:00
Linus Torvalds
d0a231f01e pci-v5.17-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmHgpugUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vz59g//eWRLb0j2Vgv84ZH4x1iv6MaBboQr
 2wScnfoN+MIoh+tuM4kRak15X4nB8rJhNZZCzesMUN6PeZvrkoPo4sz/xdzIrA/N
 qY3h8NZ3nC4yCvs/tGem0zZUcSCJsxUAD0eegyMSa142xGIOQTHBSJRflR9osKSo
 bnQlKTkugV8t4kD7NlQ5M3HzN3R+mjsII5JNzCqv2XlzAZG3D8DhPyIpZnRNAOmW
 KiHOVXvQOocfUlvSs5kBlhgR1HgJkGnruCrJ1iDCWQH1Zk0iuVgoZWgVda6Cs3Xv
 gcTJLB7VoSdNZKnct9aMNYPKziHkYc7clilPeDsJs5TlSv3kKERzLj6c/5ZAxFWN
 +RsH+zYHDXJSsL/w0twPnaF5WCuVYUyrs3UiSjUvShKl1T9k9J+Jo8zwUUZx8Xb0
 qXX8jRGMHolBGwPXm2fHEb4bwTUI8emPj29qK4L96KsQ3zKXWB8eGSosxUP52Tti
 RR2WZjkvwlREZCJp6jSEJYkhzoEaVAm8CjKpKUuneX9WcUOsMBSs9k7EXbUy7JeM
 hq5Keuqa8PZo/IK2DYYAchNnBJUDMsWJeduBW12qSmx3J+9victP2qOFu+9skP0a
 85xlO6Cx8beiQh+XnY7jyROvIFuxTnGKHgkq/89Ham/whEzdJ+GRIiYB218kLLCW
 ILdas3C2iiGz99I=
 =Vgg4
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:
   - Use pci_find_vsec_capability() instead of open-coding it (Andy
     Shevchenko)
   - Convert pci_dev_present() stub from macro to static inline to avoid
     'unused variable' errors (Hans de Goede)
   - Convert sysfs slot attributes from default_attrs to default_groups
     (Greg Kroah-Hartman)
   - Use DWORD accesses for LTR, L1 SS to avoid BayHub OZ711LV2 erratum
     (Rajat Jain)
   - Remove unnecessary initialization of static variables (Longji Guo)

  Resource management:
   - Always write Intel I210 ROM BAR on update to work around device
     defect (Bjorn Helgaas)

  PCIe native device hotplug:
   - Fix pciehp lockdep errors on Thunderbolt undock (Hans de Goede)
   - Fix infinite loop in pciehp IRQ handler on power fault (Lukas
     Wunner)

  Power management:
   - Convert amd64-agp, sis-agp, via-agp from legacy PCI power
     management to generic power management (Vaibhav Gupta)

  IOMMU:
   - Add function 1 DMA alias quirk for Marvell 88SE9125 SATA controller
     so it can work with an IOMMU (Yifeng Li)

  Error handling:
   - Add PCI_ERROR_RESPONSE and related definitions for signaling and
     checking for transaction errors on PCI (Naveen Naidu)
   - Fabricate PCI_ERROR_RESPONSE data (~0) in config read wrappers,
     instead of in host controller drivers, when transactions fail on
     PCI (Naveen Naidu)
   - Use PCI_POSSIBLE_ERROR() to check for possible failure of config
     reads (Naveen Naidu)

  Peer-to-peer DMA:
   - Add Logan Gunthorpe as P2PDMA maintainer (Bjorn Helgaas)

  ASPM:
   - Calculate link L0s and L1 exit latencies when needed instead of
     caching them (Saheed O. Bolarinwa)
   - Calculate device L0s and L1 acceptable exit latencies when needed
     instead of caching them (Saheed O. Bolarinwa)
   - Remove struct aspm_latency since it's no longer needed (Saheed O.
     Bolarinwa)

  APM X-Gene PCIe controller driver:
   - Fix IB window setup, which was broken by the fact that IB resources
     are now sorted in address order instead of DT dma-ranges order (Rob
     Herring)

  Apple PCIe controller driver:
   - Enable clock gating to save power (Hector Martin)
   - Fix REFCLK1 enable/poll logic (Hector Martin)

  Broadcom STB PCIe controller driver:
   - Declare bitmap correctly for use by bitmap interfaces (Christophe
     JAILLET)
   - Clean up computation of legacy and non-legacy MSI bitmasks (Florian
     Fainelli)
   - Update suspend/resume/remove error handling to warn about errors
     and not fail the operation (Jim Quinlan)
   - Correct the "pcie" and "msi" interrupt descriptions in DT binding
     (Jim Quinlan)
   - Add DT bindings for endpoint voltage regulators (Jim Quinlan)
   - Split brcm_pcie_setup() into two functions (Jim Quinlan)
   - Add mechanism for turning on voltage regulators for connected
     devices (Jim Quinlan)
   - Turn voltage regulators for connected devices on/off when bus is
     added or removed (Jim Quinlan)
   - When suspending, don't turn off voltage regulators for wakeup
     devices (Jim Quinlan)

  Freescale i.MX6 PCIe controller driver:
   - Add i.MX8MM support (Richard Zhu)

  Freescale Layerscape PCIe controller driver:
   - Use DWC common ops instead of layerscape-specific link-up functions
     (Hou Zhiqiang)

  Intel VMD host bridge driver:
   - Honor platform ACPI _OSC feature negotiation for Root Ports below
     VMD (Kai-Heng Feng)
   - Add support for Raptor Lake SKUs (Karthik L Gopalakrishnan)
   - Reset everything below VMD before enumerating to work around
     failure to enumerate NVMe devices when guest OS reboots (Nirmal
     Patel)

  Bridge emulation (used by Marvell Aardvark and MVEBU):
   - Make emulated ROM BAR read-only by default (Pali Rohár)
   - Make some emulated legacy PCI bits read-only for PCIe devices (Pali
     Rohár)
   - Update reserved bits in emulated PCIe Capability (Pali Rohár)
   - Allow drivers to emulate different PCIe Capability versions (Pali
     Rohár)
   - Set emulated Capabilities List bit for all PCIe devices, since they
     must have at least a PCIe Capability (Pali Rohár)

  Marvell Aardvark PCIe controller driver:
   - Add bridge emulation definitions for PCIe DEVCAP2, DEVCTL2,
     DEVSTA2, LNKCAP2, LNKCTL2, LNKSTA2, SLTCAP2, SLTCTL2, SLTSTA2 (Pali
     Rohár)
   - Add aardvark support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2
     registers (Pali Rohár)
   - Clear all MSIs at setup to avoid spurious interrupts (Pali Rohár)
   - Disable bus mastering when unbinding host controller driver (Pali
     Rohár)
   - Mask all interrupts when unbinding host controller driver (Pali
     Rohár)
   - Fix memory leak in host controller unbind (Pali Rohár)
   - Assert PERST# when unbinding host controller driver (Pali Rohár)
   - Disable link training when unbinding host controller driver (Pali
     Rohár)
   - Disable common PHY when unbinding host controller driver (Pali
     Rohár)
   - Fix resource type checking to check only IORESOURCE_MEM, not
     IORESOURCE_MEM_64, which is a flavor of IORESOURCE_MEM (Pali Rohár)

  Marvell MVEBU PCIe controller driver:
   - Implement pci_remap_iospace() for ARM so mvebu can use
     devm_pci_remap_iospace() instead of the previous ARM-specific
     pci_ioremap_io() interface (Pali Rohár)
   - Use the standard pci_host_probe() instead of the device-specific
     mvebu_pci_host_probe() (Pali Rohár)
   - Replace all uses of ARM-specific pci_ioremap_io() with the ARM
     implementation of the standard pci_remap_iospace() interface and
     remove pci_ioremap_io() (Pali Rohár)
   - Skip initializing invalid Root Ports (Pali Rohár)
   - Check for errors from pci_bridge_emul_init() (Pali Rohár)
   - Ignore any bridges at non-zero function numbers (Pali Rohár)
   - Return ~0 data for invalid config read size (Pali Rohár)
   - Disallow mapping interrupts on emulated bridges (Pali Rohár)
   - Clear Root Port Memory & I/O Space Enable and Bus Master Enable at
     initialization (Pali Rohár)
   - Make type bits in Root Port I/O Base register read-only (Pali
     Rohár)
   - Disable Root Port windows when base/limit set to invalid values
     (Pali Rohár)
   - Set controller to Root Complex mode (Pali Rohár)
   - Set Root Port Class Code to PCI Bridge (Pali Rohár)
   - Update emulated Root Port secondary bus numbers to better reflect
     the actual topology (Pali Rohár)
   - Add PCI_BRIDGE_CTL_BUS_RESET support to emulated Root Ports so
     pci_reset_secondary_bus() can reset connected devices (Pali Rohár)
   - Add PCI_EXP_DEVCTL Error Reporting Enable support to emulated Root
     Ports (Pali Rohár)
   - Add PCI_EXP_RTSTA PME Status bit support to emulated Root Ports
     (Pali Rohár)
   - Add DEVCAP2, DEVCTL2 and LNKCTL2 support to emulated Root Ports on
     Armada XP and newer devices (Pali Rohár)
   - Export mvebu-mbus.c symbols to allow pci-mvebu.c to be a module
     (Pali Rohár)
   - Add support for compiling as a module (Pali Rohár)

  MediaTek PCIe controller driver:
   - Assert PERST# for 100ms to allow power and clock to stabilize
     (qizhong cheng)

  MediaTek PCIe Gen3 controller driver:
   - Disable Mediatek DVFSRC voltage request since lack of DVFSRC to
     respond to the request causes failure to exit L1 PM Substate
     (Jianjun Wang)

  MediaTek MT7621 PCIe controller driver:
   - Declare mt7621_pci_ops static (Sergio Paracuellos)
   - Give pcibios_root_bridge_prepare() access to host bridge windows
     (Sergio Paracuellos)
   - Move MIPS I/O coherency unit setup from driver to
     pcibios_root_bridge_prepare() (Sergio Paracuellos)
   - Add missing MODULE_LICENSE() (Sergio Paracuellos)
   - Allow COMPILE_TEST for all arches (Sergio Paracuellos)

  Microsoft Hyper-V host bridge driver:
   - Add hv-internal interfaces to encapsulate arch IRQ dependencies
     (Sunil Muthuswamy)
   - Add arm64 Hyper-V vPCI support (Sunil Muthuswamy)

  Qualcomm PCIe controller driver:
   - Undo PM setup in qcom_pcie_probe() error handling path (Christophe
     JAILLET)
   - Use __be16 type to store return value from cpu_to_be16()
     (Manivannan Sadhasivam)
   - Constify static dw_pcie_ep_ops (Rikard Falkeborn)

  Renesas R-Car PCIe controller driver:
   - Fix aarch32 abort handler so it doesn't check the wrong bus clock
     before accessing the host controller (Marek Vasut)

  TI Keystone PCIe controller driver:
   - Add register offset for ti,syscon-pcie-id and ti,syscon-pcie-mode
     DT properties (Kishon Vijay Abraham I)

  MicroSemi Switchtec management driver:
   - Add Gen4 automotive device IDs (Kelvin Cao)
   - Declare state_names[] as static so it's not allocated and
     initialized for every call (Kelvin Cao)

  Host controller driver cleanups:
   - Use of_device_get_match_data(), not of_match_device(), when we only
     need the device data in altera, artpec6, cadence, designware-plat,
     dra7xx, keystone, kirin (Fan Fei)
   - Drop pointless of_device_get_match_data() cast in j721e (Bjorn
     Helgaas)
   - Drop redundant struct device * from j721e since struct cdns_pcie
     already has one (Bjorn Helgaas)
   - Rename driver structs to *_pcie in intel-gw, iproc, ls-gen4,
     mediatek-gen3, microchip, mt7621, rcar-gen2, tegra194, uniphier,
     xgene, xilinx, xilinx-cpm for consistency across drivers (Fan Fei)
   - Fix invalid address space conversions in hisi, spear13xx (Bjorn
     Helgaas)

  Miscellaneous:
   - Sort Intel Device IDs by value (Andy Shevchenko)
   - Change Capability offsets to hex to match spec (Baruch Siach)
   - Correct misspellings (Krzysztof Wilczyński)
   - Terminate statement with semicolon in pci_endpoint_test.c (Ming
     Wang)"

* tag 'pci-v5.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (151 commits)
  PCI: mt7621: Allow COMPILE_TEST for all arches
  PCI: mt7621: Add missing MODULE_LICENSE()
  PCI: mt7621: Move MIPS setup to pcibios_root_bridge_prepare()
  PCI: Let pcibios_root_bridge_prepare() access bridge->windows
  PCI: mt7621: Declare mt7621_pci_ops static
  PCI: brcmstb: Do not turn off WOL regulators on suspend
  PCI: brcmstb: Add control of subdevice voltage regulators
  PCI: brcmstb: Add mechanism to turn on subdev regulators
  PCI: brcmstb: Split brcm_pcie_setup() into two funcs
  dt-bindings: PCI: Add bindings for Brcmstb EP voltage regulators
  dt-bindings: PCI: Correct brcmstb interrupts, interrupt-map.
  PCI: brcmstb: Fix function return value handling
  PCI: brcmstb: Do not use __GENMASK
  PCI: brcmstb: Declare 'used' as bitmap, not unsigned long
  PCI: hv: Add arm64 Hyper-V vPCI support
  PCI: hv: Make the code arch neutral by adding arch specific interfaces
  PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors
  x86/PCI: Remove initialization of static variables to false
  PCI: Use DWORD accesses for LTR, L1 SS to avoid erratum
  misc: pci_endpoint_test: Terminate statement with semicolon
  ...
2022-01-16 08:08:11 +02:00
Linus Torvalds
f56caedaf9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
2022-01-15 20:37:06 +02:00
Yury Norov
47d8c15615 include: move find.h from asm_generic to linux
find_bit API and bitmap API are closely related, but inclusion paths
are different - include/asm-generic and include/linux, correspondingly.
In the past it made a lot of troubles due to circular dependencies
and/or undefined symbols. Fix this by moving find.h under include/linux.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2022-01-15 08:47:31 -08:00
Pasha Tatashin
d283d422c6 x86: mm: add x86_64 support for page table check
Add page table check hooks into routines that modify user page tables.

Link: https://lkml.kernel.org/r/20211221154650.1047963-5-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:28 +02:00
Arnd Bergmann
36090def7b mm: move tlb_flush_pending inline helpers to mm_inline.h
linux/mm_types.h should only define structure definitions, to make it
cheap to include elsewhere.  The atomic_t helper function definitions
are particularly large, so it's better to move the helpers using those
into the existing linux/mm_inline.h and only include that where needed.

As a follow-up, we may want to go through all the indirect includes in
mm_types.h and reduce them as much as possible.

Link: https://lkml.kernel.org/r/20211207125710.2503446-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Colin Cross <ccross@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:27 +02:00
Yang Zhong
c862dcd199 x86/fpu: Fix inline prefix warnings
Fix sparse warnings in xstate and remove inline prefix.

Fixes: 980fe2fddc ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions")
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220113180825.322333-1-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:48:38 -05:00
Kevin Tian
b5274b1b7b kvm: x86: Disable interception for IA32_XFD on demand
Always intercepting IA32_XFD causes non-negligible overhead when this
register is updated frequently in the guest.

Disable r/w emulation after intercepting the first WRMSR(IA32_XFD)
with a non-zero value.

Disable WRMSR emulation implies that IA32_XFD becomes out-of-sync
with the software states in fpstate and the per-cpu xfd cache. This
leads to two additional changes accordingly:

  - Call fpu_sync_guest_vmexit_xfd_state() after vm-exit to bring
    software states back in-sync with the MSR, before handle_exit_irqoff()
    is called.

  - Always trap #NM once write interception is disabled for IA32_XFD.
    The #NM exception is rare if the guest doesn't use dynamic
    features. Otherwise, there is at most one exception per guest
    task given a dynamic feature.

p.s. We have confirmed that SDM is being revised to say that
when setting IA32_XFD[18] the AMX register state is not guaranteed
to be preserved. This clarification avoids adding mess for a creative
guest which sets IA32_XFD[18]=1 before saving active AMX state to
its own storage.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-22-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:44:43 -05:00
Thomas Gleixner
5429cead01 x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
KVM can disable the write emulation for the XFD MSR when the vCPU's fpstate
is already correctly sized to reduce the overhead.

When write emulation is disabled the XFD MSR state after a VMEXIT is
unknown and therefore not in sync with the software states in fpstate and
the per CPU XFD cache.

Provide fpu_sync_guest_vmexit_xfd_state() which has to be invoked after a
VMEXIT before enabling interrupts when write emulation is disabled for the
XFD MSR.

It could be invoked unconditionally even when write emulation is enabled
for the price of a pointless MSR read.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-21-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:44:42 -05:00
Guang Zeng
be50b2065d kvm: x86: Add support for getting/setting expanded xstate buffer
With KVM_CAP_XSAVE, userspace uses a hardcoded 4KB buffer to get/set
xstate data from/to KVM. This doesn't work when dynamic xfeatures
(e.g. AMX) are exposed to the guest as they require a larger buffer
size.

Introduce a new capability (KVM_CAP_XSAVE2). Userspace VMM gets the
required xstate buffer size via KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).
KVM_SET_XSAVE is extended to work with both legacy and new capabilities
by doing properly-sized memdup_user() based on the guest fpu container.
KVM_GET_XSAVE is kept for backward-compatible reason. Instead,
KVM_GET_XSAVE2 is introduced under KVM_CAP_XSAVE2 as the preferred
interface for getting xstate buffer (4KB or larger size) from KVM
(Link: https://lkml.org/lkml/2021/12/15/510)

Also, update the api doc with the new KVM_GET_XSAVE2 ioctl.

Signed-off-by: Guang Zeng <guang.zeng@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-19-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:44:41 -05:00
Thomas Gleixner
c60427dd50 x86/fpu: Add uabi_size to guest_fpu
Userspace needs to inquire KVM about the buffer size to work
with the new KVM_SET_XSAVE and KVM_GET_XSAVE2. Add the size info
to guest_fpu for KVM to access.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-18-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:44:40 -05:00
Jing Liu
690a757d61 kvm: x86: Add CPUID support for Intel AMX
Extend CPUID emulation to support XFD, AMX_TILE, AMX_INT8 and
AMX_BF16. Adding those bits into kvm_cpu_caps finally activates all
previous logics in this series.

Hide XFD on 32bit host kernels. Otherwise it leads to a weird situation
where KVM tells userspace to migrate MSR_IA32_XFD and then rejects
attempts to read/write the MSR.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-17-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:44:40 -05:00
Jing Liu
1df4fd834e x86/fpu: Prepare xfd_err in struct fpu_guest
When XFD causes an instruction to generate #NM, IA32_XFD_ERR
contains information about which disabled state components are
being accessed. The #NM handler is expected to check this
information and then enable the state components by clearing
IA32_XFD for the faulting task (if having permission).

If the XFD_ERR value generated in guest is consumed/clobbered
by the host before the guest itself doing so, it may lead to
non-XFD-related #NM treated as XFD #NM in host (due to non-zero
value in XFD_ERR), or XFD-related #NM treated as non-XFD #NM in
guest (XFD_ERR cleared by the host #NM handler).

Introduce a new field in fpu_guest to save the guest xfd_err value.
KVM is expected to save guest xfd_err before interrupt is enabled
and restore it right before entering the guest (with interrupt
disabled).

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-12-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:44:08 -05:00
Kevin Tian
8eb9a48ac1 x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
Guest XFD can be updated either in the emulation path or in the
restore path.

Provide a wrapper to update guest_fpu::fpstate::xfd. If the guest
fpstate is currently in-use, also update the per-cpu xfd cache and
the actual MSR.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-10-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:43:22 -05:00
Sean Christopherson
0781d60f65 x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
Provide a wrapper for expanding the guest fpstate buffer according
to requested xfeatures. KVM wants to call this wrapper to manage
any dynamic xstate used by the guest.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220105123532.12586-8-yang.zhong@intel.com>
[Remove unnecessary 32-bit check. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:43:21 -05:00
Linus Torvalds
feb7a43de5 Rework of the MSI interrupt infrastructure:
Treewide cleanup and consolidation of MSI interrupt handling in
   preparation for further changes in this area which are necessary to:
 
   - address existing shortcomings in the VFIO area
 
   - support the upcoming Interrupt Message Store functionality which
     decouples the message store from the PCI config/MMIO space
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmHf+SETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobzGD/wNEFl5qQo5mNZ9thP6JSJFOItm7zMc
 2QgzCYOqNwAv4jL6Dqo+EHtbShYqDyWzKdKccgqNjmdIqgW8q7/fubN1OPzRsClV
 CZG997AsXDGXYlQcE3tXZjkeCWnWEE2AGLnygSkFV1K/r9ALAtFfTBJAWB+UD+Zc
 1P8Kxo0q0Jg+DQAMAA5bWfSSjo/Pmpr/1AFjY7+GA8BBeJJgWOyW7H1S+GYEWVOE
 RaQP81Sbd6x1JkopxkNqSJ/lbNJfnPJxi2higB56Y0OYn5CuSarYbZUM7oQ2V61t
 jN7pcEEvTpjLd6SJ93ry8WOcJVMTbccCklVfD0AfEwwGUGw2VM6fSyNrZfnrosUN
 tGBEO8eflBJzGTAwSkz1EhiGKna4o1NBDWpr0sH2iUiZC5G6V2hUDbM+0PQJhDa8
 bICwguZElcUUPOprwjS0HXhymnxghTmNHyoEP1yxGoKLTrwIqkH/9KGustWkcBmM
 hNtOCwQNqxcOHg/r3MN0KxttTASgoXgNnmFliAWA7XwseRpLWc95XPQFa5sptRhc
 EzwumEz17EW1iI5/NyZQcY+jcZ9BdgCqgZ9ECjZkyN4U+9G6iACUkxVaHUUs77jl
 a0ISSEHEvJisFOsOMYyFfeWkpIKGIKP/bpLOJEJ6kAdrUWFvlRGF3qlav3JldXQl
 ypFjPapDeB5guw==
 =vKzd
 -----END PGP SIGNATURE-----

Merge tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull MSI irq updates from Thomas Gleixner:
 "Rework of the MSI interrupt infrastructure.

  This is a treewide cleanup and consolidation of MSI interrupt handling
  in preparation for further changes in this area which are necessary
  to:

   - address existing shortcomings in the VFIO area

   - support the upcoming Interrupt Message Store functionality which
     decouples the message store from the PCI config/MMIO space"

* tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits)
  genirq/msi: Populate sysfs entry only once
  PCI/MSI: Unbreak pci_irq_get_affinity()
  genirq/msi: Convert storage to xarray
  genirq/msi: Simplify sysfs handling
  genirq/msi: Add abuse prevention comment to msi header
  genirq/msi: Mop up old interfaces
  genirq/msi: Convert to new functions
  genirq/msi: Make interrupt allocation less convoluted
  platform-msi: Simplify platform device MSI code
  platform-msi: Let core code handle MSI descriptors
  bus: fsl-mc-msi: Simplify MSI descriptor handling
  soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs()
  soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation
  NTB/msi: Convert to msi_on_each_desc()
  PCI: hv: Rework MSI handling
  powerpc/mpic_u3msi: Use msi_for_each-desc()
  powerpc/fsl_msi: Use msi_for_each_desc()
  powerpc/pasemi/msi: Convert to msi_on_each_dec()
  powerpc/cell/axon_msi: Convert to msi_on_each_desc()
  powerpc/4xx/hsta: Rework MSI handling
  ...
2022-01-13 09:05:29 -08:00
Linus Torvalds
64ad946152 - Get rid of all the .fixup sections because this generates
misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and
 LIVEPATCH as the backtrace misses the function which is being fixed up.
 
 - Add Straight Light Speculation mitigation support which uses a new
 compiler switch -mharden-sls= which sticks an INT3 after a RET or an
 indirect branch in order to block speculation after them. Reportedly,
 CPUs do speculate behind such insns.
 
 - The usual set of cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHfKA0ACgkQEsHwGGHe
 VUqLJg/2I2X2xXr5filJVaK+sQgmvDzk67DKnbxRBW2xcPF+B5sSW5yhe3G5UPW7
 SJVdhQ3gHcTiliGGlBf/VE7KXbqxFN0vO4/VFHZm78r43g7OrXTxz6WXXQRJ1n67
 U3YwRH3b6cqXZNFMs+X4bJt6qsGJM1kdTTZ2as4aERnaFr5AOAfQvfKbyhxLe/XA
 3SakfYISVKCBQ2RkTfpMpwmqlsatGFhTC5IrvuDQ83dDsM7O+Dx1J6Gu3fwjKmie
 iVzPOjCh+xTpZQp/SIZmt7MzoduZvpSym4YVyHvEnMiexQT4AmyaRthWqrhnEXY/
 qOvj8/XIqxmix8EaooGqRIK0Y2ZegxkPckNFzaeC3lsWohwMIGIhNXwHNEeuhNyH
 yvNGAW9Cq6NeDRgz5MRUXcimYw4P4oQKYLObS1WqFZhNMqm4sNtoEAYpai/lPYfs
 zUDckgXF2AoPOsSqy3hFAVaGovAgzfDaJVzkt0Lk4kzzjX2WQiNLhmiior460w+K
 0l2Iej58IajSp3MkWmFH368Jo8YfUVmkjbbpsmjsBppA08e1xamJB7RmswI/Ezj6
 s5re6UioCD+UYdjWx41kgbvYdvIkkZ2RLrktoZd/hqHrOLWEIiwEbyFO2nRFJIAh
 YjvPkB1p7iNuAeYcP1x9Ft9GNYVIsUlJ+hK86wtFCqy+abV+zQ==
 =R52z
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Borislav Petkov:

 - Get rid of all the .fixup sections because this generates
   misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and
   LIVEPATCH as the backtrace misses the function which is being fixed
   up.

 - Add Straight Line Speculation mitigation support which uses a new
   compiler switch -mharden-sls= which sticks an INT3 after a RET or an
   indirect branch in order to block speculation after them. Reportedly,
   CPUs do speculate behind such insns.

 - The usual set of cleanups and improvements

* tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  x86/entry_32: Fix segment exceptions
  objtool: Remove .fixup handling
  x86: Remove .fixup section
  x86/word-at-a-time: Remove .fixup usage
  x86/usercopy: Remove .fixup usage
  x86/usercopy_32: Simplify __copy_user_intel_nocache()
  x86/sgx: Remove .fixup usage
  x86/checksum_32: Remove .fixup usage
  x86/vmx: Remove .fixup usage
  x86/kvm: Remove .fixup usage
  x86/segment: Remove .fixup usage
  x86/fpu: Remove .fixup usage
  x86/xen: Remove .fixup usage
  x86/uaccess: Remove .fixup usage
  x86/futex: Remove .fixup usage
  x86/msr: Remove .fixup usage
  x86/extable: Extend extable functionality
  x86/entry_32: Remove .fixup usage
  x86/entry_64: Remove .fixup usage
  x86/copy_mc_64: Remove .fixup usage
  ...
2022-01-12 16:31:19 -08:00
Linus Torvalds
8e5b0adeea Peter Zijlstra says:
"Cleanup of the perf/kvm interaction."
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHdvbkACgkQEsHwGGHe
 VUrX7w/9FwKUm0WlGcQIAOSdWk85N2qAVH3brYcQHNpTCVe68TOqTCrxCDrGgyUq
 2XnCOim99MUlnsVU6QRZqF4yJ8S1tGrc0COJ/qR4SGntucu0oYuDe2aMVq+mWUD7
 /IThA0oMRfhki9WwAyUuyCrXzk4blZdlrXyYIRMJGl9xeGNy3cvUtU8f68Kiy22E
 OcmQ/o9Etsr38dueAMU1KYEmgSTvG47rS8nfyRUu3QpJHbyLmRXH32PQrm3tduxS
 Bw3gMAH5vqq1UDZJ8ZvsPsO0vFX7dtnKEwEKz4qdtRWk9gi8oLGHIwIXC+VtNqpf
 mCmX33Jw8uFz9h3JhE84J0j/CgsWHoU6MOs0MOch4Tb69/BfCjQnw1enImhejG8q
 YEIDjJf/vgRNaw9PYshiTHT+EJTe9inT3S4eK/ynLRDUEslAqyWZZm7bUE/XrEDi
 yRyGIxry/hNZVvRkXT9QBw32fpgnIH2NAMPLEjJSGCRxT89Tfqz0aRDfacCuHTTh
 P8pAeiDuy/6RkDlQckOZJWOFFh2IHsykX2l3IJcHqVRqt4ob9b+SZB5qoH/Mv9qb
 MSAqdFUupYZFC+6XuPAeX5/Mo+wSkP+pYYSbWNxjUa0yNiYecOjE7/8T2SB2y6Mx
 lk2L0ypsZUYSmpHSfvOdPmf6ucj19/5B4+VCX6PQfcNJTnvvhTE=
 =tU5G
 -----END PGP SIGNATURE-----

Merge tag 'perf_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Borislav Petkov:
 "Cleanup of the perf/kvm interaction."

* tag 'perf_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Drop guest callback (un)register stubs
  KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c
  KVM: arm64: Hide kvm_arm_pmu_available behind CONFIG_HW_PERF_EVENTS=y
  KVM: arm64: Convert to the generic perf callbacks
  KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c
  KVM: Move x86's perf guest info callbacks to generic KVM
  KVM: x86: More precisely identify NMI from guest when handling PMI
  KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable
  perf/core: Use static_call to optimize perf_guest_info_callbacks
  perf: Force architectures to opt-in to guest callbacks
  perf: Add wrappers for invoking guest callbacks
  perf/core: Rework guest callbacks to prepare for static_call support
  perf: Drop dead and useless guest "support" from arm, csky, nds32 and riscv
  perf: Stop pretending that perf can handle multiple guest callbacks
  KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest
  KVM: x86: Register perf callbacks after calling vendor's hardware_setup()
  perf: Protect perf_guest_cbs with RCU
2022-01-12 16:26:58 -08:00
Peter Zijlstra
9cdbeec409 x86/entry_32: Fix segment exceptions
The LKP robot reported that commit in Fixes: caused a failure. Turns out
the ldt_gdt_32 selftest turns into an infinite loop trying to clear the
segment.

As discovered by Sean, what happens is that PARANOID_EXIT_TO_KERNEL_MODE
in the handle_exception_return path overwrites the entry stack data with
the task stack data, restoring the "bad" segment value.

Instead of having the exception retry the instruction, have it emulate
the full instruction. Replace EX_TYPE_POP_ZERO with EX_TYPE_POP_REG
which will do the equivalent of: POP %reg; MOV $imm, %reg.

In order to encode the segment registers, add them as registers 8-11 for
32-bit.

By setting regs->[defg]s the (nested) RESTORE_REGS will pop this value
at the end of the exception handler and by increasing regs->sp, it will
have skipped the stack slot.

This was debugged by Sean Christopherson <seanjc@google.com>.

 [ bp: Add EX_REG_GS too. ]

Fixes: aa93e2ad74 ("x86/entry_32: Remove .fixup usage")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/Yd1l0gInc4zRcnt/@hirez.programming.kicks-ass.net
2022-01-12 16:38:25 +01:00
Sunil Muthuswamy
831c1ae725 PCI: hv: Make the code arch neutral by adding arch specific interfaces
Encapsulate arch dependencies in Hyper-V vPCI through a set of
arch-dependent interfaces. Adding these arch specific interfaces will
allow for an implementation for other architectures, such as arm64.

There are no functional changes expected from this patch.

Link: https://lore.kernel.org/r/1641411156-31705-2-git-send-email-sunilmut@linux.microsoft.com
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
2022-01-12 08:21:54 -06:00
Linus Torvalds
f12fc75ef7 EFI updates for v5.17
- support taking the measurement of the initrd when loaded via the
   LoadFile2 protocol
 - kobject API cleanup from Greg
 - some header file whitespace fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmHcnSAACgkQw08iOZLZ
 jyQgagv/b41O2jok20vO9vXwWqxsjru9aOsFKMeXiITudObWaXvRmvbEeUhZIRc3
 FefCemUEGUkQz1Alf23t8daJRezL3kE+Lt1R525o384INxHPiieZ2Vu+Kp1zdu6C
 4cAuJu/iLbNbn1glPOAkMRRXjfVrlIaS1pC431jYk0WncEKpoE467ljP0k8jQuTq
 X+S4W8dyxOObsGuRJJpmX9zsFKJ+R8dh0lc/KoyNcP5LSXIK8xrXwBitM0CF3YKD
 sA8dYCrWiiA8KmY851fFQknyPBpfN+30m5DZ52uGWuuqAZ4CwN3ODSfEInhyhMqf
 PhY/mXRTIPe5jAmKlHdIAe11ACB/fDtRxyMla0u1yjQgjY7CbjTlEBLNUtU+N2vs
 zbemDACEL2S9NfUuF407B9gztx4j7LmaSui3qtBaGO4fn9cbsmnwM2M2j8bCLAt0
 WOQSir/1gemyhFAKe4yDPjMjwpC+gMX8nYY2kmvm354Oseqt9l91VrnNDEUAROAE
 zBUbds2U
 =BY66
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:

 - support taking the measurement of the initrd when loaded via the
   LoadFile2 protocol

 - kobject API cleanup from Greg

 - some header file whitespace fixes

* tag 'efi-next-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: use default_groups in kobj_type
  efi/libstub: measure loaded initrd info into the TPM
  efi/libstub: consolidate initrd handling across architectures
  efi/libstub: x86/mixed: increase supported argument count
  efi/libstub: add prototype of efi_tcg2_protocol::hash_log_extend_event()
  include/linux/efi.h: Remove unneeded whitespaces before tabs
2022-01-11 15:36:30 -08:00
Linus Torvalds
1be5bdf8cd KCSAN updates for v5.17
This series provides KCSAN fixes and also the ability to take memory
 barriers into account for weakly-ordered systems.  This last can increase
 the probability of detecting certain types of data races.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmHbuRwTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jKDPEACWuzYnd/u/02AHyRd3PIF3Px9uFKlK
 TFwaXX95oYSFCXcrmO42YtDUlZm4QcbwNb85KMCu1DvckRtIsNw0rkBU7BGyqv3Z
 ZoJEfMNpmC0x9+IFBOeseBHySPVT0x7GmYus05MSh0OLfkbCfyImmxRzgoKJGL+A
 ADF9EQb4z2feWjmVEoN8uRaarCAD4f77rSXiX6oTCNDuKrHarqMld/TmoXFrJbu2
 QtfwHeyvraKBnZdUoYfVbGVenyKb1vMv4bUlvrOcuJEgIi/J/th4FupR3XCGYulI
 aWJTl2TQTGnMoE8VnFHgI27I841w3k5PVL+Y1hr/S4uN1/rIoQQuBzCtlnFeCksa
 BiBXsHIchN8N0Dwh8zD8NMd2uxV4t3fmpxXTDAwaOm7vs5hA8AJ0XNu6Sz94Lyjv
 wk2CxX41WWUNJVo3gh6SrS4mL6lC8+VvHF1hbIap++jrevj58gj1jAR1fdx4ANlT
 e2qA00EeoMngEogDNZH42/Fxs3H9zxrBta2ZbkPkwzIqTHH+4pIQDCy2xO3T3oxc
 twdGPYpjYdkf79EGsG4I4R/VA/IfcS09VIWTce8xSDeSnqkgFhcG37r1orJe8hTB
 tH+ODkNOsz5HaEoa8OoAL4ko2l0fL99p2AtMPpuQfHjRj7aorF+dJIrqCizASxwx
 37PjQgOmHeDHgQ==
 =Q5fg
 -----END PGP SIGNATURE-----

Merge tag 'kcsan.2022.01.09a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull KCSAN updates from Paul McKenney:
 "This provides KCSAN fixes and also the ability to take memory barriers
  into account for weakly-ordered systems. This last can increase the
  probability of detecting certain types of data races"

* tag 'kcsan.2022.01.09a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (29 commits)
  kcsan: Only test clear_bit_unlock_is_negative_byte if arch defines it
  kcsan: Avoid nested contexts reading inconsistent reorder_access
  kcsan: Turn barrier instrumentation into macros
  kcsan: Make barrier tests compatible with lockdep
  kcsan: Support WEAK_MEMORY with Clang where no objtool support exists
  compiler_attributes.h: Add __disable_sanitizer_instrumentation
  objtool, kcsan: Remove memory barrier instrumentation from noinstr
  objtool, kcsan: Add memory barrier instrumentation to whitelist
  sched, kcsan: Enable memory barrier instrumentation
  mm, kcsan: Enable barrier instrumentation
  x86/qspinlock, kcsan: Instrument barrier of pv_queued_spin_unlock()
  x86/barriers, kcsan: Use generic instrumentation for non-smp barriers
  asm-generic/bitops, kcsan: Add instrumentation for barriers
  locking/atomics, kcsan: Add instrumentation for barriers
  locking/barriers, kcsan: Support generic instrumentation
  locking/barriers, kcsan: Add instrumentation for barriers
  kcsan: selftest: Add test case to check memory barrier instrumentation
  kcsan: Ignore GCC 11+ warnings about TSan runtime support
  kcsan: test: Add test cases for memory barrier instrumentation
  kcsan: test: Match reordered or normal accesses
  ...
2022-01-11 09:51:26 -08:00
Linus Torvalds
b35b6d4d71 Power management updates for 5.17-rc1
- Add new P-state driver for AMD processors (Huang Rui).
 
  - Fix initialization of min and max frequency QoS requests in the
    cpufreq core (Rafael Wysocki).
 
  - Fix EPP handling on Alder Lake in intel_pstate (Srinivas Pandruvada).
 
  - Make intel_pstate update cpuinfo.max_freq when notified of HWP
    capabilities changes and drop a redundant function call from that
    driver (Rafael Wysocki).
 
  - Improve IRQ support in the Qcom cpufreq driver (Ard Biesheuvel,
    Stephen Boyd, Vladimir Zapolskiy).
 
  - Fix double devm_remap() in the Mediatek cpufreq driver (Hector Yuan).
 
  - Introduce thermal pressure helpers for cpufreq CPU cooling (Lukasz
    Luba).
 
  - Make cpufreq use default_groups in kobj_type (Greg Kroah-Hartman).
 
  - Make cpuidle use default_groups in kobj_type (Greg Kroah-Hartman).
 
  - Fix two comments in cpuidle code (Jason Wang, Yang Li).
 
  - Allow model-specific normal EPB value to be used in the intel_epb
    sysfs attribute handling code (Srinivas Pandruvada).
 
  - Simplify locking in pm_runtime_put_suppliers() (Rafael Wysocki).
 
  - Add safety net to supplier device release in the runtime PM core
    code (Rafael Wysocki).
 
  - Capture device status before disabling runtime PM for it (Rafael
    Wysocki).
 
  - Add new macros for declaring PM operations to allow drivers to
    avoid guarding them with CONFIG_PM #ifdefs or __maybe_unused and
    update some drivers to use these macros (Paul Cercueil).
 
  - Allow ACPI hardware signature to be honoured during restore from
    hibernation (David Woodhouse).
 
  - Update outdated operating performance points (OPP) documentation
    (Tang Yizhou).
 
  - Reduce log severity for informative message regarding frequency
    transition failures in devfreq (Tzung-Bi Shih).
 
  - Add DRAM frequency controller devfreq driver for Allwinner sunXi
    SoCs (Samuel Holland).
 
  - Add missing COMMON_CLK dependency to sun8i devfreq driver (Arnd
    Bergmann).
 
  - Add support for new layout of Psys PowerLimit Register on SPR to
    the Intel RAPL power capping driver (Zhang Rui).
 
  - Fix typo in a comment in idle_inject.c (Jason Wang).
 
  - Remove unused function definition from the DTPM (Dynamit Thermal
    Power Management) power capping framework (Daniel Lezcano).
 
  - Reduce DTPM trace verbosity (Daniel Lezcano).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmHcgkgSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxs34P/3kFhRk7qrwEekx6F11im6caLKT9+Qap
 PuGVqfTbK7TupVQDVGFBEjTjgKY7Ph7Fcr4bqn6wvNOp96cjXyOSk/c1fcpS3Bpr
 b1PYsFsb9diNKE462sGGYClyCT3X5qQqtpxzOl3g4I1PWKTC1mKFm4Jm2m6S6cFq
 DKhsgYKFzQSZNb1wJM4JjHS9c3BRygqp4nfEAmifu5b9tLZf7stWnFHhbGq63M9m
 OwHOrEEnzhf4pOXGZTvIXeczgE6IcuDdlGkIg7XMHnmKSNvj1HqhEgi2lfSRb98z
 5eI4S6JymCJGVK+gr8iVCq1iJ+LKqV3YPXRqvI35/+NqIKYxMt2ZivQQf5s3aQLe
 26gUulD3O6Pz5tMlwcDElD4/tcClfg35PCD/VzpRR8TAo8vLBb63kZ5v6+HM34ZJ
 6QbLTNZJTnGmEqxMccUxP+HhZz8ssqpLAC+R2sE5yXbNpIZq8CbPiGb65RGiX3SG
 CmRKqH/xQVNKBYP0ChjmUyhKcBxOnx1Xu8AhsN7gRAy0aht7j7OdjTnJuGiX6gu3
 Q5WxvVvkekyfhuFQ5TST9y/fzvMJWzeaA6GhVIr6RoBmshNQGTb0H4HXARxS3Ah5
 qjd7ao7BFLa898FCHaHIpmFWp0wF5iljwCJQVP3I2qUpPvDJxEtsxc4CF/AZzyNR
 VudoFqLoIV5C
 =1egI
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "The most signigicant change here is the addition of a new cpufreq
  'P-state' driver for AMD processors as a better replacement for the
  venerable acpi-cpufreq driver.

  There are also other cpufreq updates (in the core, intel_pstate, ARM
  drivers), PM core updates (mostly related to adding new macros for
  declaring PM operations which should make the lives of driver
  developers somewhat easier), and a bunch of assorted fixes and
  cleanups.

  Summary:

   - Add new P-state driver for AMD processors (Huang Rui).

   - Fix initialization of min and max frequency QoS requests in the
     cpufreq core (Rafael Wysocki).

   - Fix EPP handling on Alder Lake in intel_pstate (Srinivas
     Pandruvada).

   - Make intel_pstate update cpuinfo.max_freq when notified of HWP
     capabilities changes and drop a redundant function call from that
     driver (Rafael Wysocki).

   - Improve IRQ support in the Qcom cpufreq driver (Ard Biesheuvel,
     Stephen Boyd, Vladimir Zapolskiy).

   - Fix double devm_remap() in the Mediatek cpufreq driver (Hector
     Yuan).

   - Introduce thermal pressure helpers for cpufreq CPU cooling (Lukasz
     Luba).

   - Make cpufreq use default_groups in kobj_type (Greg Kroah-Hartman).

   - Make cpuidle use default_groups in kobj_type (Greg Kroah-Hartman).

   - Fix two comments in cpuidle code (Jason Wang, Yang Li).

   - Allow model-specific normal EPB value to be used in the intel_epb
     sysfs attribute handling code (Srinivas Pandruvada).

   - Simplify locking in pm_runtime_put_suppliers() (Rafael Wysocki).

   - Add safety net to supplier device release in the runtime PM core
     code (Rafael Wysocki).

   - Capture device status before disabling runtime PM for it (Rafael
     Wysocki).

   - Add new macros for declaring PM operations to allow drivers to
     avoid guarding them with CONFIG_PM #ifdefs or __maybe_unused and
     update some drivers to use these macros (Paul Cercueil).

   - Allow ACPI hardware signature to be honoured during restore from
     hibernation (David Woodhouse).

   - Update outdated operating performance points (OPP) documentation
     (Tang Yizhou).

   - Reduce log severity for informative message regarding frequency
     transition failures in devfreq (Tzung-Bi Shih).

   - Add DRAM frequency controller devfreq driver for Allwinner sunXi
     SoCs (Samuel Holland).

   - Add missing COMMON_CLK dependency to sun8i devfreq driver (Arnd
     Bergmann).

   - Add support for new layout of Psys PowerLimit Register on SPR to
     the Intel RAPL power capping driver (Zhang Rui).

   - Fix typo in a comment in idle_inject.c (Jason Wang).

   - Remove unused function definition from the DTPM (Dynamit Thermal
     Power Management) power capping framework (Daniel Lezcano).

   - Reduce DTPM trace verbosity (Daniel Lezcano)"

* tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
  x86, sched: Fix undefined reference to init_freq_invariance_cppc() build error
  cpufreq: amd-pstate: Fix Kconfig dependencies for AMD P-State
  cpufreq: amd-pstate: Fix struct amd_cpudata kernel-doc comment
  cpuidle: use default_groups in kobj_type
  x86: intel_epb: Allow model specific normal EPB value
  MAINTAINERS: Add AMD P-State driver maintainer entry
  Documentation: amd-pstate: Add AMD P-State driver introduction
  cpufreq: amd-pstate: Add AMD P-State performance attributes
  cpufreq: amd-pstate: Add AMD P-State frequencies attributes
  cpufreq: amd-pstate: Add boost mode support for AMD P-State
  cpufreq: amd-pstate: Add trace for AMD P-State module
  cpufreq: amd-pstate: Introduce the support for the processors with shared memory solution
  cpufreq: amd-pstate: Add fast switch function for AMD P-State
  cpufreq: amd-pstate: Introduce a new AMD P-State driver to support future processors
  ACPI: CPPC: Add CPPC enable register function
  ACPI: CPPC: Check present CPUs for determining _CPC is valid
  ACPI: CPPC: Implement support for SystemIO registers
  x86/msr: Add AMD CPPC MSR definitions
  x86/cpufeatures: Add AMD Collaborative Processor Performance Control feature flag
  cpufreq: use default_groups in kobj_type
  ...
2022-01-10 20:34:00 -08:00
Linus Torvalds
7e740ae635 - First part of a series to move the AMD address translation code from
arch/x86/ to amd64_edac as that is its only user anyway
 
 - Some MCE error injection improvements to the AMD side
 
 - Reorganization of the #MC handler code and the facilities it calls to
 make it noinstr-safe
 
 - Add support for new AMD MCA bank types and non-uniform banks layout
 
 - The usual set of cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcGZ4ACgkQEsHwGGHe
 VUr6Zw//WBvNvfV/akQGsvVo94G0DaF+buYB+Tl1p0goMd7QfKA5iHxjB1alEJC2
 dTchIr7pjiiE3nr4svuWLLQZamx8kMwQNqipioBHXg3YThj0wD4PbUOC9TlIceBR
 3yxVbvwlD7Y7sb2PII6IMlagzTiIeW0/ps29DHFr5vqDBvEanNdAHoV/h2vQi+76
 Ma96psIxzTMSk11yGB6l9k66EASCdDGBU7sODjup7wuQmuRaQ/1oJAWY0wIJvJez
 frjpaz/YKmlTwTf9bxoJbky2FkeBsD4yXXUGwjDgMq0EyUUaeSbvaQkm8gSHX9Yr
 VDDv1WvT6QIw6x7Wc4skS8lWmZghNBbAHOoNS31BPJ2IDmFWkF5Q2bNEuHrtU4EC
 0mkNeyN6x48L/F8j/1aE/tm+SjiGexZX4zhi6MNWReTV140I1zqQq/r7CCu5+MEa
 PAB1YH/96k2dMPT6mbFrRIFJmkDuBuZOAkuwYWEjO/XjPl2SGBGj1jKolWW3qjRR
 Po7vBJnDt7wgigWFh6+R4rJv+fh87XfB7B2wEOt4Yn37jUkK6dNRIy0zFmDaC1J2
 bHgsJbWC+Sgs1G57gnYABJYzLj7RRdDyCu1/UUVyBBP7/WfZJw0kjABE7p3AaYTd
 15JV1L0c/Ypuv05LJf40LkyF2F5w2fnP5QM2Rr8U4xW/GumEyWs=
 =8Hu7
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:
 "A relatively big amount of movements in RAS-land this time around:

   - First part of a series to move the AMD address translation code
     from arch/x86/ to amd64_edac as that is its only user anyway

   - Some MCE error injection improvements to the AMD side

   - Reorganization of the #MC handler code and the facilities it calls
     to make it noinstr-safe

   - Add support for new AMD MCA bank types and non-uniform banks layout

   - The usual set of cleanups and fixes"

* tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/mce: Reduce number of machine checks taken during recovery
  x86/mce/inject: Avoid out-of-bounds write when setting flags
  x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration
  x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
  x86/mce: Check regs before accessing it
  x86/mce: Mark mce_start() noinstr
  x86/mce: Mark mce_timed_out() noinstr
  x86/mce: Move the tainting outside of the noinstr region
  x86/mce: Mark mce_read_aux() noinstr
  x86/mce: Mark mce_end() noinstr
  x86/mce: Mark mce_panic() noinstr
  x86/mce: Prevent severity computation from being instrumented
  x86/mce: Allow instrumentation during task work queueing
  x86/mce: Remove noinstr annotation from mce_setup()
  x86/mce: Use mce_rdmsrl() in severity checking code
  x86/mce: Remove function-local cpus variables
  x86/mce: Do not use memset to clear the banks bitmaps
  x86/mce/inject: Set the valid bit in MCA_STATUS before error injection
  x86/mce/inject: Check if a bank is populated before injecting
  x86/mce: Get rid of cpu_missing
  ...
2022-01-10 11:43:09 -08:00
Linus Torvalds
308319e990 - The mandatory set of random minor cleanups all over tip
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcEx8ACgkQEsHwGGHe
 VUr7sw/9HcBuobgSgF8RJTsuaITMYZEIEGCQX9V8i8QLWAmWfw4gWsun3x/QrF0+
 /CmuGpXVckTQgYo8Y/knhPfvVRDWLIsF9p18gV7sPxpQvDZ0ZgPQVSsfXx3cl84y
 4D11zdYHDPDnFRY6AX4GCh5Ozz4K2WPwv41iHgi+vGoFjfSMeq1+ilAEXCBY8pdZ
 S/Txd7oAm5tCVyH0VUXmNbQKTNjDbFXxr/FnWpMtVgXFiDlzbOK91H998ZJP13WL
 1owCxLKW8BgvoRXtUBt8jJqHDqzjwBuxrGBCYwiFpVl0PBkbwLLYxElkef6rtpWf
 gHAMtUeqgTOprNpvtK3ynaf9g8cgXd5QyBL5nszaD1WCzpVstg+yZExgKFWtTI6F
 U0UJkjmGmVGsD9XkV7nbkcO/IGEgY9cOZ3tHqmyEaHSVahgoZRS8SDQKJA5Ivziu
 mUN60dXEdX2xtCwGR3thacblTWHZrSr4k7Cb93thCb+Het67/DtOSkVp9U4Eutqu
 O2La5Lw1pwsFaTy7WvBUvcuI+gc/zFrbs92r1gT5cGvSJ8+0Pf1Pr4nuhPL9dkKa
 rFQ1ubRHpQQ5u4hVDiEVm+PD7I9hSSPEUWiqZamUEkYGF7XYec60FOTgFERhgfez
 HExAhjHlH6lo4lSxwuWMXRcJMI7uehEm0fUduX8USvYUQn66zGM=
 =u68B
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:
 "The mandatory set of random minor cleanups all over tip"

* tag 'x86_cleanups_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/events/amd/iommu: Remove redundant assignment to variable shift
  x86/boot/string: Add missing function prototypes
  x86/fpu: Remove duplicate copy_fpstate_to_sigframe() prototype
  x86/uaccess: Move variable into switch case statement
2022-01-10 10:02:27 -08:00
Linus Torvalds
2e97a0c02b - Add support for decoding instructions which do MMIO accesses in order
to use it in SEV and TDX guests
 
 - An include fix and reorg to allow for removing set_fs in UML later
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcEl4ACgkQEsHwGGHe
 VUqFTQ/+K9Kb6X0+r7wBSRTeAIWaYewmgOdf+7rpFVyFqQtNecKbuSAWGgFnEHc8
 8HUB/krNa+odtx7mAy73wNALUaPmR0KUg6O+YKrvT6LHt8DLlGl5u0g/hihzFdAB
 PW7auuxqt9TvK1i8PkYAI+W7t93o4mw4LzgDCVvoLPQUutRZEV1gHRht8Tn8SjaN
 3EmEiazpFDrXNGWl/3rnS0qIyvtiZu7KNtibE6ljbUgse9cgxOt733mykH6eO9RJ
 hXOfewKML72UxmgWig01pElgLaXeYI5rpSoG7usm4FwwYh+tmBIA8S/EoeE24gn0
 e82lxwRCcHjqUDRp2//gz16sYhs//K6bcViT/4FtnL33e2CjK2/J4MwHPn9zgimO
 VvxSdAes7UFiA/gDIomFt3gJij+hfy4TGKg5d3326Nm9rsQLpxg49WkozYJZ8m/f
 75VVlC4BAj9SnYLQYhSm9buF7pIXmfwN3yWkYJsebl18C6/4FXLLomiqOgWpo3mG
 D0e+CXhLZsEaU5NTiVuaPySzjtpRUzmfWf3S9GifJZex0rX+et7+mqIuC92aHbtD
 Dc+nNFX/D77Fq8Uoe8bIEt8QsnjdACov1TI/S8h2rSjt5R/Lyg73qh0CpN0jtQ+S
 9dUooJWwE4RXnuVMpFq/Xea/BYj1lQ72kMeyFiCNc0/hnzYhZNM=
 =NBcE
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:
 "The pile which we cannot find the proper topic for so we stick it in
  x86/misc:

   - Add support for decoding instructions which do MMIO accesses in
     order to use it in SEV and TDX guests

   - An include fix and reorg to allow for removing set_fs in UML later"

* tag 'x86_misc_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mtrr: Remove the mtrr_bp_init() stub
  x86/sev-es: Use insn_decode_mmio() for MMIO implementation
  x86/insn-eval: Introduce insn_decode_mmio()
  x86/insn-eval: Introduce insn_get_modrm_reg_ptr()
  x86/insn-eval: Handle insn_get_opcode() failure
2022-01-10 10:00:03 -08:00
Linus Torvalds
4a692ae360 - Flush *all* mappings from the TLB after switching to the trampoline
pagetable to prevent any stale entries' presence
 
 - Flush global mappings from the TLB, in addition to the CR3-write,
 after switching off of the trampoline_pgd during boot to clear the
 identity mappings
 
 - Prevent instrumentation issues resulting from the above changes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcEGIACgkQEsHwGGHe
 VUp14RAAgo6BbW9J82Pyl55egIhcQDdGsa16Gdm9S/AFIIW/NhwYo9ydrgtzr/70
 3XKpJYX7nH7PUKYRmoca/m3NnzUU+wnjSGS1XMyB3bJvn2/8S1qeuwBty2VP2dYM
 iS2eGRLjVjbMWwQUSK7tPJa5wi11zUqLIyCe3t0YiWso6TK7xKaVJTQ3/19Xc+/a
 zVQ5VpmzglUTxA6xGCvTDn5IUViUb8QmIuw7Ty6QtQEoI6T3qQvPkdJNXOxDcHNy
 9gDGf4O+5YlPCxYsNEkWDDa02zSZ2aWFSq76b98VyMiOK0xts+ktnAwq6oes+as9
 ZLIipOu5aIkj8te7he0FelyvPhZAVzrFvvmMf1U+EV3PqbyVkabhk5SBeP5v8CZy
 bM4eYNuJ2FLvFpUCC9zQ/MNVQ6ZtxN15rrrsTqk46KLPBHmHp/Aj9W/DP4zpCcNg
 Wwh4xbnGNIN8jZBiBJG6R6q7oM/lZt/loEicxm2QFZHtAIYMsiUmE99HnIREjUHd
 +0mwo2rHniie9zh6GoybX8OcbZCLYGdfe3iPvlO9fQpyDTn8IUIlnruDlUiTBMDM
 fX4J2dynh7xXRH1WW+MwxDv4n400+C08SG9zTD0qPCbGhYwNscMlZhA2JN6mlPep
 spuRPOzzwUUxqjXkDloeDDJNUQ8r032OB2LMhWSbLApJrJM9/QA=
 =cM+z
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Borislav Petkov:

 - Flush *all* mappings from the TLB after switching to the trampoline
   pagetable to prevent any stale entries' presence

 - Flush global mappings from the TLB, in addition to the CR3-write,
   after switching off of the trampoline_pgd during boot to clear the
   identity mappings

 - Prevent instrumentation issues resulting from the above changes

* tag 'x86_mm_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Prevent early boot triple-faults with instrumentation
  x86/mm: Include spinlock_t definition in pgtable.
  x86/mm: Flush global TLB when switching to trampoline page-table
  x86/mm/64: Flush global TLB on boot and AP bringup
  x86/realmode: Add comment for Global bit usage in trampoline_pgd
  x86/mm: Add missing <asm/cpufeatures.h> dependency to <asm/page_64.h>
2022-01-10 09:51:38 -08:00
Linus Torvalds
bfed6efb8e - Add support for handling hw errors in SGX pages: poisoning, recovering
from poison memory and error injection into SGX pages
 
 - A bunch of changes to the SGX selftests to simplify and allow of SGX
 features testing without the need of a whole SGX software stack
 
 - Add a sysfs attribute which is supposed to show the amount of SGX
 memory in a NUMA node, similar to what /proc/meminfo is to normal
 memory
 
 - The usual bunch of fixes and cleanups too
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcDQMACgkQEsHwGGHe
 VUq42xAAjWM0AFpIxgUBpbE0swV3ZMulnndl3/vA5XN+9Yn7Q52+AFyPRE0s7Zam
 Ap+cInh2Il7d/sv54rZ4x/j7+TH4i7s8fWPVU/XiPALQuOuw0/B1wJJ+jmMiPFiU
 3jr7DkUPyWjWTHduMY/tk+xMOpkx1XsxJheYnKvsKVW+fjJ0vPuftAZtfu2z2VOh
 3JLcp5cAXPxW0UK9gdoF5bCBQhBu0NRguTbhHhbByAixQO2GyVSKLSRovUdj0a+y
 QRrQ6hgcvpTOsVHJoWJ7yIX4SBzQTe9Bg6dT9DghOxE4Sc2GH89hu7wRztGawBJO
 nLyzWgiW9ttjQutDpBvZANNVcFAPAdtDWczrzZpREbrGKkzT+kOBnIIL1LWITWOy
 2YWTO3ytW0KNIK85GzMjSVOKRMgaHJeBaGuYZ7Z0kb3GuUPJ9zRlaRxNapKQFuzA
 0PGoA4IDT+2Afy7VYBBNUA2d/WverFQuXKusSxK6b5zJ173o5/DXL2q0d3gn/j8Z
 hhxJUJyVOsfRXSG4NKrj4se4FiA0n/RL4oyUZR9iJ8kWzzZTd0eZTAn468bpGIp5
 yiOlPOLgsmu0xzVmAtG1+4d2+S2x+Ec5YE0sP1V/JLNciYk3Ebp7UyfnS3tn33Xc
 cpdWjELvD1LJVpMEURnbjRrwU6OiiAekYJCP/9lmK9zfOGpwRHc=
 =vFTM
 -----END PGP SIGNATURE-----

Merge tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SGX updates from Borislav Petkov:

 - Add support for handling hw errors in SGX pages: poisoning,
   recovering from poison memory and error injection into SGX pages

 - A bunch of changes to the SGX selftests to simplify and allow of SGX
   features testing without the need of a whole SGX software stack

 - Add a sysfs attribute which is supposed to show the amount of SGX
   memory in a NUMA node, similar to what /proc/meminfo is to normal
   memory

 - The usual bunch of fixes and cleanups too

* tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/sgx: Fix NULL pointer dereference on non-SGX systems
  selftests/sgx: Fix corrupted cpuid macro invocation
  x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
  x86/sgx: Fix minor documentation issues
  selftests/sgx: Add test for multiple TCS entry
  selftests/sgx: Enable multiple thread support
  selftests/sgx: Add page permission and exception test
  selftests/sgx: Rename test properties in preparation for more enclave tests
  selftests/sgx: Provide per-op parameter structs for the test enclave
  selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed
  selftests/sgx: Move setup_test_encl() to each TEST_F()
  selftests/sgx: Encpsulate the test enclave creation
  selftests/sgx: Dump segments and /proc/self/maps only on failure
  selftests/sgx: Create a heap for the test enclave
  selftests/sgx: Make data measurement for an enclave segment optional
  selftests/sgx: Assign source for each segment
  selftests/sgx: Fix a benign linker warning
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Hook arch_memory_failure() into mainline code
  ...
2022-01-10 09:44:09 -08:00
Linus Torvalds
01d5e7872c - Share the SEV string unrolling logic with TDX as TDX guests need it too
- Cleanups and generalzation of code shared by SEV and TDX
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcCf4ACgkQEsHwGGHe
 VUr5GBAAiG5FuPmNBk+9WE/nQLfk/O9J7DJTY3CXuMxDVbCN3x9qZs2PHWwukZ18
 XX6NV+OlD6ZhkHTx28WsmoY6b6rePAXBYH0r1kSCwqJ6qujbi81Mmg8jSJK/xsb3
 JjXbqt92TSbTcRPZ27Xqz81mjrifR7iTF7wU3o3tUdCfDmnfiaS4DmOJ/bwd1Vvl
 wVsjn+zBhuzbQHZtGJuJm4geYAgQNLwh3lPK6ad+V6PmQKXS6QRpbDsxqD5+ZqnT
 nfNND8+lEov3DVzvSHgAN6VsA63kfx7gU9PFKxNOLjgz6TN0fmAq/tu50HLh2X9V
 tcpgb4SywSifha3sDKSqPzDILIY7+L1S2YBcVt2QYpIYzahmEd6aKkMQ8AZINcXQ
 kV6Xrm8FOOWA3+uhqi4XBlIZ5OJ8O47UZszjSN/j1COqWnxiDVF8P+QfmEEJPs8C
 BIkgwhvQM5dLR/rRArBieEIe/mgYPKjUeQCfiUhxBOMkZamDvWBQtv4wr/2CdImH
 b0Tu2sPkyBDLsv8sxK5LSmzxWpvJJGopu8Aqvu2SOrcUw9M4rppRq/nAfpr2YZw4
 AJi4CKEQsLxsiaQQ2duWXQYnWInvAySDJgReDahdAZj2TWSgHNhCiGdEmdzMfTUY
 ToRtczTjHd16TxRDgw3JCi2XG8hneJdAK/Kbp6P1AONEZSNWyT8=
 =9EUd
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:
 "The accumulated pile of x86/sev generalizations and cleanups:

   - Share the SEV string unrolling logic with TDX as TDX guests need it
     too

   - Cleanups and generalzation of code shared by SEV and TDX"

* tag 'x86_sev_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Move common memory encryption code to mem_encrypt.c
  x86/sev: Rename mem_encrypt.c to mem_encrypt_amd.c
  x86/sev: Use CC_ATTR attribute to generalize string I/O unroll
  x86/sev: Remove do_early_exception() forward declarations
  x86/head64: Carve out the guest encryption postprocessing into a helper
  x86/sev: Get rid of excessive use of defines
  x86/sev: Shorten GHCB terminate macro names
2022-01-10 09:33:40 -08:00
Linus Torvalds
e59451fd3b - Define the INTERRUPT_RETURN macro only when CONFIG_XEN_PV is enabled
as it is its only user
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcBagACgkQEsHwGGHe
 VUoCJhAAvkF8KHowvKB7xpxO5b8ifngPvzjGIehKTRIV9p/AetDW0PgmNChXVT69
 73dkvAuEfJId3pAvN8spgfBYYZff20fVp+gTlCSr4MkzH1oKoBp3EWvAwI/eQImP
 iuVUnWmMcWQQ/sFJ5dwzQF0mGWZoLzxkQgBX6aNlvJ0Dy10ulowGSwaBVAUFr3YR
 19wdoYBlXvgT4i2kcrFsVWMFOre9ZtBL6DW+xvdc/1yqlurF281q5W94iyneUk5y
 ZgE6lZklnxdzxoOnJe2hwi7A34PCZ8AZkd2i0skuSpug8yOiVfmON7Xn/p8J5adO
 93Rn+DqmiBz26i+TQ7yt2dAYakn7BdC1r+l5/tx5Kw3qBwk6aExOWS7uPFOL/SxV
 ru5t3daEx9HIkCnwVvyX+kmiZ+9sude1YUTixgSPtNYx+xHCj03P6df+v7Wd3Rlm
 q5miWWfcqAWHbyWDd1/gaVL282/Sc03dwTd+DjWO8xJ4Fglccx5hhYYRnvhBRG2C
 pgJnNTI///pO9P7S0u6By6QjvacylU2zAF9EW66y4E/1EFciWZ0+ezgsHsrahOFN
 ftQMZfluVOH6TMU5hOXvXgp/Z1t0naC+SRWyxSKFck8t+jFQWORd9WZJ9I8CN4Qh
 gumJrOkKfsF3e5/14CvvlC99TP0XQNUHf3zPjFNNnnbChTVHeRw=
 =YEKc
 -----END PGP SIGNATURE-----

Merge tag 'x86_paravirt_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 paravirtualization fix from Borislav Petkov:
 "Define the INTERRUPT_RETURN macro only when CONFIG_XEN_PV is enabled
  as it is its only user"

* tag 'x86_paravirt_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/paravirt: Fix build PARAVIRT_XXL=y without XEN_PV
2022-01-10 09:09:36 -08:00
Rafael J. Wysocki
5561f25beb Merge branch 'pm-cpufreq'
Merge cpufreq updates for 5.17-rc1:

 - Add new P-state driver for AMD processors (Huang Rui).

 - Fix initialization of min and max frequency QoS requests in the
   cpufreq core (Rafael Wysocki).

 - Fix EPP handling on Alder Lake in intel_pstate (Srinivas Pandruvada).

 - Make intel_pstate update cpuinfo.max_freq when notified of HWP
   capabilities changes and drop a redundant function call from that
   driver (Rafael Wysocki).

 - Improve IRQ support in the Qcom cpufreq driver (Ard Biesheuvel,
   Stephen Boyd, Vladimir Zapolskiy).

 - Fix double devm_remap() in the Mediatek cpufreq driver (Hector Yuan).

 - Introduce thermal pressure helpers for cpufreq CPU cooling (Lukasz
   Luba).

 - Make cpufreq use default_groups in kobj_type (Greg Kroah-Hartman).

* pm-cpufreq: (32 commits)
  x86, sched: Fix undefined reference to init_freq_invariance_cppc() build error
  cpufreq: amd-pstate: Fix Kconfig dependencies for AMD P-State
  cpufreq: amd-pstate: Fix struct amd_cpudata kernel-doc comment
  MAINTAINERS: Add AMD P-State driver maintainer entry
  Documentation: amd-pstate: Add AMD P-State driver introduction
  cpufreq: amd-pstate: Add AMD P-State performance attributes
  cpufreq: amd-pstate: Add AMD P-State frequencies attributes
  cpufreq: amd-pstate: Add boost mode support for AMD P-State
  cpufreq: amd-pstate: Add trace for AMD P-State module
  cpufreq: amd-pstate: Introduce the support for the processors with shared memory solution
  cpufreq: amd-pstate: Add fast switch function for AMD P-State
  cpufreq: amd-pstate: Introduce a new AMD P-State driver to support future processors
  ACPI: CPPC: Add CPPC enable register function
  ACPI: CPPC: Check present CPUs for determining _CPC is valid
  ACPI: CPPC: Implement support for SystemIO registers
  x86/msr: Add AMD CPPC MSR definitions
  x86/cpufeatures: Add AMD Collaborative Processor Performance Control feature flag
  cpufreq: use default_groups in kobj_type
  cpufreq: mediatek-hw: Fix double devm_remap in hotplug case
  cpufreq: intel_pstate: Update cpuinfo.max_freq on HWP_CAP changes
  ...
2022-01-10 17:54:45 +01:00
Thomas Gleixner
36487e6228 x86/fpu: Prepare guest FPU for dynamically enabled FPU features
To support dynamically enabled FPU features for guests prepare the guest
pseudo FPU container to keep track of the currently enabled xfeatures and
the guest permissions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-3-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07 13:33:03 -05:00
Thomas Gleixner
980fe2fddc x86/fpu: Extend fpu_xstate_prctl() with guest permissions
KVM requires a clear separation of host user space and guest permissions
for dynamic XSTATE components.

Add a guest permissions member to struct fpu and a separate set of prctl()
arguments: ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM.

The semantics are equivalent to the host user space permission control
except for the following constraints:

  1) Permissions have to be requested before the first vCPU is created

  2) Permissions are frozen when the first vCPU is created to ensure
     consistency. Any attempt to expand permissions via the prctl() after
     that point is rejected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07 13:33:03 -05:00
Michael Roth
405329fc9a KVM: SVM: include CR3 in initial VMSA state for SEV-ES guests
Normally guests will set up CR3 themselves, but some guests, such as
kselftests, and potentially CONFIG_PVH guests, rely on being booted
with paging enabled and CR3 initialized to a pre-allocated page table.

Currently CR3 updates via KVM_SET_SREGS* are not loaded into the guest
VMCB until just prior to entering the guest. For SEV-ES/SEV-SNP, this
is too late, since it will have switched over to using the VMSA page
prior to that point, with the VMSA CR3 copied from the VMCB initial
CR3 value: 0.

Address this by sync'ing the CR3 value into the VMCB save area
immediately when KVM_SET_SREGS* is issued so it will find it's way into
the initial VMSA.

Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20211216171358.61140-10-michael.roth@amd.com>
[Remove vmx_post_set_cr3; add a remark about kvm_set_cr3 not calling the
 new hook. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07 10:44:46 -05:00
David Woodhouse
14243b3871 KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery
This adds basic support for delivering 2 level event channels to a guest.

Initially, it only supports delivery via the IRQ routing table, triggered
by an eventfd. In order to do so, it has a kvm_xen_set_evtchn_fast()
function which will use the pre-mapped shared_info page if it already
exists and is still valid, while the slow path through the irqfd_inject
workqueue will remap the shared_info page if necessary.

It sets the bits in the shared_info page but not the vcpu_info; that is
deferred to __kvm_xen_has_interrupt() which raises the vector to the
appropriate vCPU.

Add a 'verbose' mode to xen_shinfo_test while adding test cases for this.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211210163625.2886-5-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07 10:44:45 -05:00
David Woodhouse
1cfc9c4b9d KVM: x86/xen: Maintain valid mapping of Xen shared_info page
Use the newly reinstated gfn_to_pfn_cache to maintain a kernel mapping
of the Xen shared_info page so that it can be accessed in atomic context.

Note that we do not participate in dirty tracking for the shared info
page and we do not explicitly mark it dirty every single tim we deliver
an event channel interrupts. We wouldn't want to do that even if we *did*
have a valid vCPU context with which to do so.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211210163625.2886-4-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07 10:44:45 -05:00
Like Xu
40ccb96d54 KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
Depending on whether intr should be triggered or not, KVM registers
two different event overflow callbacks in the perf_event context.

The code skeleton of these two functions is very similar, so
the pmc->intr can be stored into pmc from pmc_reprogram_counter()
which provides smaller instructions footprint against the
u-architecture branch predictor.

The __kvm_perf_overflow() can be called in non-nmi contexts
and a flag is needed to distinguish the caller context and thus
avoid a check on kvm_is_in_guest(), otherwise we might get
warnings from suspicious RCU or check_preemption_disabled().

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-5-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07 10:44:42 -05:00
Huang Rui
6c4ab1b86d x86, sched: Fix undefined reference to init_freq_invariance_cppc() build error
The init_freq_invariance_cppc function is implemented in smpboot and depends on
CONFIG_SMP.

  MODPOST vmlinux.symvers
  MODINFO modules.builtin.modinfo
  GEN     modules.builtin
  LD      .tmp_vmlinux.kallsyms1
ld: drivers/acpi/cppc_acpi.o: in function `acpi_cppc_processor_probe':
/home/ray/brahma3/linux/drivers/acpi/cppc_acpi.c:819: undefined reference to `init_freq_invariance_cppc'
make: *** [Makefile:1161: vmlinux] Error 1

See https://lore.kernel.org/lkml/484af487-7511-647e-5c5b-33d4429acdec@infradead.org/.

Fixes: 41ea667227 ("x86, sched: Calculate frequency invariance for AMD systems")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-06 18:52:07 +01:00
Huang Rui
89aa94b4a2 x86/msr: Add AMD CPPC MSR definitions
AMD CPPC (Collaborative Processor Performance Control) function uses MSR
registers to manage the performance hints. So add the MSR register macro
here.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 18:51:17 +01:00
Huang Rui
d341db8f48 x86/cpufeatures: Add AMD Collaborative Processor Performance Control feature flag
Add Collaborative Processor Performance Control feature flag for AMD
processors.

This feature flag will be used on the following AMD P-State driver. The
AMD P-State driver has two approaches to implement the frequency control
behavior. That depends on the CPU hardware implementation. One is "Full
MSR Support" and another is "Shared Memory Support". The feature flag
indicates the current processors with "Full MSR Support".

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 16:58:03 +01:00
Michael Kelley
e1878402ab x86/hyperv: Fix definition of hv_ghcb_pg variable
The percpu variable hv_ghcb_pg is incorrectly defined.  The __percpu
qualifier should be associated with the union hv_ghcb * (i.e.,
a pointer), not with the target of the pointer. This distinction
makes no difference to gcc and the generated code, but sparse
correctly complains.  Fix the definition in the interest of
general correctness in addition to making sparse happy.

No functional change.

Fixes: 0cc4f6d9f0 ("x86/hyperv: Initialize GHCB page in Isolation VM")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1640662315-22260-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-12-28 14:18:47 +00:00
Linus Torvalds
a8ad9a2434 Another EFI fix for v5.16
- Prevent missing prototype warning from breaking the build under
   CONFIG_WERROR=y
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmHJrdYACgkQw08iOZLZ
 jyRS/wwAvrrv6xidZNqm/BAr4bhY9saQ4BZTracZuHJwD7300H9mjlMDMnVbLhOU
 oVv9hEHDqBqBKgZxIYc0ok4SMZEpieFhLuTdXzVf5dMg+BL+qHR0EQp8kLp8mssL
 xbPnMidecL2TfjPqLn34MVPe46uQo2yioxET4hDvWA5xsWNKKd0AeKVzUzdV2AVE
 elnHsdsBWpCexaIJkQEFJWf/Y3xUlzcqp7zcqgJq4hq3fZlEbD8SJ7TUM7ReAQJ6
 5afzYXBm086sn602hZVbAtOy94vLBBQswnbKntmFpcT+z/GLNjycx+A4Z2h+6jjx
 ImiIkNZA2N5ij12riXdx3n6OG0LMuILlRhhqyckgrgHyc1oYCwaUq+LOuVQENZF+
 YNGfyntQ+Y0F9dfG28atnFiL5ZFZcCSJxVZ3Dl2bql2DHptbQNHOBdXKOcDWTkpc
 JKTs3j3rigZ1I01b1SgWzH418do8s8y4iv4nmp3EdMUO8v8rEFMx3yHIHKSRjMqj
 1vAPZtWQ
 =coEy
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-for-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:
 "Another EFI fix for v5.16:

   - Prevent missing prototype warning from breaking the build under
     CONFIG_WERROR=y"

* tag 'efi-urgent-for-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Move efifb_setup_from_dmi() prototype from arch headers
2021-12-27 08:58:35 -08:00
Linus Torvalds
e8ffcd3ab0 - Prevent potential undefined behavior due to shifting pkey constants into the sign bit
- Move the EFI memory reservation code *after* the efi= cmdline parsing has happened
 
 - Revert two commits which turned out to be the wrong direction to chase
 when accommodating early memblock reservations consolidation and command line parameters
 parsing
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHIYbAACgkQEsHwGGHe
 VUphMhAAptI7YGJFqfH6bUuVh7HX38FOx5pCODpZUXJWIsvgC0MRnxqUF0+Srzbt
 ZBMhh7WUiOpIe/ZeUpRouX5Al4nhiDdG4QeddJLc1DaSaAgOJWD9NH0V0K1aWf4k
 poAUoX7mp5M1mfc/boOQGBdtX+l+3vCYU0RzyQ+IhfDNM6GzJpAC1txWFaL/lWE+
 3IVb4OF37mW7Hlk+HdizcTgm1qaHRobwP7fxbqRwdHZy+YRjz0eOd2dR625jff6b
 y3Z7WVkwx+YtNVD7YnXvmJKyh43rbbMmx0CU6WiYlhGkz5dooe9Er2LuouS1kzRr
 4XWvZrr28qLc9eCdv6MaZGv62KyP0QBm4Fg7E/nC/0Jo8NA3GQXulF2OHofSCX/h
 BZa3mJa+heoXpUb7zDTUMa8UCdOYGJpvHRGYcU3JiaIwv+dnTRBdAzyVcqCMWt5P
 wPOl4A08vVZ+w3aNrZh3Zfg03pG4/9miveIa7EsuCZF/oN+5KAVQdcU74zneH4Qx
 nUfF7b80PyiSnCblQ3DL+efKVG4/Bud6crxo1tOtw6cDQxiAYDzupuZBlHV9791O
 0TwPuLJSAqv4aMiobi5/j32pSLaohYJij9stbfB8zeD49rq5uvT+jDoxcKiZurJP
 jamo6LHffF+o8+F2YwFOSIPayZOViXxWvwPd+E6uL7EodvZgiS0=
 =j12C
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Prevent potential undefined behavior due to shifting pkey constants
   into the sign bit

 - Move the EFI memory reservation code *after* the efi= cmdline parsing
   has happened

 - Revert two commits which turned out to be the wrong direction to
   chase when accommodating early memblock reservations consolidation
   and command line parameters parsing

* tag 'x86_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pkey: Fix undefined behaviour with PKRU_WD_BIT
  x86/boot: Move EFI range reservation after cmdline parsing
  Revert "x86/boot: Pull up cmdline preparation and early param parsing"
  Revert "x86/boot: Mark prepare_command_line() __init"
2021-12-26 10:28:55 -08:00
Christoph Hellwig
4d5cff69fb x86/mtrr: Remove the mtrr_bp_init() stub
Add an IS_ENABLED() check in setup_arch() and call pat_disable()
directly if MTRRs are not supported. This allows to remove the
<asm/memtype.h> include in <asm/mtrr.h>, which pull in lowlevel x86
headers that should not be included for UML builds and will cause build
warnings with a following patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211215165612.554426-2-hch@lst.de
2021-12-22 19:50:26 +01:00
Yazen Ghannam
91f75eb481 x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration
AMD systems currently lay out MCA bank types such that the type of bank
number "i" is either the same across all CPUs or is Reserved/Read-as-Zero.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     CS
    3      SMU    RAZ

Future AMD systems will lay out MCA bank types such that the type of
bank number "i" may be different across CPUs.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     NBIO
    3      SMU    RAZ

Change the structures that cache MCA bank types to be per-CPU and update
smca_get_bank_type() to handle this change.

Move some SMCA-specific structures to amd.c from mce.h, since they no
longer need to be global.

Break out the "count" for bank types from struct smca_hwid, since this
should provide a per-CPU count rather than a system-wide count.

Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The
values in this array should not change at runtime.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.com
2021-12-22 17:22:09 +01:00
Yazen Ghannam
5176a93ab2 x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
Add HWID and McaType values for new SMCA bank types, and add their error
descriptions to edac_mce_amd.

The "PHY" bank types all have the same error descriptions, and the NBIF
and SHUB bank types have the same error descriptions. So reuse the same
arrays where appropriate.

  [ bp: Remove useless comments over hwid types. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com
2021-12-22 17:19:18 +01:00
Paolo Bonzini
855fb0384a Merge remote-tracking branch 'kvm/master' into HEAD
Pick commit fdba608f15 ("KVM: VMX: Wake vCPU when delivering posted
IRQ even if vCPU == this vCPU").  In addition to fixing a bug, it
also aligns the non-nested and nested usage of triggering posted
interrupts, allowing for additional cleanups.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-21 12:51:09 -05:00
Marc Orr
c5063551bf KVM: x86: Always set kvm_run->if_flag
The kvm_run struct's if_flag is a part of the userspace/kernel API. The
SEV-ES patches failed to set this flag because it's no longer needed by
QEMU (according to the comment in the source code). However, other
hypervisors may make use of this flag. Therefore, set the flag for
guests with encrypted registers (i.e., with guest_state_protected set).

Fixes: f1c6366e30 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Signed-off-by: Marc Orr <marcorr@google.com>
Message-Id: <20211209155257.128747-1-marcorr@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
2021-12-20 08:06:53 -05:00
Andrew Cooper
57690554ab x86/pkey: Fix undefined behaviour with PKRU_WD_BIT
Both __pkru_allows_write() and arch_set_user_pkey_access() shift
PKRU_WD_BIT (a signed constant) by up to 30 bits, hitting the
sign bit.

Use unsigned constants instead.

Clearly pkey 15 has not been used in combination with UBSAN yet.

Noticed by code inspection only.  I can't actually provoke the
compiler into generating incorrect logic as far as this shift is
concerned.

[
  dhansen: add stable@ tag, plus minor changelog massaging,

           For anyone doing backports, these #defines were in
	   arch/x86/include/asm/pgtable.h before 784a46618f.
]

Fixes: 33a709b25a ("mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20211216000856.4480-1-andrew.cooper3@citrix.com
2021-12-19 22:44:34 +01:00
Arnd Bergmann
91f7d2dbf9 x86/xen: Use correct #ifdef guard for xen_initdom_restore_msi()
The #ifdef check around the definition doesn't match the one around the
declaration, leading to a link failure when CONFIG_XEN_DOM0 is enabled
but CONFIG_XEN_PV_DOM0 is not:

x86_64-linux-ld: arch/x86/kernel/apic/msi.o: in function `arch_restore_msi_irqs':
msi.c:(.text+0x29a): undefined reference to `xen_initdom_restore_msi'

Change the declaration to use the same check that was already present
around the function definition.

Fixes: ae72f31567 ("PCI/MSI: Make arch_restore_msi_irqs() less horrible.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20211215140209.451379-1-arnd@kernel.org
2021-12-15 16:13:23 +01:00
Thomas Gleixner
09eb3ad55f Merge branch 'irq/urgent' into irq/msi
to pick up the PCI/MSI-x fixes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2021-12-14 13:30:34 +01:00
Javier Martinez Canillas
4bc5e64e6c efi: Move efifb_setup_from_dmi() prototype from arch headers
Commit 8633ef82f1 ("drivers/firmware: consolidate EFI framebuffer setup
for all arches") made the Generic System Framebuffers (sysfb) driver able
to be built on non-x86 architectures.

But it left the efifb_setup_from_dmi() function prototype declaration in
the architecture specific headers. This could lead to the following
compiler warning as reported by the kernel test robot:

   drivers/firmware/efi/sysfb_efi.c:70:6: warning: no previous prototype for function 'efifb_setup_from_dmi' [-Wmissing-prototypes]
   void efifb_setup_from_dmi(struct screen_info *si, const char *opt)
        ^
   drivers/firmware/efi/sysfb_efi.c:70:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void efifb_setup_from_dmi(struct screen_info *si, const char *opt)

Fixes: 8633ef82f1 ("drivers/firmware: consolidate EFI framebuffer setup for all arches")
Reported-by: kernel test robot <lkp@intel.com>
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20211126001333.555514-1-javierm@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-12-13 15:07:16 +01:00
Peter Zijlstra
b776078025 x86/word-at-a-time: Remove .fixup usage
Rewrite load_unaligned_zeropad() to not require .fixup text.

This is easiest done using asm-goto-output, where we can stick a C
label in the exception table entry. The fallback version isn't nearly
so nice but should work.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101326.141775772@infradead.org
2021-12-11 09:09:50 +01:00
Peter Zijlstra
d5d797dcbd x86/usercopy: Remove .fixup usage
Typically usercopy does whole word copies followed by a number of byte
copies to finish the tail. This means that on exception it needs to
compute the remaining length as: words*sizeof(long) + bytes.

Create a new extable handler to do just this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101326.081701085@infradead.org
2021-12-11 09:09:50 +01:00
Peter Zijlstra
5ce8e39f55 x86/sgx: Remove .fixup usage
Create EX_TYPE_FAULT_SGX which does as EX_TYPE_FAULT does, except adds
this extra bit that SGX really fancies having.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.961246679@infradead.org
2021-12-11 09:09:49 +01:00
Peter Zijlstra
c9a34c3f4e x86/kvm: Remove .fixup usage
KVM instruction emulation has a gnarly hack where the .fixup does a
return, however there's already a ret right after the 10b label, so
mark that as 11 and have the exception clear %esi to remove the
.fixup.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.722157053@infradead.org
2021-12-11 09:09:48 +01:00
Peter Zijlstra
5fc77b916c x86/segment: Remove .fixup usage
Create and use EX_TYPE_ZERO_REG to clear the register and retry the
segment load on exception.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.663529463@infradead.org
2021-12-11 09:09:48 +01:00
Peter Zijlstra
e2b48e4328 x86/xen: Remove .fixup usage
Employ the fancy new EX_TYPE_IMM_REG to store -EFAULT in the return
register and use this to remove some Xen .fixup usage.

All callers of these functions only test for 0 return, so the actual
return value change from -1 to -EFAULT is immaterial.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.545019822@infradead.org
2021-12-11 09:09:48 +01:00
Peter Zijlstra
99641e094d x86/uaccess: Remove .fixup usage
For the !CC_AS_ASM_GOTO_OUTPUT (aka. the legacy codepath), remove the
.fixup usage by employing both EX_TYPE_EFAULT_REG and EX_FLAG_CLEAR.
Like was already done for X86_32's version of __get_user_asm_u64() use
the "a" register for output, specifically so we can use CLEAR_AX.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.485154848@infradead.org
2021-12-11 09:09:47 +01:00
Peter Zijlstra
4c132d1d84 x86/futex: Remove .fixup usage
Use the new EX_TYPE_IMM_REG to store -EFAULT into the designated 'ret'
register, this removes the need for anonymous .fixup code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.426016322@infradead.org
2021-12-11 09:09:47 +01:00
Peter Zijlstra
d52a7344bd x86/msr: Remove .fixup usage
Rework the MSR accessors to remove .fixup usage. Add two new extable
types (to the 4 already existing msr ones) using the new register
infrastructure to record which register should get the error value.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.364084212@infradead.org
2021-12-11 09:09:47 +01:00
Peter Zijlstra
4b5305decc x86/extable: Extend extable functionality
In order to remove further .fixup usage, extend the extable
infrastructure to take additional information from the extable entry
sites.

Specifically add _ASM_EXTABLE_TYPE_REG() and EX_TYPE_IMM_REG that
extend the existing _ASM_EXTABLE_TYPE() by taking an additional
register argument and encoding that and an s16 immediate into the
existing s32 type field. This limits the actual types to the first
byte, 255 seem plenty.

Also add a few flags into the type word, specifically CLEAR_AX and
CLEAR_DX which clear the return and extended return register.

Notes:
 - due to the % in our register names it's hard to make it more
   generally usable as arm64 did.
 - the s16 is far larger than used in these patches, future extentions
   can easily shrink this to get more bits.
 - without the bitfield fix this will not compile, because: 0xFF > -1
   and we can't even extract the TYPE field.

[nathanchance: Build fix for clang-lto builds:
 https://lkml.kernel.org/r/20211210234953.3420108-1-nathan@kernel.org
]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20211110101325.303890153@infradead.org
2021-12-11 09:09:46 +01:00
Peter Zijlstra
aa93e2ad74 x86/entry_32: Remove .fixup usage
Where possible, push the .fixup into code, at the tail of functions.

This is hard for macros since they're used in multiple functions,
therefore introduce a new extable handler to pop zeros.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.245184699@infradead.org
2021-12-11 09:09:46 +01:00
Peter Zijlstra
c6dbd3e5e6 x86/mmx_32: Remove X86_USE_3DNOW
This code puts an exception table entry on the PREFETCH instruction to
overwrite it with a JMP.d8 when it triggers an exception. Except of
course, our code is no longer writable, also SMP.

Instead of fixing this broken mess, simply take it out.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/YZKQzUmeNuwyvZpk@hirez.programming.kicks-ass.net
2021-12-11 09:09:45 +01:00
Shaokun Zhang
20735d24ad x86/fpu: Remove duplicate copy_fpstate_to_sigframe() prototype
The function prototype of copy_fpstate_to_sigframe() is declared twice in

  0ae67cc34f ("x86/fpu: Remove internal.h dependency from fpu/signal.h").

Remove one of them.

 [ bp: Massage ]

Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211209015550.51916-1-zhangshaokun@hisilicon.com
2021-12-10 19:13:06 +01:00
Kees Cook
61646ca83d x86/uaccess: Move variable into switch case statement
When building with automatic stack variable initialization, GCC 12
complains about variables defined outside of switch case statements.
Move the variable into the case that uses it, which silences the warning:

./arch/x86/include/asm/uaccess.h:317:23: warning: statement will never be executed [-Wswitch-unreachable]
  317 |         unsigned char x_u8__; \
      |                       ^~~~~~

Fixes: 865c50e1d2 ("x86/uaccess: utilize CONFIG_CC_HAS_ASM_GOTO_OUTPUT")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211209043456.1377875-1-keescook@chromium.org
2021-12-10 19:13:00 +01:00
Vitaly Kuznetsov
1ebfaa11eb KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall
Prior to commit 0baedd7927 ("KVM: x86: make Hyper-V PV TLB flush use
tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH |
KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs
and KVM_REQ_TLB_FLUSH is/was defined as:

 (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)

so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that
"This call guarantees that by the time control returns back to the
caller, the observable effects of all flushes on the specified virtual
processors have occurred." and without KVM_REQUEST_WAIT there's a small
chance that the vCPU making the TLB flush will resume running before
all IPIs get delivered to other vCPUs and a stale mapping can get read
there.

Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST:
kvm_hv_flush_tlb() is the sole caller which uses it for
kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where
KVM_REQUEST_WAIT makes a difference.

Cc: stable@kernel.org
Fixes: 0baedd7927 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211209102937.584397-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-10 07:12:41 -05:00
Marco Elver
d93414e375 x86/qspinlock, kcsan: Instrument barrier of pv_queued_spin_unlock()
If CONFIG_PARAVIRT_SPINLOCKS=y, queued_spin_unlock() is implemented
using pv_queued_spin_unlock() which is entirely inline asm based. As
such, we do not receive any KCSAN barrier instrumentation via regular
atomic operations.

Add the missing KCSAN barrier instrumentation for the
CONFIG_PARAVIRT_SPINLOCKS case.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-09 16:42:28 -08:00
Marco Elver
cd8730c3ab x86/barriers, kcsan: Use generic instrumentation for non-smp barriers
Prefix all barriers with __, now that asm-generic/barriers.h supports
defining the final instrumented version of these barriers. The change is
limited to barriers used by x86-64.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-09 16:42:28 -08:00
Sebastian Andrzej Siewior
35fa745286 x86/mm: Include spinlock_t definition in pgtable.
This header file provides forward declartion for pgd_lock but does not
include the header defining its type. This works since the definition of
spinlock_t is usually included somehow via printk.

By trying to avoid recursive includes on PREEMPT_RT I avoided the loop
in printk and as a consequnce kernel/intel.c failed to compile due to
missing type definition.

Include the needed definition for spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20211102165224.wpz4zyhsvwccx5p3@linutronix.de
2021-12-09 10:58:48 -08:00
Peter Zijlstra
e463a09af2 x86: Add straight-line-speculation mitigation
Make use of an upcoming GCC feature to mitigate
straight-line-speculation for x86:

  https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952
  https://bugs.llvm.org/show_bug.cgi?id=52323

It's built tested on x86_64-allyesconfig using GCC-12 and GCC-11.

Maintenance overhead of this should be fairly low due to objtool
validation.

Size overhead of all these additional int3 instructions comes to:

     text	   data	    bss	    dec	    hex	filename
  22267751	6933356	2011368	31212475	1dc43bb	defconfig-build/vmlinux
  22804126	6933356	1470696	31208178	1dc32f2	defconfig-build/vmlinux.sls

Or roughly 2.4% additional text.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134908.140103474@infradead.org
2021-12-09 13:32:25 +01:00
Thomas Gleixner
ae72f31567 PCI/MSI: Make arch_restore_msi_irqs() less horrible.
Make arch_restore_msi_irqs() return a boolean which indicates whether the
core code should restore the MSI message or not. Get rid of the indirection
in x86.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# PCI
Link: https://lore.kernel.org/r/20211206210224.485668098@linutronix.de
2021-12-09 11:52:21 +01:00
Peter Zijlstra
b17c2baa30 x86: Prepare inline-asm for straight-line-speculation
Replace all ret/retq instructions with ASM_RET in preparation of
making it more than a single instruction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134907.964635458@infradead.org
2021-12-08 19:23:12 +01:00
Kuppuswamy Sathyanarayanan
8260b9820f x86/sev: Use CC_ATTR attribute to generalize string I/O unroll
INS/OUTS are not supported in TDX guests and cause #UD. Kernel has to
avoid them when running in TDX guest. To support existing usage, string
I/O operations are unrolled using IN/OUT instructions.

AMD SEV platform implements this support by adding unroll
logic in ins#bwl()/outs#bwl() macros with SEV-specific checks.
Since TDX VM guests will also need similar support, use
CC_ATTR_GUEST_UNROLL_STRING_IO and generic cc_platform_has() API to
implement it.

String I/O helpers were the last users of sev_key_active() interface and
sev_enable_key static key. Remove them.

 [ bp: Move comment too and do not delete it. ]

Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20211206135505.75045-2-kirill.shutemov@linux.intel.com
2021-12-08 16:49:42 +01:00
Paolo Bonzini
93b350f884 Merge branch 'kvm-on-hv-msrbm-fix' into HEAD
Merge bugfix for enlightened MSR Bitmap, before adding support
to KVM for exposing the feature to nested guests.
2021-12-08 05:30:48 -05:00
Hou Wenlong
906fa90416 KVM: x86: Add an emulation type to handle completion of user exits
The next patch would use kvm_emulate_instruction() with
EMULTYPE_SKIP in complete_userspace_io callback to fix a
problem in msr access emulation. However, EMULTYPE_SKIP
only updates RIP, more things like updating interruptibility
state and injecting single-step #DBs would be done in the
callback. Since the emulator also does those things after
x86_emulate_insn(), add a new emulation type to pair with
EMULTYPE_SKIP to do those things for completion of user exits
within the emulator.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <8f8c8e268b65f31d55c2881a4b30670946ecfa0d.1635842679.git.houwenlong93@linux.alibaba.com>
2021-12-08 04:25:15 -05:00
Lai Jiangshan
2df4a5eb6c KVM: X86: Remove mmu parameter from load_pdptrs()
It uses vcpu->arch.walk_mmu always; nested EPT does not have PDPTRs,
and nested NPT treats them like all other non-leaf page table levels
instead of caching them.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-11-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:25:14 -05:00
Lai Jiangshan
bb3b394d35 KVM: X86: Rename gpte_is_8_bytes to has_4_byte_gpte and invert the direction
This bit is very close to mean "role.quadrant is not in use", except that
it is false also when the MMU is mapping guest physical addresses
directly.  In that case, role.quadrant is indeed not in use, but there
are no guest PTEs at all.

Changing the name and direction of the bit removes the special case,
since a guest with paging disabled, or not considering guest paging
structures as is the case for two-dimensional paging, does not have
to deal with 4-byte guest PTEs.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-10-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:25:13 -05:00
Lai Jiangshan
c59a0f57fa KVM: X86: Remove mmu->translate_gpa
Reduce an indirect function call (retpoline) and some intialization
code.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-4-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:25:11 -05:00
Lai Jiangshan
1f5a21ee84 KVM: X86: Add parameter struct kvm_mmu *mmu into mmu->gva_to_gpa()
The mmu->gva_to_gpa() has no "struct kvm_mmu *mmu", so an extra
FNAME(gva_to_gpa_nested) is needed.

Add the parameter can simplify the code.  And it makes it explicit that
the walk is upon vcpu->arch.walk_mmu for gva and vcpu->arch.mmu for L2
gpa in translate_nested_gpa() via the new parameter.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-3-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:25:10 -05:00
Lai Jiangshan
42f34c20a1 KVM: X86: Remove unused declaration of __kvm_mmu_free_some_pages()
The body of __kvm_mmu_free_some_pages() has been removed.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211118110814.2568-13-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:25:09 -05:00
Sean Christopherson
005467e06b KVM: Drop obsolete kvm_arch_vcpu_block_finish()
Drop kvm_arch_vcpu_block_finish() now that all arch implementations are
nops.

No functional change intended.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:50 -05:00
Sean Christopherson
1460179dcd KVM: x86: Tweak halt emulation helper names to free up kvm_vcpu_halt()
Rename a variety of HLT-related helpers to free up the function name
"kvm_vcpu_halt" for future use in generic KVM code, e.g. to differentiate
between "block" and "halt".

No functional change intended.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:50 -05:00
Ben Gardon
8283e36abf KVM: x86/mmu: Propagate memslot const qualifier
In preparation for implementing in-place hugepage promotion, various
functions will need to be called from zap_collapsible_spte_range, which
has the const qualifier on its memslot argument. Propagate the const
qualifier to the various functions which will be needed. This just serves
to simplify the following patch.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211115234603.2908381-11-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:43 -05:00
Ben Gardon
9d395a0a7a KVM: x86/mmu: Remove need for a vcpu from kvm_slot_page_track_is_active
kvm_slot_page_track_is_active only uses its vCPU argument to get a
pointer to the assoicated struct kvm, so just pass in the struct KVM to
remove the need for a vCPU pointer.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20211115234603.2908381-6-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:42 -05:00
Maciej S. Szmigiero
f5756029ee KVM: x86: Use nr_memslot_pages to avoid traversing the memslots array
There is no point in recalculating from scratch the total number of pages
in all memslots each time a memslot is created or deleted.  Use KVM's
cached nr_memslot_pages to compute the default max number of MMU pages.

Note that even with nr_memslot_pages capped at ULONG_MAX we can't safely
multiply it by KVM_PERMILLE_MMU_PAGES (20) since this operation can
possibly overflow an unsigned long variable.

Write this "* 20 / 1000" operation as "/ 50" instead to avoid such
overflow.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
[sean: use common KVM field and rework changelog accordingly]
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <d14c5a24535269606675437d5602b7dac4ad8c0e.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:29 -05:00
Paolo Bonzini
dc1ce45575 KVM: MMU: update comment on the number of page role combinations
Fix the number of bits in the role, and simplify the explanation of
why several bits or combinations of bits are redundant.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:13 -05:00
Joerg Roedel
71d5049b05 x86/mm: Flush global TLB when switching to trampoline page-table
Move the switching code into a function so that it can be re-used and
add a global TLB flush. This makes sure that usage of memory which is
not mapped in the trampoline page-table is reliably caught.

Also move the clearing of CR4.PCIDE before the CR3 switch because the
cr4_clear_bits() function will access data not mapped into the
trampoline page-table.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211202153226.22946-4-joro@8bytes.org
2021-12-06 09:54:10 +01:00
Joerg Roedel
f154f29085 x86/mm/64: Flush global TLB on boot and AP bringup
The AP bringup code uses the trampoline_pgd page-table which
establishes global mappings in the user range of the address space.
Flush the global TLB entries after the indentity mappings are removed so
no stale entries remain in the TLB.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211202153226.22946-3-joro@8bytes.org
2021-12-06 09:38:48 +01:00
Linus Torvalds
f5d54a42d3 - Fix a couple of SWAPGS fencing issues in the x86 entry code
- Use the proper operand types in __{get,put}_user() to prevent
 truncation in SEV-ES string io
 
 - Make sure the kernel mappings are present in trampoline_pgd in order
 to prevent any potential accesses to unmapped memory after switching to
 it
 
 - Fix a trivial list corruption in objtool's pv_ops validation
 
 - Disable the clocksource watchdog for TSC on platforms which claim
 that the TSC is constant, doesn't stop in sleep states, CPU has TSC
 adjust and the number of sockets of the platform are max 2, to prevent
 erroneous markings of the TSC as unstable.
 
 - Make sure TSC adjust is always checked not only when going idle
 
 - Prevent a stack leak by initializing struct _fpx_sw_bytes properly in
 the FPU code
 
 - Fix INTEL_FAM6_RAPTORLAKE define naming to adhere to the convention
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGsnWcACgkQEsHwGGHe
 VUoR+g/9FcOP0/XLH+LKHumYc9JHXsp5BvYGihyypMFgU0fXQBORtGqdls8jZtiJ
 kEdbW6iL0MRlyN8aHJCqr7dJqs7KJlpWes6hky7BY+U+7uewtjL5y3eSyZnA34T3
 M/Raecx27Hh0L0kHQlHXTUN73v1cgDvq3dCXWsP7Jqgjf5cEmCcV/tPEateqhq/f
 8TkLVIm55rJlbJ0LBO/cT0V3Q8QH9JPKm7nviOZuKCh9gcttFEPaM9MkaJyKUhoy
 O13jlenDoVkVWRXIQec1EZp2pTLxVAm/3Y0plge1yEVsejzh07gsQnMpoNeF+yFC
 8mDgSv8ZAED/vbsnB+BcgoRVj6ajG0+ilpLzcfPwUquiqS9pZrBSTddlvYDPjRMC
 MEXO548xiYgxmipu3r62H89nqmLEYQPk914rJu6bDnDeJ1gaabh8RXbNtQcRqqj3
 RETgVOp78iWn+aT33RLLD1EyodZb2IkMy087a3+TZICIXG81aDj9VgHvrVRnWnfY
 yKuldyrEKzi60yMQkV6h1oc8KSWQhspUSLtOVS9zrulCinYphFOfYFrzFmcKUWIq
 GdVb9eaP2oNBGfPybXP+TBLGZ4Zv9iXZmaEUk7ZGCjgv3ZmGMWJ18Hs/ufs2cwWK
 RNNUo3sz/y3OsreHowkWIk1eSxI16MabB7G/PDMnBSHlioVT390=
 =d6nS
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Fix a couple of SWAPGS fencing issues in the x86 entry code

 - Use the proper operand types in __{get,put}_user() to prevent
   truncation in SEV-ES string io

 - Make sure the kernel mappings are present in trampoline_pgd in order
   to prevent any potential accesses to unmapped memory after switching
   to it

 - Fix a trivial list corruption in objtool's pv_ops validation

 - Disable the clocksource watchdog for TSC on platforms which claim
   that the TSC is constant, doesn't stop in sleep states, CPU has TSC
   adjust and the number of sockets of the platform are max 2, to
   prevent erroneous markings of the TSC as unstable.

 - Make sure TSC adjust is always checked not only when going idle

 - Prevent a stack leak by initializing struct _fpx_sw_bytes properly in
   the FPU code

 - Fix INTEL_FAM6_RAPTORLAKE define naming to adhere to the convention

* tag 'x86_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
  x86/entry: Use the correct fence macro after swapgs in kernel CR3
  x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
  x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword
  x86/64/mm: Map all kernel memory into trampoline_pgd
  objtool: Fix pv_ops noinstr validation
  x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
  x86/tsc: Add a timer to make sure TSC_adjust is always checked
  x86/fpu/signal: Initialize sw_bytes in save_xstate_epilog()
  x86/cpu: Drop spurious underscore from RAPTOR_LAKE #define
2021-12-05 08:43:35 -08:00
Linus Torvalds
90bf8d98b4 * Static analysis fix
* New SEV-ES protocol for communicating invalid VMGEXIT requests
 * Ensure APICv is considered inactive if there is no APIC
 * Fix reserved bits for AMD PerfEvtSeln register
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGscuAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOnzAgAg/0tifAOC8F31xu+gbbFa2hJsfH1
 sr2QXqVI6XUWeAHw5GEEcSFHGcFc0BrmfPEL4T3tEhkTGcijL2uTK8dwgq3ue4yg
 5LzaZzh03AWi4x84rV3XNVHHatzF69tgbUG49rSlK2T6BkGWzh4gI5LZV7XqNqLh
 acW+92YcCGo/O9RAUYYakofX4bp0rsQaZornQiD/R5X6AlrtMUyhAYHH5Wnv69n+
 MHf4K9MzrtixXbTvkOXflN5yz6TkIGvCpCK+gmppeuJ3JyZshjn+y95XcDDZtB6o
 +4Ypap3SmIkjTg4VwS8lRwIFkkY31hxhW2ohRnorfP2Q5Ckcz2/kVcIEew==
 =uJMy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm fixes from Paolo Bonzini:

 - Static analysis fix

 - New SEV-ES protocol for communicating invalid VMGEXIT requests

 - Ensure APICv is considered inactive if there is no APIC

 - Fix reserved bits for AMD PerfEvtSeln register

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure
  KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary
  KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails
  KVM: x86/mmu: Retry page fault if root is invalidated by memslot update
  KVM: VMX: Set failure code in prepare_vmcs02()
  KVM: ensure APICv is considered inactive if there is no APIC
  KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
2021-12-05 08:25:33 -08:00
Tom Lendacky
ad5b353240 KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure
Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT
exit code or exit parameters fails.

The VMGEXIT instruction can be issued from userspace, even though
userspace (likely) can't update the GHCB. To prevent userspace from being
able to kill the guest, return an error through the GHCB when validation
fails rather than terminating the guest. For cases where the GHCB can't be
updated (e.g. the GHCB can't be mapped, etc.), just return back to the
guest.

The new error codes are documented in the lasest update to the GHCB
specification.

Fixes: 291bd20d5d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <b57280b5562893e2616257ac9c2d4525a9aeeb42.1638471124.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-05 03:02:04 -05:00
Ingo Molnar
e1cd82a339 x86/mm: Add missing <asm/cpufeatures.h> dependency to <asm/page_64.h>
In the following commit:

  025768a966 x86/cpu: Use alternative to generate the TASK_SIZE_MAX constant

... we added the new task_size_max() inline, which uses X86_FEATURE_LA57,
but doesn't include <asm/cpufeatures.h> which defines the constant.

Due to the way alternatives macros work currently this doesn't get reported as an
immediate build error, only as a link error, if a .c file happens to include
<asm/page.h> first:

   > ld: kernel/fork.o:(.altinstructions+0x98): undefined reference to `X86_FEATURE_LA57'

In the current upstream kernel no .c file includes <asm/page.h> before including
some other header that includes <asm/cpufeatures.h>, which is why this dependency
bug went unnoticed.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-12-03 09:30:45 -08:00
Paolo Bonzini
ef8b4b7203 KVM: ensure APICv is considered inactive if there is no APIC
kvm_vcpu_apicv_active() returns false if a virtual machine has no in-kernel
local APIC, however kvm_apicv_activated might still be true if there are
no reasons to disable APICv; in fact it is quite likely that there is none
because APICv is inhibited by specific configurations of the local APIC
and those configurations cannot be programmed.  This triggers a WARN:

   WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));

To avoid this, introduce another cause for APICv inhibition, namely the
absence of an in-kernel local APIC.  This cause is enabled by default,
and is dropped by either KVM_CREATE_IRQCHIP or the enabling of
KVM_CAP_IRQCHIP_SPLIT.

Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Fixes: ee49a89329 ("KVM: x86: Move SVM's APICv sanity check to common x86", 2021-10-22)
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Ignat Korchagin <ignat@cloudflare.com>
Message-Id: <20211130123746.293379-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-02 04:12:11 -05:00
Kirill A. Shutemov
70a81f99e4 x86/insn-eval: Introduce insn_decode_mmio()
In preparation for sharing MMIO instruction decode between SEV-ES and
TDX, factor out the common decode into a new insn_decode_mmio() helper.

For regular virtual machine, MMIO is handled by the VMM and KVM
emulates instructions that caused MMIO. But, this model doesn't work
for a secure VMs (like SEV or TDX) as VMM doesn't have access to the
guest memory and register state. So, for TDX or SEV VMM needs
assistance in handling MMIO. It induces exception in the guest. Guest
has to decode the instruction and handle it on its own.

The code is based on the current SEV MMIO implementation.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20211130184933.31005-4-kirill.shutemov@linux.intel.com
2021-11-30 14:53:19 -08:00
Kirill A. Shutemov
d5ec1877df x86/insn-eval: Introduce insn_get_modrm_reg_ptr()
The helper returns a pointer to the register indicated by
ModRM byte.

It's going to replace vc_insn_get_reg() in the SEV MMIO
implementation. TDX MMIO implementation will also use it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20211130184933.31005-3-kirill.shutemov@linux.intel.com
2021-11-30 14:53:04 -08:00
Tony Luck
7d697f0d57 x86/cpu: Drop spurious underscore from RAPTOR_LAKE #define
Convention for all the other "lake" CPUs is all one word.

So s/RAPTOR_LAKE/RAPTORLAKE/

Fixes: fbdb5e8f29 ("x86/cpu: Add Raptor Lake to Intel family")
Reported-by: Rui Zhang <rui.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20211119170832.1034220-1-tony.luck@intel.com
2021-11-30 14:05:48 -08:00
Kirill A. Shutemov
6da5175dbe x86/paravirt: Fix build PARAVIRT_XXL=y without XEN_PV
Kernel fails to compile with PARAVIRT_XXL=y if XEN_PV is not enabled:

	ld.lld: error: undefined symbol: xen_iret

It happens because INTERRUPT_RETURN defined to use xen_iret if
CONFIG_PARAVIRT_XXL enabled regardless of CONFIG_XEN_PV.

The issue is not visible in the current kernel because CONFIG_XEN_PV is
the only user of CONFIG_PARAVIRT_XXL and there's no way to enable them
separately.

Rework code to define INTERRUPT_RETURN based on CONFIG_XEN_PV, not
CONFIG_PARAVIRT_XXL.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20211130185533.32658-1-kirill.shutemov@linux.intel.com
2021-11-30 13:50:26 -08:00
Linus Torvalds
0757ca01d9 IOMMU Fixes for Linux v5.16-rc2:
Including:
 
   - Intel VT-d fixes:
     - Remove unused PASID_DISABLED
     - Fix RCU locking
     - Fix for the unmap_pages call-back
 
   - Rockchip RK3568 address mask fix
 
   - AMD IOMMUv2 log message clarification
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmGjhDgACgkQK/BELZcB
 GuMOiQ/+MPfnGSpKtEdID/p9d7yo97M/WKRhx5aT27hfW4gyyLtAO22vUTfZ/obK
 Mmwl/mthMm55BuzOes8/ka7zcrkaluJitFGWVLN6dzXZRTZc2nMdYQb1Y25IZVEP
 AwTDdfi8btPRYRrZ1vpXZtDzF5purGe0a5P0psUox7roSq+uWZcXOy3WZDbPNA9b
 pAwPTacnxbeSrElOF6rUyv5eXilvMDMG/1lF4/gFR89xAYcDIPpWNLuRWNYWxu2M
 qTbGBGXnSs/iIzRmBs5rhqbR5VQAb8tXQjjMBb5wWx9i6gaw217wHZPQrpS6ej3Z
 E8vpD/z/kxsMLBpiNM34an/3krgzm6alhA6dalZsvGdAvQHWj4LGma72lCBYsuM1
 +Lre4MvMJ1kvONH9aWoMJWZEITnL2pS3Afu72vhg+7ank4xI/5Ej3bO7mYfN2MV+
 XeEgIUv6HJioJ8ITwYyApJskDk0CLGvfmyHADwe+rj4A+YYzOgEv6kGQlZWzaNWa
 ISDibSMa4od4rw63CLS+rq4vK13MhfyerLZl9IMpvqcUKuWrb3SCiGjTLg5iWkL0
 eqoIIvMuUavmTNG6HMOwfUMNN963osLJPXjwMYkzWbbpusLqb38KvLCed3xGS8tb
 KwhFggAF9Qt++bjo9V8zk5FdAovW0n4m5Nu6WDFcMwPCxccy3xc=
 =+B76
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Intel VT-d fixes:
     - Remove unused PASID_DISABLED
     - Fix RCU locking
     - Fix for the unmap_pages call-back

 - Rockchip RK3568 address mask fix

 - AMD IOMMUv2 log message clarification

* tag 'iommu-fixes-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix unmap_pages support
  iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock()
  iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568
  iommu/amd: Clarify AMD IOMMUv2 initialization messages
  iommu/vt-d: Remove unused PASID_DISABLED
2021-11-28 07:17:38 -08:00
Joerg Roedel
21e96a2035 iommu/vt-d: Remove unused PASID_DISABLED
The macro is unused after commit 00ecd54013 so it can be removed.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 00ecd54013 ("iommu/vt-d: Clean up unused PASID updating functions")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211123105507.7654-2-joro@8bytes.org
2021-11-26 22:54:20 +01:00
Linus Torvalds
6b54698aec xen: branch for v5.16-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYaD8mwAKCRCAXGG7T9hj
 vspAAPwLA5SUorji33PTetwmcpLcoRJ3Q4HAPz+bOPdm9iL/PgD/V8MtxFrFebBs
 AJoa+GmBarUNn7XCqKnCcA64iXhrpQw=
 =6GY3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.16c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - Kconfig fix to make it possible to control building of the privcmd
   driver

 - three fixes for issues identified by the kernel test robot

 - a five-patch series to simplify timeout handling for Xen PV driver
   initialization

 - two patches to fix error paths in xenstore/xenbus driver
   initialization

* tag 'for-linus-5.16c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: make HYPERVISOR_set_debugreg() always_inline
  xen: make HYPERVISOR_get_debugreg() always_inline
  xen: detect uninitialized xenbus in xenbus_init
  xen: flag xen_snd_front to be not essential for system boot
  xen: flag pvcalls-front to be not essential for system boot
  xen: flag hvc_xen to be not essential for system boot
  xen: flag xen_drm_front to be not essential for system boot
  xen: add "not_essential" flag to struct xenbus_driver
  xen/pvh: add missing prototype to header
  xen: don't continue xenstore initialization in case of errors
  xen/privcmd: make option visible in Kconfig
2021-11-26 09:54:13 -08:00
Juergen Gross
00db58cf21 xen: make HYPERVISOR_set_debugreg() always_inline
HYPERVISOR_set_debugreg() is being called from noinstr code, so it
should be attributed "always_inline".

Fixes: 7361fac046 ("x86/xen: Make set_debugreg() noinstr")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20211125092056.24758-3-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-25 09:25:39 -06:00
Juergen Gross
b1c45ad53e xen: make HYPERVISOR_get_debugreg() always_inline
HYPERVISOR_get_debugreg() is being called from noinstr code, so it
should be attributed "always_inline".

Fixes: f4afb713e5 ("x86/xen: Make get_debugreg() noinstr")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20211125092056.24758-2-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-25 09:25:07 -06:00
Ard Biesheuvel
44f155b4b0 efi/libstub: x86/mixed: increase supported argument count
Increase the number of arguments supported by mixed mode calls, so that
we will be able to call into the TCG2 protocol to measure the initrd
and extend the associated PCR. This involves the TCG2 protocol's
hash_log_extend_event() method, which takes five arguments, three of
which are u64 and need to be split, producing a total of 8 outgoing

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20211119114745.1560453-3-ilias.apalodimas@linaro.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-11-21 17:08:10 +01:00
Ard Biesheuvel
4da87c5170 efi/libstub: add prototype of efi_tcg2_protocol::hash_log_extend_event()
Define the right prototype for efi_tcg2_protocol::hash_log_extend_event()
and add the required structs so we can start using it to measure the initrd
into the TPM if it was loaded by the EFI stub itself.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20211119114745.1560453-2-ilias.apalodimas@linaro.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-11-21 17:08:10 +01:00
Juergen Gross
2a0991929a xen/pvh: add missing prototype to header
The prototype of mem_map_via_hcall() is missing in its header, so add
it.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: a43fb7da53 ("xen/pvh: Move Xen code for getting mem map via hcall out of common file")
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20211119153913.21678-1-jgross@suse.com
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-19 16:36:56 -06:00
Linus Torvalds
c46e8ece96 Selftest changes:
* Cleanups for the perf test infrastructure and mapping hugepages
 
 * Avoid contention on mmap_sem when the guests start to run
 
 * Add event channel upcall support to xen_shinfo_test
 
 x86 changes:
 
 * Fixes for Xen emulation
 
 * Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
 
 * Fixes for migration of 32-bit nested guests on 64-bit hypervisor
 
 * Compilation fixes
 
 * More SEV cleanups
 
 Generic:
 
 * Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
 and num_online_cpus().  Most architectures were only using one of the two.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGV/PAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMrogf/eAyilGRQL7lLETn3DTVlgLVv82+z
 giX11HlUhUmATHIDluj/wVQUjVcY6AO4SnvFaudX7B+mibndkw4L19IubP/koQZu
 xnKSJTn+mVANdzz3UdsHl0ujbPdQJaFCIPW6iewbn2GRRZMwA5F3vMK/H09XRApL
 I7kq8CPA6sC0I3TPzPN3ROxigexzYunZmGQ4qQe0GUdtxHrJOYQN++ddmWbQoEIC
 gdFTyF7CUQ+lmJe0b/Y88yhISFAJCEBuKFlg9tOTuxSfwvPX6lUu+pi+utEx9M+O
 ckTSQli/apZ4RVcSzxMIwX/BciYqhqOz5uMG+w4DRlJixtGSHtjiEVxGxw==
 =Iij4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Selftest changes:

   - Cleanups for the perf test infrastructure and mapping hugepages

   - Avoid contention on mmap_sem when the guests start to run

   - Add event channel upcall support to xen_shinfo_test

  x86 changes:

   - Fixes for Xen emulation

   - Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache

   - Fixes for migration of 32-bit nested guests on 64-bit hypervisor

   - Compilation fixes

   - More SEV cleanups

  Generic:

   - Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
     and num_online_cpus(). Most architectures were only using one of
     the two"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
  KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
  KVM: x86: Assume a 64-bit hypercall for guests with protected state
  selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
  riscv: kvm: fix non-kernel-doc comment block
  KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
  KVM: SEV: Drop a redundant setting of sev->asid during initialization
  KVM: SEV: WARN if SEV-ES is marked active but SEV is not
  KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
  KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
  KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
  KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
  KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
  KVM: x86/xen: Use sizeof_field() instead of open-coding it
  KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
  KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
  ...
2021-11-18 12:05:22 -08:00
Paolo Bonzini
817506df9d Merge branch 'kvm-5.16-fixes' into kvm-master
* Fixes for Xen emulation

* Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache

* Fixes for migration of 32-bit nested guests on 64-bit hypervisor

* Compilation fixes

* More SEV cleanups
2021-11-18 02:11:57 -05:00
Maxim Levitsky
b8453cdcf2 KVM: x86/mmu: include EFER.LMA in extended mmu role
Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
guest root level and is not reflected in kvm_mmu_page_role.level when TDP
is in use.  When simply running the guest, it is impossible for EFER.LMA
and kvm_mmu.root_level to get out of sync, as the guest cannot transition
from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
first bouncing through a different MMU context.  And stuffing guest state
via KVM_SET_SREGS{,2} also ensures a full MMU context reset.

However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to
set guest state when migrating the VM while L2 is active, the vCPU state
will reflect L2, not L1.  If L1 is using TDP for L2, then root_mmu will
have been configured using L2's state, despite not being used for L2.  If
L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
be configured for guest PAE paging, but will match the mmu_role for 64-bit
paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit.

Alternatively, the root_mmu's role could be invalidated after a successful
KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily
tricky, and not even needed if L1 and L2 do have the same role (e.g., they
are both 64-bit guests and run with the same CR4).

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:42 -05:00
Sean Christopherson
33271a9e2b KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c
Now that all state needed for VMX's PT interrupt handler is exposed to
vmx.c (specifically the currently running vCPU), move the handler into
vmx.c where it belongs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211111020738.2512932-14-seanjc@google.com
2021-11-17 14:49:10 +01:00
Sean Christopherson
e1bfc24577 KVM: Move x86's perf guest info callbacks to generic KVM
Move x86's perf guest callbacks into common KVM, as they are semantically
identical to arm64's callbacks (the only other such KVM callbacks).
arm64 will convert to the common versions in a future patch.

Implement the necessary arm64 arch hooks now to avoid having to provide
stubs or a temporary #define (from x86) to avoid arm64 compilation errors
when CONFIG_GUEST_PERF_EVENTS=y.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211111020738.2512932-13-seanjc@google.com
2021-11-17 14:49:10 +01:00
Sean Christopherson
73cd107b96 KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable
Use the generic kvm_running_vcpu plus a new 'handling_intr_from_guest'
variable in kvm_arch_vcpu instead of the semi-redundant current_vcpu.
kvm_before/after_interrupt() must be called while the vCPU is loaded,
(which protects against preemption), thus kvm_running_vcpu is guaranteed
to be non-NULL when handling_intr_from_guest is non-zero.

Switching to kvm_get_running_vcpu() will allows moving KVM's perf
callbacks to generic code, and the new flag will be used in a future
patch to more precisely identify the "NMI from guest" case.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20211111020738.2512932-11-seanjc@google.com
2021-11-17 14:49:09 +01:00
Like Xu
b9f5621c95 perf/core: Rework guest callbacks to prepare for static_call support
To prepare for using static_calls to optimize perf's guest callbacks,
replace ->is_in_guest and ->is_user_mode with a new multiplexed hook
->state, tweak ->handle_intel_pt_intr to play nice with being called when
there is no active guest, and drop "guest" from ->get_guest_ip.

Return '0' from ->state and ->handle_intel_pt_intr to indicate "not in
guest" so that DEFINE_STATIC_CALL_RET0 can be used to define the static
calls, i.e. no callback == !guest.

[sean: extracted from static_call patch, fixed get_ip() bug, wrote changelog]
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20211111020738.2512932-7-seanjc@google.com
2021-11-17 14:49:07 +01:00
Sean Christopherson
f4b027c5c8 KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest
Override the Processor Trace (PT) interrupt handler for guest mode if and
only if PT is configured for host+guest mode, i.e. is being used
independently by both host and guest.  If PT is configured for system
mode, the host fully controls PT and must handle all events.

Fixes: 8479e04e7d ("KVM: x86: Inject PMI for KVM guest")
Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reported-by: Artem Kashkanov <artem.kashkanov@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211111020738.2512932-4-seanjc@google.com
2021-11-17 14:49:06 +01:00
Borislav Petkov
dbc4c70e3c x86/sev: Get rid of excessive use of defines
Remove all the defines of masks and bit positions for the GHCB MSR
protocol and use comments instead which correspond directly to the spec
so that following those can be a lot easier and straightforward with the
spec opened in parallel to the code.

Aligh vertically while at it.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211110220731.2396491-6-brijesh.singh@amd.com
2021-11-15 20:53:40 +01:00
Brijesh Singh
18c3933c19 x86/sev: Shorten GHCB terminate macro names
Shorten macro names for improved readability.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Link: https://lkml.kernel.org/r/20211110220731.2396491-5-brijesh.singh@amd.com
2021-11-15 20:31:16 +01:00
Tony Luck
03b122da74 x86/sgx: Hook arch_memory_failure() into mainline code
Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-6-tony.luck@intel.com
2021-11-15 11:13:16 -08:00
Yazen Ghannam
b3218ae477 x86/amd_nb, EDAC/amd64: Move DF Indirect Read to AMD64 EDAC
df_indirect_read() is used only for address translation. Move it to EDAC
along with the translation code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-3-yazen.ghannam@amd.com
2021-11-15 12:44:47 +01:00
Yazen Ghannam
0b746e8c1e x86/MCE/AMD, EDAC/amd64: Move address translation to AMD64 EDAC
The address translation code used for current AMD systems is
non-architectural. So move it to EDAC.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-2-yazen.ghannam@amd.com
2021-11-15 12:36:32 +01:00
Linus Torvalds
218cc8b860 A single fix for static calls to make the trampoline patching more robust
by placing explicit signature bytes after the call trampoline to prevent
 patching random other jumps like the CFI jump table entries.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDKsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZeNEADEFTbUJKd8812O9vkY9we1GDAtH7bY
 z6sYkh0/rPvYjdPfHuwqW8tUAl+CO2ne2X8FRPKgEdRLg44BY4HaMHmujdbGh3fh
 zpqynUBPoOIgtWxAPGdF+JxjrKlzjFd+WwjG3qBXOF3pjKgCc5knyjTucsl6ced3
 wF293rSYrIJ6uRv2TTNbM5hWJdC0arWbdMFnwQTxeZR54WLpu7Wfm+CCK41w0fAU
 nrfSsv73WEwpmAZNh04wsZsf7h6yCO7dCrIJD/3mpJtrUVBZXuZAKDzUzJPvHJal
 T8LcKwxZQAgPv0ubmOCrolj98Qp6PAPSdDJbzNsCJUYEbBqaB2inJ0PeHcZPspy9
 YyW00EHXD2UKm/GNF/DIlhoiNxOSh8Wn4b6H5ZRML50bS7jsMp8YVbticWEjItL6
 N4/61c45/uPILBS+Lysj0aqyj4TvagiuffJFWjw3YAQ+Gp/pzlJwRNjrw7/4DxAx
 KdpM881IKCR8UowBz3gIiA9FrJv2dGMqq31Rs1fjuauxkIX0gV3c64tAIRWrVscT
 k6GKGvHSis5cT97K3yhmNH0BUND+Skeku8G/SnTkefvcB85aU/7HBkLLJpw0w84F
 F6PTCaCJOEHrl3ADkilsi3z0sKWrph6aAzDEgp6Q6cmo9ulFAGw0bjuJb59xsvVK
 flIvTLUY3n76FA==
 =dgiF
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 static call update from Thomas Gleixner:
 "A single fix for static calls to make the trampoline patching more
  robust by placing explicit signature bytes after the call trampoline
  to prevent patching random other jumps like the CFI jump table
  entries"

* tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  static_call,x86: Robustify trampoline patching
2021-11-14 10:30:17 -08:00
Linus Torvalds
fc661f2dcb - Avoid touching ~100 config files in order to be able to select
the preemption model
 
 - clear cluster CPU masks too, on the CPU unplug path
 
 - prevent use-after-free in cfs
 
 - Prevent a race condition when updating CPU cache domains
 
 - Factor out common shared part of smp_prepare_cpus() into a common
 helper which can be called by both baremetal and Xen, in order to fix a
 booting of Xen PV guests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ8HcACgkQEsHwGGHe
 VUouoA//WAZ/dZu7IiM06JhZWswa2yNsdU8qQHys81lEqstaBqiWuZdg1qJTVIir
 2d0aN0keiPcsLyAsp1UJ2g/K/7D5vSJWDzsHKfEAToiAm8Tntai2LlSocWWfeSQm
 10grDHWpEHbj0hTHTA6HYOr2WbY4/LnR4cdL0WobIzivIrRTx49d0XUOUfWLP5KX
 60uM6dSjwpJrQUnvzk+bhGiHVmutFrEJy+UU/0o+nxkdhwraNiSbLi0007BGRCof
 6dokRRvLLR09dl1LMG51gVjQch4j/lCx6EWWUhYOFeV3I3gibSCNkmu7dpmMCBTR
 QWO01cR9gyFN4xQ2is4I36M5L0/8T+sbGvvXIXNDT/XWr0/p+g6p2mx0cd2XiYIr
 ZthGRcxxV/KGmxfPaygKS9tpQseMEIrdd6VjAnGfZ3OS6CtUvYt8d0B2Soj8FALQ
 N9fMXDIEP3uUZim8UvCT6HBKlj9LR5uI5n+dAQ6uzsenO9WqeGeldc/N26/+osdN
 vo4lNYTqiXJPhJvunYW5t4j5JnUa3grDHioAPWaQRJlWtEZBGKs9SXTcweg/KURb
 mNfe1RfSlGJt28RD3E18gXeSS7xWdKgpcVX1rmW/9tUjX04NNDWjq4sAzOj7c+Ir
 4sr78XgCY0pUxFaFYxvQWFUy7wcm0zAczo1RGUhcDTf1edDEvjo=
 =s2MX
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Avoid touching ~100 config files in order to be able to select the
   preemption model

 - clear cluster CPU masks too, on the CPU unplug path

 - prevent use-after-free in cfs

 - Prevent a race condition when updating CPU cache domains

 - Factor out common shared part of smp_prepare_cpus() into a common
   helper which can be called by both baremetal and Xen, in order to fix
   a booting of Xen PV guests

* tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  preempt: Restore preemption model selection configs
  arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
  sched/fair: Prevent dead task groups from regaining cfs_rq's
  sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  x86/smp: Factor out parts of native_smp_prepare_cpus()
2021-11-14 09:39:03 -08:00
Linus Torvalds
1654e95ee3 - Add the model number of a new, Raptor Lake CPU, to intel-family.h
- Do not log spurious corrected MCEs on SKL too, due to an erratum
 
 - Clarify the path of paravirt ops patches upstream
 
 - Add an optimization to avoid writing out AMX components to sigframes
 when former are in init state
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ3CgACgkQEsHwGGHe
 VUoLAA/+NXRvcBHYkLaByT9f4OI6B79HzyguIBSfipYiw8ir0H7uEdV5FUCCUgCz
 egBRVFpOsXWt1teeuu6ViO+WBHncUxG/ryZ0ka35lri/3kuVYnugZExWDs4MrGR5
 vehRXehOxYNRaYc3oLYjubSbxqF1nWz3WWfGfhiBKk0jT/S1T9tX6lsRXlKsJCgj
 M4x5aqBWP8HTbFQfqjdHwagNitmSKzgjZvMcC4UWcql33ZCycbjvRdrAzBtw7WRI
 UBvgxWVmeMoagu5fqEOoph1oSoFxWuFrweFUjnxJmT6uZrTsfF7BVgXkxdG6eYUy
 2Xogcd4bPDBiRgbs0vPEog1tyyrKHOQ6p1pvksySKMPq6ULcSZ6hBpEZRpgr6Y9u
 0jB3P6weQgCckx5Hd+iwvX1a+GvEuHSEqAE+j160wFyrsBS5Cir3P1WqthWaPd5I
 3nH3h955PokUHPUioUhdf+8cfuP6h6K0nz1gdYI8GR8+fJHhEceT+pLLeyIxj/VM
 yr+bq+V7D6Cg62w3z3s9Dzg2XKpxStu1R9L1N/K8MtIGf6Uc7paL6xR27XxhmBp5
 Y6bGZw0mxxFhp6AEsFWo3rwLL9Dl5DmFcfgUHHpPK5VP0pVWp48Uapx2Hi2/JzAo
 c1o4UkPQa/EZJBPTklmGkS1JNp/2TsEL4Fw7sew+j7DWtsJpCfk=
 =Ge2T
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add the model number of a new, Raptor Lake CPU, to intel-family.h

 - Do not log spurious corrected MCEs on SKL too, due to an erratum

 - Clarify the path of paravirt ops patches upstream

 - Add an optimization to avoid writing out AMX components to sigframes
   when former are in init state

* tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Raptor Lake to Intel family
  x86/mce: Add errata workaround for Skylake SKX37
  MAINTAINERS: Add some information to PARAVIRT_OPS entry
  x86/fpu: Optimize out sigframe xfeatures when in init state
2021-11-14 09:29:03 -08:00
Tony Luck
fbdb5e8f29 x86/cpu: Add Raptor Lake to Intel family
Add model ID for Raptor Lake.

[ dhansen: These get added as soon as possible so that folks doing
  development can leverage them. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20211112182835.924977-1-tony.luck@intel.com
2021-11-12 11:46:06 -08:00
Paolo Bonzini
f5396f2d82 Merge branch 'kvm-5.16-fixes' into kvm-master
* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status

* Fix selftests on APICv machines

* Fix sparse warnings

* Fix detection of KVM features in CPUID

* Cleanups for bogus writes to MSR_KVM_PV_EOI_EN

* Fixes and cleanups for MSR bitmap handling

* Cleanups for INVPCID

* Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
2021-11-11 11:03:05 -05:00
Paolo Bonzini
1f05833193 Merge branch 'kvm-sev-move-context' into kvm-master
Add support for AMD SEV and SEV-ES intra-host migration support.  Intra
host migration provides a low-cost mechanism for userspace VMM upgrades.

In the common case for intra host migration, we can rely on the normal
ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other
confidential compute environments make most of this information opaque, and
render KVM ioctls such as "KVM_GET_REGS" irrelevant.  As a result, we need
the ability to pass this opaque metadata from one VMM to the next. The
easiest way to do this is to leave this data in the kernel, and transfer
ownership of the metadata from one KVM VM (or vCPU) to the next.  In-kernel
hand off makes it possible to move any data that would be
unsafe/impossible for the kernel to hand directly to userspace, and
cannot be reproduced using data that can be handed to userspace.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 11:02:58 -05:00
Vitaly Kuznetsov
da1bfd52b9 KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS
KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
either num_online_cpus() or num_present_cpus(), s390 has multiple
constants depending on hardware features. On x86, KVM reports an
arbitrary value of '710' which is supposed to be the maximum tested
value but it's possible to test all KVM_MAX_VCPUS even when there are
less physical CPUs available.

Drop the arbitrary '710' value and return num_online_cpus() on x86 as
well. The recommendation will match other architectures and will mean
'no CPU overcommit'.

For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
when the requested vCPU number exceeds it. The static limit of '710'
is quite weird as smaller systems with just a few physical CPUs should
certainly "recommend" less.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211111134733.86601-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:24 -05:00
Paul Durrant
760849b147 KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES
Currently when kvm_update_cpuid_runtime() runs, it assumes that the
KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
however, if Hyper-V support is enabled. In this case the KVM leaves will
be offset.

This patch introdues as new 'kvm_cpuid_base' field into struct
kvm_vcpu_arch to track the location of the KVM leaves and function
kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
target the correct leaf.

NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
      into processor.h to avoid having duplicate code for the iteration
      over possible hypervisor base leaves.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Message-Id: <20211105095101.5384-3-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:21 -05:00
Maxim Levitsky
cae72dcc3b KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active
KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
standard kvm's inject_pending_event, and not via APICv/AVIC.

Since this is a debug feature, just inhibit APICv/AVIC while
KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.

Fixes: 61e5f69ef0 ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:20 -05:00
David Woodhouse
7e2175ebd6 KVM: x86: Fix recording of guest steal time / preempted status
In commit b043138246 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
not missed") we switched to using a gfn_to_pfn_cache for accessing the
guest steal time structure in order to allow for an atomic xchg of the
preempted field. This has a couple of problems.

Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
guest vCPU using an IOMEM page for its steal time would never have its
preempted field set.

Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
should have been. There are two stages to the GFN->PFN conversion;
first the GFN is converted to a userspace HVA, and then that HVA is
looked up in the process page tables to find the underlying host PFN.
Correct invalidation of the latter would require being hooked up to the
MMU notifiers, but that doesn't happen---so it just keeps mapping and
unmapping the *wrong* PFN after the userspace page tables change.

In the !IOMEM case at least the stale page *is* pinned all the time it's
cached, so it won't be freed and reused by anyone else while still
receiving the steal time updates. The map/unmap dance only takes care
of the KVM administrivia such as marking the page dirty.

Until the gfn_to_pfn cache handles the remapping automatically by
integrating with the MMU notifiers, we might as well not get a
kernel mapping of it, and use the perfectly serviceable userspace HVA
that we already have.  We just need to implement the atomic xchg on
the userspace address with appropriate exception handling, which is
fairly trivial.

Cc: stable@vger.kernel.org
Fixes: b043138246 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
[I didn't entirely agree with David's assessment of the
 usefulness of the gfn_to_pfn cache, and integrated the outcome
 of the discussion in the above commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:19 -05:00
Peter Gonda
b56639318b KVM: SEV: Add support for SEV intra host migration
For SEV to work with intra host migration, contents of the SEV info struct
such as the ASID (used to index the encryption key in the AMD SP) and
the list of memory regions need to be transferred to the target VM.
This change adds a commands for a target VMM to get a source SEV VM's sev
info.

Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20211021174303.385706-3-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:35:27 -05:00
Paolo Bonzini
b9ecb9a997 Merge branch 'kvm-guest-sev-migration' into kvm-master
Add guest api and guest kernel support for SEV live migration.

Introduces a new hypercall to notify the host of changes to the page
encryption status.  If the page is encrypted then it must be migrated
through the SEV firmware or a helper VM sharing the key.  If page is
not encrypted then it can be migrated normally by userspace.  This new
hypercall is invoked using paravirt_ops.

Conflicts: sev_active() replaced by cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT).
2021-11-11 07:40:26 -05:00
Ashish Kalra
f4495615d7 x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature.
The guest support for detecting and enabling SEV Live migration
feature uses the following logic :

 - kvm_init_plaform() checks if its booted under the EFI

   - If not EFI,

     i) if kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), issue a wrmsrl()
         to enable the SEV live migration support

   - If EFI,

     i) If kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), read
        the UEFI variable which indicates OVMF support for live migration

     ii) the variable indicates live migration is supported, issue a wrmsrl() to
          enable the SEV live migration support

The EFI live migration check is done using a late_initcall() callback.

Also, ensure that _bss_decrypted section is marked as decrypted in the
hypervisor's guest page encryption status tracking.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Steve Rutherford <srutherford@google.com>
Message-Id: <b4453e4c87103ebef12217d2505ea99a1c3e0f0f.1629726117.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 07:37:36 -05:00
Brijesh Singh
064ce6c550 mm: x86: Invoke hypercall when page encryption status is changed
Invoke a hypercall when a memory region is changed from encrypted ->
decrypted and vice versa. Hypervisor needs to know the page encryption
status during the guest migration.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Message-Id: <0a237d5bb08793916c7790a3e653a2cbe7485761.1629726117.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 07:37:24 -05:00
Brijesh Singh
08c2336df7 x86/kvm: Add AMD SEV specific Hypercall3
KVM hypercall framework relies on alternative framework to patch the
VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
apply_alternative() is called then it defaults to VMCALL. The approach
works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
will be able to decode the instruction and do the right things. But
when SEV is active, guest memory is encrypted with guest key and
hypervisor will not be able to decode the instruction bytes.

To highlight the need to provide this interface, capturing the
flow of apply_alternatives() :
setup_arch() call init_hypervisor_platform() which detects
the hypervisor platform the kernel is running under and then the
hypervisor specific initialization code can make early hypercalls.
For example, KVM specific initialization in case of SEV will try
to mark the "__bss_decrypted" section's encryption state via early
page encryption status hypercalls.

Now, apply_alternatives() is called much later when setup_arch()
calls check_bugs(), so we do need some kind of an early,
pre-alternatives hypercall interface. Other cases of pre-alternatives
hypercalls include marking per-cpu GHCB pages as decrypted on SEV-ES
and per-cpu apf_reason, steal_time and kvm_apic_eoi as decrypted for
SEV generally.

Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
will be used by the SEV guest to notify encrypted pages to the hypervisor.

This kvm_sev_hypercall3() function is abstracted and used as follows :
All these early hypercalls are made through early_set_memory_XX() interfaces,
which in turn invoke pv_ops (paravirt_ops).

This early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed()
is a generic interface and can easily have SEV, TDX and any other
future platform specific abstractions added to it.

Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup to
invoke kvm_sev_hypercall3() in case of SEV.

Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed()
can be setup to a TDX specific callback.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <6fd25c749205dd0b1eb492c60d41b124760cc6ae.1629726117.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 07:37:10 -05:00
Boris Ostrovsky
ce2612b670 x86/smp: Factor out parts of native_smp_prepare_cpus()
Commit 66558b730f ("sched: Add cluster scheduler level for x86")
introduced cpu_l2c_shared_map mask which is expected to be initialized
by smp_op.smp_prepare_cpus(). That commit only updated
native_smp_prepare_cpus() version but not xen_pv_smp_prepare_cpus().
As result Xen PV guests crash in set_cpu_sibling_map().

While the new mask can be allocated in xen_pv_smp_prepare_cpus() one can
see that both versions of smp_prepare_cpus ops share a number of common
operations that can be factored out. So do that instead.

Fixes: 66558b730f ("sched: Add cluster scheduler level for x86")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/1635896196-18961-1-git-send-email-boris.ostrovsky@oracle.com
2021-11-11 13:09:32 +01:00
Peter Zijlstra
2105a92748 static_call,x86: Robustify trampoline patching
Add a few signature bytes after the static call trampoline and verify
those bytes match before patching the trampoline. This avoids patching
random other JMPs (such as CFI jump-table entries) instead.

These bytes decode as:

   d:   53                      push   %rbx
   e:   43 54                   rex.XB push %r12

And happen to spell "SCT".

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211030074758.GT174703@worktop.programming.kicks-ass.net
2021-11-11 13:09:31 +01:00
Linus Torvalds
e8f023caee asm-generic: asm/syscall.h cleanup
This is a single cleanup from Peter Collingbourne, removing
 some dead code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmGKm+kACgkQmmx57+YA
 GNkksg/7BUxJrWFrQmLA3fhzh4wG3KswdrKTGQMf0jRVI1n77vmdfig3hEJmMekH
 0SZoIYmPztOSj34+6p4NxuqY/Sk62oYLr8Awo8ZLDhIBWNUJE8UWC07Qb/3DpGbp
 JfD7/8mXu10htgM85aGlhaVLnHvvqUBR8PlkJUGNxuY5Gy2L+eCkpwCAYlZpSOqr
 vs0BJWFY1LzO1POcjIq4/IM0PvcU2ncLB5XoxDjjIIWnWKyHzWY21ZvoeaNumvou
 xqQ/Hj8Sc+ufS0yNlSgIC+bJP0bp1bSw/dALKr8oYxLt7X9LELVY3WXyRH+It4nS
 b6HPYmga26NVq/u7RrylBA+2fRCDB8E6z73gHt4SeHrRDEvFjhNzIyV+aXPae/dY
 XI/pjiwpG6k6FpSnF69YZJ/Y+GmUA90V/Jq8aLFZhGz8SgpjRl+2foEUSDhRVXCA
 jGB1Y388m0e6jPlVJROB7ORzXMd8K5iciyUGqtAI87QCOtPozn10ruh5RdziguKm
 kDW5IKy9E4l1ch8WRprVbgV5Ew+QWKS1JIbyjDaX3jN0lUPCqgwkwvRxxtgdFnVA
 Lq5BiUdraSWFUr84rLU3gCRU0+VoEdyZYI+bQGGNlQ4ovmLYU1nLmCU/azMvSEgz
 ZsoM5YdffShxUtfwMg+W67RpYOjSnV55ZzwN0+d1C2oqz9xECXc=
 =8EFP
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanup from Arnd Bergmann:
 "This is a single cleanup from Peter Collingbourne, removing some dead
  code"

* tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  arch: remove unused function syscall_set_arguments()
2021-11-10 11:22:03 -08:00
Linus Torvalds
bf98ecbbae xen: branch for v5.16-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYYp8HgAKCRCAXGG7T9hj
 vmuVAP4whjbyIi4IxYEOnE6On0aD0AgUMiFa7QXrDZi6NXUQIwEAnggLFe+rEG5C
 Fwi/cEXSHrRgveqrgD4GYEr6l0GTxwM=
 =/fMa
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.16b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - a series to speed up the boot of Xen PV guests

 - some cleanups in Xen related code

 - replacement of license texts with the appropriate SPDX headers and
   fixing of wrong SPDX headers in Xen header files

 - a small series making paravirtualized interrupt masking much simpler
   and at the same time removing complaints of objtool

 - a fix for Xen ballooning hogging workqueues for too long

 - enablement of the Xen pciback driver for Arm

 - some further small fixes/enhancements

* tag 'for-linus-5.16b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (22 commits)
  xen/balloon: fix unused-variable warning
  xen/balloon: rename alloc/free_xenballooned_pages
  xen/balloon: add late_initcall_sync() for initial ballooning done
  x86/xen: remove 32-bit awareness from startup_xen
  xen: remove highmem remnants
  xen: allow pv-only hypercalls only with CONFIG_XEN_PV
  x86/xen: remove 32-bit pv leftovers
  xen-pciback: allow compiling on other archs than x86
  x86/xen: switch initial pvops IRQ functions to dummy ones
  x86/xen: remove xen_have_vcpu_info_placement flag
  x86/pvh: add prototype for xen_pvh_init()
  xen: Fix implicit type conversion
  xen: fix wrong SPDX headers of Xen related headers
  xen/pvcalls-back: Remove redundant 'flush_workqueue()' calls
  x86/xen: Remove redundant irq_enter/exit() invocations
  xen-pciback: Fix return in pm_ctrl_init()
  xen/x86: restrict PV Dom0 identity mapping
  xen/x86: there's no highmem anymore in PV mode
  xen/x86: adjust handling of the L3 user vsyscall special page table
  xen/x86: adjust xen_set_fixmap()
  ...
2021-11-10 11:14:21 -08:00
Linus Torvalds
7e113d01f5 IOMMU Updates for Linux v5.16:
Including:
 
   - Intel IOMMU Updates fro Lu Baolu:
     - Dump DMAR translation structure when DMA fault occurs
     - An optimization in the page table manipulation code
     - Use second level for GPA->HPA translation
     - Various cleanups
 
   - Arm SMMU Updates from Will
     - Minor optimisations to SMMUv3 command creation and submission
     - Numerous new compatible string for Qualcomm SMMUv2 implementations
 
   - Fixes for the SWIOTLB based implemenation of dma-iommu code for
     untrusted devices
 
   - Add support for r8a779a0 to the Renesas IOMMU driver and DT matching
     code for r8a77980
 
   - A couple of cleanups and fixes for the Apple DART IOMMU driver
 
   - Make use of generic report_iommu_fault() interface in the AMD IOMMU
     driver
 
   - Various smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmGD6NQACgkQK/BELZcB
 GuOSfg/9FKXl5ym86BP3tAS1fREKH7p59JRGZrrIR89NyHAcEUjtNG3YLPao+YxU
 3CDgLkru+vlDpYY54QoyqcY5FgIHT3Cna/Cdk4zekRmSO/14gHp47jtZRheOUzLF
 rvwfaplcbbtT8akpsVFzvw8YpQLGSDiDQSl7xL2+40Z9hiYX/gS9Af+PH98tAXsa
 yZKZj6gU+JXM58VihO3M7umyE06tovyBaYgcsBZtbf66bGc0ySu+fe75UVWbueRt
 Z8jwqa7TUfVXiYC8h+LqtGET6gtzNSsxAU3VllRe7Brf6K8i/yaRs/TO2Hp83d7/
 q/fcK3vNQ5v3aDNci/DjBB8SEySzCmRz/9ocCOCx8ByuRp+5lwVRPPq3WcUMtsZY
 QpYo9Fk7luFz2Gj5LObKAVBvOoeBZ5Km3oPs4HVmQ6epxn/rVckJDnJnVSLJuATq
 tSZC2heRfFlg1dT6WFaynCTP2RI1LlNEdKhHirV6L368rSjmF0ZdQxdTpHULsHr1
 yMjqL21OfcSkLW91rvfb3g68EsIwDbCPGTOlQWZLmAtwOWtHSCLPgwwEG7WefZbH
 yaslpmlUTOurUnFmpxlfLicy5sqsBL2ASzGJkEKrgunw82Ke96zzkRzi+9j9HeS6
 g0AyIWMi1cUAjONVUZtV4yjImXh63HIPiKx730a9teodusoxm+Q=
 =waUR
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Intel IOMMU Updates fro Lu Baolu:
     - Dump DMAR translation structure when DMA fault occurs
     - An optimization in the page table manipulation code
     - Use second level for GPA->HPA translation
     - Various cleanups

 - Arm SMMU Updates from Will
     - Minor optimisations to SMMUv3 command creation and submission
     - Numerous new compatible string for Qualcomm SMMUv2 implementations

 - Fixes for the SWIOTLB based implemenation of dma-iommu code for
   untrusted devices

 - Add support for r8a779a0 to the Renesas IOMMU driver and DT matching
   code for r8a77980

 - A couple of cleanups and fixes for the Apple DART IOMMU driver

 - Make use of generic report_iommu_fault() interface in the AMD IOMMU
   driver

 - Various smaller fixes and cleanups

* tag 'iommu-updates-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (35 commits)
  iommu/dma: Fix incorrect error return on iommu deferred attach
  iommu/dart: Initialize DART_STREAMS_ENABLE
  iommu/dma: Use kvcalloc() instead of kvzalloc()
  iommu/tegra-smmu: Use devm_bitmap_zalloc when applicable
  iommu/dart: Use kmemdup instead of kzalloc and memcpy
  iommu/vt-d: Avoid duplicate removing in __domain_mapping()
  iommu/vt-d: Convert the return type of first_pte_in_page to bool
  iommu/vt-d: Clean up unused PASID updating functions
  iommu/vt-d: Delete dev_has_feat callback
  iommu/vt-d: Use second level for GPA->HPA translation
  iommu/vt-d: Check FL and SL capability sanity in scalable mode
  iommu/vt-d: Remove duplicate identity domain flag
  iommu/vt-d: Dump DMAR translation structure when DMA fault occurs
  iommu/vt-d: Do not falsely log intel_iommu is unsupported kernel option
  iommu/arm-smmu-qcom: Request direct mapping for modem device
  iommu: arm-smmu-qcom: Add compatible for QCM2290
  dt-bindings: arm-smmu: Add compatible for QCM2290 SoC
  iommu/arm-smmu-qcom: Add SM6350 SMMU compatible
  dt-bindings: arm-smmu: Add compatible for SM6350 SoC
  iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd()
  ...
2021-11-04 11:11:24 -07:00
Linus Torvalds
95faf6ba65 Driver core changes for 5.16-rc1
Here is the big set of driver core changes for 5.16-rc1.
 
 All of these have been in linux-next for a while now with no reported
 problems.
 
 Included in here are:
 	- big update and cleanup of the sysfs abi documentation files
 	  and scripts from Mauro.  We are almost at the place where we
 	  can properly check that the running kernel's sysfs abi is
 	  documented fully.
 	- firmware loader updates
 	- dyndbg updates
 	- kernfs cleanups and fixes from Christoph
 	- device property updates
 	- component fix
 	- other minor driver core cleanups and fixes
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYYPbjQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ync9gCfXKMUI1GAnCfJWAwTdTcd18q5akoAoMw32/AH
 0yh5TjAWFyFd7xz5d7qs
 =itsC
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the big set of driver core changes for 5.16-rc1.

  All of these have been in linux-next for a while now with no reported
  problems.

  Included in here are:

   - big update and cleanup of the sysfs abi documentation files and
     scripts from Mauro. We are almost at the place where we can
     properly check that the running kernel's sysfs abi is documented
     fully.

   - firmware loader updates

   - dyndbg updates

   - kernfs cleanups and fixes from Christoph

   - device property updates

   - component fix

   - other minor driver core cleanups and fixes"

* tag 'driver-core-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (122 commits)
  device property: Drop redundant NULL checks
  x86/build: Tuck away built-in firmware under FW_LOADER
  vmlinux.lds.h: wrap built-in firmware support under FW_LOADER
  firmware_loader: move struct builtin_fw to the only place used
  x86/microcode: Use the firmware_loader built-in API
  firmware_loader: remove old DECLARE_BUILTIN_FIRMWARE()
  firmware_loader: formalize built-in firmware API
  component: do not leave master devres group open after bind
  dyndbg: refine verbosity 1-4 summary-detail
  gpiolib: acpi: Replace custom code with device_match_acpi_handle()
  i2c: acpi: Replace custom function with device_match_acpi_handle()
  driver core: Provide device_match_acpi_handle() helper
  dyndbg: fix spurious vNpr_info change
  dyndbg: no vpr-info on empty queries
  dyndbg: vpr-info on remove-module complete, not starting
  device property: Add missed header in fwnode.h
  Documentation: dyndbg: Improve cli param examples
  dyndbg: Remove support for ddebug_query param
  dyndbg: make dyndbg a known cli param
  dyndbg: show module in vpr-info in dd-exec-queries
  ...
2021-11-04 08:32:38 -07:00
Dave Hansen
30d02551ba x86/fpu: Optimize out sigframe xfeatures when in init state
tl;dr: AMX state is ~8k.  Signal frames can have space for this
~8k and each signal entry writes out all 8k even if it is zeros.
Skip writing zeros for AMX to speed up signal delivery by about
4% overall when AMX is in its init state.

This is a user-visible change to the sigframe ABI.

== Hardware XSAVE Background ==

XSAVE state components may be tracked by the processor as being
in their initial configuration.  Software can detect which
features are in this configuration by looking at the XSTATE_BV
field in an XSAVE buffer or with the XGETBV(1) instruction.

Both the XSAVE and XSAVEOPT instructions enumerate features s
being in the initial configuration via the XSTATE_BV field in the
XSAVE header,  However, XSAVEOPT declines to actually write
features in their initial configuration to the buffer.  XSAVE
writes the feature unconditionally, regardless of whether it is
in the initial configuration or not.

Basically, XSAVE users never need to inspect XSTATE_BV to
determine if the feature has been written to the buffer.
XSAVEOPT users *do* need to inspect XSTATE_BV.  They might also
need to clear out the buffer if they want to make an isolated
change to the state, like modifying one register.

== Software Signal / XSAVE Background ==

Signal frames have historically been written with XSAVE itself.
Each state is written in its entirety, regardless of being in its
initial configuration.

In other words, the signal frame ABI uses the XSAVE behavior, not
the XSAVEOPT behavior.

== Problem ==

This means that any application which has acquired permission to
use AMX via ARCH_REQ_XCOMP_PERM will write 8k of state to the
signal frame.  This 8k write will occur even when AMX was in its
initial configuration and software *knows* this because of
XSTATE_BV.

This problem also exists to a lesser degree with AVX-512 and its
2k of state.  However, AVX-512 use does not require
ARCH_REQ_XCOMP_PERM and is more likely to have existing users
which would be impacted by any change in behavior.

== Solution ==

Stop writing out AMX xfeatures which are in their initial state
to the signal frame.  This effectively makes the signal frame
XSAVE buffer look as if it were written with a combination of
XSAVEOPT and XSAVE behavior.  Userspace which handles XSAVEOPT-
style buffers should be able to handle this naturally.

For now, include only the AMX xfeatures: XTILE and XTILEDATA in
this new behavior.  These require new ABI to use anyway, which
makes their users very unlikely to be broken.  This XSAVEOPT-like
behavior should be expected for all future dynamic xfeatures.  It
may also be extended to legacy features like AVX-512 in the
future.

Only attempt this optimization on systems with dynamic features.
Disable dynamic feature support (XFD) if XGETBV1 is unavailable
by adding a CPUID dependency.

This has been measured to reduce the *overall* cycle cost of
signal delivery by about 4%.

Fixes: 2308ee57d9 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: "Chang S. Bae" <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/r/20211102224750.FA412E26@davehans-spike.ostc.intel.com
2021-11-03 22:42:35 +01:00
Linus Torvalds
56d3375448 drm for 5.16-rc1
core:
 - improve dma_fence, lease and resv documentation
 - shmem-helpers: allocate WC pages on x86, use vmf_insert_pin
 - sched fixes/improvements
 - allow empty drm leases
 - add dma resv iterator
 - add more DP 2.0 headers
 - DP MST helper improvements for DP2.0
 
 dma-buf:
 - avoid warnings, remove fence trace macros
 
 bridge:
 - new helper to get rid of panels
 - probe improvements for it66121
 - enable DSI EOTP for anx7625
 
 fbdev:
 - efifb: release runtime PM on destroy
 
 ttm:
 - kerneldoc switch
 - helper to clear all DMA mappings
 - pool shrinker optimizaton
 - remove ttm_tt_destroy_common
 - update ttm_move_memcpy for async use
 
 panel:
 - add new panel-edp driver
 
 amdgpu:
  - Initial DP 2.0 support
  - Initial USB4 DP tunnelling support
  - Aldebaran MCE support
  - Modifier support for DCC image stores for GFX 10.3
  - Display rework for better FP code handling
  - Yellow Carp/Cyan Skillfish updates
  - Cyan Skillfish display support
  - convert vega/navi to IP discovery asic enumeration
  - validate IP discovery table
  - RAS improvements
  - Lots of fixes
 
  i915:
  - DG1 PCI IDs + LMEM discovery/placement
  - DG1 GuC submission by default
  - ADL-S PCI IDs updated + enabled by default
  - ADL-P (XE_LPD) fixed and updates
  - DG2 display fixes
  - PXP protected object support for Gen12 integrated
  - expose multi-LRC submission interface for GuC
  - export logical engine instance to user
  - Disable engine bonding on Gen12+
  - PSR cleanup
  - PSR2 selective fetch by default
  - DP 2.0 prep work
  - VESA vendor block + MSO use of it
  - FBC refactor
  - try again to fix fast-narrow vs slow-wide eDP training
  - use THP when IOMMU enabled
  - LMEM backup/restore for suspend/resume
  - locking simplification
  - GuC major reworking
  - async flip VT-D workaround changes
  - DP link training improvements
  - misc display refactorings
 
 bochs:
 - new PCI ID
 
 rcar-du:
 - Non-contiguious buffer import support for rcar-du
 - r8a779a0 support prep
 
 omapdrm:
 - COMPILE_TEST fixes
 
 sti:
 - COMPILE_TEST fixes
 
 msm:
 - fence ordering improvements
 - eDP support in DP sub-driver
 - dpu irq handling cleanup
 - CRC support for making igt happy
 - NO_CONNECTOR bridge support
 - dsi: 14nm phy support for msm8953
 - mdp5: msm8x53, sdm450, sdm632 support
 
 stm:
 - layer alpha + zpo support
 
 v3d:
 - fix Vulkan CTS failure
 - support multiple sync objects
 
 gud:
 - add R8/RGB332/RGB888 pixel formats
 
 vc4:
 - convert to new bridge helpers
 
 vgem:
 - use shmem helpers
 
 virtio:
 - support mapping exported vram
 
 zte:
 - remove obsolete driver
 
 rockchip:
 - use bridge attach no connector for LVDS/RGB
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmGByPYACgkQDHTzWXnE
 hr6fxA//cXUvTHlEtF7UJDBRAYv+9lXH39NbGYU4aLJuBNlZztCuUi5JOSyDFDH1
 N9VI5biVseev2PEnCzJUubWxTqbUO7FBQTw0TyvZ4Eqn+UZMuFeo0dvdKZRAkvjV
 VHSUc0fm0+WSYanKUK7XK0fwG8aE6JVyYngzgKPSjifhszTdiiRsbU21iTinFhkS
 rgh3HEVELp+LqfoG4qzAYqFUjYqUjvCjd/hX/UkzCII8ZXKr38/4127e95443WOk
 +jes0gWGJe9TvSDrqo9TMx4qukcOniINFUvnzoD2RhOS+Jzr/i5rBh51Xy92g3NO
 Q7hy6byZdk/ZO/MXCDQ2giUOkBiqn5fQjlRGQp4iAZYw9pb3HU+/xrTq0BWVWd8o
 /vmzZYEKKU/sCGpxVDMZxsHV3mXIuVBvuZq6bjmSGcybgOBCiDx5F/Rum4nY2yHp
 lr3cuc0HP3m3f4b/HVvACO4tGd1nDDpVcon7CuhBB7HB7t6Zl9u18qc/qFw0tCTh
 3sgAhno6XFXtPFcSX2KAeeg0mhKDKKrsOnq5y3bDRr05Z0jLocJk95aXEKs6em4j
 gbyHwNaX3CHtiCnFn2/5169+n1K7zqHBtVSGmQlmFDv55rcdx7L3Spk7tCahQeSQ
 ur24r+sEggm8d5Wjl+MYq6wW3oP31s04JFaeV6oCkaSp1wS+alg=
 =jdhH
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2021-11-03' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Summary below. i915 starts to add support for DG2 GPUs, enables DG1
  and ADL-S support by default, lots of work to enable DisplayPort 2.0
  across drivers. Lots of documentation updates and fixes across the
  board.

  core:
   - improve dma_fence, lease and resv documentation
   - shmem-helpers: allocate WC pages on x86, use vmf_insert_pin
   - sched fixes/improvements
   - allow empty drm leases
   - add dma resv iterator
   - add more DP 2.0 headers
   - DP MST helper improvements for DP2.0

  dma-buf:
   - avoid warnings, remove fence trace macros

  bridge:
   - new helper to get rid of panels
   - probe improvements for it66121
   - enable DSI EOTP for anx7625

  fbdev:
   - efifb: release runtime PM on destroy

  ttm:
   - kerneldoc switch
   - helper to clear all DMA mappings
   - pool shrinker optimizaton
   - remove ttm_tt_destroy_common
   - update ttm_move_memcpy for async use

  panel:
   - add new panel-edp driver

  amdgpu:
   - Initial DP 2.0 support
   - Initial USB4 DP tunnelling support
   - Aldebaran MCE support
   - Modifier support for DCC image stores for GFX 10.3
   - Display rework for better FP code handling
   - Yellow Carp/Cyan Skillfish updates
   - Cyan Skillfish display support
   - convert vega/navi to IP discovery asic enumeration
   - validate IP discovery table
   - RAS improvements
   - Lots of fixes

  i915:
   - DG1 PCI IDs + LMEM discovery/placement
   - DG1 GuC submission by default
   - ADL-S PCI IDs updated + enabled by default
   - ADL-P (XE_LPD) fixed and updates
   - DG2 display fixes
   - PXP protected object support for Gen12 integrated
   - expose multi-LRC submission interface for GuC
   - export logical engine instance to user
   - Disable engine bonding on Gen12+
   - PSR cleanup
   - PSR2 selective fetch by default
   - DP 2.0 prep work
   - VESA vendor block + MSO use of it
   - FBC refactor
   - try again to fix fast-narrow vs slow-wide eDP training
   - use THP when IOMMU enabled
   - LMEM backup/restore for suspend/resume
   - locking simplification
   - GuC major reworking
   - async flip VT-D workaround changes
   - DP link training improvements
   - misc display refactorings

  bochs:
   - new PCI ID

  rcar-du:
   - Non-contiguious buffer import support for rcar-du
   - r8a779a0 support prep

  omapdrm:
   - COMPILE_TEST fixes

  sti:
   - COMPILE_TEST fixes

  msm:
   - fence ordering improvements
   - eDP support in DP sub-driver
   - dpu irq handling cleanup
   - CRC support for making igt happy
   - NO_CONNECTOR bridge support
   - dsi: 14nm phy support for msm8953
   - mdp5: msm8x53, sdm450, sdm632 support

  stm:
   - layer alpha + zpo support

  v3d:
   - fix Vulkan CTS failure
   - support multiple sync objects

  gud:
   - add R8/RGB332/RGB888 pixel formats

  vc4:
   - convert to new bridge helpers

  vgem:
   - use shmem helpers

  virtio:
   - support mapping exported vram

  zte:
   - remove obsolete driver

  rockchip:
   - use bridge attach no connector for LVDS/RGB"

* tag 'drm-next-2021-11-03' of git://anongit.freedesktop.org/drm/drm: (1259 commits)
  drm/amdgpu/gmc6: fix DMA mask from 44 to 40 bits
  drm/amd/display: MST support for DPIA
  drm/amdgpu: Fix even more out of bound writes from debugfs
  drm/amdgpu/discovery: add SDMA IP instance info for soc15 parts
  drm/amdgpu/discovery: add UVD/VCN IP instance info for soc15 parts
  drm/amdgpu/UAPI: rearrange header to better align related items
  drm/amd/display: Enable dpia in dmub only for DCN31 B0
  drm/amd/display: Fix USB4 hot plug crash issue
  drm/amd/display: Fix deadlock when falling back to v2 from v3
  drm/amd/display: Fallback to clocks which meet requested voltage on DCN31
  drm/amd/display: move FPU associated DCN301 code to DML folder
  drm/amd/display: fix link training regression for 1 or 2 lane
  drm/amd/display: add two lane settings training options
  drm/amd/display: decouple hw_lane_settings from dpcd_lane_settings
  drm/amd/display: implement decide lane settings
  drm/amd/display: adopt DP2.0 LT SCR revision 8
  drm/amd/display: FEC configuration for dpia links in MST mode
  drm/amd/display: FEC configuration for dpia links
  drm/amd/display: Add workaround flag for EDID read on certain docks
  drm/amd/display: Set phy_mux_sel bit in dmub scratch register
  ...
2021-11-02 16:47:49 -07:00
Linus Torvalds
d7e0a795bf ARM:
* More progress on the protected VM front, now with the full
   fixed feature set as well as the limitation of some hypercalls
   after initialisation.
 
 * Cleanup of the RAZ/WI sysreg handling, which was pointlessly
   complicated
 
 * Fixes for the vgic placement in the IPA space, together with a
   bunch of selftests
 
 * More memcg accounting of the memory allocated on behalf of a guest
 
 * Timer and vgic selftests
 
 * Workarounds for the Apple M1 broken vgic implementation
 
 * KConfig cleanups
 
 * New kvmarm.mode=none option, for those who really dislike us
 
 RISC-V:
 * New KVM port.
 
 x86:
 * New API to control TSC offset from userspace
 
 * TSC scaling for nested hypervisors on SVM
 
 * Switch masterclock protection from raw_spin_lock to seqcount
 
 * Clean up function prototypes in the page fault code and avoid
 repeated memslot lookups
 
 * Convey the exit reason to userspace on emulation failure
 
 * Configure time between NX page recovery iterations
 
 * Expose Predictive Store Forwarding Disable CPUID leaf
 
 * Allocate page tracking data structures lazily (if the i915
 KVM-GT functionality is not compiled in)
 
 * Cleanups, fixes and optimizations for the shadow MMU code
 
 s390:
 * SIGP Fixes
 
 * initial preparations for lazy destroy of secure VMs
 
 * storage key improvements/fixes
 
 * Log the guest CPNC
 
 Starting from this release, KVM-PPC patches will come from
 Michael Ellerman's PPC tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGBOiEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNowwf/axlx3g9sgCwQHr12/6UF/7hL/RwP
 9z+pGiUzjl2YQE+RjSvLqyd6zXh+h4dOdOKbZDLSkSTbcral/8U70ojKnQsXM0XM
 1LoymxBTJqkgQBLm9LjYreEbzrPV4irk4ygEmuk3CPOHZu8xX1ei6c5LdandtM/n
 XVUkXsQY+STkmnGv4P3GcPoDththCr0tBTWrFWtxa0w9hYOxx0ay1AZFlgM4FFX0
 QFuRc8VBLoDJpIUjbkhsIRIbrlHc/YDGjuYnAU7lV/CIME8vf2BW6uBwIZJdYcDj
 0ejozLjodEnuKXQGnc8sXFioLX2gbMyQJEvwCgRvUu/EU7ncFm1lfs7THQ==
 =UxKM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:

   - More progress on the protected VM front, now with the full fixed
     feature set as well as the limitation of some hypercalls after
     initialisation.

   - Cleanup of the RAZ/WI sysreg handling, which was pointlessly
     complicated

   - Fixes for the vgic placement in the IPA space, together with a
     bunch of selftests

   - More memcg accounting of the memory allocated on behalf of a guest

   - Timer and vgic selftests

   - Workarounds for the Apple M1 broken vgic implementation

   - KConfig cleanups

   - New kvmarm.mode=none option, for those who really dislike us

  RISC-V:

   - New KVM port.

  x86:

   - New API to control TSC offset from userspace

   - TSC scaling for nested hypervisors on SVM

   - Switch masterclock protection from raw_spin_lock to seqcount

   - Clean up function prototypes in the page fault code and avoid
     repeated memslot lookups

   - Convey the exit reason to userspace on emulation failure

   - Configure time between NX page recovery iterations

   - Expose Predictive Store Forwarding Disable CPUID leaf

   - Allocate page tracking data structures lazily (if the i915 KVM-GT
     functionality is not compiled in)

   - Cleanups, fixes and optimizations for the shadow MMU code

  s390:

   - SIGP Fixes

   - initial preparations for lazy destroy of secure VMs

   - storage key improvements/fixes

   - Log the guest CPNC

  Starting from this release, KVM-PPC patches will come from Michael
  Ellerman's PPC tree"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  RISC-V: KVM: fix boolreturn.cocci warnings
  RISC-V: KVM: remove unneeded semicolon
  RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions
  RISC-V: KVM: Factor-out FP virtualization into separate sources
  KVM: s390: add debug statement for diag 318 CPNC data
  KVM: s390: pv: properly handle page flags for protected guests
  KVM: s390: Fix handle_sske page fault handling
  KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol
  KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info
  KVM: x86: Clarify the kvm_run.emulation_failure structure layout
  KVM: s390: Add a routine for setting userspace CPU state
  KVM: s390: Simplify SIGP Set Arch handling
  KVM: s390: pv: avoid stalls when making pages secure
  KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm
  KVM: s390: pv: avoid double free of sida page
  KVM: s390: pv: add macros for UVC CC values
  s390/mm: optimize reset_guest_reference_bit()
  s390/mm: optimize set_guest_storage_key()
  s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present
  ...
2021-11-02 11:24:14 -07:00
Linus Torvalds
44261f8e28 hyperv-next for 5.16
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmGBMQUTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXmE5B/9MK3Ju+tc6C8eyR3Ic4XBYHJ3voEKO
 M+R90gggBriDOgkz4B8vF+k0aD8wevXAUtmCSXonDzCh5H7GoyfrVZmJEVkwlioH
 ZMSMlFHcjGhCPIXhLbNtfo/NsAYEtT/lRM2lLGCSbdGuKabylXKujVdhuSIcRPdj
 Rj5innUgcAywOoxG6WzFt3JBzM33UQErCGfUF2b7Rvp9E+Zii4vIMxkMzUpnkEHH
 F8WMEdL0DqH5ThOs0MslNgy03pUC9wk1d5DNd9ytYHqiSQtcQZhFHw/P6dxzUFlW
 OptWv31PXUIsiJf4Zi9hmfjgUl+KZHeacZ2hXtidAo86VPcIjVs25OQW
 =40fn
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20211102' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - Initial patch set for Hyper-V isolation VM support (Tianyu Lan)

 - Fix a warning on preemption (Vitaly Kuznetsov)

 - A bunch of misc cleanup patches

* tag 'hyperv-next-signed-20211102' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Protect set_hv_tscchange_cb() against getting preempted
  Drivers: hv : vmbus: Adding NULL pointer check
  x86/hyperv: Remove duplicate include
  x86/hyperv: Remove duplicated include in hv_init
  Drivers: hv: vmbus: Remove unused code to check for subchannels
  Drivers: hv: vmbus: Initialize VMbus ring buffer for Isolation VM
  Drivers: hv: vmbus: Add SNP support for VMbus channel initiate message
  x86/hyperv: Add ghcb hvcall support for SNP VM
  x86/hyperv: Add Write/Read MSR registers via ghcb page
  Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM
  x86/hyperv: Add new hvcall guest address host visibility support
  x86/hyperv: Initialize shared memory boundary in the Isolation VM.
  x86/hyperv: Initialize GHCB page in Isolation VM
2021-11-02 10:56:49 -07:00
Linus Torvalds
cc0356d6a0 - Do not #GP on userspace use of CLI/STI but pretend it was a NOP to
keep old userspace from breaking. Adjust the corresponding iopl selftest
 to that.
 
 - Improve stack overflow warnings to say which stack got overflowed and
 raise the exception stack sizes to 2 pages since overflowing the single
 page of exception stack is very easy to do nowadays with all the tracing
 machinery enabled. With that, rip out the custom mapping of AMD SEV's
 too.
 
 - A bunch of changes in preparation for FGKASLR like supporting more
 than 64K section headers in the relocs tool, correct ORC lookup table
 size to cover the whole kernel .text and other adjustments.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/uugACgkQEsHwGGHe
 VUroKw//e8BJ3Aun8bg00FHxfiMGbPYcozjLGDkaoMtMDZ8WlfCUrvtqYICEr8eB
 UU0eRyygAPI167dre1O9JvAcbilkNTKntaU6qbu/ZVyUwS3+Jkjwsotbqn3xKtkd
 QDDTDNiCU+beCJ2ZbspbrPgEh13+H0MwMHUfRxZB9Scpmo6aGSEaU3g295f6GX57
 VFGJ/LNov5MV1dTD7Pp/h6/Nb+R6WmflKcBzJmQxYuKyKX+g1xsSv0VSga+t+uf3
 M9pUkizqTiUxzC2eLgtcEZTqqBHu810E8M76FmhKBUMilsFJT5YAJTiqyahwHXds
 HYarOFRgcnFuJPd29vn8UHjqeeoi6ru8GtcZYzccEc7U3ku/gXPaDJ9ffmvhs7vU
 pJX5Um3GiiFm0w/ZZOKDqh78wRAsCKLN+jIoyszuhkkNchZSj/jKfOgdd3EmcZst
 6L6rxBA4oRHwNOgM7uVMp+jFeRe1/prR280OWWH0D4QmmuqybThOdO23Iuh/Deth
 W3qPUH3UQtfSWxGy2yODzJ1ciuGAr/AzJZ9zjg04e3Vl0DkEpyWtLKJiG3ClXZag
 Nj+3xc4xYH2Aw+M0HRaONk5XVKLpqVjuAfgU5iLQa0YSUbtrR+wCWvY8KgQNbAqK
 xZmzYzQ89stwVCuGKx10gPsL3jSJ3VCylMfqdHD2Ajmld1yApr0=
 =DOZU
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Borislav Petkov:

 - Do not #GP on userspace use of CLI/STI but pretend it was a NOP to
   keep old userspace from breaking. Adjust the corresponding iopl
   selftest to that.

 - Improve stack overflow warnings to say which stack got overflowed and
   raise the exception stack sizes to 2 pages since overflowing the
   single page of exception stack is very easy to do nowadays with all
   the tracing machinery enabled. With that, rip out the custom mapping
   of AMD SEV's too.

 - A bunch of changes in preparation for FGKASLR like supporting more
   than 64K section headers in the relocs tool, correct ORC lookup table
   size to cover the whole kernel .text and other adjustments.

* tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/x86/iopl: Adjust to the faked iopl CLI/STI usage
  vmlinux.lds.h: Have ORC lookup cover entire _etext - _stext
  x86/boot/compressed: Avoid duplicate malloc() implementations
  x86/boot: Allow a "silent" kaslr random byte fetch
  x86/tools/relocs: Support >64K section headers
  x86/sev: Make the #VC exception stacks part of the default stacks storage
  x86: Increase exception stack sizes
  x86/mm/64: Improve stack overflow warnings
  x86/iopl: Fake iopl(3) CLI/STI usage
2021-11-02 07:56:47 -07:00
Juergen Gross
ee1f9d1914 xen: allow pv-only hypercalls only with CONFIG_XEN_PV
Put the definitions of the hypercalls usable only by pv guests inside
CONFIG_XEN_PV sections.

On Arm two dummy functions related to pv hypercalls can be removed.

While at it remove the no longer supported tmem hypercall definition.

Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20211028081221.2475-3-jgross@suse.com
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-02 08:11:01 -05:00
Juergen Gross
d99bb72a30 x86/xen: remove 32-bit pv leftovers
There are some remaining 32-bit pv-guest support leftovers in the Xen
hypercall interface. Remove them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20211028081221.2475-2-jgross@suse.com
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-02 08:03:43 -05:00
Oleksandr Andrushchenko
a67efff288 xen-pciback: allow compiling on other archs than x86
Xen-pciback driver was designed to be built for x86 only. But it
can also be used by other architectures, e.g. Arm.

Currently PCI backend implements multiple functionalities at a time,
such as:
1. It is used as a database for assignable PCI devices, e.g. xl
   pci-assignable-{add|remove|list} manipulates that list. So, whenever
   the toolstack needs to know which PCI devices can be passed through
   it reads that from the relevant sysfs entries of the pciback.
2. It is used to hold the unbound PCI devices list, e.g. when passing
   through a PCI device it needs to be unbound from the relevant device
   driver and bound to pciback (strictly speaking it is not required
   that the device is bound to pciback, but pciback is again used as a
   database of the passed through PCI devices, so we can re-bind the
   devices back to their original drivers when guest domain shuts down)
3. Device reset for the devices being passed through
4. Para-virtualised use-cases support

The para-virtualised part of the driver is not always needed as some
architectures, e.g. Arm or x86 PVH Dom0, are not using backend-frontend
model for PCI device passthrough.

For such use-cases make the very first step in splitting the
xen-pciback driver into two parts: Xen PCI stub and PCI PV backend
drivers.

For that add new configuration options CONFIG_XEN_PCI_STUB and
CONFIG_XEN_PCIDEV_STUB, so the driver can be limited in its
functionality, e.g. no support for para-virtualised scenario.
x86 platform will continue using CONFIG_XEN_PCIDEV_BACKEND for the
fully featured backend driver.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Anastasiia Lukianenko <anastasiia_lukianenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20211028143620.144936-1-andr2000@gmail.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-02 08:03:43 -05:00
Juergen Gross
e453f872b7 x86/xen: switch initial pvops IRQ functions to dummy ones
The initial pvops functions handling irq flags will only ever be called
before interrupts are being enabled.

So switch them to be dummy functions:
- xen_save_fl() can always return 0
- xen_irq_disable() is a nop
- xen_irq_enable() can BUG()

Add some generic paravirt functions for that purpose.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20211028072748.29862-3-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-02 08:03:43 -05:00
Juergen Gross
767216796c x86/pvh: add prototype for xen_pvh_init()
xen_pvh_init() is lacking a prototype in a header, add it.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20211006061950.9227-1-jgross@suse.com
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-02 07:45:44 -05:00
Linus Torvalds
79ef0c0014 Tracing updates for 5.16:
- kprobes: Restructured stack unwinder to show properly on x86 when a stack
   dump happens from a kretprobe callback.
 
 - Fix to bootconfig parsing
 
 - Have tracefs allow owner and group permissions by default (only denying
   others). There's been pressure to allow non root to tracefs in a
   controlled fashion, and using groups is probably the safest.
 
 - Bootconfig memory managament updates.
 
 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.
 
 - Allow perf to be traced by function tracer.
 
 - Rewrite of function graph tracer to be a callback from the function tracer
   instead of having its own trampoline (this change will happen on an arch
   by arch basis, and currently only x86_64 implements it).
 
 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.
 
 - Allow histogram triggers to add variables that can perform calculations
   against the event's fields.
 
 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent warnings
   from the compiler.
 
 - Extend histogram triggers to key off of variables.
 
 - Have trace recursion use bit magic to determine preempt context over if
   branches.
 
 - Have trace recursion disable preemption as all use cases do anyway.
 
 - Added testing for verification of tracing utilities.
 
 - Various small clean ups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYBdxhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qp1sAQD2oYFwaG3sx872gj/myBcHIBSKdiki
 Hry5csd8zYDBpgD+Poylopt5JIbeDuoYw/BedgEXmscZ8Qr7VzjAXdnv/Q4=
 =Loz8
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - kprobes: Restructured stack unwinder to show properly on x86 when a
   stack dump happens from a kretprobe callback.

 - Fix to bootconfig parsing

 - Have tracefs allow owner and group permissions by default (only
   denying others). There's been pressure to allow non root to tracefs
   in a controlled fashion, and using groups is probably the safest.

 - Bootconfig memory managament updates.

 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.

 - Allow perf to be traced by function tracer.

 - Rewrite of function graph tracer to be a callback from the function
   tracer instead of having its own trampoline (this change will happen
   on an arch by arch basis, and currently only x86_64 implements it).

 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.

 - Allow histogram triggers to add variables that can perform
   calculations against the event's fields.

 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent
   warnings from the compiler.

 - Extend histogram triggers to key off of variables.

 - Have trace recursion use bit magic to determine preempt context over
   if branches.

 - Have trace recursion disable preemption as all use cases do anyway.

 - Added testing for verification of tracing utilities.

 - Various small clean ups and fixes.

* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
  tracing/histogram: Fix semicolon.cocci warnings
  tracing/histogram: Fix documentation inline emphasis warning
  tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
  tracing: Show size of requested perf buffer
  bootconfig: Initialize ret in xbc_parse_tree()
  ftrace: do CPU checking after preemption disabled
  ftrace: disable preemption when recursion locked
  tracing/histogram: Document expression arithmetic and constants
  tracing/histogram: Optimize division by a power of 2
  tracing/histogram: Covert expr to const if both operands are constants
  tracing/histogram: Simplify handling of .sym-offset in expressions
  tracing: Fix operator precedence for hist triggers expression
  tracing: Add division and multiplication support for hist triggers
  tracing: Add support for creating hist trigger variables from literal
  selftests/ftrace: Stop tracing while reading the trace file by default
  MAINTAINERS: Update KPROBES and TRACING entries
  test_kprobes: Move it from kernel/ to lib/
  docs, kprobes: Remove invalid URL and add new reference
  samples/kretprobes: Fix return value if register_kretprobe() failed
  lib/bootconfig: Fix the xbc_get_info kerneldoc
  ...
2021-11-01 20:05:19 -07:00
Linus Torvalds
01463374c5 cpu-to-thread_info update for v5.16-rc1
Cross-architecture update to move task_struct::cpu back into thread_info
 on arm64, x86, s390, powerpc, and riscv. All Acked by arch maintainers.
 
 Quoting Ard Biesheuvel:
 
 "Move task_struct::cpu back into thread_info
 
  Keeping CPU in task_struct is problematic for architectures that define
  raw_smp_processor_id() in terms of this field, as it requires
  linux/sched.h to be included, which causes a lot of pain in terms of
  circular dependencies (aka 'header soup')
 
  This series moves it back into thread_info (where it came from) for all
  architectures that enable THREAD_INFO_IN_TASK, addressing the header
  soup issue as well as some pointless differences in the implementations
  of task_cpu() and set_task_cpu()."
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmGAEPYWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJq4wEACItgLuyzPgB2eSLVMc3sHPIWcn
 EUWbAWsuzJH79wmJtn2AKxW/C5OLBNGeoNjkXQvFN3ULkQDPrfCpB4x/tB6CjIQI
 WRDf8kO7oaAD85ZrbSwyFl/MFfrD67f6H1HZoB9FKWAzuv/Bp2xQ0Kf06Dv4HEZp
 CzprzZuWtjHB+qgyy+EpGOge3zbFmCuYPE2QpMYLWgs1rcVW9OYvoCI6AYtNefrC
 6Kl6CbmBb1k6lFxkhM7wvRcIJthBl6Bajpc3Z2uL1aLb27dVpQZs3YpY859Knb6U
 ZpOQCRJOMui3HOxyF3bDUI37y0XVLm6xaNM6C/7i0XS1GiFlSxkGVamg+Mp7anpI
 +hdK5kqtSagaBC9CaJvRHnWIex1npQAfiyDNdyiEbrsUJ1dp6/zZcQSe4/m/XRbi
 vywQPGxU9f1ASshzHsGU2TJf7Ps7qHulUsS5fKwmHU2ZjQnbYCoPN10JGO9gKjOX
 yioN5xsKnbPY9j0ys3l9XBqaMJ8KAr1XspplTGIMZIVbjNMlqrfgbg8Qn8T8WGM7
 oUqudMIxczilj0/iEGfGRxBeFaYAfhGQCDnxNlNX9g7Xe/gHTJgNYlHVxL55jHNu
 AoPE3Gd0X8K9fbov0BCB6a21XwGJ6Wj+FSrnvuyWrRuy8JWiDFJaVKUBEcalKr7a
 MhoUNQPu5M83OdC42A==
 =PzvV
 -----END PGP SIGNATURE-----

Merge tag 'cpu-to-thread_info-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull thread_info update to move 'cpu' back from task_struct from Kees Cook:
 "Cross-architecture update to move task_struct::cpu back into
  thread_info on arm64, x86, s390, powerpc, and riscv. All Acked by arch
  maintainers.

  Quoting Ard Biesheuvel:

     'Move task_struct::cpu back into thread_info

      Keeping CPU in task_struct is problematic for architectures that
      define raw_smp_processor_id() in terms of this field, as it
      requires linux/sched.h to be included, which causes a lot of pain
      in terms of circular dependencies (aka 'header soup')

      This series moves it back into thread_info (where it came from)
      for all architectures that enable THREAD_INFO_IN_TASK, addressing
      the header soup issue as well as some pointless differences in the
      implementations of task_cpu() and set_task_cpu()'"

* tag 'cpu-to-thread_info-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  riscv: rely on core code to keep thread_info::cpu updated
  powerpc: smp: remove hack to obtain offset of task_struct::cpu
  sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y
  powerpc: add CPU field to struct thread_info
  s390: add CPU field to struct thread_info
  x86: add CPU field to struct thread_info
  arm64: add CPU field to struct thread_info
2021-11-01 17:00:05 -07:00
Linus Torvalds
879dbe9ffe Add a SGX_IOC_VEPC_REMOVE ioctl to the /dev/sgx_vepc virt interface with
which EPC pages can be put back into their uninitialized state without
 having to reopen /dev/sgx_vepc, which could not be possible anymore
 after startup due to security policies.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/x7AACgkQEsHwGGHe
 VUqHXA//YWrukmJ5PQZwWqkXGo6h42JWhIdNfSC2c1SVdz1cioGUCCswALTX4g8l
 MYYf3eN12GJ296jPh7m9bz8JvlYjdavSm3Y1yzHIjuQ3q6qywHIuYTbsrMD7waUD
 PkcY1TTYgNJ2+f0AgsC4GZhlcpf9g5DqiftW6wvExx5tLUNsVu3Y3gZy/+fajP4f
 s/TMjcdr2QmPsjun00KfoIY4/z0u8LkyRMSwyoxSV6wYdL6rRtfYFWsbEUS+W6Nw
 /VJ0IKl+aBQ1ztsDc4M5h1uy9II2M/Row5k6JjyrdG4X8D6ACSG7cho6qcMjXgcP
 Gac7Im5IyjPEorxqXAgJiMoAl9lU9a2JMVZqPtihYsQW/ygMTdpzP9sBpcZPMevc
 gxQD4gyixwzUa3cyVDzTPBdk/DEuGc2nwn2k9nPvmNxKMonX1oLEiP7hu265mvet
 56DtwKJF9ddtpepO2zFCg1qX+eZnTuhuZNCPsm/pmdGgzI8cyLznho33OgUSZEQY
 c1UisT7HXNRVC/1Q8VBDTU/D9LtIk+2+Q5lQkcNeftI5PYKTXIVddkOkqJ4GhGWJ
 9EasA4UtnhvsLzJ76gxxuUf677ns+1TCo65e7Hu1+X0eTmBJK3boe3aMHvJeHEWH
 Asd+SMkYWfxAlW/arAYhR2JgT9wgEH3pSx4eXnpGwpeValxBPRs=
 =1UYy
 -----END PGP SIGNATURE-----

Merge tag 'x86_sgx_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SGX updates from Borislav Petkov:
 "Add a SGX_IOC_VEPC_REMOVE ioctl to the /dev/sgx_vepc virt interface
  with which EPC pages can be put back into their uninitialized state
  without having to reopen /dev/sgx_vepc, which could not be possible
  anymore after startup due to security policies"

* tag 'x86_sgx_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx/virt: implement SGX_IOC_VEPC_REMOVE ioctl
  x86/sgx/virt: extract sgx_vepc_remove_page
2021-11-01 15:54:07 -07:00
Linus Torvalds
20273d2588 - Export sev_es_ghcb_hv_call() so that HyperV Isolation VMs can use it too
- Non-urgent fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/xXMACgkQEsHwGGHe
 VUpFohAAn1FcRfgUh4a7SZQudhWaYPye0Yaf9c9acJIDYfls4Qg3ZLvSNGS0QChW
 pcjNQzr42UymxZKq1t6JGaUlD0vkfW0p+w5wueeIxMltWG0oZXgUPhqWrFTLwBtR
 g5Gio3Jum1CULCMokS6W4MjJSkTtX5NyYPg+m5Siowy10cbBdYA4wJaKnwGslPT7
 4pCDQP5159cjmG9WthKppxUdFy/vql0NJhjxmUkha39eVJ7yLoWvJoubQqqGnqXF
 XHwFolZGBxm4Ed4XoUjtz4HgI0VD1JOImUBPqnaE/uyrU7bqqywe5/PpZP051xtF
 anpWBm8KbZFsh220bSRJdFQxQBiXaIA41tfBiqVQhrgPy6TKgq7glhD4/ZjvUAdu
 DDg2HYEnK3dBAOCa7zIj/+uTijD1nvvuhQblGB2PnvnD2RWWgl+0vZ9Wqspo0EyW
 ry5V7hGCMC3mgFexTtvwd1hvMJVYrKfyn2XcP9B+zdgpUJ9DprB+g1O1J6NkGe1r
 SKS6itMokVRd+I+16iFQh0PuywqldbNv9dby6bd+dtvxAcVER2vUA0C7wmjqX4Mx
 bpftPrNhdNmgQAYlN/tRIfh2t2cFTJnWegVBBErdEfafiqKL9lU8gQlMVgwY10o+
 a1ALQ5cUI9Y0xS4cJtfVBVIekqIwEbmniS66iMlMiEJx+Ar6T8g=
 =Gql9
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Export sev_es_ghcb_hv_call() so that HyperV Isolation VMs can use it
   too

 - Non-urgent fixes and cleanups

* tag 'x86_sev_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV
  x86/sev: Allow #VC exceptions on the VC2 stack
  x86/sev: Fix stack type check in vc_switch_off_ist()
  x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c
  x86/sev: Carve out HV call's return value verification
2021-11-01 15:52:26 -07:00
Linus Torvalds
e0f4c59dc4 - Start checking a CPUID bit on AMD Zen3 which states that the CPU
clears the segment base when a null selector is written. Do the explicit
 detection on older CPUs, zen2 and hygon specifically, which have the
 functionality but do not advertize the CPUID bit. Factor in the presence
 of a hypervisor underneath the kernel and avoid doing the explicit check
 there which the HV might've decided to not advertize for migration
 safety reasons, a.o.
 
 - Add support for a new X86 CPU vendor: VORTEX. Needed for whitelisting
 those CPUs in the hardware vulnerabilities detection
 
 - Force the compiler to use rIP-relative addressing in the fallback path of
 static_cpu_has(), in order to avoid unnecessary register pressure
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/wRgACgkQEsHwGGHe
 VUoGQBAAk9V9//FMoENuGFGul/IK8+VBibTfztYgaPvm7vjMDYaYuRBCQiZg5Y8U
 D14pwkg7CuRa6iwZmrk/X/y6FVjo5BJA//ROk/n/9JNvV5QUp3/o00uLiziv80K3
 H6Wm3PUyGgkpBuJg+/K8SLE9UQ6uSh4nsykS+70Dcd45DtkC/vH8pkDs5Q1fVQwb
 7AuOuWTCWKUYOMFYWFI3a9D8tZYhg99ABREbXBaJGiGdIlZKNVe/7W8qQw5s6cVA
 cD5Q2ILY2RCGP55ZQiWoFy3XNP3/ygvZ7Zm1ARYUvUMR2Y5X2XJWN/B6oMbc0oEu
 OZsDDA/ILYcah9eBV/zk4ON/1djksp1iWNXNxjct0cNBPAKxi6T/HhHuIHBtzvW+
 zDyBWUMLlv1m2i1oW4J4NuNJJi9Gaz+7PesmI7C0OQPgywR8UqqfMD+TzlEHWya1
 YqYqI0f3aiyC/sLjUp3GSA7a9sWSd3BZfyAlLBJZCxyXAxX92tXX5BRPh/KYbnJn
 c/NaYA6X4m4Rdvr0gKKtCklaC6w4GLzVak6wIvftzHlUYsWX21BhnTkQrciKbqc+
 AKWed41AO+4pDHROePxc409x3UZolti+1RandikrztIVAolVJ6W/OkHWxXfy28Fg
 iSrtl4M3omv8fCHDaJ26STrXqxH8pIK8noVolwQoXKyAFVyvXTk=
 =rlVy
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

 - Start checking a CPUID bit on AMD Zen3 which states that the CPU
   clears the segment base when a null selector is written. Do the
   explicit detection on older CPUs, zen2 and hygon specifically, which
   have the functionality but do not advertize the CPUID bit. Factor in
   the presence of a hypervisor underneath the kernel and avoid doing
   the explicit check there which the HV might've decided to not
   advertize for migration safety reasons, or similar.

 - Add support for a new X86 CPU vendor: VORTEX. Needed for whitelisting
   those CPUs in the hardware vulnerabilities detection

 - Force the compiler to use rIP-relative addressing in the fallback
   path of static_cpu_has(), in order to avoid unnecessary register
   pressure

* tag 'x86_cpu_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Fix migration safety with X86_BUG_NULL_SEL
  x86/CPU: Add support for Vortex CPUs
  x86/umip: Downgrade warning messages to debug loglevel
  x86/asm: Avoid adding register pressure for the init case in static_cpu_has()
  x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix
2021-11-01 15:33:54 -07:00
Linus Torvalds
18398bb825 The usual round of random minor fixes and cleanups all over the place.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/ucwACgkQEsHwGGHe
 VUpLMQ//d4xim4zD4hQVleYkGWqA2nB050QtutIto1nvsiZdjrUSMjJGZnos2nLd
 9tY3NZtgrFfAyjUkal098L+2zed+U6UemIV6kT1F3TnWg4dYByxYABNutOsQGUgw
 o4sTwjG7ELC273yPt/WY9TMwfMCiX7t80QkjoSeWbkApdfB0aZoxB0CvdLBKwCl/
 bxdfX1uvqW7sc6fatcI634hC1HDw8GJThym4/lrMHq2Pr8n/U6pEWoBFsdlprnLk
 pqb3IGX3kNnpjTmCpZxvd4ZQV8xUlMcJkdEFjKDf7BLtWjwZxPIdGcfnxrpf2EJQ
 yVZklcabaBNz/zNkoQeyD6Ix1ZCFSxcHRhg0BJpvvhzQ91My2pGZgLuzUYz3Fk7G
 GjWZje8WZcL3ViL9oGbOYMLSw76wov95+8WMiyKqPaNuzZbS3py5C/ThgqpCdg5b
 WyQe0GhUvthzLsVz9Gu7OFrbZl6VBz8q7/bxuo+vpFhgC1EiOj2yPSZNUJBRKdcd
 cFSfybcjk3Qyf7YXmZ/NcD9TQARQO1ediRY6ZNeZr7JYPzyebY+wTfHqDvdX65S5
 i/zgeAX4XAuX4pl28nJvDe8x1P7t5T8L6Qno9Lnd1xMG7jWift9RSEOo29rUp0sw
 gA9xV/BsmApvyM8pgD/lAqxAFzGkYfSy8bB6uav8HccHprVfJE0=
 =4BVs
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:
 "The usual round of random minor fixes and cleanups all over the place"

* tag 'x86_cleanups_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Makefile: Remove unneeded whitespaces before tabs
  x86/of: Kill unused early_init_dt_scan_chosen_arch()
  x86: Fix misspelled Kconfig symbols
  x86/Kconfig: Remove references to obsolete Kconfig symbols
  x86/smp: Remove unnecessary assignment to local var freq_scale
2021-11-01 15:25:08 -07:00
Linus Torvalds
6e5772c8d9 Add an interface called cc_platform_has() which is supposed to be used
by confidential computing solutions to query different aspects of the
 system. The intent behind it is to unify testing of such aspects instead
 of having each confidential computing solution add its own set of tests
 to code paths in the kernel, leading to an unwieldy mess.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/uLUACgkQEsHwGGHe
 VUqGbQ/+LOmz8hmL5vtbXw/lVonCSBRKI2KVefnN2VtQ3rjtCq8HlNoq/hAdi15O
 WntABFV8u4daNAcssp+H/p+c8Mt/NzQa60TRooC5ZIynSOCj4oZQxTWjcnR4Qxrf
 oABy4sp09zNW31qExtTVTwPC/Ejzv4hA0Vqt9TLQOSxp7oYVYKeDJNp79VJK64Yz
 Ky7epgg8Pauk0tAT76ATR4kyy9PLGe4/Ry0bOtAptO4NShL1RyRgI0ywUmptJHSw
 FV/MnoexdAs4V8+4zPwyOkf8YMDnhbJcvFcr7Yd9AEz2q9Z1wKCgi1M3aZIoW8lV
 YMXECMGe9DfxmEJbnP5zbnL6eF32x+tbq+fK8Ye4V2fBucpWd27zkcTXjoP+Y+zH
 NLg+9QykR9QCH75YCOXcAg1Q5hSmc4DaWuJymKjT+W7MKs89ywjq+ybIBpLBHbQe
 uN9FM/CEKXx8nQwpNQc7mdUE5sZeCQ875028RaLbLx3/b6uwT6rBlNJfxl/uxmcZ
 iF1kG7Cx4uO+7G1a9EWgxtWiJQ8GiZO7PMCqEdwIymLIrlNksAk7nX2SXTuH5jIZ
 YDuBj/Xz2UUVWYFm88fV5c4ogiFlm9Jeo140Zua/BPdDJd2VOP013rYxzFE/rVSF
 SM2riJxCxkva8Fb+8TNiH42AMhPMSpUt1Nmd1H2rcEABRiT83Ow=
 =Na0U
 -----END PGP SIGNATURE-----

Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull generic confidential computing updates from Borislav Petkov:
 "Add an interface called cc_platform_has() which is supposed to be used
  by confidential computing solutions to query different aspects of the
  system.

  The intent behind it is to unify testing of such aspects instead of
  having each confidential computing solution add its own set of tests
  to code paths in the kernel, leading to an unwieldy mess"

* tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  treewide: Replace the use of mem_encrypt_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_active() with cc_platform_has()
  x86/sme: Replace occurrences of sme_active() with cc_platform_has()
  powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
  x86/sev: Add an x86 version of cc_platform_has()
  arch/cc: Introduce a function to check for confidential computing features
  x86/ioremap: Selectively build arch override encryption functions
2021-11-01 15:16:52 -07:00
Linus Torvalds
158405e888 - Get rid of a bunch of function pointers used in MCA land in favor
of normal functions. This is in preparation of making the MCA code
 noinstr-aware
 
 - When the kernel copies data from user addresses and it encounters a
 machine check, a SIGBUS is sent to that process. Change this action to
 either an -EFAULT which is returned to the user or a short write, making
 the recovery action a lot more user-friendly
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/s8sACgkQEsHwGGHe
 VUqnaQ/8DIHkIOF6vy2w56snJwCj0XQYNLO+Clf6sHJ7ukWpWDoAi6HzvjqrBmaa
 bQEdOLeO92wGtVutCQ5ndzq2SJ6UFcZtOulpHyzCpwNinhY2QMsPG6pkSzeaAy/e
 aR4gpTY6pyCJyWl5DXXr7FMzBZVaWYdtZ2szPKmW1d1mLeDIdv5d3hInDbZ48XJF
 o+fZx0uuK0CIuDjDujRNvkPbHXLbBSqSLCTRf66o+sCY5ZXHlAipabxa3UmhHKvd
 dBxMrlObAaDBmDjqpc/YpS4IfWZb7+rHQfVmiq5O85ExXx6cyF6vlM7GI/5VBxSA
 2dVcZX/3TsSqGbFdVygbcF6e/Yl1xhP5AE+pBb5jpzbzEaf4oiM8MDhoMAai3lEL
 7CFsXL2oyAzho7QQsUSkv/hffHHrph2/aUZbGJlz6SdeRF9aoIjZANpcwm44TZrk
 c11Fh1MLTDxx8uhCGrYFXqR8QgeTi4B+8d/CEXNJnkLXZMfSUtoL1iIzhBpsGkv3
 r0JOIG2o5dGX2lLhQOiHZ+us33O1e8mvOli9P1jLoDttoKvNqSqLUuwpBCz4sc0E
 ugfarf7v/R07NN+7SIT+O83ZG8dXxIRPzHm/g7wjZYgyOfEBgFSMBKVWXRotPo/f
 aY88sDVyvF5sbYnUcA6zZANBCKAVfilqdMgCyaoGegoNGzDOCYE=
 =bIZq
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Get rid of a bunch of function pointers used in MCA land in favor of
   normal functions. This is in preparation of making the MCA code
   noinstr-aware

 - When the kernel copies data from user addresses and it encounters a
   machine check, a SIGBUS is sent to that process. Change this action
   to either an -EFAULT which is returned to the user or a short write,
   making the recovery action a lot more user-friendly

* tag 'ras_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Sort mca_config members to get rid of unnecessary padding
  x86/mce: Get rid of the ->quirk_no_way_out() indirect call
  x86/mce: Get rid of msr_ops
  x86/mce: Get rid of machine_check_vector
  x86/mce: Get rid of the mce_severity function pointer
  x86/mce: Drop copyin special case for #MC
  x86/mce: Change to not send SIGBUS error during copy from user
2021-11-01 15:12:04 -07:00
Linus Torvalds
8cb1ae19bf x86/fpu updates:
- Cleanup of extable fixup handling to be more robust, which in turn
    allows to make the FPU exception fixups more robust as well.
 
  - Change the return code for signal frame related failures from explicit
    error codes to a boolean fail/success as that's all what the calling
    code evaluates.
 
  - A large refactoring of the FPU code to prepare for adding AMX support:
 
    - Distangle the public header maze and remove especially the misnomed
      kitchen sink internal.h which is despite it's name included all over
      the place.
 
    - Add a proper abstraction for the register buffer storage (struct
      fpstate) which allows to dynamically size the buffer at runtime by
      flipping the pointer to the buffer container from the default
      container which is embedded in task_struct::tread::fpu to a
      dynamically allocated container with a larger register buffer.
 
    - Convert the code over to the new fpstate mechanism.
 
    - Consolidate the KVM FPU handling by moving the FPU related code into
      the FPU core which removes the number of exports and avoids adding
      even more export when AMX has to be supported in KVM. This also
      removes duplicated code which was of course unnecessary different and
      incomplete in the KVM copy.
 
    - Simplify the KVM FPU buffer handling by utilizing the new fpstate
      container and just switching the buffer pointer from the user space
      buffer to the KVM guest buffer when entering vcpu_run() and flipping
      it back when leaving the function. This cuts the memory requirements
      of a vCPU for FPU buffers in half and avoids pointless memory copy
      operations.
 
      This also solves the so far unresolved problem of adding AMX support
      because the current FPU buffer handling of KVM inflicted a circular
      dependency between adding AMX support to the core and to KVM.  With
      the new scheme of switching fpstate AMX support can be added to the
      core code without affecting KVM.
 
    - Replace various variables with proper data structures so the extra
      information required for adding dynamically enabled FPU features (AMX)
      can be added in one place
 
  - Add AMX (Advanved Matrix eXtensions) support (finally):
 
     AMX is a large XSTATE component which is going to be available with
     Saphire Rapids XEON CPUs. The feature comes with an extra MSR (MSR_XFD)
     which allows to trap the (first) use of an AMX related instruction,
     which has two benefits:
 
     1) It allows the kernel to control access to the feature
 
     2) It allows the kernel to dynamically allocate the large register
        state buffer instead of burdening every task with the the extra 8K
        or larger state storage.
 
     It would have been great to gain this kind of control already with
     AVX512.
 
     The support comes with the following infrastructure components:
 
     1) arch_prctl() to
        - read the supported features (equivalent to XGETBV(0))
        - read the permitted features for a task
        - request permission for a dynamically enabled feature
 
        Permission is granted per process, inherited on fork() and cleared
        on exec(). The permission policy of the kernel is restricted to
        sigaltstack size validation, but the syscall obviously allows
        further restrictions via seccomp etc.
 
     2) A stronger sigaltstack size validation for sys_sigaltstack(2) which
        takes granted permissions and the potentially resulting larger
        signal frame into account. This mechanism can also be used to
        enforce factual sigaltstack validation independent of dynamic
        features to help with finding potential victims of the 2K
        sigaltstack size constant which is broken since AVX512 support was
        added.
 
     3) Exception handling for #NM traps to catch first use of a extended
        feature via a new cause MSR. If the exception was caused by the use
        of such a feature, the handler checks permission for that
        feature. If permission has not been granted, the handler sends a
        SIGILL like the #UD handler would do if the feature would have been
        disabled in XCR0. If permission has been granted, then a new fpstate
        which fits the larger buffer requirement is allocated.
 
        In the unlikely case that this allocation fails, the handler sends
        SIGSEGV to the task. That's not elegant, but unavoidable as the
        other discussed options of preallocation or full per task
        permissions come with their own set of horrors for kernel and/or
        userspace. So this is the lesser of the evils and SIGSEGV caused by
        unexpected memory allocation failures is not a fundamentally new
        concept either.
 
        When allocation succeeds, the fpstate properties are filled in to
        reflect the extended feature set and the resulting sizes, the
        fpu::fpstate pointer is updated accordingly and the trap is disarmed
        for this task permanently.
 
     4) Enumeration and size calculations
 
     5) Trap switching via MSR_XFD
 
        The XFD (eXtended Feature Disable) MSR is context switched with the
        same life time rules as the FPU register state itself. The mechanism
        is keyed off with a static key which is default disabled so !AMX
        equipped CPUs have zero overhead. On AMX enabled CPUs the overhead
        is limited by comparing the tasks XFD value with a per CPU shadow
        variable to avoid redundant MSR writes. In case of switching from a
        AMX using task to a non AMX using task or vice versa, the extra MSR
        write is obviously inevitable.
 
        All other places which need to be aware of the variable feature sets
        and resulting variable sizes are not affected at all because they
        retrieve the information (feature set, sizes) unconditonally from
        the fpstate properties.
 
     6) Enable the new AMX states
 
   Note, this is relatively new code despite the fact that AMX support is in
   the works for more than a year now.
 
   The big refactoring of the FPU code, which allowed to do a proper
   integration has been started exactly 3 weeks ago. Refactoring of the
   existing FPU code and of the original AMX patches took a week and has
   been subject to extensive review and testing. The only fallout which has
   not been caught in review and testing right away was restricted to AMX
   enabled systems, which is completely irrelevant for anyone outside Intel
   and their early access program. There might be dragons lurking as usual,
   but so far the fine grained refactoring has held up and eventual yet
   undetected fallout is bisectable and should be easily addressable before
   the 5.16 release. Famous last words...
 
   Many thanks to Chang Bae and Dave Hansen for working hard on this and
   also to the various test teams at Intel who reserved extra capacity to
   follow the rapid development of this closely which provides the
   confidence level required to offer this rather large update for inclusion
   into 5.16-rc1.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/NkITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodDkEADH4+/nN/QoSUHIuuha5Zptj3g2b16a
 /3TxT9fhwPen/kzMGsUk70s3iWJMA+I5dCfkSZexJ2hfhcRe9cBzZIa1HCawKwf3
 YCISTsO/M+LpeORuZ+TpfFLJKnxNr1SEOl+EYffGhq0AkCjifb9Cnr0JZuoMUzGU
 jpfJZ2bj28ri5lG812DtzSMBM9E3SAwgJv+GNjmZbxZKb9mAfhbAMdBUXHirX7Ej
 jmx6koQjYOKwYIW8w1BrdC270lUKQUyJTbQgdRkN9Mh/HnKyFixQ18JqGlgaV2cT
 EtYePUfTEdaHdAhUINLIlEug1MfOslHU+HyGsdywnoChNB4GHPQuePC5Tz60VeFN
 RbQ9aKcBUu8r95rjlnKtAtBijNMA4bjGwllVxNwJ/ZoA9RPv1SbDZ07RX3qTaLVY
 YhVQl8+shD33/W24jUTJv1kMMexpHXIlv0gyfMryzpwI7uzzmGHRPAokJdbYKctC
 dyMPfdE90rxTiMUdL/1IQGhnh3awjbyfArzUhHyQ++HyUyzCFh0slsO0CD18vUy8
 FofhCugGBhjuKw3XwLNQ+KsWURz5qHctSzBc3qMOSyqFHbAJCVRANkhsFvWJo2qL
 75+Z7OTRebtsyOUZIdq26r4roSxHrps3dupWTtN70HWx2NhQG1nLEw986QYiQu1T
 hcKvDmehQLrUvg==
 =x3WL
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Thomas Gleixner:

 - Cleanup of extable fixup handling to be more robust, which in turn
   allows to make the FPU exception fixups more robust as well.

 - Change the return code for signal frame related failures from
   explicit error codes to a boolean fail/success as that's all what the
   calling code evaluates.

 - A large refactoring of the FPU code to prepare for adding AMX
   support:

      - Distangle the public header maze and remove especially the
        misnomed kitchen sink internal.h which is despite it's name
        included all over the place.

      - Add a proper abstraction for the register buffer storage (struct
        fpstate) which allows to dynamically size the buffer at runtime
        by flipping the pointer to the buffer container from the default
        container which is embedded in task_struct::tread::fpu to a
        dynamically allocated container with a larger register buffer.

      - Convert the code over to the new fpstate mechanism.

      - Consolidate the KVM FPU handling by moving the FPU related code
        into the FPU core which removes the number of exports and avoids
        adding even more export when AMX has to be supported in KVM.
        This also removes duplicated code which was of course
        unnecessary different and incomplete in the KVM copy.

      - Simplify the KVM FPU buffer handling by utilizing the new
        fpstate container and just switching the buffer pointer from the
        user space buffer to the KVM guest buffer when entering
        vcpu_run() and flipping it back when leaving the function. This
        cuts the memory requirements of a vCPU for FPU buffers in half
        and avoids pointless memory copy operations.

        This also solves the so far unresolved problem of adding AMX
        support because the current FPU buffer handling of KVM inflicted
        a circular dependency between adding AMX support to the core and
        to KVM. With the new scheme of switching fpstate AMX support can
        be added to the core code without affecting KVM.

      - Replace various variables with proper data structures so the
        extra information required for adding dynamically enabled FPU
        features (AMX) can be added in one place

 - Add AMX (Advanced Matrix eXtensions) support (finally):

   AMX is a large XSTATE component which is going to be available with
   Saphire Rapids XEON CPUs. The feature comes with an extra MSR
   (MSR_XFD) which allows to trap the (first) use of an AMX related
   instruction, which has two benefits:

    1) It allows the kernel to control access to the feature

    2) It allows the kernel to dynamically allocate the large register
       state buffer instead of burdening every task with the the extra
       8K or larger state storage.

   It would have been great to gain this kind of control already with
   AVX512.

   The support comes with the following infrastructure components:

    1) arch_prctl() to
        - read the supported features (equivalent to XGETBV(0))
        - read the permitted features for a task
        - request permission for a dynamically enabled feature

       Permission is granted per process, inherited on fork() and
       cleared on exec(). The permission policy of the kernel is
       restricted to sigaltstack size validation, but the syscall
       obviously allows further restrictions via seccomp etc.

    2) A stronger sigaltstack size validation for sys_sigaltstack(2)
       which takes granted permissions and the potentially resulting
       larger signal frame into account. This mechanism can also be used
       to enforce factual sigaltstack validation independent of dynamic
       features to help with finding potential victims of the 2K
       sigaltstack size constant which is broken since AVX512 support
       was added.

    3) Exception handling for #NM traps to catch first use of a extended
       feature via a new cause MSR. If the exception was caused by the
       use of such a feature, the handler checks permission for that
       feature. If permission has not been granted, the handler sends a
       SIGILL like the #UD handler would do if the feature would have
       been disabled in XCR0. If permission has been granted, then a new
       fpstate which fits the larger buffer requirement is allocated.

       In the unlikely case that this allocation fails, the handler
       sends SIGSEGV to the task. That's not elegant, but unavoidable as
       the other discussed options of preallocation or full per task
       permissions come with their own set of horrors for kernel and/or
       userspace. So this is the lesser of the evils and SIGSEGV caused
       by unexpected memory allocation failures is not a fundamentally
       new concept either.

       When allocation succeeds, the fpstate properties are filled in to
       reflect the extended feature set and the resulting sizes, the
       fpu::fpstate pointer is updated accordingly and the trap is
       disarmed for this task permanently.

    4) Enumeration and size calculations

    5) Trap switching via MSR_XFD

       The XFD (eXtended Feature Disable) MSR is context switched with
       the same life time rules as the FPU register state itself. The
       mechanism is keyed off with a static key which is default
       disabled so !AMX equipped CPUs have zero overhead. On AMX enabled
       CPUs the overhead is limited by comparing the tasks XFD value
       with a per CPU shadow variable to avoid redundant MSR writes. In
       case of switching from a AMX using task to a non AMX using task
       or vice versa, the extra MSR write is obviously inevitable.

       All other places which need to be aware of the variable feature
       sets and resulting variable sizes are not affected at all because
       they retrieve the information (feature set, sizes) unconditonally
       from the fpstate properties.

    6) Enable the new AMX states

   Note, this is relatively new code despite the fact that AMX support
   is in the works for more than a year now.

   The big refactoring of the FPU code, which allowed to do a proper
   integration has been started exactly 3 weeks ago. Refactoring of the
   existing FPU code and of the original AMX patches took a week and has
   been subject to extensive review and testing. The only fallout which
   has not been caught in review and testing right away was restricted
   to AMX enabled systems, which is completely irrelevant for anyone
   outside Intel and their early access program. There might be dragons
   lurking as usual, but so far the fine grained refactoring has held up
   and eventual yet undetected fallout is bisectable and should be
   easily addressable before the 5.16 release. Famous last words...

   Many thanks to Chang Bae and Dave Hansen for working hard on this and
   also to the various test teams at Intel who reserved extra capacity
   to follow the rapid development of this closely which provides the
   confidence level required to offer this rather large update for
   inclusion into 5.16-rc1

* tag 'x86-fpu-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
  Documentation/x86: Add documentation for using dynamic XSTATE features
  x86/fpu: Include vmalloc.h for vzalloc()
  selftests/x86/amx: Add context switch test
  selftests/x86/amx: Add test cases for AMX state management
  x86/fpu/amx: Enable the AMX feature in 64-bit mode
  x86/fpu: Add XFD handling for dynamic states
  x86/fpu: Calculate the default sizes independently
  x86/fpu/amx: Define AMX state components and have it used for boot-time checks
  x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers
  x86/fpu/xstate: Add fpstate_realloc()/free()
  x86/fpu/xstate: Add XFD #NM handler
  x86/fpu: Update XFD state where required
  x86/fpu: Add sanity checks for XFD
  x86/fpu: Add XFD state to fpstate
  x86/msr-index: Add MSRs for XFD
  x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit
  x86/fpu: Reset permission and fpstate on exec()
  x86/fpu: Prepare fpu_clone() for dynamically enabled features
  x86/fpu/signal: Prepare for variable sigframe length
  x86/signal: Use fpu::__state_user_size for sigalt stack validation
  ...
2021-11-01 14:03:56 -07:00
Linus Torvalds
9a7e0a90a4 Scheduler updates:
- Revert the printk format based wchan() symbol resolution as it can leak
    the raw value in case that the symbol is not resolvable.
 
  - Make wchan() more robust and work with all kind of unwinders by
    enforcing that the task stays blocked while unwinding is in progress.
 
  - Prevent sched_fork() from accessing an invalid sched_task_group
 
  - Improve asymmetric packing logic
 
  - Extend scheduler statistics to RT and DL scheduling classes and add
    statistics for bandwith burst to the SCHED_FAIR class.
 
  - Properly account SCHED_IDLE entities
 
  - Prevent a potential deadlock when initial priority is assigned to a
    newly created kthread. A recent change to plug a race between cpuset and
    __sched_setscheduler() introduced a new lock dependency which is now
    triggered. Break the lock dependency chain by moving the priority
    assignment to the thread function.
 
  - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
 
  - Improve idle balancing in general and especially for NOHZ enabled
    systems.
 
  - Provide proper interfaces for live patching so it does not have to
    fiddle with scheduler internals.
 
  - Add cluster aware scheduling support.
 
  - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
    scheduler options and delaying mmdrop)
 
  - The usual small tweaks and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
 iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
 /1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
 aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
 3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
 ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
 vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
 mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
 V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
 s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
 i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
 d2qWG7UvoseT+g==
 =fgtS
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Thomas Gleixner:

 - Revert the printk format based wchan() symbol resolution as it can
   leak the raw value in case that the symbol is not resolvable.

 - Make wchan() more robust and work with all kind of unwinders by
   enforcing that the task stays blocked while unwinding is in progress.

 - Prevent sched_fork() from accessing an invalid sched_task_group

 - Improve asymmetric packing logic

 - Extend scheduler statistics to RT and DL scheduling classes and add
   statistics for bandwith burst to the SCHED_FAIR class.

 - Properly account SCHED_IDLE entities

 - Prevent a potential deadlock when initial priority is assigned to a
   newly created kthread. A recent change to plug a race between cpuset
   and __sched_setscheduler() introduced a new lock dependency which is
   now triggered. Break the lock dependency chain by moving the priority
   assignment to the thread function.

 - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.

 - Improve idle balancing in general and especially for NOHZ enabled
   systems.

 - Provide proper interfaces for live patching so it does not have to
   fiddle with scheduler internals.

 - Add cluster aware scheduling support.

 - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
   scheduler options and delaying mmdrop)

 - The usual small tweaks and improvements all over the place

* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
  sched/fair: Cleanup newidle_balance
  sched/fair: Remove sysctl_sched_migration_cost condition
  sched/fair: Wait before decaying max_newidle_lb_cost
  sched/fair: Skip update_blocked_averages if we are defering load balance
  sched/fair: Account update_blocked_averages in newidle_balance cost
  x86: Fix __get_wchan() for !STACKTRACE
  sched,x86: Fix L2 cache mask
  sched/core: Remove rq_relock()
  sched: Improve wake_up_all_idle_cpus() take #2
  irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
  irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
  irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
  sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
  sched: Add cluster scheduler level for x86
  sched: Add cluster scheduler level in core and related Kconfig for ARM64
  topology: Represent clusters of CPUs within a die
  sched: Disable -Wunused-but-set-variable
  sched: Add wrapper for get_wchan() to keep task blocked
  x86: Fix get_wchan() to support the ORC unwinder
  proc: Use task_is_running() for wchan in /proc/$pid/stat
  ...
2021-11-01 13:48:52 -07:00
Linus Torvalds
43aa0a195f objtool updates:
- Improve retpoline code patching by separating it from alternatives which
    reduces memory footprint and allows to do better optimizations in the
    actual runtime patching.
 
  - Add proper retpoline support for x86/BPF
 
  - Address noinstr warnings in x86/kvm, lockdep and paravirtualization code
 
  - Add support to handle pv_opsindirect calls in the noinstr analysis
 
  - Classify symbols upfront and cache the result to avoid redundant
    str*cmp() invocations.
 
  - Add a CFI hash to reduce memory consumption which also reduces runtime
    on a allyesconfig by ~50%
 
  - Adjust XEN code to make objtool handling more robust and as a side
    effect to prevent text fragmentation due to placement of the hypercall
    page.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/GFgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoc1JD/0Sz6seP2OUMxbMT3gCcFo9sMvYTdsM
 7WuGFbBbnCIo7g8JH7k0zRRBigptMp2eUtQXKkgaaIbWN4JbuVKf8KxN5/qXxLi4
 fJ12QnNTGH9N2jtzl5wKmpjaKJnnJMD9D10XwoR+T6gn6NHd+AgLEs7GxxuQUlgo
 eC9oEXhNHC8uNhiZc38EwfwmItI1bRgaLrnZWIL4rYGSMxfCK1/cEOpWrFfX9wmj
 /diB6oqMyPXZXMCtgpX7TniUr5XOTCcUkeO9mQv5bmyq/YM/8hrTbcVSJlsVYLvP
 EsBnUSHAcfLFiHXwa1RNiIGdbiPjbN+UYeXGAvqF58f3e5dTIHtN/UmWo7OH93If
 9rLMVNcMpsfPx7QRk2IxEPumLCkyfwjzfKrVDM6P6TKEIUzD1og4IK9gTlfykVsh
 56G5XiCOC/X2x8IMxKTLGuBiAVLFHXK/rSwoqhvNEWBFKDbP13QWs0LurBcW09Sa
 /kQI9pIBT1xFA/R+OY5Xy1cqNVVK1Gxmk8/bllCijA9pCFSCFM4hLZE5CevdrBCV
 h5SdqEK5hIlzFyypXfsCik/4p/+rfvlGfUKtFsPctxx29SPe+T0orx+l61jiWQok
 rZOflwMawK5lDuASHrvNHGJcWaTwoo3VcXMQDnQY0Wulc43J5IFBaPxkZzgyd+S1
 4lktHxatrCMUgw==
 =pfZi
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Thomas Gleixner:

 - Improve retpoline code patching by separating it from alternatives
   which reduces memory footprint and allows to do better optimizations
   in the actual runtime patching.

 - Add proper retpoline support for x86/BPF

 - Address noinstr warnings in x86/kvm, lockdep and paravirtualization
   code

 - Add support to handle pv_opsindirect calls in the noinstr analysis

 - Classify symbols upfront and cache the result to avoid redundant
   str*cmp() invocations.

 - Add a CFI hash to reduce memory consumption which also reduces
   runtime on a allyesconfig by ~50%

 - Adjust XEN code to make objtool handling more robust and as a side
   effect to prevent text fragmentation due to placement of the
   hypercall page.

* tag 'objtool-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  bpf,x86: Respect X86_FEATURE_RETPOLINE*
  bpf,x86: Simplify computing label offsets
  x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
  x86/alternative: Add debug prints to apply_retpolines()
  x86/alternative: Try inline spectre_v2=retpoline,amd
  x86/alternative: Handle Jcc __x86_indirect_thunk_\reg
  x86/alternative: Implement .retpoline_sites support
  x86/retpoline: Create a retpoline thunk array
  x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h
  x86/asm: Fixup odd GEN-for-each-reg.h usage
  x86/asm: Fix register order
  x86/retpoline: Remove unused replacement symbols
  objtool,x86: Replace alternatives with .retpoline_sites
  objtool: Shrink struct instruction
  objtool: Explicitly avoid self modifying code in .altinstr_replacement
  objtool: Classify symbols
  objtool: Support pv_opsindirect calls for noinstr
  x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
  x86/xen: Mark xen_force_evtchn_callback() noinstr
  x86/xen: Make irq_disable() noinstr
  ...
2021-11-01 13:24:43 -07:00
Linus Torvalds
5a47ebe98e Updates for the interrupt subsystem:
Core changes:
 
   - Prevent a potential deadlock when initial priority is assigned to a
     newly created interrupt thread. A recent change to plug a race between
     cpuset and __sched_setscheduler() introduced a new lock dependency
     which is now triggered. Break the lock dependency chain by moving the
     priority assignment to the thread function.
 
   - A couple of small updates to make the irq core RT safe.
 
   - Confine the irq_cpu_online/offline() API to the only left unfixable
     user Cavium Octeon so that it does not grow new usage.
 
   - A small documentation update
 
  Driver changes:
 
   - A large cross architecture rework to move irq_enter/exit() into the
     architecture code to make addressing the NOHZ_FULL/RCU issues simpler.
 
   - The obligatory new irq chip driver for Microchip EIC
 
   - Modularize a few irq chip drivers
 
   - Expand usage of devm_*() helpers throughout the driver code
 
   - The usual small fixes and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF+8BUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWs2EACeNbL93aIFokd2/RllRSr4VvMjKNyW
 PpA0RYDOz1Jh4ldK+7b/EYapKgAkR3yyOtz+jyjRE7jsQK0pQeLtYNLd3cTzsD7K
 LCvl8rq6cbRqyFoSC15UKKNbQ/f+o/3LeGPoipr5NQZRMepxk2J/yBCNRXHvIbe6
 oLMQJUgw7KKtvCrCUX9OSei4F09T1qsNrIYb7QafP5+v0zndAT7uKNivWrKGFrsh
 Uk9epoH3hIkvQERkpmzwJEJaq6oyqhoYQy7ZRGayEPwIdCyivJGZrVX0mZk1LX58
 uc8u5grIslX9MqZEQWBweR5y7nISB494NGKmoCInu66U/+3DSOg3AGH2Rfw8PNFZ
 lMKdXzYoDgv2y6LeiLtTUKV4K1NBRXo0BhwSGbPw0o6C03/x003kG824Y+/naU75
 6q05BZSia1PagPV3e0UAm0A2Rnjj/5uso2fEk0eGBSGM27jf9SQcSE8DVrEiLRd1
 2N5uAXbMdfu4xACsEI1Uxu1KNOSQnUhBCy0X6Ppj1a083kLG7jg/126ebb05R8G4
 MF79PFt+xUPSzmuKc/xwCdANtW+zzoyjYl5w6mwELBJ9veNbPShokGBTN/qzjXKZ
 vdr3/pXx95lRAzFnGOnETesm3IyObruU4K8NbMKd2b+eYa0w1WuZCKnutGLfsqxg
 byhCEw459e3P2g==
 =r6ln
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Core changes:

   - Prevent a potential deadlock when initial priority is assigned to a
     newly created interrupt thread. A recent change to plug a race
     between cpuset and __sched_setscheduler() introduced a new lock
     dependency which is now triggered. Break the lock dependency chain
     by moving the priority assignment to the thread function.

   - A couple of small updates to make the irq core RT safe.

   - Confine the irq_cpu_online/offline() API to the only left unfixable
     user Cavium Octeon so that it does not grow new usage.

   - A small documentation update

  Driver changes:

   - A large cross architecture rework to move irq_enter/exit() into the
     architecture code to make addressing the NOHZ_FULL/RCU issues
     simpler.

   - The obligatory new irq chip driver for Microchip EIC

   - Modularize a few irq chip drivers

   - Expand usage of devm_*() helpers throughout the driver code

   - The usual small fixes and improvements all over the place"

* tag 'irq-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  h8300: Fix linux/irqchip.h include mess
  dt-bindings: irqchip: renesas-irqc: Document r8a774e1 bindings
  MIPS: irq: Avoid an unused-variable error
  genirq: Hide irq_cpu_{on,off}line() behind a deprecated option
  irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()
  MIPS: loongson64: Drop call to irq_cpu_offline()
  irq: remove handle_domain_{irq,nmi}()
  irq: remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
  irq: riscv: perform irqentry in entry code
  irq: openrisc: perform irqentry in entry code
  irq: csky: perform irqentry in entry code
  irq: arm64: perform irqentry in entry code
  irq: arm: perform irqentry in entry code
  irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
  irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQ
  irq: arc: avoid CONFIG_HANDLE_DOMAIN_IRQ
  irq: add generic_handle_arch_irq()
  irq: unexport handle_irq_desc()
  irq: simplify handle_domain_{irq,nmi}()
  irq: mips: simplify do_domain_IRQ()
  ...
2021-11-01 13:09:10 -07:00
Joerg Roedel
52d96919d6 Merge branches 'apple/dart', 'arm/mediatek', 'arm/renesas', 'arm/smmu', 'arm/tegra', 'iommu/fixes', 'x86/amd', 'x86/vt-d' and 'core' into next 2021-10-31 22:26:53 +01:00
Linus Torvalds
ca5e83eddc * Fixes for s390 interrupt delivery
* Fixes for Xen emulator bugs showing up as debug kernel WARNs
 * Fix another issue with SEV/ES string I/O VMGEXITs
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmF6uGIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNRagf/Srvk9lNcRh4cEzsczErKMyr3xOqA
 jgsTSqgl1ExJI9sBLMpVYBOFGILMaMSrhLPIltKPy0Bj/E+hw8WOQwPa44QjWlSD
 MAUxO1Nryt9Luc2L8uSd1c//g4fr4V1BhOaumk1lM14Q8EDfQBcDIMI2ZKueMU1+
 2Q+n8/AsG63jQIINwKNidof0dzRtbfcE30Wq/8QHttIPo5wt6l0YClOlOikqNY8N
 5+WSQFmuutHIXftq5Jb/Ldn/+HVukWZyZOEVwLnBpM9uBvIubNgcEakqvxsaVtAn
 FHdvnA+Bk99/Xuhl+wRLQo8ofzQIQ13RQv3HPArJAJv34oAJZx2rNObVlA==
 =6ofB
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

 - Fixes for s390 interrupt delivery

 - Fixes for Xen emulator bugs showing up as debug kernel WARNs

 - Fix another issue with SEV/ES string I/O VMGEXITs

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Take srcu lock in post_kvm_run_save()
  KVM: SEV-ES: fix another issue with string I/O VMGEXITs
  KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block()
  KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock
  KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpu
  KVM: s390: clear kicked_mask before sleeping again
2021-10-31 11:19:02 -07:00
Paolo Bonzini
4e33868433 KVM/arm64 updates for Linux 5.16
- More progress on the protected VM front, now with the full
   fixed feature set as well as the limitation of some hypercalls
   after initialisation.
 
 - Cleanup of the RAZ/WI sysreg handling, which was pointlessly
   complicated
 
 - Fixes for the vgic placement in the IPA space, together with a
   bunch of selftests
 
 - More memcg accounting of the memory allocated on behalf of a guest
 
 - Timer and vgic selftests
 
 - Workarounds for the Apple M1 broken vgic implementation
 
 - KConfig cleanups
 
 - New kvmarm.mode=none option, for those who really dislike us
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmF7u5YPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD6w8QAIKDLJCTqkxv5Vh4ZSmtXxg4gTZMBlg8oSQ8
 sVL639aqBvFe3A6Vmz6IwBm+NT7Sm1zxkuH9qHzVR1gmXq0oLYNrIuyrzRW8PvqO
 hIkSRRoVsf03755TmkxwR7/2jAFxb6FhEVAy6VWdQyI44orihIPvMp8aTIq+jvU+
 XoNGb/rPf9HpSUtvuaHYvZhSZBhoi5dRnkr33R1+VR69n7Axs8lm905xcl6Pt0a0
 QqYZWQvFu/BXPyNflG7LUsegRF/iiV2vNTbNNowkzlV5suqxBpJAp6ApDL/gWrHv
 ya/6cMqicSjBIkWnawhXY98w6/5xfzK4IV/zc00FNWOlUdVP89Thqrgc8EkigS9R
 BGcxFFqj41snr+ensSBBIkNtV+dBX52H3rUE0F9seiTXm8QWI86JobdeNadT8tUP
 TXdOeCUcA+cp4Ngln18lsbOEaBkPA5H1po1nUFPHbKnVOxnqXScB7E/xF6rAbryV
 m+Z+oidU7MyS/Ev/Da0ww/XFx7cs2ez9EgeQvjcdFAvUMqS6kcXEExvgGYlm+KRQ
 GBMKPLCNHKdflMANoSpol7MZUmPJ45XoWKW1rntj2r9X+oJW2Z2hEx32xrWDJdqK
 ixnbjog5kNZb0CjLGsUC90lo2hpRJecaLhAjgTLYaNC1QxGPrt92eat6gnwuMTBc
 mpADqi7w
 =qBAO
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.16

- More progress on the protected VM front, now with the full
  fixed feature set as well as the limitation of some hypercalls
  after initialisation.

- Cleanup of the RAZ/WI sysreg handling, which was pointlessly
  complicated

- Fixes for the vgic placement in the IPA space, together with a
  bunch of selftests

- More memcg accounting of the memory allocated on behalf of a guest

- Timer and vgic selftests

- Workarounds for the Apple M1 broken vgic implementation

- KConfig cleanups

- New kvmarm.mode=none option, for those who really dislike us
2021-10-31 02:28:48 -04:00
Borislav Petkov
2258a6fc33 irqchip updates for Linux 5.16
- A large cross-arch rework to move irq_enter()/irq_exit() into
   the arch code, and removing it from the generic irq code.
   Thanks to Mark Rutland for the huge effort!
 
 - A few irqchip drivers are made modular (broadcom, meson), because
   that's apparently a thing...
 
 - A new driver for the Microchip External Interrupt Controller
 
 - The irq_cpu_offline()/irq_cpu_online() API is now deprecated and
   can only be selected on the Cavium Octeon platform. Once this
   platform is removed, the API will be removed at the same time.
 
 - A sprinkle of devm_* helper, as people seem to love that.
 
 - The usual spattering of small fixes and minor improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmF7rnYPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDudEP/i3WmAcXQYKJpRz075M8S6PZ8BXeTKUe7WMK
 rrslOkxDqyQ2SVqMLII1xkyOWafC7BnRjexm/ASwrBsc6GyQha7B2YsKy1m/NEwy
 ZcnXCCIg71LpDrUyxbscFxB6s5OvUN0yv+a+WnEAmOXpD1x3S8x5tHmRUfsRGksR
 zOhKaYPLqgCiw3VHRuhEKFUA+CMjXxHhw3lJv6gPh6TRjdXQuJouau2dBzr7tQEd
 h9Jq2OatWXiwPr00hQDDILbdH4+fQYKJqsaaLNX0Pxexg2slRWHwrgA2o/w0tTVW
 99HOc9hN04QoLkDfyQis40L1YC7VOIr5OAqzUehdYELT8UsrZS288Rr6099n4M/Y
 x8Nzcg4eA+jVUz1VMEBA9qR45fKjEMcTAXyNAAYLsov/obSgGH/PSOYaunG2xvYq
 iiJBM/g506PTw2MRROqrH5oKiER3tTD65f5NM0mJONr3xEm9XT74m0JIodgVZ4QX
 0LMJytgetg0b+yZcFY25GhJ+2mGoYwB2eiZBVjE3FyLSs0epcuzogaKRi5axK4sN
 rvlAtgNZiOg7tzRqiPIQKSzO3dCyJjR86t5fd1cRBl/WPmywvA2Lkcgd09V2oyJe
 FEp1QllpgYw0a5+aIS+bdOUK63FLnLdEMas7WgSAAxA4/jjgP1p+SbytOD81psL0
 4r02YN2A
 =/NLR
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-5.16' into irq/core

Merge irqchip updates for Linux 5.16 from Marc Zyngier:

- A large cross-arch rework to move irq_enter()/irq_exit() into
  the arch code, and removing it from the generic irq code.
  Thanks to Mark Rutland for the huge effort!

- A few irqchip drivers are made modular (broadcom, meson), because
  that's apparently a thing...

- A new driver for the Microchip External Interrupt Controller

- The irq_cpu_offline()/irq_cpu_online() API is now deprecated and
  can only be selected on the Cavium Octeon platform. Once this
  platform is removed, the API will be removed at the same time.

- A sprinkle of devm_* helper, as people seem to love that.

- The usual spattering of small fixes and minor improvements.

* tag 'irqchip-5.16': (912 commits)
  h8300: Fix linux/irqchip.h include mess
  dt-bindings: irqchip: renesas-irqc: Document r8a774e1 bindings
  MIPS: irq: Avoid an unused-variable error
  genirq: Hide irq_cpu_{on,off}line() behind a deprecated option
  irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()
  MIPS: loongson64: Drop call to irq_cpu_offline()
  irq: remove handle_domain_{irq,nmi}()
  irq: remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
  irq: riscv: perform irqentry in entry code
  irq: openrisc: perform irqentry in entry code
  irq: csky: perform irqentry in entry code
  irq: arm64: perform irqentry in entry code
  irq: arm: perform irqentry in entry code
  irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
  irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQ
  irq: arc: avoid CONFIG_HANDLE_DOMAIN_IRQ
  irq: add generic_handle_arch_irq()
  irq: unexport handle_irq_desc()
  irq: simplify handle_domain_{irq,nmi}()
  irq: mips: simplify do_domain_IRQ()
  ...

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211029083332.3680101-1-maz@kernel.org
2021-10-29 11:58:35 +02:00
Peter Zijlstra
87c87ecd00 bpf,x86: Respect X86_FEATURE_RETPOLINE*
Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and
unconditionally emits a thunk call, this is sub-optimal and doesn't
match the regular, compiler generated, code.

Update the i386 JIT to emit code equal to what the compiler emits for
the regular kernel text (IOW. a plain THUNK call).

Update the x86_64 JIT to emit code similar to the result of compiler
and kernel rewrites as according to X86_FEATURE_RETPOLINE* flags.
Inlining RETPOLINE_AMD (lfence; jmp *%reg) and !RETPOLINE (jmp *%reg),
while doing a THUNK call for RETPOLINE.

This removes the hard-coded retpoline thunks and shrinks the generated
code. Leaving a single retpoline thunk definition in the kernel.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.614772675@infradead.org
2021-10-28 23:25:29 +02:00
Peter Zijlstra
7508500900 x86/alternative: Implement .retpoline_sites support
Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off. This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.

This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.

One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.

Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.232495794@infradead.org
2021-10-28 23:25:27 +02:00
Peter Zijlstra
1a6f74429c x86/retpoline: Create a retpoline thunk array
Stick all the retpolines in a single symbol and have the individual
thunks as inner labels, this should guarantee thunk order and layout.

Previously there were 16 (or rather 15 without rsp) separate symbols and
a toolchain might reasonably expect it could displace them however it
liked, with disregard for their relative position.

However, now they're part of a larger symbol. Any change to their
relative position would disrupt this larger _array symbol and thus not
be sound.

This is the same reasoning used for data symbols. On their own there
is no guarantee about their relative position wrt to one aonther, but
we're still able to do arrays because an array as a whole is a single
larger symbol.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.169659320@infradead.org
2021-10-28 23:25:27 +02:00
Peter Zijlstra
6fda8a3886 x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h
Because it makes no sense to split the retpoline gunk over multiple
headers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.106290934@infradead.org
2021-10-28 23:25:27 +02:00
Peter Zijlstra
b6d3d9944b x86/asm: Fixup odd GEN-for-each-reg.h usage
Currently GEN-for-each-reg.h usage leaves GEN defined, relying on any
subsequent usage to start with #undef, which is rude.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.041792350@infradead.org
2021-10-28 23:25:26 +02:00
Peter Zijlstra
a92ede2d58 x86/asm: Fix register order
Ensure the register order is correct; this allows for easy translation
between register number and trampoline and vice-versa.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.978573921@infradead.org
2021-10-28 23:25:26 +02:00
Peter Zijlstra
4fe79e710d x86/retpoline: Remove unused replacement symbols
Now that objtool no longer creates alternatives, these replacement
symbols are no longer needed, remove them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.915051744@infradead.org
2021-10-28 23:25:26 +02:00
Tianyu Lan
faff44069f x86/hyperv: Add Write/Read MSR registers via ghcb page
Hyperv provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyperv requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB page.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-7-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-10-28 11:22:38 +00:00
Tianyu Lan
810a521265 x86/hyperv: Add new hvcall guest address host visibility support
Add new hvcall guest address host visibility support to mark
memory visible to host. Call it inside set_memory_decrypted
/encrypted(). Add HYPERVISOR feature check in the
hv_is_isolation_supported() to optimize in non-virtualization
environment.

Acked-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-4-ltykernel@gmail.com
[ wei: fix conflicts with tip ]
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-10-28 11:21:33 +00:00
Tianyu Lan
0cc4f6d9f0 x86/hyperv: Initialize GHCB page in Isolation VM
Hyperv exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via ghcb page.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-2-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-10-28 10:47:52 +00:00
Wei Liu
e82f2069b5 Merge remote-tracking branch 'tip/x86/cc' into hyperv-next 2021-10-28 10:46:03 +00:00
Dave Airlie
970eae1560 Linux 5.15-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmF298ceHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGIJYH/1rsEFQQ6caeQdy1
 z9eFIe48DNM4l7bFk+qEj2UAbzPdahVJ299Mg5fW0n2CDemOc9/n0b9TxQ37YObi
 mOzu0xwJVupIxkyFMPQSSc2q8aLm67NSpJy08DsmaNses5hSvu8x15RPHLQTybjt
 SwtKns+jpCq79P1GWbrB5e5UkLb0VNoxNp4L1U4pMrYGcEkJUXbaxNY2V/JcXdM7
 Vtn+qN0T/J6V6QVftv0t8Ecj3bjEnmL3kZHaTaNg3dGeKRpCGyHc5lcBQ0cNFG6t
 vjZ9VbuhBzGI3TN2tHH5hpA1UXo7HPBBCwQqxF1jeGLGHULikYwZ3TAPWqL3QZqC
 9cxr9SY=
 =p75d
 -----END PGP SIGNATURE-----

BackMerge tag 'v5.15-rc7' into drm-next

The msm next tree is based on rc3, so let's just backmerge rc7 before pulling it in.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2021-10-28 14:59:38 +10:00
Chang S. Bae
2308ee57d9 x86/fpu/amx: Enable the AMX feature in 64-bit mode
Add the AMX state components in XFEATURE_MASK_USER_SUPPORTED and the
TILE_DATA component to the dynamic states and update the permission check
table accordingly.

This is only effective on 64 bit kernels as for 32bit kernels
XFEATURE_MASK_TILE is defined as 0.

TILE_DATA is caller-saved state and the only dynamic state. Add build time
sanity check to ensure the assumption that every dynamic feature is caller-
saved.

Make AMX state depend on XFD as it is dynamic feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-24-chang.seok.bae@intel.com
2021-10-26 10:53:03 +02:00
Chang S. Bae
eec2113eab x86/fpu/amx: Define AMX state components and have it used for boot-time checks
The XSTATE initialization uses check_xstate_against_struct() to sanity
check the size of XSTATE-enabled features. AMX is a XSAVE-enabled feature,
and its size is not hard-coded but discoverable at run-time via CPUID.

The AMX state is composed of state components 17 and 18, which are all user
state components. The first component is the XTILECFG state of a 64-byte
tile-related control register. The state component 18, called XTILEDATA,
contains the actual tile data, and the state size varies on
implementations. The architectural maximum, as defined in the CPUID(0x1d,
1): EAX[15:0], is a byte less than 64KB. The first implementation supports
8KB.

Check the XTILEDATA state size dynamically. The feature introduces the new
tile register, TMM. Define one register struct only and read the number of
registers from CPUID. Cross-check the overall size with CPUID again.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-21-chang.seok.bae@intel.com
2021-10-26 10:53:02 +02:00
Chang S. Bae
500afbf645 x86/fpu/xstate: Add fpstate_realloc()/free()
The fpstate embedded in struct fpu is the default state for storing the FPU
registers. It's sized so that the default supported features can be stored.
For dynamically enabled features the register buffer is too small.

The #NM handler detects first use of a feature which is disabled in the
XFD MSR. After handling permission checks it recalculates the size for
kernel space and user space state and invokes fpstate_realloc() which
tries to reallocate fpstate and install it.

Provide the allocator function which checks whether the current buffer size
is sufficient and if not allocates one. If allocation is successful the new
fpstate is initialized with the new features and sizes and the now enabled
features is removed from the task's XFD mask.

realloc_fpstate() uses vzalloc(). If use of this mechanism grows to
re-allocate buffers larger than 64KB, a more sophisticated allocation
scheme that includes purpose-built reclaim capability might be justified.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-19-chang.seok.bae@intel.com
2021-10-26 10:53:02 +02:00
Chang S. Bae
783e87b404 x86/fpu/xstate: Add XFD #NM handler
If the XFD MSR has feature bits set then #NM will be raised when user space
attempts to use an instruction related to one of these features.

When the task has no permissions to use that feature, raise SIGILL, which
is the same behavior as #UD.

If the task has permissions, calculate the new buffer size for the extended
feature set and allocate a larger fpstate. In the unlikely case that
vzalloc() fails, SIGSEGV is raised.

The allocation function will be added in the next step. Provide a stub
which fails for now.

  [ tglx: Updated serialization ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-18-chang.seok.bae@intel.com
2021-10-26 10:53:02 +02:00
Chang S. Bae
8bf26758ca x86/fpu: Add XFD state to fpstate
Add storage for XFD register state to struct fpstate. This will be used to
store the XFD MSR state. This will be used for switching the XFD MSR when
FPU content is restored.

Add a per-CPU variable to cache the current MSR value so the MSR has only
to be written when the values are different.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-15-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Chang S. Bae
dae1bd5838 x86/msr-index: Add MSRs for XFD
XFD introduces two MSRs:

    - IA32_XFD to enable/disable a feature controlled by XFD

    - IA32_XFD_ERR to expose to the #NM trap handler which feature
      was tried to be used for the first time.

Both use the same xstate-component bitmap format, used by XCR0.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-14-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Chang S. Bae
c351101678 x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit
Intel's eXtended Feature Disable (XFD) feature is an extension of the XSAVE
architecture. XFD allows the kernel to enable a feature state in XCR0 and
to receive a #NM trap when a task uses instructions accessing that state.

This is going to be used to postpone the allocation of a larger XSTATE
buffer for a task to the point where it is actually using a related
instruction after the permission to use that facility has been granted.

XFD is not used by the kernel, but only applied to userspace. This is a
matter of policy as the kernel knows how a fpstate is reallocated and the
XFD state.

The compacted XSAVE format is adjustable for dynamic features. Make XFD
depend on XSAVES.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-13-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Thomas Gleixner
9e798e9aa1 x86/fpu: Prepare fpu_clone() for dynamically enabled features
The default portion of the parent's FPU state is saved in a child task.
With dynamic features enabled, the non-default portion is not saved in a
child's fpstate because these register states are defined to be
caller-saved. The new task's fpstate is therefore the default buffer.

Fork inherits the permission of the parent.

Also, do not use memcpy() when TIF_NEED_FPU_LOAD is set because it is
invalid when the parent has dynamic features.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-11-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Thomas Gleixner
23686ef25d x86/fpu: Add basic helpers for dynamically enabled features
To allow building up the infrastructure required to support dynamically
enabled FPU features, add:

 - XFEATURES_MASK_DYNAMIC

   This constant will hold xfeatures which can be dynamically enabled.

 - fpu_state_size_dynamic()

   A static branch for 64-bit and a simple 'return false' for 32-bit.

   This helper allows to add dynamic-feature-specific changes to common
   code which is shared between 32-bit and 64-bit without #ifdeffery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-8-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Chang S. Bae
db8268df09 x86/arch_prctl: Add controls for dynamic XSTATE components
Dynamically enabled XSTATE features are by default disabled for all
processes. A process has to request permission to use such a feature.

To support this implement a architecture specific prctl() with the options:

   - ARCH_GET_XCOMP_SUPP

     Copies the supported feature bitmap into the user space provided
     u64 storage. The pointer is handed in via arg2

   - ARCH_GET_XCOMP_PERM

     Copies the process wide permitted feature bitmap into the user space
     provided u64 storage. The pointer is handed in via arg2

   - ARCH_REQ_XCOMP_PERM

     Request permission for a feature set. A feature set can be mapped to a
     facility, e.g. AMX, and can require one or more XSTATE components to
     be enabled.

     The feature argument is the number of the highest XSTATE component
     which is required for a facility to work.

     The request argument is not a user supplied bitmap because that makes
     filtering harder (think seccomp) and even impossible because to
     support 32bit tasks the argument would have to be a pointer.

The permission mechanism works this way:

   Task asks for permission for a facility and kernel checks whether that's
   supported. If supported it does:

     1) Check whether permission has already been granted

     2) Compute the size of the required kernel and user space buffer
        (sigframe) size.

     3) Validate that no task has a sigaltstack installed
        which is smaller than the resulting sigframe size

     4) Add the requested feature bit(s) to the permission bitmap of
        current->group_leader->fpu and store the sizes in the group
        leaders fpu struct as well.

If that is successful then the feature is still not enabled for any of the
tasks. The first usage of a related instruction will result in a #NM
trap. The trap handler validates the permission bit of the tasks group
leader and if permitted it installs a larger kernel buffer and transfers
the permission and size info to the new fpstate container which makes all
the FPU functions which require per task information aware of the extended
feature set.

  [ tglx: Adopted to new base code, added missing serialization,
          massaged namings, comments and changelog ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-7-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Thomas Gleixner
c33f0a81a2 x86/fpu: Add fpu_state_config::legacy_features
The upcoming prctl() which is required to request the permission for a
dynamically enabled feature will also provide an option to retrieve the
supported features. If the CPU does not support XSAVE, the supported
features would be 0 even when the CPU supports FP and SSE.

Provide separate storage for the legacy feature set to avoid that and fill
in the bits in the legacy init function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-6-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Thomas Gleixner
6f6a7c09c4 x86/fpu: Add members to struct fpu to cache permission information
Dynamically enabled features can be requested by any thread of a running
process at any time. The request does neither enable the feature nor
allocate larger buffers. It just stores the permission to use the feature
by adding the features to the permission bitmap and by calculating the
required sizes for kernel and user space.

The reallocation of the kernel buffer happens when the feature is used
for the first time which is caught by an exception. The permission
bitmap is then checked and if the feature is permitted, then it becomes
fully enabled. If not, the task dies similarly to a task which uses an
undefined instruction.

The size information is precomputed to allow proper sigaltstack size checks
once the feature is permitted, but not yet in use because otherwise this
would open race windows where too small stacks could be installed causing
a later fail on signal delivery.

Initialize them to the default feature set and sizes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-5-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Tianyu Lan
007faec014 x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV
Hyper-V needs to issue the GHCB HV call in order to read/write MSRs in
Isolation VMs. For that, expose sev_es_ghcb_hv_call().

The Hyper-V Isolation VMs are unenlightened guests and run a paravisor
at VMPL0 for communicating. GHCB pages are being allocated and set up
by that paravisor. Linux gets the GHCB page's physical address via
MSR_AMD64_SEV_ES_GHCB from the paravisor and should not change it.

Add a @set_ghcb_msr parameter to sev_es_ghcb_hv_call() to control
whether the function should set the GHCB's address prior to the call or
not and export that function for use by HyperV.

  [ bp: - Massage commit message
        - add a struct ghcb forward declaration to fix randconfig builds. ]

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-6-ltykernel@gmail.com
2021-10-25 18:11:42 +02:00
David Woodhouse
8228c77d8b KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock
On the preemption path when updating a Xen guest's runstate times, this
lock is taken inside the scheduler rq->lock, which is a raw spinlock.
This was shown in a lockdep warning:

[   89.138354] =============================
[   89.138356] [ BUG: Invalid wait context ]
[   89.138358] 5.15.0-rc5+ #834 Tainted: G S        I E
[   89.138360] -----------------------------
[   89.138361] xen_shinfo_test/2575 is trying to lock:
[   89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
[   89.138442] other info that might help us debug this:
[   89.138444] context-{5:5}
[   89.138445] 4 locks held by xen_shinfo_test/2575:
[   89.138447]  #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
[   89.138483]  #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
[   89.138526]  #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
[   89.138534]  #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
...
[   89.138695]  get_kvmclock_ns+0x1f/0x130 [kvm]
[   89.138734]  kvm_xen_update_runstate+0x14/0x90 [kvm]
[   89.138783]  kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
[   89.138830]  kvm_arch_vcpu_put+0xe6/0x170 [kvm]
[   89.138870]  kvm_sched_out+0x2f/0x40 [kvm]
[   89.138900]  __schedule+0x5de/0xbd0

Cc: stable@vger.kernel.org
Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
Fixes: 30b5c851af ("KVM: x86/xen: Add support for vCPU runstate information")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-25 08:14:38 -04:00
David Edmondson
e615e35589 KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
Should instruction emulation fail, include the VM exit reason, etc. in
the emulation_failure data passed to userspace, in order that the VMM
can report it as a debugging aid when describing the failure.

Suggested-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-25 06:48:24 -04:00
David Edmondson
0a62a0319a KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info
Extend the get_exit_info static call to provide the reason for the VM
exit. Modify relevant trace points to use this rather than extracting
the reason in the caller.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-25 06:48:24 -04:00
Thomas Gleixner
582b01b6ab x86/fpu: Remove old KVM FPU interface
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211022185313.074853631@linutronix.de
2021-10-23 17:05:19 +02:00
Thomas Gleixner
d69c1382e1 x86/kvm: Convert FPU handling to a single swap buffer
For the upcoming AMX support it's necessary to do a proper integration with
KVM. Currently KVM allocates two FPU structs which are used for saving the user
state of the vCPU thread and restoring the guest state when entering
vcpu_run() and doing the reverse operation before leaving vcpu_run().

With the new fpstate mechanism this can be reduced to one extra buffer by
swapping the fpstate pointer in current:🧵:fpu. This makes the
upcoming support for AMX and XFD simpler because then fpstate information
(features, sizes, xfd) are always consistent and it does not require any
nasty workarounds.

Convert the KVM FPU code over to this new scheme.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211022185313.019454292@linutronix.de
2021-10-23 16:13:29 +02:00
Thomas Gleixner
69f6ed1d14 x86/fpu: Provide infrastructure for KVM FPU cleanup
For the upcoming AMX support it's necessary to do a proper integration with
KVM. Currently KVM allocates two FPU structs which are used for saving the user
state of the vCPU thread and restoring the guest state when entering
vcpu_run() and doing the reverse operation before leaving vcpu_run().

With the new fpstate mechanism this can be reduced to one extra buffer by
swapping the fpstate pointer in current:🧵:fpu. This makes the
upcoming support for AMX and XFD simpler because then fpstate information
(features, sizes, xfd) are always consistent and it does not require any
nasty workarounds.

Provide:

  - An allocator which initializes the state properly

  - A replacement for the existing FPU swap mechanim

Aside of the reduced memory footprint, this also makes state switching
more efficient when TIF_FPU_NEED_LOAD is set. It does not require a
memcpy as the state is already correct in the to be swapped out fpstate.

The existing interfaces will be removed once KVM is converted over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211022185312.954684740@linutronix.de
2021-10-23 14:50:19 +02:00
Thomas Gleixner
75c52dad5e x86/fpu: Prepare for sanitizing KVM FPU code
For the upcoming AMX support it's necessary to do a proper integration with
KVM. To avoid more nasty hackery in KVM which violate encapsulation extend
struct fpu and fpstate so the fpstate switching can be consolidated and
simplified.

Currently KVM allocates two FPU structs which are used for saving the user
state of the vCPU thread and restoring the guest state when entering
vcpu_run() and doing the reverse operation before leaving vcpu_run().

With the new fpstate mechanism this can be reduced to one extra buffer by
swapping the fpstate pointer in current:🧵:fpu. This makes the
upcoming support for AMX and XFD simpler because then fpstate information
(features, sizes, xfd) are always consistent and it does not require any
nasty workarounds.

Add fpu::__task_fpstate to save the regular fpstate pointer while the task
is inside vcpu_run(). Add some state fields to fpstate to indicate the
nature of the state.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211022185312.896403942@linutronix.de
2021-10-23 13:14:50 +02:00
Linus Torvalds
cd82c4a73b * Cache coherency fix for SEV live migration
* Fix for instruction emulation with PKU
 * fixes for rare delaying of interrupt delivery
 * fix for SEV-ES buffer overflow
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmFy2tsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMrKggAq6JWuFGwJY8hq9hd/8SMvJUsmtmh
 ua7zKj8xi8w52yZNigCSllj3cOtpQ4pTpy9nhUBcXbGEWDNbZ9Tm6flYmvc6Hrt3
 iffXBtqri3ioSvQr908f+ceOAsX8ishA1ewbMKLmathGN6+GXa3KtqVAZ2t7z3Yp
 VX/I/xpViYGwhMPi5T1Yoj0SfVAEhO0ROodcGJXo2ddX/FVZTibqE/nONkXbgMP0
 gibf39N7JIti3oz+puLkFUnBKcdi/jy9yUjz01Rn315QrrFEsOsPhQGLR6Q24lgg
 7aarqbsoJQK6eJwNU/SxwpiZuj5lRsQVD0evkNd/JxDkGCa1T5cXUVILdg==
 =+1Ow
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more x86 kvm fixes from Paolo Bonzini:

 - Cache coherency fix for SEV live migration

 - Fix for instruction emulation with PKU

 - fixes for rare delaying of interrupt delivery

 - fix for SEV-ES buffer overflow

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed
  KVM: SEV-ES: keep INS functions together
  KVM: x86: remove unnecessary arguments from complete_emulator_pio_in
  KVM: x86: split the two parts of emulator_pio_in
  KVM: SEV-ES: clean up kvm_sev_es_ins/outs
  KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_out
  KVM: SEV-ES: rename guest_ins_data to sev_pio_data
  KVM: SEV: Flush cache on non-coherent systems before RECEIVE_UPDATE_DATA
  KVM: MMU: Reset mmu->pkru_mask to avoid stale data
  KVM: nVMX: promptly process interrupts delivered while in guest mode
  KVM: x86: check for interrupts before deciding whether to exit the fast path
2021-10-22 09:02:15 -10:00
Masami Hiramatsu
811b93ffaa x86/unwind: Compile kretprobe fixup code only if CONFIG_KRETPROBES=y
Compile kretprobe related stacktrace entry recovery code and
unwind_state::kr_cur field only when CONFIG_KRETPROBES=y.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-22 12:16:53 -04:00
Paolo Bonzini
ae095b16fc x86/sgx/virt: implement SGX_IOC_VEPC_REMOVE ioctl
For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

Some userspace implementations of virtual SGX would rather avoid having
to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the
virtual EPC.  For example, they could sandbox themselves after the guest
starts and forbid further calls to open(), in order to mitigate exploits
from untrusted guests.

Therefore, add a ioctl that does this with EREMOVE.  Userspace can
invoke the ioctl to bring its vEPC pages back to uninitialized state.
There is a possibility that some pages fail to be removed if they are
SECS pages, and the child and SECS pages could be in separate vEPC
regions.  Therefore, the ioctl returns the number of EREMOVE failures,
telling userspace to try the ioctl again after it's done with all
vEPC regions.  A more verbose description of the correct usage and
the possible error conditions is documented in sgx.rst.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20211021201155.1523989-3-pbonzini@redhat.com
2021-10-22 08:32:12 -07:00
Sean Christopherson
187c8833de KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism
Use a rw_semaphore instead of a mutex to coordinate APICv updates so that
vCPUs responding to requests can take the lock for read and run in
parallel.  Using a mutex forces serialization of vCPUs even though
kvm_vcpu_update_apicv() only touches data local to that vCPU or is
protected by a different lock, e.g. SVM's ir_list_lock.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211022004927.1448382-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22 11:20:16 -04:00
Paolo Bonzini
95e16b4792 KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed
The PIO scratch buffer is larger than a single page, and therefore
it is not possible to copy it in a single step to vcpu->arch/pio_data.
Bound each call to emulator_pio_in/out to a single page; keep
track of how many I/O operations are left in vcpu->arch.sev_pio_count,
so that the operation can be restarted in the complete_userspace_io
callback.

For OUT, this means that the previous kvm_sev_es_outs implementation
becomes an iterator of the loop, and we can consume the sev_pio_data
buffer before leaving to userspace.

For IN, instead, consuming the buffer and decreasing sev_pio_count
is always done in the complete_userspace_io callback, because that
is when the memcpy is done into sev_pio_data.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22 10:09:13 -04:00
Paolo Bonzini
b5998402e3 KVM: SEV-ES: rename guest_ins_data to sev_pio_data
We will be using this field for OUTS emulation as well, in case the
data that is pushed via OUTS spans more than one page.  In that case,
there will be a need to save the data pointer across exits to userspace.

So, change the name to something that refers to any kind of PIO.
Also spell out what it is used for, namely SEV-ES.

No functional change intended.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22 10:01:26 -04:00
Borislav Petkov
9d48960414 x86/microcode: Use the firmware_loader built-in API
The microcode loader has been looping through __start_builtin_fw down to
__end_builtin_fw to look for possibly built-in firmware for microcode
updates.

Now that the firmware loader code has exported an API for looping
through the kernel's built-in firmware section, use it and drop the x86
implementation in favor.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211021155843.1969401-4-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-22 14:13:50 +02:00
Sean Christopherson
9dadfc4a61 KVM: x86: Add vendor name to kvm_x86_ops, use it for error messages
Paul pointed out the error messages when KVM fails to load are unhelpful
in understanding exactly what went wrong if userspace probes the "wrong"
module.

Add a mandatory kvm_x86_ops field to track vendor module names, kvm_intel
and kvm_amd, and use the name for relevant error message when KVM fails
to load so that the user knows which module failed to load.

Opportunistically tweak the "disabled by bios" error message to clarify
that _support_ was disabled, not that the module itself was magically
disabled by BIOS.

Suggested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211018183929.897461-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22 05:19:28 -04:00
Wanpeng Li
540c7abe61 KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0
SDM section 18.2.3 mentioned that:

  "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
   any general-purpose or fixed-function counters via a single WRMSR."

It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr
MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop
global_ovf_ctrl variable.

Tested-by: Like Xu <likexu@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1634631160-67276-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22 05:19:28 -04:00
David Stevens
1e76a3ce0d KVM: cleanup allocation of rmaps and page tracking data
Unify the flags for rmaps and page tracking data, using a
single flag in struct kvm_arch and a single loop to go
over all the address spaces and memslots.  This avoids
code duplication between alloc_all_memslots_rmaps and
kvm_page_track_enable_mmu_write_tracking.

Signed-off-by: David Stevens <stevensd@chromium.org>
[This patch is the delta between David's v2 and v3, with conflicts
 fixed and my own commit message. - Paolo]
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22 05:19:25 -04:00
Thomas Gleixner
d72c87018d x86/fpu/xstate: Move remaining xfeature helpers to core
Now that everything is mopped up, move all the helpers and prototypes into
the core header. They are not required by the outside.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211014230739.514095101@linutronix.de
2021-10-22 11:10:48 +02:00
Thomas Gleixner
eda32f4f93 x86/fpu: Rework restore_regs_from_fpstate()
xfeatures_mask_fpstate() is no longer valid when dynamically enabled
features come into play.

Rework restore_regs_from_fpstate() so it takes a constant mask which will
then be applied against the maximum feature set so that the restore
operation brings all features which are not in the xsave buffer xfeature
bitmap into init state.

This ensures that if the previous task used a dynamically enabled feature
that the task which restores has all unused components properly initialized.

Cleanup the last user of xfeatures_mask_fpstate() as well and remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211014230739.461348278@linutronix.de
2021-10-22 11:09:15 +02:00
Thomas Gleixner
daddee2473 x86/fpu: Mop up xfeatures_mask_uabi()
Use the new fpu_user_cfg to retrieve the information instead of
xfeatures_mask_uabi() which will be no longer correct when dynamically
enabled features become available.

Using fpu_user_cfg is appropriate when setting XCOMP_BV in the
init_fpstate since it has space allocated for "max_features". But,
normal fpstates might only have space for default xfeatures. Since
XRSTOR* derives the format of the XSAVE buffer from XCOMP_BV, this can
lead to XRSTOR reading out of bounds.

So when copying actively used fpstate, simply read the XCOMP_BV features
bits directly out of the fpstate instead.

This correction courtesy of Dave Hansen <dave.hansen@linux.intel.com>

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211014230739.408879849@linutronix.de
2021-10-22 11:04:46 +02:00
Thomas Gleixner
1c253ff228 x86/fpu: Move xstate feature masks to fpu_*_cfg
Move the feature mask storage to the kernel and user config
structs. Default and maximum feature set are the same for now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211014230739.352041752@linutronix.de
2021-10-21 20:36:58 +02:00
Thomas Gleixner
578971f4e2 x86/fpu: Provide struct fpu_config
Provide a struct to store information about the maximum supported and the
default feature set and buffer sizes for both user and kernel space.

This allows quick retrieval of this information for the upcoming support
for dynamically enabled features.

 [ bp: Add vertical spacing between the struct members. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211014230739.126107370@linutronix.de
2021-10-21 19:17:58 +02:00
Marcos Del Sol Vives
639475d434 x86/CPU: Add support for Vortex CPUs
DM&P devices were not being properly identified, which resulted in
unneeded Spectre/Meltdown mitigations being applied.

The manufacturer states that these devices execute always in-order and
don't support either speculative execution or branch prediction, so
they are not vulnerable to this class of attack. [1]

This is something I've personally tested by a simple timing analysis
on my Vortex86MX CPU, and can confirm it is true.

Add identification for some devices that lack the CPUID product name
call, so they appear properly on /proc/cpuinfo.

¹https://www.ssv-embedded.de/doks/infos/DMP_Ann_180108_Meltdown.pdf

 [ bp: Massage commit message. ]

Signed-off-by: Marcos Del Sol Vives <marcos@orca.pet>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211017094408.1512158-1-marcos@orca.pet
2021-10-21 15:49:07 +02:00
Thomas Gleixner
49e4eb4125 x86/fpu/xstate: Use fpstate for copy_uabi_to_xstate()
Prepare for dynamically enabled states per task. The function needs to
retrieve the features and sizes which are valid in a fpstate
context. Retrieve them from fpstate.

Move the function declarations to the core header as they are not
required anywhere else.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145323.233529986@linutronix.de
2021-10-21 14:24:14 +02:00
Thomas Gleixner
248452ce21 x86/fpu: Add size and mask information to fpstate
Add state size and feature mask information to the fpstate container. This
will be used for runtime checks with the upcoming support for dynamically
enabled features and dynamically sized buffers. That avoids conditionals
all over the place as the required information is accessible for both
default and extended buffers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145322.921388806@linutronix.de
2021-10-21 13:51:42 +02:00
Thomas Gleixner
2dd8eedc80 x86/process: Move arch_thread_struct_whitelist() out of line
In preparation for dynamically enabled FPU features move the function
out of line as the goal is to expose less and not more information.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145322.869001791@linutronix.de
2021-10-21 09:33:41 +02:00
Steven Rostedt (VMware)
0c0593b45c x86/ftrace: Make function graph use ftrace directly
We don't need special hook for graph tracer entry point,
but instead we can use graph_ops::func function to install
the return_hooker.

This moves the graph tracing setup _before_ the direct
trampoline prepares the stack, so the return_hooker will
be called when the direct trampoline is finished.

This simplifies the code, because we don't need to take into
account the direct trampoline setup when preparing the graph
tracer hooker and we can allow function graph tracer on entries
registered with direct trampoline.

Link: https://lkml.kernel.org/r/20211008091336.33616-4-jolsa@kernel.org

[fixed compile error reported by kernel test robot <lkp@intel.com>]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-20 23:44:43 -04:00
Thomas Gleixner
2f27b50342 x86/fpu: Remove fpu::state
All users converted. Remove it along with the sanity checks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145322.765063318@linutronix.de
2021-10-20 23:58:29 +02:00
Thomas Gleixner
c20942ce51 x86/fpu/core: Convert to fpstate
Convert the rest of the core code to the new register storage mechanism in
preparation for dynamically sized buffers.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145322.659456185@linutronix.de
2021-10-20 23:54:26 +02:00
Thomas Gleixner
cceb496420 x86/fpu: Convert tracing to fpstate
Convert FPU tracing code to the new register storage mechanism in
preparation for dynamically sized buffers.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145322.503327333@linutronix.de
2021-10-20 22:35:04 +02:00
Thomas Gleixner
087df48c29 x86/fpu: Replace KVMs xstate component clearing
In order to prepare for the support of dynamically enabled FPU features,
move the clearing of xstate components to the FPU core code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211013145322.399567049@linutronix.de
2021-10-20 22:26:41 +02:00
Thomas Gleixner
18b3fa1ad1 x86/fpu: Convert restore_fpregs_from_fpstate() to struct fpstate
Convert restore_fpregs_from_fpstate() and related code to the new
register storage mechanism in preparation for dynamically sized buffers.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145322.347395546@linutronix.de
2021-10-20 22:26:38 +02:00
Thomas Gleixner
87d0e5be0f x86/fpu: Provide struct fpstate
New xfeatures will not longer be automatically stored in the regular XSAVE
buffer in thread_struct::fpu.

The kernel will provide the default sized buffer for storing the regular
features up to AVX512 in thread_struct::fpu and if a task requests to use
one of the new features then the register storage has to be extended.

The state will be accessed via a pointer in thread_struct::fpu which
defaults to the builtin storage and can be switched when extended storage
is required.

To avoid conditionals all over the code, create a new container for the
register storage which will gain other information, e.g. size, feature
masks etc., later. For now it just contains the register storage, which
gives it exactly the same layout as the exiting fpu::state.

Stick fpu::state and the new fpu::__fpstate into an anonymous union and
initialize the pointer. Add build time checks to validate that both are
at the same place and have the same size.

This allows step by step conversion of all users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211013145322.234458659@linutronix.de
2021-10-20 22:26:24 +02:00
Thomas Gleixner
bf5d004707 x86/fpu: Replace KVMs home brewed FPU copy to user
Similar to the copy from user function the FPU core has this already
implemented with all bells and whistles.

Get rid of the duplicated code and use the core functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211015011539.244101845@linutronix.de
2021-10-20 22:17:17 +02:00
Thomas Gleixner
079ec41b22 x86/fpu: Provide a proper function for ex_handler_fprestore()
To make upcoming changes for support of dynamically enabled features
simpler, provide a proper function for the exception handler which removes
exposure of FPU internals.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011540.053515012@linutronix.de
2021-10-20 15:27:29 +02:00
Thomas Gleixner
b56d2795b2 x86/fpu: Replace the includes of fpu/internal.h
Now that the file is empty, fixup all references with the proper includes
and delete the former kitchen sink.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011540.001197214@linutronix.de
2021-10-20 15:27:29 +02:00
Thomas Gleixner
6415bb8092 x86/fpu: Mop up the internal.h leftovers
Move the global interfaces to api.h and the rest into the core.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.948837194@linutronix.de
2021-10-20 15:27:29 +02:00
Thomas Gleixner
0ae67cc34f x86/fpu: Remove internal.h dependency from fpu/signal.h
In order to remove internal.h make signal.h independent of it.

Include asm/fpu/xstate.h to fix a missing update_regset_xstate_info()
prototype, which is
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.844565975@linutronix.de
2021-10-20 15:27:29 +02:00
Thomas Gleixner
90489f1dee x86/fpu: Move fpstate functions to api.h
Move function declarations which need to be globally available to api.h
where they belong.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.792363754@linutronix.de
2021-10-20 15:27:28 +02:00
Thomas Gleixner
d9d005f32a x86/fpu: Move mxcsr related code to core
No need to expose that to code which only needs the XCR0 accessors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.740012411@linutronix.de
2021-10-20 15:27:28 +02:00
Thomas Gleixner
9848fb9683 x86/fpu: Move fpregs_restore_userregs() to core
Only used internally in the FPU core code.

While at it, convert to the percpu accessors which verify preemption is
disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.686806639@linutronix.de
2021-10-20 15:27:28 +02:00
Thomas Gleixner
cdcb6fa14e x86/fpu: Make WARN_ON_FPU() private
No point in being in global headers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.628516182@linutronix.de
2021-10-20 15:27:28 +02:00
Thomas Gleixner
34002571cb x86/fpu: Move legacy ASM wrappers to core
Nothing outside the core code requires them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.572439164@linutronix.de
2021-10-20 15:27:28 +02:00
Thomas Gleixner
df95b0f1aa x86/fpu: Move os_xsave() and os_xrstor() to core
Nothing outside the core code needs these.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.513368075@linutronix.de
2021-10-20 15:27:28 +02:00
Thomas Gleixner
b579d0c375 x86/fpu: Make os_xrstor_booting() private
It's only required in the xstate init code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.455836597@linutronix.de
2021-10-20 15:27:28 +02:00
Thomas Gleixner
d06241f52c x86/fpu: Clean up CPU feature tests
Further disintegration of internal.h:

Move the CPU feature tests to a core header and remove the unused one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.401510559@linutronix.de
2021-10-20 15:27:27 +02:00
Thomas Gleixner
63e81807c1 x86/fpu: Move context switch and exit to user inlines into sched.h
internal.h is a kitchen sink which needs to get out of the way to prepare
for the upcoming changes.

Move the context switch and exit to user inlines into a separate header,
which is all that code needs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.349132461@linutronix.de
2021-10-20 15:27:27 +02:00
Thomas Gleixner
9603445549 x86/fpu: Mark fpu__init_prepare_fx_sw_frame() as __init
No need to keep it around.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.296435736@linutronix.de
2021-10-20 15:27:27 +02:00
Thomas Gleixner
ea4d6938d4 x86/fpu: Replace KVMs home brewed FPU copy from user
Copying a user space buffer to the memory buffer is already available in
the FPU core. The copy mechanism in KVM lacks sanity checks and needs to
use cpuid() to lookup the offset of each component, while the FPU core has
this information cached.

Make the FPU core variant accessible for KVM and replace the home brewed
mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211015011539.134065207@linutronix.de
2021-10-20 15:27:27 +02:00
Thomas Gleixner
a0ff0611c2 x86/fpu: Move KVMs FPU swapping to FPU core
Swapping the host/guest FPU is directly fiddling with FPU internals which
requires 5 exports. The upcoming support of dynamically enabled states
would even need more.

Implement a swap function in the FPU core code and export that instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211015011539.076072399@linutronix.de
2021-10-20 15:27:27 +02:00
Thomas Gleixner
126fe04018 x86/fpu: Cleanup xstate xcomp_bv initialization
No point in having this duplicated all over the place with needlessly
different defines.

Provide a proper initialization function which initializes user buffers
properly and make KVM use it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011538.897664678@linutronix.de
2021-10-20 15:27:26 +02:00
Thomas Gleixner
b50854eca0 x86/pkru: Remove useless include
PKRU code does not need anything from FPU headers. Include cpufeature.h
instead and fixup the resulting fallout in perf.

This is a preparation for FPU changes in order to prevent recursive include
hell.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011538.551522694@linutronix.de
2021-10-20 15:27:25 +02:00
Thomas Gleixner
9568bfb4f0 x86/fpu: Remove pointless argument from switch_fpu_finish()
Unused since the FPU switching rework.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011538.433135710@linutronix.de
2021-10-20 15:27:25 +02:00
Oliver Upton
828ca89628 KVM: x86: Expose TSC offset controls to userspace
To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.

A much cleaner approach to configuring the guest's views of the TSC is to
simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus not subject to change depending on when the VMM actually
reads/writes values from/to KVM. The VMM can then read the TSC once with
KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
the guest is paused.

Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-8-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 14:43:45 -04:00
Paolo Bonzini
869b44211a kvm: x86: protect masterclock with a seqcount
Protect the reference point for kvmclock with a seqcount, so that
kvmclock updates for all vCPUs can proceed in parallel.  Xen runstate
updates will also run in parallel and not bounce the kvmclock cacheline.

Of the variables that were protected by pvclock_gtod_sync_lock,
nr_vcpus_matched_tsc is different because it is updated outside
pvclock_update_vm_gtod_copy and read inside it.  Therefore, we
need to keep it protected by a spinlock.  In fact it must now
be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
write-side of a seqcount, is non-preemptible.  Since we already
have tsc_write_lock which is a raw spinlock, we can just use
tsc_write_lock as the lock that protects the write-side of the
seqcount.

Co-developed-by: Oliver Upton <oupton@google.com>
Message-Id: <20210916181538.968978-6-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 14:43:44 -04:00
Oliver Upton
c68dc1b577 KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.

Provide userspace with a host TSC & realtime pair iff the realtime clock
is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
realtime value, advance the KVM clock by the amount of elapsed time. Do
not step the KVM clock backwards, though, as it is a monotonic
oscillator.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-5-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 14:43:44 -04:00
Fenghua Yu
00ecd54013 iommu/vt-d: Clean up unused PASID updating functions
update_pasid() and its call chain are currently unused in the tree because
Thomas disabled the ENQCMD feature. The feature will be re-enabled shortly
using a different approach and update_pasid() and its call chain will not
be used in the new approach.

Remove the useless functions.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210920192349.2602141-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211014053839.727419-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18 12:31:48 +02:00
Ingo Molnar
082f20b21d Merge branch 'x86/urgent' into x86/fpu, to resolve a conflict
Resolve the conflict between these commits:

   x86/fpu:      1193f408cd ("x86/fpu/signal: Change return type of __fpu_restore_sig() to boolean")

   x86/urgent:   d298b03506 ("x86/fpu: Restore the masking out of reserved MXCSR bits")
                 b2381acd3f ("x86/fpu: Mask out the invalid MXCSR bits properly")

 Conflicts:
        arch/x86/kernel/fpu/signal.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-10-16 15:17:46 +02:00
Tim Chen
66558b730f sched: Add cluster scheduler level for x86
There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is
shared among a cluster of cores instead of being exclusive to one
single core.

To prevent oversubscription of L2 cache, load should be balanced
between such L2 clusters, especially for tasks with no shared data.
On benchmark such as SPECrate mcf test, this change provides a boost
to performance especially on medium load system on Jacobsville.  on a
Jacobsville that has 24 Atom cores, arranged into 6 clusters of 4
cores each, the benchmark number is as follow:

 Improvement over baseline kernel for mcf_r
 copies		run time	base rate
 1		-0.1%		-0.2%
 6		25.1%		25.1%
 12		18.8%		19.0%
 24		0.3%		0.3%

So this looks pretty good. In terms of the system's task distribution,
some pretty bad clumping can be seen for the vanilla kernel without
the L2 cluster domain for the 6 and 12 copies case. With the extra
domain for cluster, the load does get evened out between the clusters.

Note this patch isn't an universal win as spreading isn't necessarily
a win, particually for those workload who can benefit from packing.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210924085104.44806-4-21cnbao@gmail.com
2021-10-15 11:25:16 +02:00
Kees Cook
42a20f86dc sched: Add wrapper for get_wchan() to keep task blocked
Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
2021-10-15 11:25:14 +02:00
Linus Torvalds
c22ccc4a3e - A FPU fix to properly handle invalid MXCSR values: 32-bit masks them
out due to histerical reasons and 64-bit kernels reject them
 
 - A fix to clear X86_FEATURE_SMAP when support for is not config-enabled
 
 - Three fixes correcting misspelled Kconfig symbols used in code
 
 - Two resctrl object cleanup fixes
 
 - Yet another attempt at fixing the neverending saga of botched x86
 timers, this time because some incredibly smart hardware decides to turn
 off the HPET timer in a low power state - who cares if the OS is relying
 on it...
 
 - Check the full return value range of an SEV VMGEXIT call to determine
 whether it returned an error
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmFiqboACgkQEsHwGGHe
 VUpIFA//cLWfa1vvamCcLjW0lruQVzHrZesO4Cbti3Fyp2at/Dtwt9w/uZPu9NAa
 +sreJBdrkZfo9lmKW6/E1MmLT/YlLg8YHsylKn9d+XSdcy0qWXLYdVVm7bb4teJf
 XxRQfYNQrwfpjFNnt+7NUcaqte2zUo7K16CctJF5+E6SGUn+hlu6zK15tf6MMAM1
 TFHsQWEuRW5Mgc7eD734cNGDOJgzvb4IACn5BNfKR1+jD1ANfutytXjGqcveJ/sg
 lBoWMCU47vo5/uoW516oBK6PfQ/+s1OvYAx2G4DMQSC7WpEWpxnJUoszj9umu+jE
 VndS8jQ4WGXcVmfkkwUHbVxcJzsPEzZ/5m+nER9hrGOykKWTajzi2MirBHju5EKv
 xfYLqEJHNG9YulxKy2wIW0VRmXDE3wFZfaPAmQbLXud1KfzlC/EpEaloZSJSgqyG
 L4uOKk8CBumYJzgVCfTFAqqr1HhmeylYSxHmOUEzTm0sEJX2HuodGcl+sPI/LDPW
 MkjVYXq2sOUEVLmk5xyJIkbAUcK2X/Fzt3rKS4CVsjfzWRW67o1oopMy6ZrQ0o/h
 Dt/fHub/+Pke5sbB2+RiRsvq3aDftRkvaZK05pTiqlE9gFlKaCVwxDQqvmTnY0oa
 PkPzauXRC4qjNsdDMGHaiclm/fk/nlLM9vxXGJ+oTXP6snC4OhQ=
 =kKOw
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - A FPU fix to properly handle invalid MXCSR values: 32-bit masks them
   out due to historical reasons and 64-bit kernels reject them

 - A fix to clear X86_FEATURE_SMAP when support for is not
   config-enabled

 - Three fixes correcting misspelled Kconfig symbols used in code

 - Two resctrl object cleanup fixes

 - Yet another attempt at fixing the neverending saga of botched x86
   timers, this time because some incredibly smart hardware decides to
   turn off the HPET timer in a low power state - who cares if the OS is
   relying on it...

 - Check the full return value range of an SEV VMGEXIT call to determine
   whether it returned an error

* tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Restore the masking out of reserved MXCSR bits
  x86/Kconfig: Correct reference to MWINCHIP3D
  x86/platform/olpc: Correct ifdef symbol to intended CONFIG_OLPC_XO15_SCI
  x86/entry: Clear X86_FEATURE_SMAP when CONFIG_X86_SMAP=n
  x86/entry: Correct reference to intended CONFIG_64_BIT
  x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu()
  x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails
  x86/hpet: Use another crystalball to evaluate HPET usability
  x86/sev: Return an error on a returned non-zero SW_EXITINFO1[31:0]
2021-10-10 10:00:51 -07:00
Linus Torvalds
3946b46cab xen: branch for v5.15-rc5
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYWBSIwAKCRCAXGG7T9hj
 vrXxAP9na1EqRJ+SpWyvxHY1jMaIrbg1bgnOc+GsnWxU5liW5AEA4h1HjHtVtrzL
 3vweIS6u2fanrWlYML/daQ3r6EuLPQc=
 =iXsP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - fix two minor issues in the Xen privcmd driver plus a cleanup patch
   for that driver

 - fix multiple issues related to running as PVH guest and some related
   earlyprintk fixes for other Xen guest types

 - fix an issue introduced in 5.15 the Xen balloon driver

* tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: fix cancelled balloon action
  xen/x86: adjust data placement
  x86/PVH: adjust function/data placement
  xen/x86: hook up xen_banner() also for PVH
  xen/x86: generalize preferred console model from PV to PVH Dom0
  xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU
  xen/x86: allow "earlyprintk=xen" to work for PV Dom0
  xen/x86: make "earlyprintk=xen" work better for PVH Dom0
  xen/x86: allow PVH Dom0 without XEN_PV=y
  xen/x86: prevent PVH type from getting clobbered
  xen/privcmd: drop "pages" parameter from xen_remap_pfn()
  xen/privcmd: fix error handling in mmap-resource processing
  xen/privcmd: replace kcalloc() by kvcalloc() when allocating empty pages
2021-10-08 12:55:23 -07:00
Peter Zijlstra
b08cadbd3b Merge branch 'objtool/urgent'
Fixup conflicts.

# Conflicts:
#	tools/objtool/check.c
2021-10-07 00:40:17 +02:00
Mukul Joshi
f38ce910d8 x86/MCE/AMD: Export smca_get_bank_type symbol
Export smca_get_bank_type for use in the AMD GPU
driver to determine MCA bank while handling correctable
and uncorrectable errors in GPU UMC.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-06 15:53:24 -04:00
Borislav Petkov
541ac97186 x86/sev: Make the #VC exception stacks part of the default stacks storage
The size of the exception stacks was increased by the commit in Fixes,
resulting in stack sizes greater than a page in size. The #VC exception
handling was only mapping the first (bottom) page, resulting in an
SEV-ES guest failing to boot.

Make the #VC exception stacks part of the default exception stacks
storage and allocate them with a CONFIG_AMD_MEM_ENCRYPT=y .config. Map
them only when a SEV-ES guest has been detected.

Rip out the custom VC stacks mapping and storage code.

 [ bp: Steal and adapt Tom's commit message. ]

Fixes: 7fae4c24a2 ("x86: Increase exception stack sizes")
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Brijesh Singh <brijesh.singh@amd.com>
Link: https://lkml.kernel.org/r/YVt1IMjIs7pIZTRR@zn.tnic
2021-10-06 21:48:27 +02:00
Lukas Bulwahn
2c861f2b85 x86/entry: Correct reference to intended CONFIG_64_BIT
Commit in Fixes adds a condition with IS_ENABLED(CONFIG_64_BIT),
but the intended config item is called CONFIG_64BIT, as defined in
arch/x86/Kconfig.

Fortunately, scripts/checkkconfigsymbols.py warns:

64_BIT
Referencing files: arch/x86/include/asm/entry-common.h

Correct the reference to the intended config symbol.

Fixes: 662a022189 ("x86/entry: Fix AC assertion")
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20210803113531.30720-2-lukas.bulwahn@gmail.com
2021-10-06 18:46:02 +02:00
Lukas Bulwahn
6bf8a55d83 x86: Fix misspelled Kconfig symbols
Fix misspelled Kconfig symbols as detected by
scripts/checkkconfigsymbols.py.

 [ bp: Combine into a single patch. ]

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210803113531.30720-7-lukas.bulwahn@gmail.com
2021-10-05 21:48:30 +02:00
Jan Beulich
cae7d81a37 xen/x86: allow PVH Dom0 without XEN_PV=y
Decouple XEN_DOM0 from XEN_PV, converting some existing uses of XEN_DOM0
to a new XEN_PV_DOM0. (I'm not convinced all are really / should really
be PV-specific, but for starters I've tried to be conservative.)

For PVH Dom0 the hypervisor populates MADT with only x2APIC entries, so
without x2APIC support enabled in the kernel things aren't going to work
very well. (As opposed, DomU-s would only ever see LAPIC entries in MADT
as of now.) Note that this then requires PVH Dom0 to be 64-bit, as
X86_X2APIC depends on X86_64.

In the course of this xen_running_on_version_or_later() needs to be
available more broadly. Move it from a PV-specific to a generic file,
considering that what it does isn't really PV-specific at all anyway.

Note that xen/interface/version.h cannot be included on its own; in
enlighten.c, which uses SCHEDOP_* anyway, include xen/interface/sched.h
first to resolve the apparently sole missing type (xen_ulong_t).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

Link: https://lore.kernel.org/r/983bb72f-53df-b6af-14bd-5e088bd06a08@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-05 08:35:56 +02:00
Borislav Petkov
c7419a6e1a Merge branch x86/cc into x86/core
Pick up dependent cc_platform_has() changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
2021-10-04 17:37:22 +02:00
Tom Lendacky
e9d1d2bb75 treewide: Replace the use of mem_encrypt_active() with cc_platform_has()
Replace uses of mem_encrypt_active() with calls to cc_platform_has() with
the CC_ATTR_MEM_ENCRYPT attribute.

Remove the implementation of mem_encrypt_active() across all arches.

For s390, since the default implementation of the cc_platform_has()
matches the s390 implementation of mem_encrypt_active(), cc_platform_has()
does not need to be implemented in s390 (the config option
ARCH_HAS_CC_PLATFORM is not set).

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-9-bp@alien8.de
2021-10-04 11:47:24 +02:00
Tom Lendacky
6283f2effb x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
Replace uses of sev_es_active() with the more generic cc_platform_has()
using CC_ATTR_GUEST_STATE_ENCRYPT. If future support is added for other
memory encyrption techonologies, the use of CC_ATTR_GUEST_STATE_ENCRYPT
can be updated, as required.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-8-bp@alien8.de
2021-10-04 11:47:09 +02:00
Tom Lendacky
4d96f91091 x86/sev: Replace occurrences of sev_active() with cc_platform_has()
Replace uses of sev_active() with the more generic cc_platform_has()
using CC_ATTR_GUEST_MEM_ENCRYPT. If future support is added for other
memory encryption technologies, the use of CC_ATTR_GUEST_MEM_ENCRYPT
can be updated, as required.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-7-bp@alien8.de
2021-10-04 11:46:58 +02:00
Tom Lendacky
32cb4d02fb x86/sme: Replace occurrences of sme_active() with cc_platform_has()
Replace uses of sme_active() with the more generic cc_platform_has()
using CC_ATTR_HOST_MEM_ENCRYPT. If future support is added for other
memory encryption technologies, the use of CC_ATTR_HOST_MEM_ENCRYPT
can be updated, as required.

This also replaces two usages of sev_active() that are really geared
towards detecting if SME is active.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-6-bp@alien8.de
2021-10-04 11:46:46 +02:00
Tom Lendacky
aa5a461171 x86/sev: Add an x86 version of cc_platform_has()
Introduce an x86 version of the cc_platform_has() function. This will be
used to replace vendor specific calls like sme_active(), sev_active(),
etc.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-4-bp@alien8.de
2021-10-04 11:46:20 +02:00
Tom Lendacky
402fe0cb71 x86/ioremap: Selectively build arch override encryption functions
In preparation for other uses of the cc_platform_has() function
besides AMD's memory encryption support, selectively build the
AMD memory encryption architecture override functions only when
CONFIG_AMD_MEM_ENCRYPT=y. These functions are:

- early_memremap_pgprot_adjust()
- arch_memremap_can_ram_remap()

Additionally, routines that are only invoked by these architecture
override functions can also be conditionally built. These functions are:

- memremap_should_map_decrypted()
- memremap_is_efi_data()
- memremap_is_setup_data()
- early_memremap_is_setup_data()

And finally, phys_mem_access_encrypted() is conditionally built as well,
but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is
not set.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-2-bp@alien8.de
2021-10-04 11:45:49 +02:00
Linus Torvalds
b2626f1e32 Small x86 fixes.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmFXQUoUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMglgf/egh3zb9/+BUQWe0xWfhcINNzpsVk
 PJtiBmJc3nQLbZbTSLp63rouy1lNgR0s2DiMwP7G1u39OwW8W3LHMrBUSqF1F01+
 gntb4GGiRTiTPJI64K4z6ytORd3tuRarHq8TUIa2zvki9ZW5Obgkm1i1RsNMOo+s
 AOA7whhpS8e/a5fBbtbS9bTZb30PKTZmbW4oMjvO9Sw4Eb76IauqPSEtRPSuCAc7
 r7z62RTlm10Qk0JR3tW1iXMxTJHZk+tYPJ8pclUAWVX5bZqWa/9k8R0Z5i/miFiZ
 glW/y3R4+aUwIQV2v7V3Jx9MOKDhZxniMtnqZG/Hp9NVDtWIz37V/U37vw==
 =zQQ1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm fixes from Paolo Bonzini:
 "Small x86 fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: selftests: Ensure all migrations are performed when test is affined
  KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks
  ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvm
  x86/kvmclock: Move this_cpu_pvti into kvmclock.h
  selftests: KVM: Don't clobber XMM register when read
  KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue
2021-10-01 11:08:07 -07:00
David Stevens
deae4a10f1 KVM: x86: only allocate gfn_track when necessary
Avoid allocating the gfn_track arrays if nothing needs them. If there
are no external to KVM users of the API (i.e. no GVT-g), then page
tracking is only needed for shadow page tables. This means that when tdp
is enabled and there are no external users, then the gfn_track arrays
can be lazily allocated when the shadow MMU is actually used. This avoid
allocations equal to .05% of guest memory when nested virtualization is
not used, if the kernel is compiled without GVT-g.

Signed-off-by: David Stevens <stevensd@chromium.org>
Message-Id: <20210922045859.2011227-3-stevensd@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:58 -04:00