Commit Graph

3144 Commits

Author SHA1 Message Date
Linus Torvalds
63eb28bb14 ARM:
- Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.
 
 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.
 
 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.
 
 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.
 
 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.
 
 - Convert more system register sanitization to the config-driven
   implementation.
 
 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.
 
 - Various cleanups and minor fixes.
 
 LoongArch:
 
 - Add stat information for in-kernel irqchip
 
 - Add tracepoints for CPUCFG and CSR emulation exits
 
 - Enhance in-kernel irqchip emulation
 
 - Various cleanups.
 
 RISC-V:
 
 - Enable ring-based dirty memory tracking
 
 - Improve perf kvm stat to report interrupt events
 
 - Delegate illegal instruction trap to VS-mode
 
 - MMU improvements related to upcoming nested virtualization
 
 s390x
 
 - Fixes
 
 x86:
 
 - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC,
   PIC, and PIT emulation at compile time.
 
 - Share device posted IRQ code between SVM and VMX and
   harden it against bugs and runtime errors.
 
 - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
   instead of O(n).
 
 - For MMIO stale data mitigation, track whether or not a vCPU has access to
   (host) MMIO based on whether the page tables have MMIO pfns mapped; using
   VFIO is prone to false negatives
 
 - Rework the MSR interception code so that the SVM and VMX APIs are more or
   less identical.
 
 - Recalculate all MSR intercepts from scratch on MSR filter changes,
   instead of maintaining shadow bitmaps.
 
 - Advertise support for LKGS (Load Kernel GS base), a new instruction
   that's loosely related to FRED, but is supported and enumerated
   independently.
 
 - Fix a user-triggerable WARN that syzkaller found by setting the vCPU
   in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU
   into VMX Root Mode (post-VMXON).  Trying to detect every possible path
   leading to architecturally forbidden states is hard and even risks
   breaking userspace (if it goes from valid to valid state but passes
   through invalid states), so just wait until KVM_RUN to detect that
   the vCPU state isn't allowed.
 
 - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
   APERF/MPERF reads, so that a "properly" configured VM can access
   APERF/MPERF.  This has many caveats (APERF/MPERF cannot be zeroed
   on vCPU creation or saved/restored on suspend and resume, or preserved
   over thread migration let alone VM migration) but can be useful whenever
   you're interested in letting Linux guests see the effective physical CPU
   frequency in /proc/cpuinfo.
 
 - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
   created, as there's no known use case for changing the default
   frequency for other VM types and it goes counter to the very reason
   why the ioctl was added to the vm file descriptor.  And also, there
   would be no way to make it work for confidential VMs with a "secure"
   TSC, so kill two birds with one stone.
 
 - Dynamically allocation the shadow MMU's hashed page list, and defer
   allocating the hashed list until it's actually needed (the TDP MMU
   doesn't use the list).
 
 - Extract many of KVM's helpers for accessing architectural local APIC
   state to common x86 so that they can be shared by guest-side code for
   Secure AVIC.
 
 - Various cleanups and fixes.
 
 x86 (Intel):
 
 - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
   Failure to honor FREEZE_IN_SMM can leak host state into guests.
 
 - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent
   L1 from running L2 with features that KVM doesn't support, e.g. BTF.
 
 x86 (AMD):
 
 - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
   nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty
   much a static condition and therefore should never happen, but still).
 
 - Fix a variety of flaws and bugs in the AVIC device posted IRQ code.
 
 - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
   supports) instead of rejecting vCPU creation.
 
 - Extend enable_ipiv module param support to SVM, by simply leaving
   IsRunning clear in the vCPU's physical ID table entry.
 
 - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
   erratum #1235, to allow (safely) enabling AVIC on such CPUs.
 
 - Request GA Log interrupts if and only if the target vCPU is blocking,
   i.e. only if KVM needs a notification in order to wake the vCPU.
 
 - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
   vCPU's CPUID model.
 
 - Accept any SNP policy that is accepted by the firmware with respect to
   SMT and single-socket restrictions.  An incompatible policy doesn't put
   the kernel at risk in any way, so there's no reason for KVM to care.
 
 - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
   use WBNOINVD instead of WBINVD when possible for SEV cache maintenance.
 
 - When reclaiming memory from an SEV guest, only do cache flushes on CPUs
   that have ever run a vCPU for the guest, i.e. don't flush the caches for
   CPUs that can't possibly have cache lines with dirty, encrypted data.
 
 Generic:
 
 - Rework irqbypass to track/match producers and consumers via an xarray
   instead of a linked list.  Using a linked list leads to O(n^2) insertion
   times, which is hugely problematic for use cases that create large
   numbers of VMs.  Such use cases typically don't actually use irqbypass,
   but eliminating the pointless registration is a future problem to
   solve as it likely requires new uAPI.
 
 - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
   to avoid making a simple concept unnecessarily difficult to understand.
 
 - Decouple device posted IRQs from VFIO device assignment, as binding a VM
   to a VFIO group is not a requirement for enabling device posted IRQs.
 
 - Clean up and document/comment the irqfd assignment code.
 
 - Disallow binding multiple irqfds to an eventfd with a priority waiter,
   i.e.  ensure an eventfd is bound to at most one irqfd through the entire
   host, and add a selftest to verify eventfd:irqfd bindings are globally
   unique.
 
 - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
   related to private <=> shared memory conversions.
 
 - Drop guest_memfd's .getattr() implementation as the VFS layer will call
   generic_fillattr() if inode_operations.getattr is NULL.
 
 - Fix issues with dirty ring harvesting where KVM doesn't bound the
   processing of entries in any way, which allows userspace to keep KVM
   in a tight loop indefinitely.
 
 - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking,
   now that KVM no longer uses assigned_device_count as a heuristic for
   either irqbypass usage or MDS mitigation.
 
 Selftests:
 
 - Fix a comment typo.
 
 - Verify KVM is loaded when getting any KVM module param so that attempting
   to run a selftest without kvm.ko loaded results in a SKIP message about
   KVM not being loaded/enabled (versus some random parameter not existing).
 
 - Skip tests that hit EACCES when attempting to access a file, and rpint
   a "Root required?" help message.  In most cases, the test just needs to
   be run with elevated permissions.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmiKXMgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMhMQf/QDhC/CP1aGXph2whuyeD2NMqPKiU
 9KdnDNST+ftPwjg9QxZ9mTaa8zeVz/wly6XlxD9OQHy+opM1wcys3k0GZAFFEEQm
 YrThgURdzEZ3nwJZgb+m0t4wjJQtpiFIBwAf7qq6z1VrqQBEmHXJ/8QxGuqO+BNC
 j5q/X+q6KZwehKI6lgFBrrOKWFaxqhnRAYfW6rGBxRXxzTJuna37fvDpodQnNceN
 zOiq+avfriUMArTXTqOteJNKU0229HjiPSnjILLnFQ+B3akBlwNG0jk7TMaAKR6q
 IZWG1EIS9q1BAkGXaw6DE1y6d/YwtXCR5qgAIkiGwaPt5yj9Oj6kRN2Ytw==
 =j2At
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Host driver for GICv5, the next generation interrupt controller for
     arm64, including support for interrupt routing, MSIs, interrupt
     translation and wired interrupts

   - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
     GICv5 hardware, leveraging the legacy VGIC interface

   - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
     userspace to disable support for SGIs w/o an active state on
     hardware that previously advertised it unconditionally

   - Map supporting endpoints with cacheable memory attributes on
     systems with FEAT_S2FWB and DIC where KVM no longer needs to
     perform cache maintenance on the address range

   - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
     guest hypervisor to inject external aborts into an L2 VM and take
     traps of masked external aborts to the hypervisor

   - Convert more system register sanitization to the config-driven
     implementation

   - Fixes to the visibility of EL2 registers, namely making VGICv3
     system registers accessible through the VGIC device instead of the
     ONE_REG vCPU ioctls

   - Various cleanups and minor fixes

  LoongArch:

   - Add stat information for in-kernel irqchip

   - Add tracepoints for CPUCFG and CSR emulation exits

   - Enhance in-kernel irqchip emulation

   - Various cleanups

  RISC-V:

   - Enable ring-based dirty memory tracking

   - Improve perf kvm stat to report interrupt events

   - Delegate illegal instruction trap to VS-mode

   - MMU improvements related to upcoming nested virtualization

  s390x

   - Fixes

  x86:

   - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
     APIC, PIC, and PIT emulation at compile time

   - Share device posted IRQ code between SVM and VMX and harden it
     against bugs and runtime errors

   - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
     O(1) instead of O(n)

   - For MMIO stale data mitigation, track whether or not a vCPU has
     access to (host) MMIO based on whether the page tables have MMIO
     pfns mapped; using VFIO is prone to false negatives

   - Rework the MSR interception code so that the SVM and VMX APIs are
     more or less identical

   - Recalculate all MSR intercepts from scratch on MSR filter changes,
     instead of maintaining shadow bitmaps

   - Advertise support for LKGS (Load Kernel GS base), a new instruction
     that's loosely related to FRED, but is supported and enumerated
     independently

   - Fix a user-triggerable WARN that syzkaller found by setting the
     vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
     the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
     possible path leading to architecturally forbidden states is hard
     and even risks breaking userspace (if it goes from valid to valid
     state but passes through invalid states), so just wait until
     KVM_RUN to detect that the vCPU state isn't allowed

   - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
     interception of APERF/MPERF reads, so that a "properly" configured
     VM can access APERF/MPERF. This has many caveats (APERF/MPERF
     cannot be zeroed on vCPU creation or saved/restored on suspend and
     resume, or preserved over thread migration let alone VM migration)
     but can be useful whenever you're interested in letting Linux
     guests see the effective physical CPU frequency in /proc/cpuinfo

   - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
     created, as there's no known use case for changing the default
     frequency for other VM types and it goes counter to the very reason
     why the ioctl was added to the vm file descriptor. And also, there
     would be no way to make it work for confidential VMs with a
     "secure" TSC, so kill two birds with one stone

   - Dynamically allocation the shadow MMU's hashed page list, and defer
     allocating the hashed list until it's actually needed (the TDP MMU
     doesn't use the list)

   - Extract many of KVM's helpers for accessing architectural local
     APIC state to common x86 so that they can be shared by guest-side
     code for Secure AVIC

   - Various cleanups and fixes

  x86 (Intel):

   - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
     Failure to honor FREEZE_IN_SMM can leak host state into guests

   - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
     prevent L1 from running L2 with features that KVM doesn't support,
     e.g. BTF

  x86 (AMD):

   - WARN and reject loading kvm-amd.ko instead of panicking the kernel
     if the nested SVM MSRPM offsets tracker can't handle an MSR (which
     is pretty much a static condition and therefore should never
     happen, but still)

   - Fix a variety of flaws and bugs in the AVIC device posted IRQ code

   - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
     supports) instead of rejecting vCPU creation

   - Extend enable_ipiv module param support to SVM, by simply leaving
     IsRunning clear in the vCPU's physical ID table entry

   - Disable IPI virtualization, via enable_ipiv, if the CPU is affected
     by erratum #1235, to allow (safely) enabling AVIC on such CPUs

   - Request GA Log interrupts if and only if the target vCPU is
     blocking, i.e. only if KVM needs a notification in order to wake
     the vCPU

   - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
     the vCPU's CPUID model

   - Accept any SNP policy that is accepted by the firmware with respect
     to SMT and single-socket restrictions. An incompatible policy
     doesn't put the kernel at risk in any way, so there's no reason for
     KVM to care

   - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
     use WBNOINVD instead of WBINVD when possible for SEV cache
     maintenance

   - When reclaiming memory from an SEV guest, only do cache flushes on
     CPUs that have ever run a vCPU for the guest, i.e. don't flush the
     caches for CPUs that can't possibly have cache lines with dirty,
     encrypted data

  Generic:

   - Rework irqbypass to track/match producers and consumers via an
     xarray instead of a linked list. Using a linked list leads to
     O(n^2) insertion times, which is hugely problematic for use cases
     that create large numbers of VMs. Such use cases typically don't
     actually use irqbypass, but eliminating the pointless registration
     is a future problem to solve as it likely requires new uAPI

   - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
     "void *", to avoid making a simple concept unnecessarily difficult
     to understand

   - Decouple device posted IRQs from VFIO device assignment, as binding
     a VM to a VFIO group is not a requirement for enabling device
     posted IRQs

   - Clean up and document/comment the irqfd assignment code

   - Disallow binding multiple irqfds to an eventfd with a priority
     waiter, i.e. ensure an eventfd is bound to at most one irqfd
     through the entire host, and add a selftest to verify eventfd:irqfd
     bindings are globally unique

   - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
     related to private <=> shared memory conversions

   - Drop guest_memfd's .getattr() implementation as the VFS layer will
     call generic_fillattr() if inode_operations.getattr is NULL

   - Fix issues with dirty ring harvesting where KVM doesn't bound the
     processing of entries in any way, which allows userspace to keep
     KVM in a tight loop indefinitely

   - Kill off kvm_arch_{start,end}_assignment() and x86's associated
     tracking, now that KVM no longer uses assigned_device_count as a
     heuristic for either irqbypass usage or MDS mitigation

  Selftests:

   - Fix a comment typo

   - Verify KVM is loaded when getting any KVM module param so that
     attempting to run a selftest without kvm.ko loaded results in a
     SKIP message about KVM not being loaded/enabled (versus some random
     parameter not existing)

   - Skip tests that hit EACCES when attempting to access a file, and
     print a "Root required?" help message. In most cases, the test just
     needs to be run with elevated permissions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
  Documentation: KVM: Use unordered list for pre-init VGIC registers
  RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
  RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
  RISC-V: perf/kvm: Add reporting of interrupt events
  RISC-V: KVM: Enable ring-based dirty memory tracking
  RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
  RISC-V: KVM: Delegate illegal instruction fault to VS mode
  RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
  RISC-V: KVM: Factor-out g-stage page table management
  RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
  RISC-V: KVM: Introduce struct kvm_gstage_mapping
  RISC-V: KVM: Factor-out MMU related declarations into separate headers
  RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
  RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
  RISC-V: KVM: Don't flush TLB when PTE is unchanged
  RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
  RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
  RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
  RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
  KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
  ...
2025-07-30 17:14:01 -07:00
Linus Torvalds
6fb44438a5 arm64 updates for 6.17:
Perf and PMU updates:
 
  - Add support for new (v3) Hisilicon SLLC and DDRC PMUs
 
  - Add support for Arm-NI PMU integrations that share interrupts between
    clock domains within a given instance
 
  - Allow SPE to be configured with a lower sample period than the
    minimum recommendation advertised by PMSIDR_EL1.Interval
 
  - Add suppport for Arm's "Branch Record Buffer Extension" (BRBE)
 
  - Adjust the perf watchdog period according to cpu frequency changes
 
  - Minor driver fixes and cleanups
 
 Hardware features:
 
  - Support for MTE store-only checking (FEAT_MTE_STORE_ONLY)
 
  - Support for reporting the non-address bits during a synchronous MTE
    tag check fault (FEAT_MTE_TAGGED_FAR)
 
  - Optimise the TLBI when folding/unfolding contiguous PTEs on hardware
    with FEAT_BBM (break-before-make) level 2 and no TLB conflict aborts
 
 Software features:
 
  - Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable()
    and using the text-poke API for late module relocations
 
  - Force VMAP_STACK always on and change arm64_efi_rt_init() to use
    arch_alloc_vmap_stack() in order to avoid KASAN false positives
 
 ACPI:
 
  - Improve SPCR handling and messaging on systems lacking an SPCR table
 
 Debug:
 
  - Simplify the debug exception entry path
 
  - Drop redundant DBG_MDSCR_* macros
 
 Kselftests:
 
  - Cleanups and improvements for SME, SVE and FPSIMD tests
 
 Miscellaneous:
 
  - Optimise loop to reduce redundant operations in contpte_ptep_get()
 
  - Remove ISB when resetting POR_EL0 during signal handling
 
  - Mark the kernel as tainted on SEA and SError panic
 
  - Remove redundant gcs_free() call
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmiDkgoACgkQa9axLQDI
 XvFucQ//bYugRP5/Sdlrq5eDKWBGi1HufYzwfDEBLc4S75Eu8mGL/tuThfu9yFn+
 qCowtt4U84HdWsZDTSVo6lym6v2vJUpGOMgXzepvJaFBRnqGv9X9NxH6RQO1LTnu
 Pm7rO+7I9tNpfuc7Zu9pHDggsJEw+WzVfmEF6WPSFlT9mUNv6NbSx4rbLQKU86Dm
 ouTqXaePEQZ5oiRXVasxyT0otGtiACD20WpgOtNjYGzsfUVwCf/C83V/2DLwwbhr
 9cW9lCtFxA/yFdQcA9ThRzWZ9Eo5LAHqjGIq00+zOjuzgDbBtcTT79gpChkhovIR
 FBIsWHd9j9i3nYxzf4V4eRKQnyqS3NQWv7g7uKFwNgARif1Zk0VJ77QIlAYk5xLI
 ENTRjLKz5WNGGnhdkeCvDlVyxX+OktgcVTp3vqRxAKCRahMMUqBrwxiM8RzVF37e
 yzkEQayL8F7uZqy9H7Sjn48UpHZux6frJ1bBQw1oEvR9QmAoAdqavPMSAYIOT3Zr
 ze4WIljq/cFr3kBPIFP5pK1e0qYMHXZpSKIm8MAv6y/7KmQuVbMjZthpuPbLSIw0
 Q7C0KalB8lToPIbO7qMni/he0dCN4K2+E1YHFTR+pzfcoLuW4rjSg7i8tqMLKMJ8
 H+SeGLyPtM5A6bdAPTTpqefcgUUe7064ENUqrGUpDEynGXA7boE=
 =5h1C
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "A quick summary: perf support for Branch Record Buffer Extensions
  (BRBE), typical PMU hardware updates, small additions to MTE for
  store-only tag checking and exposing non-address bits to signal
  handlers, HAVE_LIVEPATCH enabled on arm64, VMAP_STACK forced on.

  There is also a TLBI optimisation on hardware that does not require
  break-before-make when changing the user PTEs between contiguous and
  non-contiguous.

  More details:

  Perf and PMU updates:

   - Add support for new (v3) Hisilicon SLLC and DDRC PMUs

   - Add support for Arm-NI PMU integrations that share interrupts
     between clock domains within a given instance

   - Allow SPE to be configured with a lower sample period than the
     minimum recommendation advertised by PMSIDR_EL1.Interval

   - Add suppport for Arm's "Branch Record Buffer Extension" (BRBE)

   - Adjust the perf watchdog period according to cpu frequency changes

   - Minor driver fixes and cleanups

  Hardware features:

   - Support for MTE store-only checking (FEAT_MTE_STORE_ONLY)

   - Support for reporting the non-address bits during a synchronous MTE
     tag check fault (FEAT_MTE_TAGGED_FAR)

   - Optimise the TLBI when folding/unfolding contiguous PTEs on
     hardware with FEAT_BBM (break-before-make) level 2 and no TLB
     conflict aborts

  Software features:

   - Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable()
     and using the text-poke API for late module relocations

   - Force VMAP_STACK always on and change arm64_efi_rt_init() to use
     arch_alloc_vmap_stack() in order to avoid KASAN false positives

  ACPI:

   - Improve SPCR handling and messaging on systems lacking an SPCR
     table

  Debug:

   - Simplify the debug exception entry path

   - Drop redundant DBG_MDSCR_* macros

  Kselftests:

   - Cleanups and improvements for SME, SVE and FPSIMD tests

  Miscellaneous:

   - Optimise loop to reduce redundant operations in contpte_ptep_get()

   - Remove ISB when resetting POR_EL0 during signal handling

   - Mark the kernel as tainted on SEA and SError panic

   - Remove redundant gcs_free() call"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
  arm64/gcs: task_gcs_el0_enable() should use passed task
  arm64: Kconfig: Keep selects somewhat alphabetically ordered
  arm64: signal: Remove ISB when resetting POR_EL0
  kselftest/arm64: Handle attempts to disable SM on SME only systems
  kselftest/arm64: Fix SVE write data generation for SME only systems
  kselftest/arm64: Test SME on SME only systems in fp-ptrace
  kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
  kselftest/arm64: Allow sve-ptrace to run on SME only systems
  arm64/mm: Drop redundant addr increment in set_huge_pte_at()
  kselftest/arm4: Provide local defines for AT_HWCAP3
  arm64: Mark kernel as tainted on SAE and SError panic
  arm64/gcs: Don't call gcs_free() when releasing task_struct
  drivers/perf: hisi: Support PMUs with no interrupt
  drivers/perf: hisi: Relax the event number check of v2 PMUs
  drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
  drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
  drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
  drivers/perf: hisi: Simplify the probe process for each DDRC version
  perf/arm-ni: Support sharing IRQs within an NI instance
  perf/arm-ni: Consolidate CPU affinity handling
  ...
2025-07-29 20:21:54 -07:00
Paolo Bonzini
314b40b3b6 KVM/arm64 changes for 6.17, round #1
- Host driver for GICv5, the next generation interrupt controller for
    arm64, including support for interrupt routing, MSIs, interrupt
    translation and wired interrupts.
 
  - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
    GICv5 hardware, leveraging the legacy VGIC interface.
 
  - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
    userspace to disable support for SGIs w/o an active state on hardware
    that previously advertised it unconditionally.
 
  - Map supporting endpoints with cacheable memory attributes on systems
    with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
    maintenance on the address range.
 
  - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
    hypervisor to inject external aborts into an L2 VM and take traps of
    masked external aborts to the hypervisor.
 
  - Convert more system register sanitization to the config-driven
    implementation.
 
  - Fixes to the visibility of EL2 registers, namely making VGICv3 system
    registers accessible through the VGIC device instead of the ONE_REG
    vCPU ioctls.
 
  - Various cleanups and minor fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaIezbRccb2xpdmVyLnVw
 dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFr/eAQDY5NIG5cR6ZcAWnPQLmGWpz2ou
 pq4Jhn9E/mGR3n5L1AEAsJpfLLpOsmnLBdwfbjmW59gGsa8k3i5tjWEOJ6yzAwk=
 =r+sp
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 changes for 6.17, round #1

 - Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.

 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.

 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.

 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.

 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.

 - Convert more system register sanitization to the config-driven
   implementation.

 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.

 - Various cleanups and minor fixes.
2025-07-29 12:27:40 -04:00
Paolo Bonzini
f02b1bcc73 Merge tag 'kvm-x86-irqs-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM IRQ changes for 6.17

 - Rework irqbypass to track/match producers and consumers via an xarray
   instead of a linked list.  Using a linked list leads to O(n^2) insertion
   times, which is hugely problematic for use cases that create large numbers
   of VMs.  Such use cases typically don't actually use irqbypass, but
   eliminating the pointless registration is a future problem to solve as it
   likely requires new uAPI.

 - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
   to avoid making a simple concept unnecessarily difficult to understand.

 - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC,
   and PIT emulation at compile time.

 - Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c.

 - Fix a variety of flaws and bugs in the AVIC device posted IRQ code.

 - Inhibited AVIC if a vCPU's ID is too big (relative to what hardware
   supports) instead of rejecting vCPU creation.

 - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning
   clear in the vCPU's physical ID table entry.

 - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
   erratum #1235, to allow (safely) enabling AVIC on such CPUs.

 - Dedup x86's device posted IRQ code, as the vast majority of functionality
   can be shared verbatime between SVM and VMX.

 - Harden the device posted IRQ code against bugs and runtime errors.

 - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
   instead of O(n).

 - Generate GA Log interrupts if and only if the target vCPU is blocking, i.e.
   only if KVM needs a notification in order to wake the vCPU.

 - Decouple device posted IRQs from VFIO device assignment, as binding a VM to
   a VFIO group is not a requirement for enabling device posted IRQs.

 - Clean up and document/comment the irqfd assignment code.

 - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e.
   ensure an eventfd is bound to at most one irqfd through the entire host,
   and add a selftest to verify eventfd:irqfd bindings are globally unique.
2025-07-29 08:35:46 -04:00
Linus Torvalds
8e736a2eea hardening updates for v6.17-rc1
- Introduce and start using TRAILING_OVERLAP() helper for fixing
   embedded flex array instances (Gustavo A. R. Silva)
 
 - mux: Convert mux_control_ops to a flex array member in mux_chip
   (Thorsten Blum)
 
 - string: Group str_has_prefix() and strstarts() (Andy Shevchenko)
 
 - Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
   Kees Cook)
 
 - Refactor and rename stackleak feature to support Clang
 
 - Add KUnit test for seq_buf API
 
 - Fix KUnit fortify test under LTO
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaIfUkgAKCRA2KwveOeQk
 uypLAP92r6f47sWcOw/5B9aVffX6Bypsb7dqBJQpCNxI5U1xcAEAiCrZ98UJyOeQ
 JQgnXd4N67K4EsS2JDc+FutRn3Yi+A8=
 =+5Bq
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:

 - Introduce and start using TRAILING_OVERLAP() helper for fixing
   embedded flex array instances (Gustavo A. R. Silva)

 - mux: Convert mux_control_ops to a flex array member in mux_chip
   (Thorsten Blum)

 - string: Group str_has_prefix() and strstarts() (Andy Shevchenko)

 - Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
   Kees Cook)

 - Refactor and rename stackleak feature to support Clang

 - Add KUnit test for seq_buf API

 - Fix KUnit fortify test under LTO

* tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
  sched/task_stack: Add missing const qualifier to end_of_stack()
  kstack_erase: Support Clang stack depth tracking
  kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
  init.h: Disable sanitizer coverage for __init and __head
  kstack_erase: Disable kstack_erase for all of arm compressed boot code
  x86: Handle KCOV __init vs inline mismatches
  arm64: Handle KCOV __init vs inline mismatches
  s390: Handle KCOV __init vs inline mismatches
  arm: Handle KCOV __init vs inline mismatches
  mips: Handle KCOV __init vs inline mismatch
  powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section
  configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON
  configs/hardening: Enable CONFIG_KSTACK_ERASE
  stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS
  stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
  stackleak: Rename STACKLEAK to KSTACK_ERASE
  seq_buf: Introduce KUnit tests
  string: Group str_has_prefix() and strstarts()
  kunit/fortify: Add back "volatile" for sizeof() constants
  acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings
  ...
2025-07-28 17:16:12 -07:00
Oliver Upton
0d46e324c0 Merge branch 'kvm-arm64/vgic-v4-ctl' into kvmarm/next
* kvm-arm64/vgic-v4-ctl:
  : Userspace control of nASSGIcap, courtesy of Raghavendra Rao Ananta
  :
  : Allow userspace to decide if support for SGIs without an active state is
  : advertised to the guest, allowing VMs from GICv3-only hardware to be
  : migrated to to GICv4.1 capable machines.
  Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init
  KVM: arm64: selftests: Add test for nASSGIcap attribute
  KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap
  KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization
  KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling
  KVM: arm64: Disambiguate support for vSGIs v. vLPIs

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28 08:11:38 -07:00
Oliver Upton
a7f49a9bf4 Merge branch 'kvm-arm64/el2-reg-visibility' into kvmarm/next
* kvm-arm64/el2-reg-visibility:
  : Fixes to EL2 register visibility, courtesy of Marc Zyngier
  :
  :  - Expose EL2 VGICv3 registers via the VGIC attributes accessor, not the
  :    KVM_{GET,SET}_ONE_REG ioctls
  :
  :  - Condition visibility of FGT registers on the presence of FEAT_FGT in
  :    the VM
  KVM: arm64: selftest: vgic-v3: Add basic GICv3 sysreg userspace access test
  KVM: arm64: Enforce the sorting of the GICv3 system register table
  KVM: arm64: Clarify the check for reset callback in check_sysreg_table()
  KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2
  KVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
  KVM: arm64: selftests: get-reg-list: Add base EL2 registers
  KVM: arm64: selftests: get-reg-list: Simplify feature dependency
  KVM: arm64: Advertise FGT2 registers to userspace
  KVM: arm64: Condition FGT registers on feature availability
  KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
  KVM: arm64: Let GICv3 save/restore honor visibility attribute
  KVM: arm64: Define helper for ICH_VTR_EL2
  KVM: arm64: Define constant value for ICC_SRE_EL2
  KVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG
  KVM: arm64: Make RVBAR_EL2 accesses UNDEF

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28 08:06:38 -07:00
Oliver Upton
d9b9fa2c32 Merge branch 'kvm-arm64/config-masks' into kvmarm/next
* kvm-arm64/config-masks:
  : More config-driven mask computation, courtesy of Marc Zyngier
  :
  : Converts more system registers to the config-driven computation of RESx
  : masks based on the advertised feature set
  KVM: arm64: Tighten the definition of FEAT_PMUv3p9
  KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation
  KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation
  KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation
  arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28 08:03:08 -07:00
Oliver Upton
0a2c9d808a Merge branch 'kvm-arm64/misc' into kvmarm/next
* kvm-arm64/misc:
  : Miscellaneous fixes/cleanups for KVM/arm64
  :
  :  - Fixes for computing POE output permissions
  :
  :  - Return ENXIO for invalid VGIC device attribute
  :
  :  - String helper conversions
  arm64: kvm: trace_handle_exit: use string choices helper
  arm64: kvm: sys_regs: use string choices helper
  KVM: arm64: Follow specification when implementing WXN
  KVM: arm64: Remove the wi->{e0,}poe vs wr->{p,u}ov confusion
  KVM: arm64: vgic-its: Return -ENXIO to invalid KVM_DEV_ARM_VGIC_GRP_CTRL attrs

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:54:47 -07:00
Oliver Upton
1f315e99bd Merge branch 'kvm-arm64/gcie-legacy' into kvmarm/next
* kvm-arm64/gcie-legacy:
  : Support for GICv3 emulation on GICv5, courtesy of Sascha Bischoff
  :
  : FEAT_GCIE_LEGACY adds the necessary hardware for GICv5 systems to
  : support the legacy GICv3 for VMs, including a backwards-compatible VGIC
  : implementation that we all know and love.
  :
  : As a starting point for GICv5 enablement in KVM, enable + use the
  : GICv3-compatible feature when running VMs on GICv5 hardware.
  KVM: arm64: gic-v5: Probe for GICv5
  KVM: arm64: gic-v5: Support GICv3 compat
  arm64/sysreg: Add ICH_VCTLR_EL2
  irqchip/gic-v5: Populate struct gic_kvm_info
  irqchip/gic-v5: Skip deactivate for forwarded PPI interrupts

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:50:06 -07:00
Oliver Upton
3318e42b81 Merge branch 'kvm-arm64/doublefault2' into kvmarm/next
* kvm-arm64/doublefault2: (33 commits)
  : NV Support for FEAT_RAS + DoubleFault2
  :
  : Delegate the vSError context to the guest hypervisor when in a nested
  : state, including registers related to ESR propagation. Additionally,
  : catch up KVM's external abort infrastructure to the architecture,
  : implementing the effects of FEAT_DoubleFault2.
  :
  : This has some impact on non-nested guests, as SErrors deemed unmasked at
  : the time they're made pending are now immediately injected with an
  : emulated exception entry rather than using the VSE bit.
  KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised
  KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context
  KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state
  KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately
  KVM: arm64: selftests: Test ESR propagation for vSError injection
  KVM: arm64: Populate ESR_ELx.EC for emulated SError injection
  KVM: arm64: selftests: Catch up set_id_regs with the kernel
  KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list
  KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1
  KVM: arm64: selftests: Add basic SError injection test
  KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError
  KVM: arm64: Advertise support for FEAT_DoubleFault2
  KVM: arm64: Advertise support for FEAT_SCTLR2
  KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set
  KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA
  KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set
  KVM: arm64: Route SEAs to the SError vector when EASE is set
  KVM: arm64: nv: Ensure Address size faults affect correct ESR
  KVM: arm64: Factor out helper for selecting exception target EL
  KVM: arm64: Describe SCTLR2_ELx RESx masks
  ...

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:47:22 -07:00
Raghavendra Rao Ananta
c652887a92 KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap
KVM unconditionally advertises GICD_TYPER2.nASSGIcap (which internally
implies vSGIs) on GICv4.1 systems. Allow userspace to change whether a
VM supports the feature. Only allow changes prior to VGIC initialization
as at that point vPEs need to be allocated for the VM.

For convenience, bundle support for vLPIs and vSGIs behind this feature,
allowing userspace to control vPE allocation for VMs in environments
that may be constrained on vPE IDs.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:45:52 -07:00
Oliver Upton
f26e6af757 KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization
KVM allows userspace to write GICD_IIDR for backwards-compatibility with
older kernels, where new implementation revisions have new features.
Unfortunately this is allowed to happen at runtime, and ripping features
out from underneath a running guest is a terrible idea.

While we can't do anything about the ABI, prepare for more ID-like
registers by allowing access to GICD_IIDR prior to VGIC initialization.
Hoist initializaiton of the default value to kvm_vgic_create() and
discard the incorrect comment that assumed userspace could access the
register before initialization (until now).

Subsequent changes will allow the VMM to further provision the GIC
feature set, e.g. the presence of nASSGIcap.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:37:45 -07:00
Oliver Upton
ef364c5b43 KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling
Consolidate the duplicated handling of the VGICv3 maintenance IRQ
attribute as a regular GICv3 attribute, as it is neither a register nor
a common attribute. As this is now handled separately from the VGIC
registers, the locking is relaxed to only acquire the intended
config_lock.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:37:45 -07:00
Oliver Upton
82221a4e66 KVM: arm64: Disambiguate support for vSGIs v. vLPIs
vgic_supports_direct_msis() is a bit of a misnomer, as it returns true
if either vSGIs or vLPIs are supported. Pick it apart into a few
predicates and replace some open-coded checks for vSGIs, including an
opportunistic fix to always check if the CPUIF is capable of handling
vSGIs.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:37:45 -07:00
Marc Zyngier
8af3e8ab09 KVM: arm64: Enforce the sorting of the GICv3 system register table
In order to avoid further embarassing bugs, enforce that the GICv3
sysreg table is actually sorted, just like all the other tables.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250718111154.104029-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:36:58 -07:00
Marc Zyngier
f5e6ebf285 KVM: arm64: Clarify the check for reset callback in check_sysreg_table()
check_sysreg_table() has a wonky 'is_32" parameter, which is really
an indication that we should enforce the presence of a reset helper.

Clean this up by naming the variable accordingly and inverting the
condition. Contrary to popular belief, system instructions don't
have a reset value (duh!), and therefore do not need to be checked
for reset (they escaped the check through luck...).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250718111154.104029-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:36:57 -07:00
Marc Zyngier
0f3046c8f6 KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2
The sysreg tables are supposed to be sorted so that a binary search
can easily find them. However, ICH_HCR_EL2 is obviously at the wrong
spot.

Move it where it belongs.

Fixes: 9fe9663e47 ("KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250718111154.104029-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:36:57 -07:00
Catalin Marinas
5b1ae9de71 Merge branch 'for-next/feat_mte_store_only' into for-next/core
* for-next/feat_mte_store_only:
  : MTE feature to restrict tag checking to store only operations
  kselftest/arm64/mte: Add MTE_STORE_ONLY testcases
  kselftest/arm64/mte: Preparation for mte store only test
  kselftest/arm64/abi: Add MTE_STORE_ONLY feature hwcap test
  KVM: arm64: Expose MTE_STORE_ONLY feature to guest
  arm64/hwcaps: Add MTE_STORE_ONLY hwcaps
  arm64/kernel: Support store-only mte tag check
  prctl: Introduce PR_MTE_STORE_ONLY
  arm64/cpufeature: Add MTE_STORE_ONLY feature
2025-07-24 16:03:34 +01:00
Catalin Marinas
3ae8cef210 Merge branches 'for-next/livepatch', 'for-next/user-contig-bbml2', 'for-next/misc', 'for-next/acpi', 'for-next/debug-entry', 'for-next/feat_mte_tagged_far', 'for-next/kselftest', 'for-next/mdscr-cleanup' and 'for-next/vmap-stack', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (23 commits)
  drivers/perf: hisi: Support PMUs with no interrupt
  drivers/perf: hisi: Relax the event number check of v2 PMUs
  drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
  drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
  drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
  drivers/perf: hisi: Simplify the probe process for each DDRC version
  perf/arm-ni: Support sharing IRQs within an NI instance
  perf/arm-ni: Consolidate CPU affinity handling
  perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentation
  perf/cxlpmu: Remove unintended newline from IRQ name format string
  perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe()
  perf: arm_spe: Relax period restriction
  perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  KVM: arm64: nvhe: Disable branch generation in nVHE guests
  arm64: Handle BRBE booting requirements
  arm64/sysreg: Add BRBE registers and fields
  perf/arm: Add missing .suppress_bind_attrs
  perf/arm-cmn: Reduce stack usage during discovery
  perf: imx9_perf: make the read-only array mask static const
  perf/arm-cmn: Broaden module description for wider interconnect support
  ...

* for-next/livepatch:
  : Support for HAVE_LIVEPATCH on arm64
  arm64: Kconfig: Keep selects somewhat alphabetically ordered
  arm64: Implement HAVE_LIVEPATCH
  arm64: stacktrace: Implement arch_stack_walk_reliable()
  arm64: stacktrace: Check kretprobe_find_ret_addr() return value
  arm64/module: Use text-poke API for late relocations.

* for-next/user-contig-bbml2:
  : Optimise the TLBI when folding/unfolding contigous PTEs on hardware with BBML2 and no TLB conflict aborts
  arm64/mm: Elide tlbi in contpte_convert() under BBML2
  iommu/arm: Add BBM Level 2 smmu feature
  arm64: Add BBM Level 2 cpu feature
  arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type

* for-next/misc:
  : Miscellaneous arm64 patches
  arm64/gcs: task_gcs_el0_enable() should use passed task
  arm64: signal: Remove ISB when resetting POR_EL0
  arm64/mm: Drop redundant addr increment in set_huge_pte_at()
  arm64: Mark kernel as tainted on SAE and SError panic
  arm64/gcs: Don't call gcs_free() when releasing task_struct
  arm64: fix unnecessary rebuilding when CONFIG_DEBUG_EFI=y
  arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get
  arm64: pi: use 'targets' instead of extra-y in Makefile

* for-next/acpi:
  : Various ACPI arm64 changes
  ACPI: Suppress misleading SPCR console message when SPCR table is absent
  ACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabled

* for-next/debug-entry:
  : Simplify the debug exception entry path
  arm64: debug: remove debug exception registration infrastructure
  arm64: debug: split bkpt32 exception entry
  arm64: debug: split brk64 exception entry
  arm64: debug: split hardware watchpoint exception entry
  arm64: debug: split single stepping exception entry
  arm64: debug: refactor reinstall_suspended_bps()
  arm64: debug: split hardware breakpoint exception entry
  arm64: entry: Add entry and exit functions for debug exceptions
  arm64: debug: remove break/step handler registration infrastructure
  arm64: debug: call step handlers statically
  arm64: debug: call software breakpoint handlers statically
  arm64: refactor aarch32_break_handler()
  arm64: debug: clean up single_step_handler logic

* for-next/feat_mte_tagged_far:
  : Support for reporting the non-address bits during a synchronous MTE tag check fault
  kselftest/arm64/mte: Add mtefar tests on check_mmap_options
  kselftest/arm64/mte: Refactor check_mmap_option test
  kselftest/arm64/mte: Add verification for address tag in signal handler
  kselftest/arm64/mte: Add address tag related macro and function
  kselftest/arm64/mte: Check MTE_FAR feature is supported
  kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITS
  kselftest/arm64: Add MTE_FAR hwcap test
  KVM: arm64: Expose FEAT_MTE_TAGGED_FAR feature to guest
  arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported
  arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR feature

* for-next/kselftest:
  : Kselftest updates for arm64
  kselftest/arm64: Handle attempts to disable SM on SME only systems
  kselftest/arm64: Fix SVE write data generation for SME only systems
  kselftest/arm64: Test SME on SME only systems in fp-ptrace
  kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
  kselftest/arm64: Allow sve-ptrace to run on SME only systems
  kselftest/arm4: Provide local defines for AT_HWCAP3
  kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace
  kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace
  kselftest/arm64: Fix check for setting new VLs in sve-ptrace
  kselftest/arm64: Convert tpidr2 test to use kselftest.h

* for-next/mdscr-cleanup:
  : Drop redundant DBG_MDSCR_* macros
  KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t
  arm64/debug: Drop redundant DBG_MDSCR_* macros

* for-next/vmap-stack:
  : Force VMAP_STACK on arm64
  arm64: remove CONFIG_VMAP_STACK checks from entry code
  arm64: remove CONFIG_VMAP_STACK checks from SDEI stack handling
  arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logic
  arm64: remove CONFIG_VMAP_STACK conditionals from traps overflow stack
  arm64: remove CONFIG_VMAP_STACK conditionals from irq stack setup
  arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGN
  arm64: efi: Remove CONFIG_VMAP_STACK check
  arm64: Mandate VMAP_STACK
  arm64: efi: Fix KASAN false positive for EFI runtime stack
  arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth()
  arm64/gcs: Don't call gcs_free() during flush_gcs()
  arm64: Restrict pagetable teardown to avoid false warning
  docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
2025-07-24 16:01:22 +01:00
Kuninori Morimoto
f509de1b0f arm64: kvm: trace_handle_exit: use string choices helper
We can use string choices helper, let's use it.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/87o6ti5ksx.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-23 23:36:55 -07:00
Kuninori Morimoto
30a597e19f arm64: kvm: sys_regs: use string choices helper
We can use string choices helper, let's use it.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/87pldy5ktb.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-23 23:36:37 -07:00
Marc Zyngier
5152977340 KVM: arm64: Follow specification when implementing WXN
The R_QXXPC and R_NPBXC rules have some interesting (and pretty
sharp) corners when defining the behaviour of of WXN at S1:

- when S1 overlay is enabled, WXN applies to the overlay and
  will remove W

- when S1 overlay is disabled, WXN applies to the base permissions
  and will remove X.

Today, we lumb the two together in a way that doesn't really match
the rules, making things awkward to follow what is happening, in
particular when overlays are enabled.

Split these two rules over two distinct paths, which makes things
a lot easier to read and validate against the architecture rules.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250701151648.754785-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-23 23:34:50 -07:00
Marc Zyngier
a508d5afb7 KVM: arm64: Remove the wi->{e0,}poe vs wr->{p,u}ov confusion
Some of the POE computation is a bit confused. Specifically, there
is an element of confusion between what wi->{e0,}poe an wr->{p,u}ov
actually represent.

- wi->{e0,}poe is an *input* to the walk, and indicates whether
  POE is enabled at EL0 or EL{1,2}

- wr->{p,u}ov is a *result* of the walk, and indicates whether
  overlays are enabled. Crutially, it is possible to have POE
  enabled, and yet overlays disabled, while the converse isn't
  true

What this all means is that once the base permissions have been
established, checking for wi->{e0,}poe makes little sense, because
the truth about overlays resides in wr->{p,u}ov. So constructs
checking for (wi->poe && wr->pov) only add perplexity.

Refactor compute_s1_overlay_permissions() and the way it is
called according to the above principles. Take the opportunity
to avoid reading registers that are not strictly required.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250701151648.754785-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-23 23:34:50 -07:00
David Woodhouse
4530256f36 KVM: arm64: vgic-its: Return -ENXIO to invalid KVM_DEV_ARM_VGIC_GRP_CTRL attrs
A preliminary version of a hack to invoke unmap_all_vpes() from an ioctl
didn't work very well. We eventually determined this was because we were
invoking it on the wrong file descriptor, but not getting an error.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/bbbddd56135399baf699bc46ffb6e7f08d9f8c9f.camel@infradead.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-23 23:33:52 -07:00
Kees Cook
57fbad15c2 stackleak: Rename STACKLEAK to KSTACK_ERASE
In preparation for adding Clang sanitizer coverage stack depth tracking
that can support stack depth callbacks:

- Add the new top-level CONFIG_KSTACK_ERASE option which will be
  implemented either with the stackleak GCC plugin, or with the Clang
  stack depth callback support.
- Rename CONFIG_GCC_PLUGIN_STACKLEAK as needed to CONFIG_KSTACK_ERASE,
  but keep it for anything specific to the GCC plugin itself.
- Rename all exposed "STACKLEAK" names and files to "KSTACK_ERASE" (named
  for what it does rather than what it protects against), but leave as
  many of the internals alone as possible to avoid even more churn.

While here, also split "prev_lowest_stack" into CONFIG_KSTACK_ERASE_METRICS,
since that's the only place it is referenced from.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250717232519.2984886-1-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-21 21:35:01 -07:00
Marc Zyngier
d9c5c23201 KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised
We currently always expose FEAT_RAS when available on the host.

As we are about to make this feature selectable from userspace,
check for it being present before emulating register accesses
as RAZ/WI, and inject an UNDEF otherwise.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250721101955.535159-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-21 09:35:57 -07:00
Marc Zyngier
303084ad12 KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context
Most HCR_EL2 bits are not supposed to affect EL2 at all, but only
the guest. However, we gladly merge these bits with the host's
HCR_EL2 configuration, irrespective of entering L1 or L2.

This leads to some funky behaviour, such as L1 trying to inject
a virtual SError for L2, and getting a taste of its own medecine.
Not quite what the architecture anticipated.

In the end, the only bits that matter are those we have defined as
invariants, either because we've made them RESx (E2H, HCD...), or
that we actively refuse to merge because the mess with KVM's own
logic.

Use the sanitisation infrastructure to get the RES1 bits, and let
things rip in a safer way.

Fixes: 04ab519bb8 ("KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250721101955.535159-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-21 09:35:57 -07:00
Marc Zyngier
c6e35dff58 KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state
Mark Brown reports that since we commit to making exceptions
visible without the vcpu being loaded, the external abort selftest
fails.

Upon investigation, it turns out that the code that makes registers
affected by an exception visible to the guest is completely broken
on VHE, as we don't check whether the system registers are loaded
on the CPU at this point. We managed to get away with this so far,
but that's obviously as bad as it gets,

Add the required checksm and document the absolute need to check
for the SYSREGS_ON_CPU flag before calling into any of the
__vcpu_write_sys_reg_to_cpu()__vcpu_read_sys_reg_from_cpu() helpers.

Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/18535df8-e647-4643-af9a-bb780af03a70@sirena.org.uk
Link: https://lore.kernel.org/r/20250720102229.179114-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-21 09:34:57 -07:00
Marc Zyngier
3096d238ec KVM: arm64: Tighten the definition of FEAT_PMUv3p9
The current definition of FEAT_PMUv3p9 doesn't check for the lack
of an IMPDEF PMU, which is encoded as 0b1111, but considered unsigned.

Use the recently introduced helper to address the issue (which is
harmless, as KVM never advertises an IMPDEF PMU).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:42 -07:00
Marc Zyngier
cd64587f10 KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation
As for other registers, convert the determination of the RES0 bits
affecting MDCR_EL2 to be driven by a table extracted from the 2025-06
JSON drop

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:42 -07:00
Marc Zyngier
6bd4a274b0 KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation
As for other registers, convert the determination of the RES0 bits
affecting SCTLR_EL1 to be driven by a table extracted from the 2025-06
JSON drop

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:42 -07:00
Marc Zyngier
001e032c0f KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation
As for other registers, convert the determination of the RES0 bits
affecting TCR2_EL2 to be driven by a table extracted from the 2025-06
JSON drop.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:42 -07:00
Marc Zyngier
a0aae0a9a7 KVM: arm64: Advertise FGT2 registers to userspace
While a guest is able to use the FEAT_FGT2 registers, we're missing
them being exposed to userspace. Add them to the (very long) list.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-9-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:24:29 -07:00
Marc Zyngier
72c62700b2 KVM: arm64: Condition FGT registers on feature availability
We shouldn't expose the FEAT_FGT registers unconditionally. Make
them dependent on FEAT_FGT being actually advertised to the guest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-8-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:24:29 -07:00
Marc Zyngier
9fe9663e47 KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
Expose all the GICv3 EL2 registers through the usual GICv3 save/restore
interface, making it possible for a VMM to access the EL2 state.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-7-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:24:29 -07:00
Marc Zyngier
1d14c97145 KVM: arm64: Let GICv3 save/restore honor visibility attribute
The GICv3 save/restore code never needed any visibility attribute,
but that's about to change. Make vgic_v3_has_cpu_sysregs_attr()
check the visibility in case a register is hidden.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:24:29 -07:00
Marc Zyngier
ce7a1cff2e KVM: arm64: Define helper for ICH_VTR_EL2
Move the computation of the ICH_VTR_EL2 value to a common location,
so that it can be reused by the save/restore code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:24:29 -07:00
Marc Zyngier
c6ef468610 KVM: arm64: Define constant value for ICC_SRE_EL2
Move the bag of bits defining the value of ICC_SRE_EL2 to a common
spot so that it can be reused by the save/restore code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:24:29 -07:00
Marc Zyngier
c70a4027f5 KVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG
It appears that exposing the GICv3 EL2 registers through the usual
sysreg interface is not consistent with the way we expose the EL1
registers. The latter are exposed via the GICv3 device interface
instead, and there is no reason why the EL2 registers should get
a different treatement.

Hide the registers from userspace until the GICv3 code grows the
required infrastructure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:24:28 -07:00
Marc Zyngier
1095b32665 KVM: arm64: Make RVBAR_EL2 accesses UNDEF
We always expose a virtual CPU that has EL3 when NV is enabled,
irrespective of EL3 being actually implemented in HW.

Therefore, as per the architecture, RVBAR_EL2 must UNDEF, since
EL2 is not the highest implemented exception level. This is
consistent with RMR_EL2 also triggering an UNDEF.

Adjust the handling of RVBAR_EL2 accordingly.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:24:28 -07:00
Oliver Upton
efa1368ba9 KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately
syzkaller has found that it can trip a warning in KVM's exception
emulation infrastructure by repeatedly injecting exceptions into the
guest.

While it's unlikely that a reasonable VMM will do this, further
investigation of the issue reveals that KVM can potentially discard the
"pending" SEA state. While the handling of KVM_GET_VCPU_EVENTS presumes
that userspace-injected SEAs are realized immediately, in reality the
emulated exception entry is deferred until the next call to KVM_RUN.

Hack-a-fix the immediate issues by committing the pending exceptions to
the vCPU's architectural state immediately in KVM_SET_VCPU_EVENTS. This
is no different to the way KVM-injected exceptions are handled in
KVM_RUN where we potentially call __kvm_adjust_pc() before returning to
userspace.

Reported-by: syzbot+4e09b1432de3774b86ae@syzkaller.appspotmail.com
Reported-by: syzbot+1f6f096afda6f4f8f565@syzkaller.appspotmail.com
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:12:03 -07:00
Oliver Upton
f6e2262dfa KVM: arm64: Populate ESR_ELx.EC for emulated SError injection
The hardware vSError injection mechanism populates ESR_ELx.EC as part of
ESR propagation and the contents of VSESR_EL2 populate the ISS field. Of
course, this means our emulated injection needs to set up the EC
correctly for an SError too.

Fixes: ce66109cec ("KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set")
Link: https://lore.kernel.org/r/20250708230632.1954240-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-09 09:55:05 -07:00
Ben Horgan
2265c08ec3 KVM: arm64: Fix enforcement of upper bound on MDCR_EL2.HPMN
Previously, u64_replace_bits() was used to no effect as the return value
was ignored. Convert to u64p_replace_bits() so the value is updated in
place.

Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Fixes: efff9dd2fe ("KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMN")
Link: https://lore.kernel.org/r/20250709093808.920284-2-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-09 13:19:24 +01:00
Sascha Bischoff
ff2aa6495d KVM: arm64: gic-v5: Probe for GICv5
Add in a probe function for GICv5 which enables support for GICv3
guests on a GICv5 host, if FEAT_GCIE_LEGACY is supported by the
hardware.

Co-authored-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://lore.kernel.org/r/20250627100847.1022515-6-sascha.bischoff@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 14:41:06 -07:00
Sascha Bischoff
c017e49ed1 KVM: arm64: gic-v5: Support GICv3 compat
Add support for GICv3 compat mode (FEAT_GCIE_LEGACY) which allows a
GICv5 host to run GICv3-based VMs. This change enables the
VHE/nVHE/hVHE/protected modes, but does not support nested
virtualization.

A lazy-disable approach is taken for compat mode; it is enabled on the
vgic_v3_load path but not disabled on the vgic_v3_put path. A
non-GICv3 VM, i.e., one based on GICv5, is responsible for disabling
compat mode on the corresponding vgic_v5_load path. Currently, GICv5
is not supported, and hence compat mode is not disabled again once it
is enabled, and this function is intentionally omitted from the code.

Co-authored-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://lore.kernel.org/r/20250627100847.1022515-5-sascha.bischoff@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 14:41:06 -07:00
Oliver Upton
bfb7a30b19 KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError
KVM might have an emulated SError queued for the guest if userspace
returned an abort for MMIO. Better yet, it could actually be a
*synchronous* exception in disguise if SCTLR2_ELx.EASE is set.

Don't advance PC if KVM owes an emulated SError, just like the handling
of emulated SEA injection.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-24-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:36 -07:00
Oliver Upton
a3c4a00dbe KVM: arm64: Advertise support for FEAT_DoubleFault2
KVM's external abort injection now respects the exception routing
wreckage due to FEAT_DoubleFault2. Advertise the feature.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-23-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:36 -07:00
Oliver Upton
075c2dc736 KVM: arm64: Advertise support for FEAT_SCTLR2
Everything is in place to handle the additional state for SCTLR2_ELx,
which is all that FEAT_SCTLR2 implies.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-22-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:36 -07:00
Oliver Upton
59b6d08666 KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA
As the name might imply, when NMEA is set SErrors are non-maskable and
can be taken regardless of PSTATE.A. As is the recurring theme with
DoubleFault2, the effects on SError routing are entirely backwards to
this.

If at EL1, NMEA is *not* considered for SError routing when TMEA is set
and the exception is taken to EL2 when PSTATE.A is set.

Link: https://lore.kernel.org/r/20250708172532.1699409-20-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:35 -07:00