linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-30 13:03:01 +00:00

Author	SHA1	Message	Date
Paolo Bonzini	314b40b3b6	KVM/arm64 changes for 6.17, round #1 - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts. - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface. - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally. - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range. - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor. - Convert more system register sanitization to the config-driven implementation. - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls. - Various cleanups and minor fixes. -----BEGIN PGP SIGNATURE----- iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaIezbRccb2xpdmVyLnVw dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFr/eAQDY5NIG5cR6ZcAWnPQLmGWpz2ou pq4Jhn9E/mGR3n5L1AEAsJpfLLpOsmnLBdwfbjmW59gGsa8k3i5tjWEOJ6yzAwk= =r+sp -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #1 - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts. - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface. - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally. - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range. - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor. - Convert more system register sanitization to the config-driven implementation. - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls. - Various cleanups and minor fixes.	2025-07-29 12:27:40 -04:00
Paolo Bonzini	f02b1bcc73	Merge tag 'kvm-x86-irqs-6.17' of https://github.com/kvm-x86/linux into HEAD KVM IRQ changes for 6.17 - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI. - Track irqbypass's "token" as "struct eventfd_ctx " instead of a "void ", to avoid making a simple concept unnecessarily difficult to understand. - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time. - Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c. - Fix a variety of flaws and bugs in the AVIC device posted IRQ code. - Inhibited AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation. - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry. - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs. - Dedup x86's device posted IRQ code, as the vast majority of functionality can be shared verbatime between SVM and VMX. - Harden the device posted IRQ code against bugs and runtime errors. - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n). - Generate GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU. - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs. - Clean up and document/comment the irqfd assignment code. - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique.	2025-07-29 08:35:46 -04:00
Oliver Upton	3318e42b81	Merge branch 'kvm-arm64/doublefault2' into kvmarm/next * kvm-arm64/doublefault2: (33 commits) : NV Support for FEAT_RAS + DoubleFault2 : : Delegate the vSError context to the guest hypervisor when in a nested : state, including registers related to ESR propagation. Additionally, : catch up KVM's external abort infrastructure to the architecture, : implementing the effects of FEAT_DoubleFault2. : : This has some impact on non-nested guests, as SErrors deemed unmasked at : the time they're made pending are now immediately injected with an : emulated exception entry rather than using the VSE bit. KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately KVM: arm64: selftests: Test ESR propagation for vSError injection KVM: arm64: Populate ESR_ELx.EC for emulated SError injection KVM: arm64: selftests: Catch up set_id_regs with the kernel KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1 KVM: arm64: selftests: Add basic SError injection test KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError KVM: arm64: Advertise support for FEAT_DoubleFault2 KVM: arm64: Advertise support for FEAT_SCTLR2 KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set KVM: arm64: Route SEAs to the SError vector when EASE is set KVM: arm64: nv: Ensure Address size faults affect correct ESR KVM: arm64: Factor out helper for selecting exception target EL KVM: arm64: Describe SCTLR2_ELx RESx masks ... Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-26 08:47:22 -07:00
Oliver Upton	77ee70a073	KVM: arm64: nv: Honor SError exception routing / masking To date KVM has used HCR_EL2.VSE to track the state of a pending SError for the guest. With this bit set, hardware respects the EL1 exception routing / masking rules and injects the vSError when appropriate. This isn't correct for NV guests as hardware is oblivious to vEL2's intentions for SErrors. Better yet, with FEAT_NV2 the guest can change the routing behind our back as HCR_EL2 is redirected to memory. Cope with this mess by: - Using a flag (instead of HCR_EL2.VSE) to track the pending SError state when SErrors are unconditionally masked for the current context - Resampling the routing / masking of a pending SError on every guest entry/exit - Emulating exception entry when SError routing implies a translation regime change Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-08 11:36:31 -07:00
Oliver Upton	aae35f4ffb	KVM: arm64: Treat vCPU with pending SError as runnable Per R_VRLPB, a pending SError is a WFI wakeup event regardless of PSTATE.A, meaning that the vCPU is runnable. Sample VSE in addition to the other IRQ lines. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-08 10:40:30 -07:00
Marc Zyngier	1d6fea7663	KVM: arm64: Add helper to identify a nested context A common idiom in the KVM code is to check if we are currently dealing with a "nested" context, defined as having NV enabled, but being in the EL1&0 translation regime. This is usually expressed as: if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu) ... ) which is a mouthful and a bit hard to read, specially when followed by additional conditions. Introduce a new helper that encapsulate these two terms, allowing the above to be written as if (is_nested_context(vcpu) ... ) which is both shorter and easier to read, and makes more obvious the potential for simplification on some code paths. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-08 10:40:30 -07:00
Ankit Agrawal	f55ce5a6cd	KVM: arm64: Expose new KVM cap for cacheable PFNMAP Introduce a new KVM capability to expose to the userspace whether cacheable mapping of PFNMAP is supported. The ability to safely do the cacheable mapping of PFNMAP is contingent on S2FWB and ARM64_HAS_CACHE_DIC. S2FWB allows KVM to avoid flushing the D cache, ARM64_HAS_CACHE_DIC allows KVM to avoid flushing the icache and turns icache_inval_pou() into a NOP. The cap would be false if those requirements are missing and is checked by making use of kvm_arch_supports_cacheable_pfnmap. This capability would allow userspace to discover the support. It could for instance be used by userspace to prevent live-migration across FWB and non-FWB hosts. CC: Catalin Marinas <catalin.marinas@arm.com> CC: Jason Gunthorpe <jgg@nvidia.com> CC: Oliver Upton <oliver.upton@linux.dev> CC: David Hildenbrand <david@redhat.com> Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250705071717.5062-7-ankita@nvidia.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-07 16:54:52 -07:00
Mark Rutland	42ce432522	KVM: arm64: Remove kvm_arch_vcpu_run_map_fp() Historically KVM hyp code saved the host's FPSIMD state into the hosts's fpsimd_state memory, and so it was necessary to map this into the hyp Stage-1 mappings before running a vCPU. This is no longer necessary as of commits: * `fbc7e61195` ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state") * `8eca7f6d51` ("KVM: arm64: Remove host FPSIMD saving for non-protected KVM") Since those commits, we eagerly save the host's FPSIMD state before calling into hyp to run a vCPU, and hyp code never reads nor writes the host's fpsimd_state memory. There's no longer any need to map the host's fpsimd_state memory into the hyp Stage-1, and kvm_arch_vcpu_run_map_fp() is unnecessary but benign. Remove kvm_arch_vcpu_run_map_fp(). Currently there is no code to perform a corresponding unmap, and we never mapped the host's SVE or SME state into the hyp Stage-1, so no other code needs to be removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Cc: kvmarm@lists.linux.dev Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250619134817.4075340-1-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-07-03 10:39:24 +01:00
Quentin Perret	0e02219f9c	KVM: arm64: Don't free hyp pages with pKVM on GICv2 Marc reported that enabling protected mode on a device with GICv2 doesn't fail gracefully as one would expect, and leads to a host kernel crash. As it turns out, the first half of pKVM init happens before the vgic probe, and so by the time we find out we have a GICv2 we're already committed to keeping the pKVM vectors installed at EL2 -- pKVM rejects stub HVCs for obvious security reasons. However, the error path on KVM init leads to teardown_hyp_mode() which unconditionally frees hypervisor allocations (including the EL2 stacks and per-cpu pages) under the assumption that a previous cpu_hyp_uninit() execution has reset the vectors back to the stubs, which is false with pKVM. Interestingly, host stage-2 protection is not enabled yet at this point, so this use-after-free may go unnoticed for a while. The issue becomes more obvious after the finalize_pkvm() call. Fix this by keeping track of the CPUs on which pKVM is initialized in the kvm_hyp_initialized per-cpu variable, and use it from teardown_hyp_mode() to skip freeing pages that are in fact used. Fixes: `a770ee80e6` ("KVM: arm64: pkvm: Disable GICv2 support") Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250626101014.1519345-1-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-06-26 11:39:15 +01:00
Mostafa Saleh	9a2b9416fd	KVM: arm64: Fix error path in init_hyp_mode() In the unlikely case pKVM failed to allocate carveout, the error path tries to access NULL ptr when it de-reference the SVE state from the uninitialized nVHE per-cpu base. [ 1.575420] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 1.576010] pc : teardown_hyp_mode+0xe4/0x180 [ 1.576920] lr : teardown_hyp_mode+0xd0/0x180 [ 1.577308] sp : ffff8000826fb9d0 [ 1.577600] x29: ffff8000826fb9d0 x28: 0000000000000000 x27: ffff80008209b000 [ 1.578383] x26: ffff800081dde000 x25: ffff8000820493c0 x24: ffff80008209eb00 [ 1.579180] x23: 0000000000000040 x22: 0000000000000001 x21: 0000000000000000 [ 1.579881] x20: 0000000000000002 x19: ffff800081d540b8 x18: 0000000000000000 [ 1.580544] x17: ffff800081205230 x16: 0000000000000152 x15: 00000000fffffff8 [ 1.581183] x14: 0000000000000008 x13: fff00000ff7f6880 x12: 000000000000003e [ 1.581813] x11: 0000000000000002 x10: 00000000000000ff x9 : 0000000000000000 [ 1.582503] x8 : 0000000000000000 x7 : 7f7f7f7f7f7f7f7f x6 : 43485e525851ff30 [ 1.583140] x5 : fff00000ff6e9030 x4 : fff00000ff6e8f80 x3 : 0000000000000000 [ 1.583780] x2 : 0000000000000000 x1 : 0000000000000002 x0 : 0000000000000000 [ 1.584526] Call trace: [ 1.584945] teardown_hyp_mode+0xe4/0x180 (P) [ 1.585578] init_hyp_mode+0x920/0x994 [ 1.586005] kvm_arm_init+0xb4/0x25c [ 1.586387] do_one_initcall+0xe0/0x258 [ 1.586819] do_initcall_level+0xa0/0xd4 [ 1.587224] do_initcalls+0x54/0x94 [ 1.587606] do_basic_setup+0x1c/0x28 [ 1.587998] kernel_init_freeable+0xc8/0x130 [ 1.588409] kernel_init+0x20/0x1a4 [ 1.588768] ret_from_fork+0x10/0x20 [ 1.589568] Code: f875db48 8b1c0109 f100011f 9a8903e8 (f9463100) [ 1.590332] ---[ end trace 0000000000000000 ]--- As Quentin pointed, the order of free is also wrong, we need to free SVE state first before freeing the per CPU ptrs. I initially observed this on 6.12, but I could also repro in master. Signed-off-by: Mostafa Saleh <smostafa@google.com> Fixes: `66d5b53e20` ("KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM") Reviewed-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250625123058.875179-1-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-06-26 08:05:04 +01:00
Sean Christopherson	77bb184ab8	KVM: Fold kvm_arch_irqfd_route_changed() into kvm_arch_update_irqfd_routing() Fold kvm_arch_irqfd_route_changed() into kvm_arch_update_irqfd_routing(). Calling arch code to know whether or not to call arch code is absurd. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250611224604.313496-35-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:33 -07:00
Sean Christopherson	b33252b9d1	KVM: Don't WARN if updating IRQ bypass route fails Don't bother WARNing if updating an IRTE route fails now that vendor code provides much more precise WARNs. The generic WARN doesn't provide enough information to actually debug the problem, and has obviously done nothing to surface the myriad bugs in KVM x86's implementation. Drop all of the associated return code plumbing that existed just so that common KVM could WARN. Link: https://lore.kernel.org/r/20250611224604.313496-34-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:32 -07:00
Sean Christopherson	cb21073767	KVM: Pass new routing entries and irqfd when updating IRTEs When updating IRTEs in response to a GSI routing or IRQ bypass change, pass the new/current routing information along with the associated irqfd. This will allow KVM x86 to harden, simplify, and deduplicate its code. Since adding/removing a bypass producer is now conveniently protected with irqfds.lock, i.e. can't run concurrently with kvm_irq_routing_update(), use the routing information cached in the irqfd instead of looking up the information in the current GSI routing tables. Opportunistically convert an existing printk() to pr_info() and put its string onto a single line (old code that strictly adhered to 80 chars). Link: https://lore.kernel.org/r/20250611224604.313496-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:52:53 -07:00
Sean Christopherson	1fbe6861a6	KVM: arm64: Explicitly treat routing entry type changes as changes Explicitly treat type differences as GSI routing changes, as comparing MSI data between two entries could get a false negative, e.g. if userspace changed the type but left the type-specific data as- Note, the same bug was fixed in x86 by commit `bcda70c56f` ("KVM: x86: Explicitly treat routing entry type changes as changes"). Fixes: `4bf3693d36` ("KVM: arm64: Unmap vLPIs affected by changes to GSI routing information") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250611224604.313496-3-seanjc@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-06-19 09:58:21 +01:00
Paolo Bonzini	61374cc145	KVM/arm64 fixes for 6.16, take #1 - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change. - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated. - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum. - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping. - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmg5fbgACgkQI9DQutE9 ekNMyhAAq99rswsyqUrDwTd4qwJBtRYj4FMkBD8rH1oPQcXkqPGVllOMOgAnJhtd mU1/XpU8gk49nsx5kwdg1RORQG4lKCZATH1F2MhInDWMMSSWUTWBu0+qG62GFQWF Db+aXcvyebHdfzunDmT1/NebhV9VGCnkUtbQrFBvBbqOt1xNCRhKNvv5RWNEugTW R2DIg4gCk2x1/iZceP8dDRDpnvFk8txtzZsWE6g1DZRw1mmlrz01rRgD7EJtkyaQ c74mqpc/A2GuJv2mwpz3GCXTLiNgAuNq1X9+yeEqYM+VVvuyGy906BTFXxgvYl6l o4OvE1F6QOtcJIokPb5kBlpxz76J2bNxKuUrY7rxM73Of0ZgGMsSltQxdcoGhJZ7 cyfUWFVQ+H5Gx+BVs/+5u2XdtX7OC0U50eXP0XrzviunTkEUhvRaH1WfTh9Zike/ U150d0oU17MeVZ9HlOWyhXV/nId6CPX5xo7BUjIq2e1Oy4mYkIYYbowdrV5t6NiJ FHwzXNa/T0prZDSPW4t1GL/8o8hcnd4KRGLTJDWBYk58J6S1hHG0QElAz5RvqUj+ nf8HSycZMBrpdXtIK6nu/GRk9Ng+6gT4KHomwjgIDj6QVn7DTFFId911j+HRmoCP DiV4QD4yVRSMBDX5cdQj2XBk2raMTOxdN9A1y6T2H3b+iLavVyo= =JjTn -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.16-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.16, take #1 - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change. - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated. - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum. - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping. - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet.	2025-06-02 03:05:29 -04:00
Oliver Upton	4bf3693d36	KVM: arm64: Unmap vLPIs affected by changes to GSI routing information KVM's interrupt infrastructure is dodgy at best, allowing for some ugly 'off label' usage of the various UAPIs. In one example, userspace can change the routing entry of a particular "GSI" after configuring irqbypass with KVM_IRQFD. KVM/arm64 is oblivious to this, and winds up preserving the stale translation in cases where vLPIs are configured. Honor userspace's intentions and tear down the vLPI mapping if affected by a "GSI" routing change. Make no attempt to reconstruct vLPIs if the new target is an MSI and just fall back to software injection. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-5-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-30 09:11:29 +01:00
Oliver Upton	05b9405f2f	KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() The virtual mapping and "GSI" routing of a particular vLPI is subject to change in response to the guest / userspace. This can be pretty annoying to deal with when KVM needs to track the physical state that's managed for vLPI direct injection. Make vgic_v4_unset_forwarding() resilient by using the host IRQ to resolve the vgic IRQ. Since this uses the LPI xarray directly, finding the ITS by doorbell address + grabbing it's its_lock is no longer necessary. Note that matching the right ITS / ITE is already handled in vgic_v4_set_forwarding(), and unless there's a bug in KVM's VGIC ITS emulation the virtual mapping that should remain stable for the lifetime of the vLPI mapping. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-30 09:11:29 +01:00
Maxim Levitsky	b586c5d219	KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs Use kvm_trylock_all_vcpus instead of a custom implementation when locking all vCPUs of a VM, to avoid triggering a lockdep warning, in the case in which the VM is configured to have more than MAX_LOCK_DEPTH vCPUs. This fixes the following false lockdep warning: [ 328.171264] BUG: MAX_LOCK_DEPTH too low! [ 328.175227] turning off the locking correctness validator. [ 328.180726] Please attach the output of /proc/lock_stat to the bug report [ 328.187531] depth: 48 max: 48! [ 328.190678] 48 locks held by qemu-kvm/11664: [ 328.194957] #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0 [ 328.204048] #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.212521] #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.220991] #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.229463] #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.237934] #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.246405] #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-05-27 12:16:41 -04:00
Marc Zyngier	7f3225fe8b	Merge branch kvm-arm64/nv-nv into kvmarm-master/next * kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-23 10:58:57 +01:00
Marc Zyngier	fef3acf5ae	Merge branch kvm-arm64/fgt-masks into kvmarm-master/next * kvm-arm64/fgt-masks: (43 commits) : . : Large rework of the way KVM deals with trap bits in conjunction with : the CPU feature registers. It now draws a direct link between which : the feature set, the system registers that need to UNDEF to match : the configuration and bits that need to behave as RES0 or RES1 in : the trap registers that are visible to the guest. : : Best of all, these definitions are mostly automatically generated : from the JSON description published by ARM under a permissive : license. : . KVM: arm64: Handle TSB CSYNC traps KVM: arm64: Add FGT descriptors for FEAT_FGT2 KVM: arm64: Allow sysreg ranges for FGT descriptors KVM: arm64: Add context-switch for FEAT_FGT2 registers KVM: arm64: Add trap routing for FEAT_FGT2 registers KVM: arm64: Add sanitisation for FEAT_FGT2 registers KVM: arm64: Add FEAT_FGT2 registers to the VNCR page KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits KVM: arm64: Allow kvm_has_feat() to take variable arguments KVM: arm64: Use FGT feature maps to drive RES0 bits KVM: arm64: Validate FGT register descriptions against RES0 masks KVM: arm64: Switch to table-driven FGU configuration KVM: arm64: Handle PSB CSYNC traps KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask KVM: arm64: Remove hand-crafted masks for FGT registers KVM: arm64: Use computed FGT masks to setup FGT registers KVM: arm64: Propagate FGT masks to the nVHE hypervisor KVM: arm64: Unconditionally configure fine-grain traps KVM: arm64: Use computed masks as sanitisers for FGT registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-23 10:58:15 +01:00
Marc Zyngier	d5702dd224	Merge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16 * kvm-arm64/pkvm-selftest-6.16: : . : pKVM selftests covering the memory ownership transitions by : Quentin Perret. From the initial cover letter: : : "We have recently found a bug [1] in the pKVM memory ownership : transitions by code inspection, but it could have been caught with a : test. : : Introduce a boot-time selftest exercising all the known pKVM memory : transitions and importantly checks the rejection of illegal transitions. : : The new test is hidden behind a new Kconfig option separate from : CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the : transition checks ([1] doesn't reproduce with EL2 debug enabled). : : [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/" : . KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 14:33:43 +01:00
Marc Zyngier	4bc0fe0898	KVM: arm64: Add sanitisation for FEAT_FGT2 registers Just like the FEAT_FGT registers, treat the FGT2 variant the same way. THis is a large update, but a fairly mechanical one. The config dependencies are extracted from the 2025-03 JSON drop. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:36:10 +01:00
Marc Zyngier	a7484c80e5	KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* Since we're (almost) feature complete, let's allow userspace to request KVM_ARM_VCPU_EL2* by bumping KVM_VCPU_MAX_FEATURES up. We also now advertise the features to userspace with new capabilities. It's going to be great... Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250514103501.2225951-17-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	ea8d3cf46d	KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR is a virtual address in the EL2&0 (or EL2, but we thankfully ignore this) translation regime. As we need to replicate such mapping in the real EL2, it means that we need to remember that there is such a translation, and that any TLBI affecting EL2 can possibly affect this translation. It also means that any invalidation driven by an MMU notifier must be able to shoot down any such mapping. All in all, we need a data structure that represents this mapping, and that is extremely close to a TLB. Given that we can only use one of those per vcpu at any given time, we only allocate one. No effort is made to keep that structure small. If we need to start caching multiple of them, we may want to revisit that design point. But for now, it is kept simple so that we can reason about it. Oh, and add a braindump of how things are supposed to work, because I will definitely page this out at some point. Yes, pun intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	311ba55a5f	KVM: arm64: Propagate FGT masks to the nVHE hypervisor The nVHE hypervisor needs to have access to its own view of the FGT masks, which unfortunately results in a bit of data duplication. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-06 17:35:25 +01:00
David Brazdil	74b13d5816	KVM: arm64: Add .hyp.data section The hypervisor has not needed its own .data section because all globals were either .rodata or .bss. To avoid having to initialize future data-structures at run-time, let's introduce add a .data section to the hypervisor. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-06 09:56:18 +01:00
Paolo Bonzini	5f9e169814	KVM: arm64, x86: make kvm_arch_has_irq_bypass() inline kvm_arch_has_irq_bypass() is a small function and even though it does not appear in any really hot paths, it's also not entirely rare. Make it inline---it also works out nicely in preparation for using it in kvm-intel.ko and kvm-amd.ko, since the function is not currently exported. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-04-24 09:46:58 -04:00
Oliver Upton	369c012268	Merge branch 'kvm-arm64/pmu-fixes' into kvmarm/next * kvm-arm64/pmu-fixes: : vPMU fixes for 6.15 courtesy of Akihiko Odaki : : Various fixes to KVM's vPMU implementation, notably ensuring : userspace-directed changes to the PMCs are reflected in the backing perf : events. KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:54:52 -07:00
Oliver Upton	ca19dd4323	Merge branch 'kvm-arm64/pkvm-6.15' into kvmarm/next * kvm-arm64/pkvm-6.15: : pKVM updates for 6.15 : : - SecPageTable stats for stage-2 table pages allocated by the protected : hypervisor (Vincent Donnefort) : : - HCRX_EL2 trap + vCPU initialization fixes for pKVM (Fuad Tabba) KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: arm64: Count pKVM stage-2 usage in secondary pagetable stats KVM: arm64: Distinct pKVM teardown memcache for stage-2 KVM: arm64: Add flags to kvm_hyp_memcache Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:54:40 -07:00
Oliver Upton	4f2774c57a	Merge branch 'kvm-arm64/writable-midr' into kvmarm/next * kvm-arm64/writable-midr: : Writable implementation ID registers, courtesy of Sebastian Ott : : Introduce a new capability that allows userspace to set the : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1, : and AIDR_EL1. Also plug a hole in KVM's trap configuration where : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not : support SME. KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable KVM: arm64: Copy guest CTR_EL0 into hyp VM KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR KVM: arm64: Allow userspace to change the implementation ID registers KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value KVM: arm64: Maintain per-VM copy of implementation ID regs KVM: arm64: Set HCR_EL2.TID1 unconditionally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:54:32 -07:00
Oliver Upton	1b1d1b17b8	Merge branch 'kvm-arm64/pmuv3-asahi' into kvmarm/next * kvm-arm64/pmuv3-asahi: : Support PMUv3 for KVM guests on Apple silicon : : Take advantage of some IMPLEMENTATION DEFINED traps available on Apple : parts to trap-and-emulate the PMUv3 registers on behalf of a KVM guest. : Constrain the vPMU to a cycle counter and single event counter, as the : Apple PMU has events that cannot be counted on every counter. : : There is a small new interface between the ARM PMU driver and KVM, where : the PMU driver owns the PMUv3 -> hardware event mappings. arm64: Enable IMP DEF PMUv3 traps on Apple M* KVM: arm64: Provide 1 event counter on IMPDEF hardware drivers/perf: apple_m1: Provide helper for mapping PMUv3 events KVM: arm64: Remap PMUv3 events onto hardware KVM: arm64: Advertise PMUv3 if IMPDEF traps are present KVM: arm64: Compute synthetic sysreg ESR for Apple PMUv3 traps KVM: arm64: Move PMUVer filtering into KVM code KVM: arm64: Use guard() to cleanup usage of arm_pmus_lock KVM: arm64: Drop kvm_arm_pmu_available static key KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3 KVM: arm64: Always support SW_INCR PMU event KVM: arm64: Compute PMCEID from arm_pmu's event bitmaps drivers/perf: apple_m1: Support host/guest event filtering drivers/perf: apple_m1: Refactor event select/filter configuration Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:54:23 -07:00
Oliver Upton	13f64f6d21	Merge branch 'kvm-arm64/nv-idregs' into kvmarm/next * kvm-arm64/nv-idregs: : Changes to exposure of NV features, courtesy of Marc Zyngier : : Apply NV-specific feature restrictions at reset rather than at the point : of KVM_RUN. This makes the true feature set visible to userspace, a : necessary step towards save/restore support or NV VMs. : : Add an additional vCPU feature flag for selecting the E2H0 flavor of NV, : such that the VHE-ness of the VM can be applied to the feature set. KVM: arm64: selftests: Test that TGRAN_2 fields are writable KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN_2 KVM: arm64: Advertise FEAT_ECV when possible KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable KVM: arm64: Allow userspace to limit NV support to nVHE KVM: arm64: Move NV-specific capping to idreg sanitisation KVM: arm64: Enforce NV limits on a per-idregs basis KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available KVM: arm64: Consolidate idreg callbacks KVM: arm64: Advertise NV2 in the boot messages KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0 KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace arm64: cpufeature: Handle NV_frac as a synonym of NV2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:52:26 -07:00
Oliver Upton	56e3e5c8f7	Merge branch 'kvm-arm64/nv-vgic' into kvmarm/next * kvm-arm64/nv-vgic: : NV VGICv3 support, courtesy of Marc Zyngier : : Support for emulating the GIC hypervisor controls and managing shadow : VGICv3 state for the L1 hypervisor. As part of it, bring in support for : taking IRQs to the L1 and UAPI to manage the VGIC maintenance interrupt. KVM: arm64: nv: Fail KVM init if asking for NV without GICv3 KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup KVM: arm64: nv: Propagate used_lrs between L1 and L0 contexts KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2 KVM: arm64: nv: Respect virtual HCR_EL2.TWx setting KVM: arm64: nv: Add Maintenance Interrupt emulation KVM: arm64: nv: Handle L2->L1 transition on interrupt injection KVM: arm64: nv: Nested GICv3 emulation KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses KVM: arm64: nv: Add ICH_*_EL2 registers to vpcu_sysreg KVM: arm64: nv: Load timer before the GIC arm64: sysreg: Add layout for ICH_MISR_EL2 arm64: sysreg: Add layout for ICH_VTR_EL2 arm64: sysreg: Add layout for ICH_HCR_EL2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:51:43 -07:00
Oliver Upton	3ed0dc03f6	Merge branch 'kvm-arm64/misc' into kvmarm/next * kvm-arm64/misc: : Miscellaneous fixes/cleanups for KVM/arm64 : : - Avoid GICv4 vLPI configuration when confronted with user error : : - Only attempt vLPI configuration when the target routing is an MSI : : - Document ordering requirements to avoid aforementioned user error KVM: arm64: Tear down vGIC on failed vCPU creation KVM: arm64: Document ordering requirements for irqbypass KVM: arm64: vgic-v4: Fall back to software irqbypass if LPI not found KVM: arm64: vgic-v4: Only WARN for HW IRQ mismatch when unmapping vLPI KVM: arm64: vgic-v4: Only attempt vLPI mapping for actual MSIs Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:51:15 -07:00
Will Deacon	250f25367b	KVM: arm64: Tear down vGIC on failed vCPU creation If kvm_arch_vcpu_create() fails to share the vCPU page with the hypervisor, we propagate the error back to the ioctl but leave the vGIC vCPU data initialised. Note only does this leak the corresponding memory when the vCPU is destroyed but it can also lead to use-after-free if the redistributor device handling tries to walk into the vCPU. Add the missing cleanup to kvm_arch_vcpu_create(), ensuring that the vGIC vCPU structures are destroyed on error. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250314133409.9123-1-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-17 23:30:51 -07:00
Akihiko Odaki	be5ccac3f1	KVM: arm64: PMU: Assume PMU presence in pmu-emul.c Many functions in pmu-emul.c checks kvm_vcpu_has_pmu(vcpu). A favorable interpretation is defensive programming, but it also has downsides: - It is confusing as it implies these functions are called without PMU although most of them are called only when a PMU is present. - It makes semantics of functions fuzzy. For example, calling kvm_pmu_disable_counter_mask() without PMU may result in no-op as there are no enabled counters, but it's unclear what kvm_pmu_get_counter_value() returns when there is no PMU. - It allows callers without checking kvm_vcpu_has_pmu(vcpu), but it is often wrong to call these functions without PMU. - It is error-prone to duplicate kvm_vcpu_has_pmu(vcpu) checks into multiple functions. Many functions are called for system registers, and the system register infrastructure already employs less error-prone, comprehensive checks. Check kvm_vcpu_has_pmu(vcpu) in callers of these functions instead, and remove the obsolete checks from pmu-emul.c. The only exceptions are the functions that implement ioctls as they have definitive semantics even when the PMU is not present. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250315-pmc-v5-2-ecee87dab216@daynix.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-17 10:42:22 -07:00
Fuad Tabba	1eab115486	KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu Instead of creating and initializing _all_ hyp vcpus in pKVM when the first host vcpu runs for the first time, initialize _each_ hyp vcpu in conjunction with its corresponding host vcpu. Some of the host vcpu state (e.g., system registers and traps values) is not initialized until the first time the host vcpu is run. Therefore, initializing a hyp vcpu before its corresponding host vcpu has run for the first time might not view the complete host state of these vcpus. Additionally, this behavior is inline with non-protected modes. Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250314111832.4137161-5-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-14 16:06:03 -07:00
Oliver Upton	a38b67d151	KVM: arm64: Drop kvm_arm_pmu_available static key With the PMUv3 cpucap, kvm_arm_pmu_available is no longer used in the hot path of guest entry/exit. On top of that, guest support for PMUv3 may not correlate with host support for the feature, e.g. on IMPDEF hardware. Throw out the static key and just inspect the list of PMUs to determine if PMUv3 is supported for KVM guests. Tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305202641.428114-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-11 12:54:29 -07:00
Marc Zyngier	83c6cb2014	KVM: arm64: nv: Fail KVM init if asking for NV without GICv3 Although there is nothing in NV that is fundamentally incompatible with the lack of GICv3, there is no HW implementation without one, at least on the virtual side (yes, even fruits have some form of vGICv3). We therefore make the decision to require GICv3, which will only affect models such as QEMU. Booting with a GICv2 or something even more exotic while asking for NV will result in KVM being disabled. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-17-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-03 14:57:10 -08:00
Marc Zyngier	201c8d40dd	KVM: arm64: nv: Add Maintenance Interrupt emulation Emulating the vGIC means emulating the dreaded Maintenance Interrupt. This is a two-pronged problem: - while running L2, getting an MI translates into an MI injected in the L1 based on the state of the HW. - while running L1, we must accurately reflect the state of the MI line, based on the in-memory state. The MI INTID is added to the distributor, as expected on any virtualisation-capable implementation, and further patches will allow its configuration. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-03 14:57:10 -08:00
Marc Zyngier	4b1b97f0d7	KVM: arm64: nv: Handle L2->L1 transition on interrupt injection An interrupt being delivered to L1 while running L2 must result in the correct exception being delivered to L1. This means that if, on entry to L2, we found ourselves with pending interrupts in the L1 distributor, we need to take immediate action. This is done by posting a request which will prevent the entry in L2, and deliver an IRQ exception to L1, forcing the switch. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-03 14:57:10 -08:00
Marc Zyngier	16abeb60be	KVM: arm64: nv: Load timer before the GIC In order for vgic_v3_load_nested to be able to observe which timer interrupts have the HW bit set for the current context, the timers must have been loaded in the new mode and the right timer mapped to their corresponding HW IRQs. At the moment, we load the GIC first, meaning that timer interrupts injected to an L2 guest will never have the HW bit set (we see the old configuration). Swapping the two loads solves this particular problem. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-03 14:54:57 -08:00
Oliver Upton	a0d7e2fc61	KVM: arm64: vgic-v4: Only attempt vLPI mapping for actual MSIs Some 'creative' VMMs out there may assign a VFIO MSI eventfd to an SPI routing entry. And yes, I can already hear you shouting about possibly driving a level interrupt with an edge-sensitive one. You know who you are. This works for the most part, and interrupt injection winds up taking the software path. However, when running on GICv4-enabled hardware, KVM erroneously attempts to setup LPI forwarding, even though the KVM routing isn't an MSI. Thanks to misuse of a union, the MSI destination is unlikely to match any ITS in the VM and kvm_vgic_v4_set_forwarding() bails early. Later on when the VM is being torn down, this half-configured state triggers the WARN_ON() in kvm_vgic_v4_unset_forwarding() due to the fact that no HW IRQ was ever assigned. Avoid the whole mess by preventing SPI routing entries from getting into the LPI forwarding helpers. Reported-by: Sudheer Dantuluri <dantuluris@google.com> Tested-by: Sudheer Dantuluri <dantuluris@google.com> Fixes: `196b136498` ("KVM: arm/arm64: GICv4: Wire mapping/unmapping of VLPIs in VFIO irq bypass") Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250226183124.82094-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-02-26 13:22:48 -08:00
Sebastian Ott	3adaee7830	KVM: arm64: Allow userspace to change the implementation ID registers KVM's treatment of the ID registers that describe the implementation (MIDR, REVIDR, and AIDR) is interesting, to say the least. On the userspace-facing end of it, KVM presents the values of the boot CPU on all vCPUs and treats them as invariant. On the guest side of things KVM presents the hardware values of the local CPU, which can change during CPU migration in a big-little system. While one may call this fragile, there is at least some degree of predictability around it. For example, if a VMM wanted to present big-little to a guest, it could affine vCPUs accordingly to the correct clusters. All of this makes a giant mess out of adding support for making these implementation ID registers writable. Avoid breaking the rather subtle ABI around the old way of doing things by requiring opt-in from userspace to make the registers writable. When the cap is enabled, allow userspace to set MIDR, REVIDR, and AIDR to any non-reserved value and present those values consistently across all vCPUs. Signed-off-by: Sebastian Ott <sebott@redhat.com> [oliver: changelog, capability] Link: https://lore.kernel.org/r/20250225005401.679536-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-02-26 01:32:16 -08:00
Marc Zyngier	2cd9542a37	KVM: arm64: Advertise NV2 in the boot messages Make it a bit easier to understand what people are running by adding a +NV2 string to the successful KVM initialisation. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-02-24 11:06:55 -08:00
Oliver Upton	fa808ed4e1	KVM: arm64: Ensure a VMID is allocated before programming VTTBR_EL2 Vladimir reports that a race condition to attach a VMID to a stage-2 MMU sometimes results in a vCPU entering the guest with a VMID of 0: \| CPU1 \| CPU2 \| \| \| \| kvm_arch_vcpu_ioctl_run \| \| vcpu_load <= load VTTBR_EL2 \| \| kvm_vmid->id = 0 \| \| \| kvm_arch_vcpu_ioctl_run \| \| vcpu_load <= load VTTBR_EL2 \| \| with kvm_vmid->id = 0\| \| kvm_arm_vmid_update <= allocates fresh \| \| kvm_vmid->id and \| \| reload VTTBR_EL2 \| \| \| \| \| kvm_arm_vmid_update <= observes that kvm_vmid->id \| \| already allocated, \| \| skips reload VTTBR_EL2 Oh yeah, it's as bad as it looks. Remember that VHE loads the stage-2 MMU eagerly but a VMID only gets attached to the MMU later on in the KVM_RUN loop. Even in the "best case" where VTTBR_EL2 correctly gets reprogrammed before entering the EL1&0 regime, there is a period of time where hardware is configured with VMID 0. That's completely insane. So, rather than decorating the 'late' binding with another hack, just allocate the damn thing up front. Attaching a VMID from vcpu_load() is still rollover safe since (surprise!) it'll always get called after a vCPU was preempted. Excuse me while I go find a brown paper bag. Cc: stable@vger.kernel.org Fixes: `934bf871f0` ("KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()") Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250219220737.130842-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-02-20 16:29:28 +00:00
Will Deacon	102c51c50d	KVM: arm64: Fix tcr_el2 initialisation in hVHE mode When not running in VHE mode, cpu_prepare_hyp_mode() computes the value of TCR_EL2 using the host's TCR_EL1 settings as a starting point. For nVHE, this amounts to masking out everything apart from the TG0, SH0, ORGN0, IRGN0 and T0SZ fields before setting the RES1 bits, shifting the IPS field down to the PS field and setting DS if LPA2 is enabled. Unfortunately, for hVHE, things go slightly wonky: EPD1 is correctly set to disable walks via TTBR1_EL2 but then the T1SZ and IPS fields are corrupted when we mistakenly attempt to initialise the PS and DS fields in their E2H=0 positions. Furthermore, many fields are retained from TCR_EL1 which should not be propagated to TCR_EL2. Notably, this means we can end up with A1 set despite not initialising TTBR1_EL2 at all. This has been shown to cause unexpected translation faults at EL2 with pKVM due to TLB invalidation not taking effect when running with a non-zero ASID. Fix the TCR_EL2 initialisation code to set PS and DS only when E2H=0, masking out HD, HA and A1 when E2H=1. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Fixes: `ad744e8cb3` ("arm64: Allow arm64_sw.hvhe on command line") Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250214133724.13179-1-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-02-19 22:09:24 +00:00
Paolo Bonzini	3bb7dcebd0	KVM/arm64 fixes for 6.14, take #2 - Large set of fixes for vector handling, specially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours. - Fix use of kernel VAs at EL2 when emulating timers with nVHE. - Small set of pKVM improvements and cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmevLKMACgkQI9DQutE9 ekP3hQ//db7pAPzLr++//PAyam0GP+ooKlgpB0ImZisQwkrTrTMP+IjNJG+NCJ46 y88anBErFijvWb3BINpeTM/dux7DmuoaolGx7lquFu+i0L8UfFFjYG7UU+NZscim KE4j0tJz8jm5ksN4iwaj3RIkGKc1zJtRyoPny3j1blOtm8aTtujRJB7/Gx2QefZR 1Z13RaIzk1tKdY0JxAmPpGkaRY99MQahx96iBsk2u4rlypcxmVr9aQ1Madp7Pc6Y pBcX9jZwLf75cj6CAK93YSjFF3j/x4QM8jSupLCu5tyin6YZ4sRaZa6sy52byk2v zes7i83l5g3+JEKv5oZVwjD5SFBu02UPbnMGSxKQitgz4Zej3qMIq5BxgII2kHZV jwXrNEx4trNegEcoqwFX5xA0FMUr1/g3Cr4+rZBoUramj80cBhzbBdUkhyWd3eey j2EOuAG3pgUD5Wv9SyojlbHBwmSAcBEtr3vqJpTjWQS6AyEmdKNvzh/8JCH1h7UM fBo4+LIEylzmZXbqDrZNwXh31tELoTCR9Ur3pTCEO3Yfg9npTLWmvKs+tAgO/282 IOjZE0N/ZtzPJ6Cgr+2efBGd+id81HXh+H8gWo35Dyx3EH2k44FHwQ3rW2NKOVzo 10eSbswYpjk3gi/6GxwC0lDqFi4Bk6ILvC6roqTghixBf7xThfY= =L5HS -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.14, take #2 - Large set of fixes for vector handling, specially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours. - Fix use of kernel VAs at EL2 when emulating timers with nVHE. - Small set of pKVM improvements and cleanups.	2025-02-14 18:32:47 -05:00
Mark Rutland	8eca7f6d51	KVM: arm64: Remove host FPSIMD saving for non-protected KVM Now that the host eagerly saves its own FPSIMD/SVE/SME state, non-protected KVM never needs to save the host FPSIMD/SVE/SME state, and the code to do this is never used. Protected KVM still needs to save/restore the host FPSIMD/SVE state to avoid leaking guest state to the host (and to avoid revealing to the host whether the guest used FPSIMD/SVE/SME), and that code needs to be retained. Remove the unused code and data structures. To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the VHE hyp code, the nVHE/hVHE version is moved into the shared switch header, where it is only invoked when KVM is in protected mode. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-3-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-02-13 17:54:44 +00:00
Paolo Bonzini	5e21d0c5b9	KVM/arm64 fixes for 6.14, take #1 - Correctly clean the BSS to the PoC before allowing EL2 to access it on nVHE/hVHE/protected configurations - Propagate ownership of debug registers in protected mode after the rework that landed in 6.14-rc1 - Stop pretending that we can run the protected mode without a GICv3 being present on the host - Fix a use-after-free situation that can occur if a vcpu fails to initialise the NV shadow S2 MMU contexts - Always evaluate the need to arm a background timer for fully emulated guest timers - Fix the emulation of EL1 timers in the absence of FEAT_ECV - Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmeiNZsACgkQI9DQutE9 ekOmiQ//WsCDESxdg+kezCCmx9yHo8CM5tsOqzcDoIz+KV5aiLB6KlegGbciC9MZ Fgj8lBr+Mu2GGEhYP/H6glqMLB8VPLiGYzwbeBE8ty2IH5Nn6M9hiKFWTZJVpGma gNDtFVSO39d39gtklQ4Y3SYSlN0dN2wW47pXxlZiY/aD9GRie7z2m1yC3T05xVpR TT7XAYjucfx0Cj8vK9MaJyyy69TXsqocdAxZI+ZCp+NI1yuBv7sunjDdfPcp9n4G Lk+Ikn39FC3qlrjZ38eHidMgY6LQvucrwtlvDuPp7yMuNYTXW6UEIU7RUF1AUts4 IsKKvENV0zLsmtoEAgSa4d9Rtid3eomyll6mb3hBF+1GSBf7+v+sgKdpoB7/mOyn exGEbYCOXghXswhF9up2YC6jI8XtcO/2nTt6PdS2nMK61mvCb31u6xFvU0yTEMQW dEDVTcIKmvKrAeVzNOBNNjZj0bGNns0P5+oLFbkOzYj8qNueYS0Y9o996kqODET2 d4J8xib6Mh0cr6usgj0Z8Jp+7vl9Li9KR0SSQqiQyOQcu5St0zidJeCf7J46HCR7 5d8sQi6lYZ+G+9rm0X7Rl8+yBaUhrUs9v+jOAip3M+1fIxIWr9DyDUcQQKv+obW0 KqyMWx4Fv+bDqBqTRdd1r6KQEpLL0DbeWinNrdJdVZkmLyI1Y98= =idVr -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.14, take #1 - Correctly clean the BSS to the PoC before allowing EL2 to access it on nVHE/hVHE/protected configurations - Propagate ownership of debug registers in protected mode after the rework that landed in 6.14-rc1 - Stop pretending that we can run the protected mode without a GICv3 being present on the host - Fix a use-after-free situation that can occur if a vcpu fails to initialise the NV shadow S2 MMU contexts - Always evaluate the need to arm a background timer for fully emulated guest timers - Fix the emulation of EL1 timers in the absence of FEAT_ECV - Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0	2025-02-04 11:14:53 -05:00

1 2 3 4 5 ...

463 Commits