linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-01 23:46:45 +00:00

Author	SHA1	Message	Date
Oliver Upton	a5c7f011cb	KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe manner Free the vgic_irq structs in an RCU-safe manner to allow reads of the LPI configuration data to happen in parallel with the release of LPIs. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	05f4d4f5d4	KVM: arm64: vgic: Use atomics to count LPIs Switch to using atomics for LPI accounting, allowing vgic_irq references to be dropped in parallel. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	9880835af7	KVM: arm64: vgic: Get rid of the LPI linked-list All readers of LPI configuration have been transitioned to use the LPI xarray. Get rid of the linked-list altogether. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	2798683b8c	KVM: arm64: vgic-its: Walk the LPI xarray in vgic_copy_lpi_list() Start iterating the LPI xarray in anticipation of removing the LPI linked-list. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	49f0a468a1	KVM: arm64: vgic-v3: Iterate the xarray to find pending LPIs Start walking the LPI xarray to find pending LPIs in preparation for the removal of the LPI linked-list. Note that the 'basic' iterator is chosen here as each iteration needs to drop the xarray read lock (RCU) as reads/writes to guest memory can potentially block. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	5a021df719	KVM: arm64: vgic: Use xarray to find LPI in vgic_get_lpi() Iterating over the LPI linked-list is less than ideal when the desired index is already known. Use the INTID to index the LPI xarray instead. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	1d6f83f60f	KVM: arm64: vgic: Store LPIs in an xarray Using a linked-list for LPIs is less than ideal as it of course requires iterative searches to find a particular entry. An xarray is a better data structure for this use case, as it provides faster searches and can still handle a potentially sparse range of INTID allocations. Start by storing LPIs in an xarray, punting usage of the xarray to a subsequent change. The observant among you will notice that we added yet another lock to the chain of locking order rules; document the ordering of the xa_lock. Don't worry, we'll get rid of the lpi_list_lock one day... Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:01 +00:00
Oliver Upton	85a71ee9a0	KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler It is possible that an LPI mapped in a different ITS gets unmapped while handling the MOVALL command. If that is the case, there is no state that can be migrated to the destination. Silently ignore it and continue migrating other LPIs. Cc: stable@vger.kernel.org Fixes: `ff9c114394` ("KVM: arm/arm64: GICv4: Handle MOVALL applied to a vPE") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240221092732.4126848-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-02-21 10:06:41 +00:00
Oliver Upton	8d3a7dfb80	KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table() vgic_get_irq() may not return a valid descriptor if there is no ITS that holds a valid translation for the specified INTID. If that is the case, it is safe to silently ignore it and continue processing the LPI pending table. Cc: stable@vger.kernel.org Fixes: `33d3bc9556` ("KVM: arm64: vgic-its: Read initial LPI pending table") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240221092732.4126848-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-02-21 10:06:41 +00:00
Randy Dunlap	e634ff9598	KVM: arm64: vgic: fix a kernel-doc warning Use the correct function name in a kernel-doc comment to prevent a warning: arch/arm64/kvm/vgic/vgic.c:217: warning: expecting prototype for kvm_vgic_target_oracle(). Prototype was for vgic_target_oracle() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.linux.dev Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240117230714.31025-11-rdunlap@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-01 20:25:42 +00:00
Randy Dunlap	f779d2c017	KVM: arm64: vgic-its: fix kernel-doc warnings Correct the function parameter name "@save tables" -> "@save_tables". Use the "typedef" keyword in the kernel-doc comment for a typedef. These changes prevent kernel-doc warnings: vgic/vgic-its.c:174: warning: Function parameter or struct member 'save_tables' not described in 'vgic_its_abi' arch/arm64/kvm/vgic/vgic-its.c:2152: warning: expecting prototype for entry_fn_t(). Prototype was for int() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.linux.dev Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240117230714.31025-10-rdunlap@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-01 20:25:42 +00:00
Randy Dunlap	dd609a574a	KVM: arm64: vgic-init: fix a kernel-doc warning Change the function comment block to kernel-doc format to prevent a kernel-doc warning: arch/arm64/kvm/vgic/vgic-init.c:448: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Map the MMIO regions depending on the VGIC model exposed to the guest Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.linux.dev Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240117230714.31025-9-rdunlap@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-01 20:25:42 +00:00
Paolo Bonzini	5f53d88f10	KVM/arm64 updates for Linux 6.8 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmWX4wUACgkQI9DQutE9 ekM0DxAAvOJtM+m8ahv2tCSHZpwowkuKBBc7JWI75l4befHEOSvYMZwQwejrequa lPwLgx9t0sGjba+tRGv1JZMtnUBjV4V/lcrhX95AYTF5dfg7vbuTxUh/YFu1CaQ/ MkuKVJ74PUWqpvDYSzwW8Jjqu6RskjW0HqVPMbFkmUWWc8cgExc8XD9M+nu0SrNT g5261KD53CUeyNaR0/+zkaHouq2Skeqw/u2d5OLdnY23hINMZ0qR1jYHj935suYy YrMTiMje1h/fs7YXWra4LmMcsg0V+3LZVQJXwRARrZdk2xkW5w+eLPIYjVqcA7aT VwhrtzjEzD56trrSZClOpj7MSVfQ8OjV7BgvSUpgLT5+kjVrFLIEMIOakiTOCoIJ weweRawTyomUoIsT1EkRmRYQkPH3Z552tcrztD/slYvqrtCB4JcHKF0O7BT88ZfM t2hRhlT+32KR9cOciLfFMzlZI1uKQYF8Z+CvvBA5TJ9Hv8JsIwF2E/NjYUy2ilca iDzF5KdZ/OLQzjwWVWDq9OlvepB2rLGQKNnw67jd1BSzd9Jj3eVuaI/9xRBrLDYR cBOMoIaZMy7Va+pop1zoFEhC7IbTglVHzsj2ch+4F1NB/1+Dd0zBQKbDUPqp5TR/ OOuonTTVk9yH6RgpUULKlbRZ4oU70UoOBFBxCqnvng0cw1KBbbA= =Q6c+ -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 6.8 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups.	2024-01-08 08:09:53 -05:00
Oliver Upton	ad362fe07f	KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache There is a potential UAF scenario in the case of an LPI translation cache hit racing with an operation that invalidates the cache, such as a DISCARD ITS command. The root of the problem is that vgic_its_check_cache() does not elevate the refcount on the vgic_irq before dropping the lock that serializes refcount changes. Have vgic_its_check_cache() raise the refcount on the returned vgic_irq and add the corresponding decrement after queueing the interrupt. Cc: stable@vger.kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240104183233.3560639-1-oliver.upton@linux.dev	2024-01-04 19:26:34 +00:00
Paolo Bonzini	5c2b2176ea	KVM/arm64 fixes for 6.7, part #2 - Ensure a vCPU's redistributor is unregistered from the MMIO bus if vCPU creation fails - Fix building KVM selftests for arm64 from the top-level Makefile -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZYCYmAAKCRCivnWIJHzd FhU+AQDqIOIg3VMV+VjxhrG5aiHccq9o1mczO4LL9FQUO9AdYwD/SbTP4puBlfai gOFQDuvJFogTwKmYPDO2jycp1ekTuQ0= =RhfO -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 6.7, part #2 - Ensure a vCPU's redistributor is unregistered from the MMIO bus if vCPU creation fails - Fix building KVM selftests for arm64 from the top-level Makefile	2023-12-22 18:03:54 -05:00
Oliver Upton	39084ba8d0	KVM: arm64: vgic-v3: Reinterpret user ISPENDR writes as I{C,S}PENDR User writes to ISPENDR for GICv3 are treated specially, as zeroes actually clear the pending state for interrupts (unlike HW). Reimplement it using the ISPENDR and ICPENDR user accessors. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231219065855.1019608-4-oliver.upton@linux.dev	2023-12-22 09:34:27 +00:00
Oliver Upton	561851424d	KVM: arm64: vgic: Use common accessor for writes to ICPENDR Fold MMIO and user accessors into a common helper while maintaining the distinction between the two. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231219065855.1019608-3-oliver.upton@linux.dev	2023-12-22 09:34:17 +00:00
Oliver Upton	13886f3444	KVM: arm64: vgic: Use common accessor for writes to ISPENDR Perhaps unsurprisingly, there is a considerable amount of duplicate code between the MMIO and user accessors for ISPENDR. At the same time there are some important differences between user and guest MMIO, like how SGIs can only be made pending from userspace. Fold user and MMIO accessors into a common helper, maintaining the distinction between the two. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231219065855.1019608-2-oliver.upton@linux.dev	2023-12-22 09:33:54 +00:00
Marc Zyngier	7b95382f96	KVM: arm64: vgic-v4: Restore pending state on host userspace write When the VMM writes to ISPENDR0 to set the state pending state of an SGI, we fail to convey this to the HW if this SGI is already backed by a GICv4.1 vSGI. This is a bit of a corner case, as this would only occur if the vgic state is changed on an already running VM, but this can apparently happen across a guest reset driven by the VMM. Fix this by always writing out the pending_latch value to the HW, and reseting it to false. Reported-by: Kunkun Jiang <jiangkunkun@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Cc: stable@vger.kernel.org # 5.10+ Link: https://lore.kernel.org/r/7e7f2c0c-448b-10a9-8929-4b8f4f6e2a32@huawei.com	2023-12-22 09:27:36 +00:00
Marc Zyngier	6bef365e31	KVM: arm64: vgic: Ensure that slots_lock is held in vgic_register_all_redist_iodevs() Although we implicitly depend on slots_lock being held when registering IO devices with the IO bus infrastructure, we don't enforce this requirement. Make it explicit. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-12-12 07:11:38 +00:00
Marc Zyngier	02e3858f08	KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroy When failing to create a vcpu because (for example) it has a duplicate vcpu_id, we destroy the vcpu. Amusingly, this leaves the redistributor registered with the KVM_MMIO bus. This is no good, and we should properly clean the mess. Force a teardown of the vgic vcpu interface, including the RD device before returning to the caller. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-12-12 07:11:38 +00:00
Marc Zyngier	d26b9cb33c	KVM: arm64: vgic: Add a non-locking primitive for kvm_vgic_vcpu_destroy() As we are going to need to call into kvm_vgic_vcpu_destroy() without prior holding of the slots_lock, introduce __kvm_vgic_vcpu_destroy() as a non-locking primitive of kvm_vgic_vcpu_destroy(). Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-12-12 07:11:38 +00:00
Marc Zyngier	01ad29d224	KVM: arm64: vgic: Simplify kvm_vgic_destroy() When destroying a vgic, we have rather cumbersome rules about when slots_lock and config_lock are held, resulting in fun buglets. The first port of call is to simplify kvm_vgic_map_resources() so that there is only one call to kvm_vgic_destroy() instead of two, with the second only holding half of the locks. For that, we kill the non-locking primitive and move the call outside of the locking altogether. This doesn't change anything (we re-acquire the locks and teardown the whole vgic), and simplifies the code significantly. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-12-12 07:11:38 +00:00
Kunkun Jiang	8e4ece6889	KVM: arm64: GICv4: Do not perform a map to a mapped vLPI Before performing a map, let's check whether the vLPI has been mapped. Fixes: `196b136498` ("KVM: arm/arm64: GICv4: Wire mapping/unmapping of VLPIs in VFIO irq bypass") Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20231120131210.2039-1-jiangkunkun@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-11-20 19:13:32 +00:00
Linus Torvalds	6803bd7956	ARM: * Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest * Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table * Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM * Guest support for memory operation instructions (FEAT_MOPS) * Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code * Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use * Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations * Miscellaneous kernel and selftest fixes LoongArch: * New architecture. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: * Support for the Smstateen and Zicond extensions * Support for virtualizing senvcfg * Support for virtualized SBI debug console (DBCN) S390: * Nested page table management can be monitored through tracepoints and statistics x86: * Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ * Avoid WARN on systems with Intel IPI virtualization * Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. * Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). * Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. * Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. * "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. * Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. * Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. * Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. * Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. * Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. * Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. * Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. * Optimize injection of PMU interrupts that are simultaneous with NMIs. * Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: * Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. * Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. * Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y. This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: * Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. * Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: * Various updates and fixes, mostly for x86 * MTRR and PAT fixes and optimizations: -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVBZc0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroP1LQf+NgsmZ1lkGQlKdSdijoQ856w+k0or l2SV1wUwiEdFPSGK+RTUlHV5Y1ni1dn/CqCVIJZKEI3ZtZ1m9/4HKIRXvbMwFHIH hx+E4Lnf8YUjsGjKTLd531UKcpphztZavQ6pXLEwazkSkDEra+JIKtooI8uU+9/p bd/eF1V+13a8CHQf1iNztFJVxqBJbVlnPx4cZDRQQvewskIDGnVDtwbrwCUKGtzD eNSzhY7si6O2kdQNkuA8xPhg29dYX9XLaCK2K1l8xOUm8WipLdtF86GAKJ5BVuOL 6ek/2QCYjZ7a+coAZNfgSEUi8JmFHEqCo7cnKmWzPJp+2zyXsdudqAhT1g== =UIxm -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes LoongArch: - New architecture for kvm. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: - Support for the Smstateen and Zicond extensions - Support for virtualizing senvcfg - Support for virtualized SBI debug console (DBCN) S390: - Nested page table management can be monitored through tracepoints and statistics x86: - Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ - Avoid WARN on systems with Intel IPI virtualization - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. - Optimize injection of PMU interrupts that are simultaneous with NMIs. - Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: - Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. - Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. - Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: - Various updates and fixes, mostly for x86 - MTRR and PAT fixes and optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits) KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: x86: Service NMI requests after PMI requests in VM-Enter path KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} ...	2023-11-02 15:45:15 -10:00
Oliver Upton	54b44ad26c	Merge branch kvm-arm64/sgi-injection into kvmarm/next * kvm-arm64/sgi-injection: : vSGI injection improvements + fixes, courtesy Marc Zyngier : : Avoid linearly searching for vSGI targets using a compressed MPIDR to : index a cache. While at it, fix some egregious bugs in KVM's mishandling : of vcpuid (user-controlled value) and vcpu_idx. KVM: arm64: Clarify the ordering requirements for vcpu/RD creation KVM: arm64: vgic-v3: Optimize affinity-based SGI injection KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available KVM: arm64: Build MPIDR to vcpu index cache at runtime KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() KVM: arm64: Use vcpu_idx for invalidation tracking KVM: arm64: vgic: Use vcpu_idx for the debug information KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id KVM: arm64: vgic-v3: Refactor GICv3 SGI generation KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-10-30 20:19:13 +00:00
Marc Zyngier	fe49fd940e	KVM: arm64: Move VTCR_EL2 into struct s2_mmu We currently have a global VTCR_EL2 value for each guest, even if the guest uses NV. This implies that the guest's own S2 must fit in the host's. This is odd, for multiple reasons: - the PARange values and the number of IPA bits don't necessarily match: you can have 33 bits of IPA space, and yet you can only describe 32 or 36 bits of PARange - When userspace set the IPA space, it creates a contract with the kernel saying "this is the IPA space I'm prepared to handle". At no point does it constraint the guest's own IPA space as long as the guest doesn't try to use a [I]PA outside of the IPA space set by userspace - We don't even try to hide the value of ID_AA64MMFR0_EL1.PARange. And then there is the consequence of the above: if a guest tries to create a S2 that has for input address something that is larger than the IPA space defined by the host, we inject a fatal exception. This is no good. For all intent and purposes, a guest should be able to have the S2 it really wants, as long as the output address of that S2 isn't outside of the IPA space. For that, we need to have a per-s2_mmu VTCR_EL2 setting, which allows us to represent the full PARange. Move the vctr field into the s2_mmu structure, which has no impact whatsoever, except for NV. Note that once we are able to override ID_AA64MMFR0_EL1.PARange from userspace, we'll also be able to restrict the size of the shadow S2 that NV uses. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012205108.3937270-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-10-23 18:48:46 +00:00
Mark Rutland	d8569fba13	arm64: kvm: Use cpus_have_final_cap() explicitly Much of the arm64 KVM code uses cpus_have_const_cap() to check for cpucaps, but this is unnecessary and it would be preferable to use cpus_have_final_cap(). For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. KVM is initialized after cpucaps have been finalized and alternatives have been patched. Since commit: `d86de40dec` ("arm64: cpufeature: upgrade hyp caps to final") ... use of cpus_have_const_cap() in hyp code is automatically converted to use cpus_have_final_cap(): \| static __always_inline bool cpus_have_const_cap(int num) \| { \| if (is_hyp_code()) \| return cpus_have_final_cap(num); \| else if (system_capabilities_finalized()) \| return __cpus_have_const_cap(num); \| else \| return cpus_have_cap(num); \| } Thus, converting hyp code to use cpus_have_final_cap() directly will not result in any functional change. Non-hyp KVM code is also not executed until cpucaps have been finalized, and it would be preferable to extent the same treatment to this code and use cpus_have_final_cap() directly. This patch converts instances of cpus_have_const_cap() in KVM-only code over to cpus_have_final_cap(). As all of this code runs after cpucaps have been finalized, there should be no functional change as a result of this patch, but the redundant instructions generated by cpus_have_const_cap() will be removed from the non-hyp KVM code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 12:57:56 +01:00
Marc Zyngier	b5daffb120	KVM: arm64: vgic-v3: Optimize affinity-based SGI injection Our affinity-based SGI injection code is a bit daft. We iterate over all the CPUs trying to match the set of affinities that the guest is trying to reach, leading to some very bad behaviours if the selected targets are at a high vcpu index. Instead, we can now use the fact that we have an optimised MPIDR to vcpu mapping, and only look at the relevant values. This results in a much faster injection for large VMs, and in a near constant time, irrespective of the position in the vcpu index space. As a bonus, this is mostly deleting a lot of hard-to-read code. Nobody will complain about that. Suggested-by: Xu Zhao <zhaoxu.35@bytedance.com> Tested-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:44 +00:00
Marc Zyngier	ac0fe56d46	KVM: arm64: vgic: Use vcpu_idx for the debug information When dumping the debug information, use vcpu_idx instead of vcpu_id, as this is independent of any userspace influence. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:43 +00:00
Marc Zyngier	4e7728c81a	KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id When parsing a GICv2 attribute that contains a cpuid, handle this as the vcpu_id, not a vcpu_idx, as userspace cannot really know the mapping between the two. For this, use kvm_get_vcpu_by_id() instead of kvm_get_vcpu(). Take this opportunity to get rid of the pointless check against online_vcpus, which doesn't make much sense either, and switch to FIELD_GET as a way to extract the vcpu_id. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:43 +00:00
Marc Zyngier	f3f60a5653	KVM: arm64: vgic-v3: Refactor GICv3 SGI generation As we're about to change the way SGIs are sent, start by splitting out some of the basic functionnality: instead of intermingling the broadcast and non-broadcast cases with the actual SGI generation, perform the following cleanups: - move the SGI queuing into its own helper - split the broadcast code from the affinity-driven code - replace the mask/shift combinations with FIELD_GET() - fix the confusion between vcpu_id and vcpu when handling the broadcast case The result is much more readable, and paves the way for further optimisations. Tested-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:43 +00:00
Marc Zyngier	d455d366c4	KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id Since our emulated ITS advertises GITS_TYPER.PTA=0, the target address associated to a collection is a PE number and not an address. So far, so good. However, the PE number is what userspace has provided given us (aka the vcpu_id), and not the internal vcpu index. Make sure we consistently retrieve the vcpu by ID rather than by index, adding a helper that deals with most of the cases. We also get rid of the pointless (and bogus) comparisons to online_vcpus, which don't really make sense. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:43 +00:00
Marc Zyngier	9a0a75d3cc	KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Passing a vcpu_id to kvm_vgic_inject_irq() is silly for two reasons: - we often confuse vcpu_id and vcpu_idx - we eventually have to convert it back to a vcpu - we can't count Instead, pass a vcpu pointer, which is unambiguous. A NULL vcpu is also allowed for interrupts that are not private to a vcpu (such as SPIs). Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:43 +00:00
Yue Haibing	a6b33d009f	KVM: arm64: Remove unused declarations Commit `53692908b0` ("KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI") removed vgic_v2_set_npie()/vgic_v3_set_npie() but not the declarations. Commit `29eb5a3c57` ("KVM: arm64: Handle PtrAuth traps early") left behind kvm_arm_vcpu_ptrauth_trap(), remove it. Commit `2a0c343386` ("KVM: arm64: Initialize trap registers for protected VMs") declared but never implemented kvm_init_protected_traps() and commit `cf5d318865` ("arm/arm64: KVM: Turn off vcpus on PSCI shutdown/reboot") declared but never implemented force_vm_exit(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230814140636.45988-1-yuehaibing@huawei.com	2023-08-15 20:27:32 +01:00
Marc Zyngier	b321c31c9b	KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when running a preemptible kernel, as it is possible that a vCPU is blocked without requesting a doorbell interrupt. The issue is that any preemption that occurs between vgic_v4_put() and schedule() on the block path will mark the vPE as nonresident and not request a doorbell irq. This occurs because when the vcpu thread is resumed on its way to block, vcpu_load() will make the vPE resident again. Once the vcpu actually blocks, we don't request a doorbell anymore, and the vcpu won't be woken up on interrupt delivery. Fix it by tracking that we're entering WFI, and key the doorbell request on that flag. This allows us not to make the vPE resident when going through a preempt/schedule cycle, meaning we don't lose any state. Cc: stable@vger.kernel.org Fixes: `8e01d9a396` ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Suggested-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Co-developed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-07-13 22:23:34 +00:00
Marc Zyngier	1caa71a7a6	KVM: arm64: Restore GICv2-on-GICv3 functionality When reworking the vgic locking, the vgic distributor registration got simplified, which was a very good cleanup. But just a tad too radical, as we now register the native vgic only, ignoring the GICv2-on-GICv3 that allows pre-historic VMs (or so I thought) to run. As it turns out, QEMU still defaults to GICv2 in some cases, and this breaks Nathan's setup! Fix it by propagating the requested vgic type rather than the host's version. Fixes: `59112e9c39` ("KVM: arm64: vgic: Fix a circular locking issue") Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> link: https://lore.kernel.org/r/20230606221525.GA2269598@dev-arch.thelio-3990X	2023-06-07 16:38:25 +01:00
Jean-Philippe Brucker	6254873226	KVM: arm64: vgic: Fix a comment It is host userspace, not the guest, that issues KVM_DEV_ARM_VGIC_GRP_CTRL Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-5-jean-philippe@linaro.org	2023-05-19 10:20:00 +01:00
Jean-Philippe Brucker	c38b8400ae	KVM: arm64: vgic: Fix locking comment It is now config_lock that must be held, not kvm lock. Replace the comment with a lockdep annotation. Fixes: `f003277311` ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-4-jean-philippe@linaro.org	2023-05-19 10:20:00 +01:00
Jean-Philippe Brucker	9cf2f840c4	KVM: arm64: vgic: Wrap vgic_its_create() with config_lock vgic_its_create() changes the vgic state without holding the config_lock, which triggers a lockdep warning in vgic_v4_init(): [ 358.667941] WARNING: CPU: 3 PID: 178 at arch/arm64/kvm/vgic/vgic-v4.c:245 vgic_v4_init+0x15c/0x7a8 ... [ 358.707410] vgic_v4_init+0x15c/0x7a8 [ 358.708550] vgic_its_create+0x37c/0x4a4 [ 358.709640] kvm_vm_ioctl+0x1518/0x2d80 [ 358.710688] __arm64_sys_ioctl+0x7ac/0x1ba8 [ 358.711960] invoke_syscall.constprop.0+0x70/0x1e0 [ 358.713245] do_el0_svc+0xe4/0x2d4 [ 358.714289] el0_svc+0x44/0x8c [ 358.715329] el0t_64_sync_handler+0xf4/0x120 [ 358.716615] el0t_64_sync+0x190/0x194 Wrap the whole of vgic_its_create() with config_lock since, in addition to calling vgic_v4_init(), it also modifies the global kvm->arch.vgic state. Fixes: `f003277311` ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-3-jean-philippe@linaro.org	2023-05-19 10:20:00 +01:00
Jean-Philippe Brucker	59112e9c39	KVM: arm64: vgic: Fix a circular locking issue Lockdep reports a circular lock dependency between the srcu and the config_lock: [ 262.179917] -> #1 (&kvm->srcu){.+.+}-{0:0}: [ 262.182010] __synchronize_srcu+0xb0/0x224 [ 262.183422] synchronize_srcu_expedited+0x24/0x34 [ 262.184554] kvm_io_bus_register_dev+0x324/0x50c [ 262.185650] vgic_register_redist_iodev+0x254/0x398 [ 262.186740] vgic_v3_set_redist_base+0x3b0/0x724 [ 262.188087] kvm_vgic_addr+0x364/0x600 [ 262.189189] vgic_set_common_attr+0x90/0x544 [ 262.190278] vgic_v3_set_attr+0x74/0x9c [ 262.191432] kvm_device_ioctl+0x2a0/0x4e4 [ 262.192515] __arm64_sys_ioctl+0x7ac/0x1ba8 [ 262.193612] invoke_syscall.constprop.0+0x70/0x1e0 [ 262.195006] do_el0_svc+0xe4/0x2d4 [ 262.195929] el0_svc+0x44/0x8c [ 262.196917] el0t_64_sync_handler+0xf4/0x120 [ 262.198238] el0t_64_sync+0x190/0x194 [ 262.199224] [ 262.199224] -> #0 (&kvm->arch.config_lock){+.+.}-{3:3}: [ 262.201094] __lock_acquire+0x2b70/0x626c [ 262.202245] lock_acquire+0x454/0x778 [ 262.203132] __mutex_lock+0x190/0x8b4 [ 262.204023] mutex_lock_nested+0x24/0x30 [ 262.205100] vgic_mmio_write_v3_misc+0x5c/0x2a0 [ 262.206178] dispatch_mmio_write+0xd8/0x258 [ 262.207498] __kvm_io_bus_write+0x1e0/0x350 [ 262.208582] kvm_io_bus_write+0xe0/0x1cc [ 262.209653] io_mem_abort+0x2ac/0x6d8 [ 262.210569] kvm_handle_guest_abort+0x9b8/0x1f88 [ 262.211937] handle_exit+0xc4/0x39c [ 262.212971] kvm_arch_vcpu_ioctl_run+0x90c/0x1c04 [ 262.214154] kvm_vcpu_ioctl+0x450/0x12f8 [ 262.215233] __arm64_sys_ioctl+0x7ac/0x1ba8 [ 262.216402] invoke_syscall.constprop.0+0x70/0x1e0 [ 262.217774] do_el0_svc+0xe4/0x2d4 [ 262.218758] el0_svc+0x44/0x8c [ 262.219941] el0t_64_sync_handler+0xf4/0x120 [ 262.221110] el0t_64_sync+0x190/0x194 Note that the current report, which can be triggered by the vgic_irq kselftest, is a triple chain that includes slots_lock, but after inverting the slots_lock/config_lock dependency, the actual problem reported above remains. In several places, the vgic code calls kvm_io_bus_register_dev(), which synchronizes the srcu, while holding config_lock (#1). And the MMIO handler takes the config_lock while holding the srcu read lock (#0). Break dependency #1, by registering the distributor and redistributors without holding config_lock. The ITS also uses kvm_io_bus_register_dev() but already relies on slots_lock to serialize calls. The distributor iodev is created on the first KVM_RUN call. Multiple threads will race for vgic initialization, and only the first one will see !vgic_ready() under the lock. To serialize those threads, rely on slots_lock rather than config_lock. Redistributors are created earlier, through KVM_DEV_ARM_VGIC_GRP_ADDR ioctls and vCPU creation. Similarly, serialize the iodev creation with slots_lock, and the rest with config_lock. Fixes: `f003277311` ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-2-jean-philippe@linaro.org	2023-05-19 10:20:00 +01:00
Marc Zyngier	9a48c597d6	Merge branch kvm-arm64/misc-6.4 into kvmarm-master/fixes * kvm-arm64/misc-6.4: : . : Minor changes for 6.4: : : - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...) : : - FP/SVE/SME documentation update, in the hope that this field : becomes clearer... : : - Add workaround for the usual Apple SEIS brokenness : : - Random comment fixes : . KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations KVM: arm64: Clarify host SME state management KVM: arm64: Restructure check for SVE support in FP trap handler KVM: arm64: Document check for TIF_FOREIGN_FPSTATE KVM: arm64: Fix repeated words in comments KVM: arm64: Use the bitmap API to allocate bitmaps KVM: arm64: Slightly optimize flush_context() Signed-off-by: Marc Zyngier <maz@kernel.org>	2023-05-11 15:25:58 +01:00
Marc Zyngier	e910baa9c1	KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations Unsurprisingly, the M2 PRO is also affected by the SEIS bug, so add it to the naughty list. And since M2 MAX is likely to be of the same ilk, flag it as well. Tested on a M2 PRO mini machine. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20230501182141.39770-1-maz@kernel.org	2023-05-11 15:17:02 +01:00
Marc Zyngier	b22498c484	Merge branch kvm-arm64/timer-vm-offsets into kvmarm-master/next * kvm-arm64/timer-vm-offsets: (21 commits) : . : This series aims at satisfying multiple goals: : : - allow a VMM to atomically restore a timer offset for a whole VM : instead of updating the offset each time a vcpu get its counter : written : : - allow a VMM to save/restore the physical timer context, something : that we cannot do at the moment due to the lack of offsetting : : - provide a framework that is suitable for NV support, where we get : both global and per timer, per vcpu offsetting, and manage : interrupts in a less braindead way. : : Conflict resolution involves using the new per-vcpu config lock instead : of the home-grown timer lock. : . KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: selftests: Augment existing timer test to handle variable offset KVM: arm64: selftests: Deal with spurious timer interrupts KVM: arm64: selftests: Add physical timer registers to the sysreg list KVM: arm64: nv: timers: Support hyp timer emulation KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co KVM: arm64: timers: Abstract the number of valid timers per vcpu KVM: arm64: timers: Fast-track CNTPCT_EL0 trap handling KVM: arm64: Elide kern_hyp_va() in VHE-specific parts of the hypervisor KVM: arm64: timers: Move the timer IRQs into arch_timer_vm_data KVM: arm64: timers: Abstract per-timer IRQ access KVM: arm64: timers: Rationalise per-vcpu timer init KVM: arm64: timers: Allow save/restoring of the physical timer KVM: arm64: timers: Allow userspace to set the global counter offset KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVM KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2 KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer arm64: Add HAS_ECV_CNTPOFF capability arm64: Add CNTPOFF_EL2 register definition ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2023-04-21 09:36:40 +01:00
Oliver Upton	49e5d16b6f	KVM: arm64: vgic: Don't acquire its_lock before config_lock commit `f003277311` ("KVM: arm64: Use config_lock to protect vgic state") was meant to rectify a longstanding lock ordering issue in KVM where the kvm->lock is taken while holding vcpu->mutex. As it so happens, the aforementioned commit introduced yet another locking issue by acquiring the its_lock before acquiring the config lock. This is obviously wrong, especially considering that the lock ordering is well documented in vgic.c. Reshuffle the locks once more to take the config_lock before the its_lock. While at it, sprinkle in the lockdep hinting that has become popular as of late to keep lockdep apprised of our ordering. Cc: stable@vger.kernel.org Fixes: `f003277311` ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230412062733.988229-1-oliver.upton@linux.dev	2023-04-12 13:50:18 +01:00
Marc Zyngier	81dc9504a7	KVM: arm64: nv: timers: Support hyp timer emulation Emulating EL2 also means emulating the EL2 timers. To do so, we expand our timer framework to deal with at most 4 timers. At any given time, two timers are using the HW timers, and the two others are purely emulated. The role of deciding which is which at any given time is left to a mapping function which is called every time we need to make such a decision. Reviewed-by: Colton Lewis <coltonlewis@google.com> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-18-maz@kernel.org	2023-03-30 19:01:10 +01:00
Marc Zyngier	96906a9150	KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVM Being able to lock/unlock all vcpus in one go is a feature that only the vgic has enjoyed so far. Let's be brave and expose it to the world. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-7-maz@kernel.org	2023-03-30 19:01:09 +01:00
Oliver Upton	f003277311	KVM: arm64: Use config_lock to protect vgic state Almost all of the vgic state is VM-scoped but accessed from the context of a vCPU. These accesses were serialized on the kvm->lock which cannot be nested within a vcpu->mutex critical section. Move over the vgic state to using the config_lock. Tweak the lock ordering where necessary to ensure that the config_lock is acquired after the vcpu->mutex. Acquire the config_lock in kvm_vgic_create() to avoid a race between the converted flows and GIC creation. Where necessary, continue to acquire kvm->lock to avoid a race with vCPU creation (i.e. flows that use lock_all_vcpus()). Finally, promote the locking expectations in comments to lockdep assertions and update the locking documentation for the config_lock as well as vcpu->mutex. Cc: stable@vger.kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-5-oliver.upton@linux.dev	2023-03-29 14:08:31 +01:00
Paolo Bonzini	4090871d77	KVM/arm64 updates for 6.3 - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place. - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company). - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests. - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM. - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested. - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems. - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor. - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host. - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add myself as co-maintainer This also drags in a couple of branches to avoid conflicts: - The shared 'kvm-hw-enable-refactor' branch that reworks initialization, as it conflicted with the virtual cache topology changes. - arm64's 'for-next/sme2' branch, as the PSCI relay changes, as both touched the EL2 initialization code. -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmPw29cPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpD9doQAIJyMW0odT6JBe15uGCxTuTnJbb8mniajJdX CuSxPl85WyKLtZbIJLRTQgyt6Nzbu0N38zM0y/qBZT5BvAnWYI8etvnJhYZjooAy jrf0Me/GM5hnORXN+1dByCmlV+DSuBkax86tgIC7HhU71a2SWpjlmWQi/mYvQmIK PBAqpFF+w2cWHi0ZvCq96c5EXBdN4FLEA5cdZhekCbgw1oX8+x+HxdpBuGW5lTEr 9oWOzOzJQC1uFnjP3unFuIaG94QIo+NA4aGLMzfb7wm2wdQUnKebtdj/RxsDZOKe 43Q1+MDFWMsxxFu4FULH8fPMwidIm5rfz3pw3JJloqaZp8vk/vjDLID7AYucMIX8 1G/mjqz6E9lYvv57WBmBhT/+apSDAmeHlAT97piH73Nemga91esDKuHSdtA8uB5j mmzcUYajuB2GH9rsaXJhVKt/HW7l9fbGliCkI99ckq/oOTO9VsKLsnwS/rMRIsPn y2Y8Lyoe4eqokd1DNn5/bo+3qDnfmzm6iDmZOo+JYuJv9KS95zuw17Wu7la9UAPV e13+btoijHDvu8RnTecuXljWfAAKVtEjpEIoS5aP2R2iDvhr0d8POlMPaJ40YuRq D2fKr18b6ngt+aI0TY63/ksEIFexx67HuwQsUZ2lRjyjq5/x+u3YIqUPbKrU4Rnl uxXjSvyr =r4s/ -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.3 - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place. - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company). - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests. - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM. - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested. - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems. - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor. - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host. - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer This also drags in arm64's 'for-next/sme2' branch, because both it and the PSCI relay changes touch the EL2 initialization code.	2023-02-20 06:12:42 -05:00
Paolo Bonzini	33436335e9	KVM/riscv changes for 6.3 - Fix wrong usage of PGDIR_SIZE to check page sizes - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect() - Redirect illegal instruction traps to guest - SBI PMU support for guest -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmPifFIACgkQrUjsVaLH LAcEyxAAinMBaBhiPmwWZQvcCzh/UFmJo8BQCwAPuwoc/a4ZGAR7ylzd0oJilP8M wSgX6Ad8XF+CEW2VpxW9nwyi41N25ep1Lrf8vOaWy9L9QNUo0t15WrCIbXT2p399 HrK9fz7HHKKIMsJy+rYb9EepdmMf55xtr1Y/EjyvhoDQbrEMlKsAODYz/SUoriQG Tn3cCYBzLdvzDzu0xXM9v+nsetWXdajK/v4je+mE3NQceXhePAO4oVWP4IpnoROd ZQm3evvVdf0WtKG9curxwMB7jjBqDBFrcLYl0qHGa7pi2o5PzVM7esgaV47KwetH IgA/Mrf1IfzpgM7VYDDax5wUHlKj63KisqU0J8rU3PUloQXaWqv7+ho51t9GzZ/i 9x4uyO/evVntgyTw6HCbqmQJDgEtJiG1ydrR/ydBMYHLnh7LPY2UpKgcqmirtbkK 1/DYDp84vikQ5VW1hc8IACdoBShh9Moh4xsEStzkTrIeHcZCjtORXUh8UIPZ0Mu2 7Mnkktu9I55SLwA3rwH/EYT1ISrOV1G+q3wfqgeLpn8YUWwCIiqWQ5Ur0/WSMJse uJ3HedZDzj9T4n4khX+mKEYh6joAafQZag+4TID2lRSwd0S/mpeC22hYrViMdDmq yhE+JNin/sz4AVaHNzGwfqk2NC2RFl9aRn2X0xTwyBubif9pKMQ= =spUL -----END PGP SIGNATURE----- Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv changes for 6.3 - Fix wrong usage of PGDIR_SIZE to check page sizes - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect() - Redirect illegal instruction traps to guest - SBI PMU support for guest	2023-02-15 12:33:28 -05:00
Oliver Upton	022d3f0800	Merge branch kvm-arm64/misc into kvmarm/next * kvm-arm64/misc: : Miscellaneous updates : : - Convert CPACR_EL1_TTA to the new, generated system register : definitions. : : - Serialize toggling CPACR_EL1.SMEN to avoid unexpected exceptions when : accessing SVCR in the host. : : - Avoid quiescing the guest if a vCPU accesses its own redistributor's : SGIs/PPIs, eliminating the need to IPI. Largely an optimization for : nested virtualization, as the L1 accesses the affected registers : rather often. : : - Conversion to kstrtobool() : : - Common definition of INVALID_GPA across architectures : : - Enable CONFIG_USERFAULTFD for CI runs of KVM selftests KVM: arm64: Fix non-kerneldoc comments KVM: selftests: Enable USERFAULTFD KVM: selftests: Remove redundant setbuf() arm64/sysreg: clean up some inconsistent indenting KVM: MMU: Make the definition of 'INVALID_GPA' common KVM: arm64: vgic-v3: Use kstrtobool() instead of strtobool() KVM: arm64: vgic-v3: Limit IPI-ing when accessing GICR_{C,S}ACTIVER0 KVM: arm64: Synchronize SMEN on vcpu schedule out KVM: arm64: Kill CPACR_EL1_TTA definition Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-02-13 23:33:25 +00:00
Oliver Upton	e4f7417e96	Merge branch kvm-arm64/apple-vgic-mi into kvmarm/next * kvm-arm64/apple-vgic-mi: : VGIC maintenance interrupt support for the AIC, courtesy of Marc Zyngier. : : The AIC provides a non-maskable VGIC maintenance interrupt, which until : now was not supported by KVM. This series (1) allows the registration of : a non-maskable maintenance interrupt and (2) wires in support for this : with the AIC driver. irqchip/apple-aic: Correctly map the vgic maintenance interrupt irqchip/apple-aic: Register vgic maintenance interrupt with KVM KVM: arm64: vgic: Allow registration of a non-maskable maintenance interrupt Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-02-13 23:31:23 +00:00
Oliver Upton	92425e058a	Merge branch kvm/kvm-hw-enable-refactor into kvmarm/next Merge the kvm_init() + hardware enable rework to avoid conflicts with kvmarm. Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-02-13 22:28:34 +00:00
Gavin Shan	6028acbe3a	KVM: arm64: Allow no running vcpu on saving vgic3 pending table We don't have a running VCPU context to save vgic3 pending table due to KVM_DEV_ARM_VGIC_{GRP_CTRL, SAVE_PENDING_TABLES} command on KVM device "kvm-arm-vgic-v3". The unknown case is caught by kvm-unit-tests. # ./kvm-unit-tests/tests/its-pending-migration WARNING: CPU: 120 PID: 7973 at arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3325 \ mark_page_dirty_in_slot+0x60/0xe0 : mark_page_dirty_in_slot+0x60/0xe0 __kvm_write_guest_page+0xcc/0x100 kvm_write_guest+0x7c/0xb0 vgic_v3_save_pending_tables+0x148/0x2a0 vgic_set_common_attr+0x158/0x240 vgic_v3_set_attr+0x4c/0x5c kvm_device_ioctl+0x100/0x160 __arm64_sys_ioctl+0xa8/0xf0 invoke_syscall.constprop.0+0x7c/0xd0 el0_svc_common.constprop.0+0x144/0x160 do_el0_svc+0x34/0x60 el0_svc+0x3c/0x1a0 el0t_64_sync_handler+0xb4/0x130 el0t_64_sync+0x178/0x17c Use vgic_write_guest_lock() to save vgic3 pending table. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230126235451.469087-5-gshan@redhat.com	2023-01-29 18:46:11 +00:00
Gavin Shan	2f8b1ad222	KVM: arm64: Allow no running vcpu on restoring vgic3 LPI pending status We don't have a running VCPU context to restore vgic3 LPI pending status due to command KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_RESTORE_TABLES} on KVM device "kvm-arm-vgic-its". Use vgic_write_guest_lock() to restore vgic3 LPI pending status. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230126235451.469087-4-gshan@redhat.com	2023-01-29 18:46:11 +00:00
Gavin Shan	a23eaf9368	KVM: arm64: Add helper vgic_write_guest_lock() Currently, the unknown no-running-vcpu sites are reported when a dirty page is tracked by mark_page_dirty_in_slot(). Until now, the only known no-running-vcpu site is saving vgic/its tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on KVM device "kvm-arm-vgic-its". Unfortunately, there are more unknown sites to be handled and no-running-vcpu context will be allowed in these sites: (1) KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_RESTORE_TABLES} command on KVM device "kvm-arm-vgic-its" to restore vgic/its tables. The vgic3 LPI pending status could be restored. (2) Save vgic3 pending table through KVM_DEV_ARM_{VGIC_GRP_CTRL, VGIC_SAVE_PENDING_TABLES} command on KVM device "kvm-arm-vgic-v3". In order to handle those unknown cases, we need a unified helper vgic_write_guest_lock(). struct vgic_dist::save_its_tables_in_progress is also renamed to struct vgic_dist::save_tables_in_progress. No functional change intended. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230126235451.469087-3-gshan@redhat.com	2023-01-29 18:46:11 +00:00
Paolo Bonzini	dc7c31e922	Merge branch 'kvm-v6.2-rc4-fixes' into HEAD ARM: * Fix the PMCR_EL0 reset value after the PMU rework * Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots * Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot * Put the Apple M2 on the naughty list for not being able to correctly implement the vgic SEIS feature, just like the M1 before it * Reviewer updates: Alex is stepping down, replaced by Zenghui x86: * Fix various rare locking issues in Xen emulation and teach lockdep to detect them * Documentation improvements * Do not return host topology information from KVM_GET_SUPPORTED_CPUID	2023-01-24 06:05:23 -05:00
Marc Zyngier	ef3691683d	KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation To save the vgic LPI pending state with GICv4.1, the VPEs must all be unmapped from the ITSs so that the sGIC caches can be flushed. The opposite is done once the state is saved. This is all done by using the activate/deactivate irqdomain callbacks directly from the vgic code. Crutially, this is done without holding the irqdesc lock for the interrupts that represent the VPE. And these callbacks are changing the state of the irqdesc. What could possibly go wrong? If a doorbell fires while we are messing with the irqdesc state, it will acquire the lock and change the interrupt state concurrently. Since we don't hole the lock, curruption occurs in on the interrupt state. Oh well. While acquiring the lock would fix this (and this was Shanker's initial approach), this is still a layering violation we could do without. A better approach is actually to free the VPE interrupt, do what we have to do, and re-request it. It is more work, but this usually happens only once in the lifetime of the VM and we don't really care about this sort of overhead. Fixes: `f66b7b151e` ("KVM: arm64: GICv4.1: Try to save VLPI state in save_pending_tables") Reported-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230118022348.4137094-1-sdonthineni@nvidia.com	2023-01-21 11:02:19 +00:00
Christophe JAILLET	016cbbd2ba	KVM: arm64: vgic-v3: Use kstrtobool() instead of strtobool() strtobool() is the same as kstrtobool(). However, the latter is more used within the kernel. In order to remove strtobool() and slightly simplify kstrtox.h, switch to the other function name. While at it, include the corresponding header file (<linux/kstrtox.h>) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/f546e636c6d2bbcc0d8c4191ab98ce892fce4584.1673702763.git.christophe.jaillet@wanadoo.fr Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-01-17 20:53:41 +00:00
Marc Zyngier	fd2b165ce2	KVM: arm64: vgic-v3: Limit IPI-ing when accessing GICR_{C,S}ACTIVER0 When a vcpu is accessing its own redistributor's SGIs/PPIs, there is no point in doing a stop-the-world operation. Instead, we can just let the access occur as we do with GICv2. This is a very minor optimisation for a non-nesting guest, but a potentially major one for a nesting L1 hypervisor which is likely to access the emulated registers pretty often (on each vcpu switch, at the very least). Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230112154840.1808595-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-01-12 21:18:08 +00:00
Marc Zyngier	43c5c868bd	KVM: arm64: vgic: Allow registration of a non-maskable maintenance interrupt Our Apple M1/M2 friends do have a per-CPU maintenance interrupt, but no mask to make use of it in the standard Linux framework. Given that KVM directly drives the source of the interrupt and leaves the GIC interrupt always enabled, there is no harm in tolerating such a setup. It will become useful once we enable NV on M2 HW. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230103095022.3230946-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-01-12 21:13:27 +00:00
Marc Zyngier	decb17aeb8	KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations I really hoped that Apple had fixed their not-quite-a-vgic implementation when moving from M1 to M2. Alas, it seems they didn't, and running a buggy EFI version results in the vgic generating SErrors outside of the guest and taking the host down. Apply the same workaround as for M1. Yes, this is all a bit crap. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230103095022.3230946-2-maz@kernel.org	2023-01-05 15:25:19 +00:00
Marc Zyngier	466d27e48d	KVM: arm64: Simplify the CPUHP logic For a number of historical reasons, the KVM/arm64 hotplug setup is pretty complicated, and we have two extra CPUHP notifiers for vGIC and timers. It looks pretty pointless, and gets in the way of further changes. So let's just expose some helpers that can be called from the core CPUHP callback, and get rid of everything else. This gives us the opportunity to drop a useless notifier entry, as well as tidy-up the timer enable/disable, which was a bit odd. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:04 -05:00
Gavin Shan	9cb1096f85	KVM: arm64: Enable ring-based dirty memory tracking Enable ring-based dirty memory tracking on ARM64: - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL. - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP. - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page offset. - Add ARM64 specific kvm_arch_allow_write_without_running_vcpu() to keep the site of saving vgic/its tables out of the no-running-vcpu radar. Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110104914.31280-5-gshan@redhat.com	2022-11-10 13:11:58 +00:00
Eric Ren	c000a26071	KVM: arm64: vgic: Fix exit condition in scan_its_table() With some PCIe topologies, restoring a guest fails while parsing the ITS device tables. Reproducer hints: 1. Create ARM virt VM with pxb-pcie bus which adds extra host bridges, with qemu command like: ``` -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.x \ ... -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.y \ ... ``` 2. Ensure the guest uses 2-level device table 3. Perform VM migration which calls save/restore device tables In that setup, we get a big "offset" between 2 device_ids, which makes unsigned "len" round up a big positive number, causing the scan loop to continue with a bad GPA. For example: 1. L1 table has 2 entries; 2. and we are now scanning at L2 table entry index 2075 (pointed to by L1 first entry) 3. if next device id is 9472, we will get a big offset: 7397; 4. with unsigned 'len', 'len -= offset * esz', len will underflow to a positive number, mistakenly into next iteration with a bad GPA; (It should break out of the current L2 table scanning, and jump into the next L1 table entry) 5. that bad GPA fails the guest read. Fix it by stopping the L2 table scan when the next device id is outside of the current table, allowing the scan to continue from the next L1 table entry. Thanks to Eric Auger for the fix suggestion. Fixes: `920a7a8fa9` ("KVM: arm64: vgic-its: Add infrastructure for tableookup") Suggested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Eric Ren <renzhengeek@gmail.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/d9c3a564af9e2c5bf63f48a7dcbf08cd593c5c0b.1665802985.git.renzhengeek@gmail.com	2022-10-15 12:10:54 +01:00
Gavin Shan	096560dd13	KVM: arm64: vgic: Remove duplicate check in update_affinity_collection() The 'coll' parameter to update_affinity_collection() is never NULL, so comparing it with 'ite->collection' is enough to cover both the NULL case and the "another collection" case. Remove the duplicate check in update_affinity_collection(). Signed-off-by: Gavin Shan <gshan@redhat.com> [maz: repainted commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220923065447.323445-1-gshan@redhat.com	2022-09-26 10:46:37 +01:00
Marc Zyngier	619064afa9	KVM: arm64: vgic: Tidy-up calls to vgic_{get,set}_common_attr() The userspace accessors have an early call to vgic_{get,set}_common_attr() that makes the code hard to follow. Move it to the default: clause of the decoding switch statement, which results in a nice cleanup. This requires us to move the handling of the pending table into the common handling, even if it is strictly a GICv3 feature (it has the benefit of keeping the whole control group handling in the same function). Also cleanup vgic_v3_{get,set}_attr() while we're at it, deduplicating the calls to vgic_v3_attr_regs_access(). Suggested-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-07-17 11:55:33 +01:00
Marc Zyngier	4b85080f4e	KVM: arm64: vgic: Consolidate userspace access for base address setting Align kvm_vgic_addr() with the rest of the code by moving the userspace accesses into it. kvm_vgic_addr() is also made static. Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-07-17 11:55:33 +01:00
Marc Zyngier	9f968c9266	KVM: arm64: vgic-v2: Add helper for legacy dist/cpuif base address setting We carry a legacy interface to set the base addresses for GICv2. As this is currently plumbed into the same handling code as the modern interface, it limits the evolution we can make there. Add a helper dedicated to this handling, with a view of maybe removing this in the future. Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-07-17 11:55:33 +01:00
Marc Zyngier	d7df6f282d	KVM: arm64: vgic: Use {get,put}_user() instead of copy_{from.to}_user Tidy-up vgic_get_common_attr() and vgic_set_common_attr() to use {get,put}_user() instead of the more complex (and less type-safe) copy_{from,to}_user(). Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-07-17 11:55:33 +01:00
Marc Zyngier	7e9f723c2a	KVM: arm64: vgic-v2: Consolidate userspace access for MMIO registers Align the GICv2 MMIO accesses from userspace with the way the GICv3 code is now structured. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-07-17 11:55:33 +01:00
Marc Zyngier	e1246f3f2d	KVM: arm64: vgic-v3: Consolidate userspace access for MMIO registers For userspace accesses to GICv3 MMIO registers (and related data), vgic_v3_{get,set}_attr are littered with {get,put}_user() calls, making it hard to audit and reason about. Consolidate all userspace accesses in vgic_v3_attr_regs_access(), making the code far simpler to audit. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-07-17 11:55:33 +01:00
Marc Zyngier	38cf0bb762	KVM: arm64: vgic-v3: Use u32 to manage the line level from userspace Despite the userspace ABI clearly defining the bits dealt with by KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO as a __u32, the kernel uses a u64. Use a u32 to match the userspace ABI, which will subsequently lead to some simplifications. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-07-17 11:55:33 +01:00
Marc Zyngier	db25081e14	KVM: arm64: vgic-v3: Push user access into vgic_v3_cpu_sysregs_uaccess() In order to start making the vgic sysreg access from userspace similar to all the other sysregs, push the userspace memory access one level down into vgic_v3_cpu_sysregs_uaccess(). The next step will be to rely on the sysreg infrastructure to perform this task. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-07-17 11:55:33 +01:00
Marc Zyngier	b61fc0857a	KVM: arm64: vgic-v3: Simplify vgic_v3_has_cpu_sysregs_attr() Finding out whether a sysreg exists has little to do with that register being accessed, so drop the is_write parameter. Also, the reg pointer is completely unused, and we're better off just passing the attr pointer to the function. This result in a small cleanup of the calling site, with a new helper converting the vGIC view of a sysreg into the canonical one (this is purely cosmetic, as the encoding is the same). Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-07-17 11:55:33 +01:00
Marc Zyngier	98432ccdec	KVM: arm64: Replace vgic_v3_uaccess_read_pending with vgic_uaccess_read_pending Now that GICv2 has a proper userspace accessor for the pending state, switch GICv3 over to it, dropping the local version, moving over the specific behaviours that CGIv3 requires (such as the distinction between pending latch and line level which were never enforced with GICv2). We also gain extra locking that isn't really necessary for userspace, but that's a small price to pay for getting rid of superfluous code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220607131427.1164881-3-maz@kernel.org	2022-06-08 10:16:15 +01:00
Marc Zyngier	2cdea19a34	KVM: arm64: Don't read a HW interrupt pending state in user context Since `5bfa685e62` ("KVM: arm64: vgic: Read HW interrupt pending state from the HW"), we're able to source the pending bit for an interrupt that is stored either on the physical distributor or on a device. However, this state is only available when the vcpu is loaded, and is not intended to be accessed from userspace. Unfortunately, the GICv2 emulation doesn't provide specific userspace accessors, and we fallback with the ones that are intended for the guest, with fatal consequences. Add a new vgic_uaccess_read_pending() accessor for userspace to use, build on top of the existing vgic_mmio_read_pending(). Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: `5bfa685e62` ("KVM: arm64: vgic: Read HW interrupt pending state from the HW") Link: https://lore.kernel.org/r/20220607131427.1164881-2-maz@kernel.org Cc: stable@vger.kernel.org	2022-06-07 16:28:19 +01:00
Paolo Bonzini	47e8eec832	KVM/arm64 updates for 5.19 - Add support for the ARMv8.6 WFxT extension - Guard pages for the EL2 stacks - Trap and emulate AArch32 ID registers to hide unsupported features - Ability to select and save/restore the set of hypercalls exposed to the guest - Support for PSCI-initiated suspend in collaboration with userspace - GICv3 register-based LPI invalidation support - Move host PMU event merging into the vcpu data structure - GICv3 ITS save/restore fixes - The usual set of small-scale cleanups and fixes -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmKGAGsPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDB/gQAMhyZ+wCG0OMEZhwFF6iDfxVEX2Kw8L41NtD a/e6LDWuIOGihItpRkYROc5myG74D7XckF2Bz3G7HJoU4vhwHOV/XulE26GFizoC O1GVRekeSUY81wgS1yfo0jojLupBkTjiq3SjTHoDP7GmCM0qDPBtA0QlMRzd2bMs Kx0+UUXZUHFSTXc7Lp4vqNH+tMp7se+yRx7hxm6PCM5zG+XYJjLxnsZ0qpchObgU 7f6YFojsLUs1SexgiUqJ1RChVQ+FkgICh5HyzORvGtHNNzK6D2sIbsW6nqMGAMql Kr3A5O/VOkCztSYnLxaa76/HqD21mvUrXvr3grhabNc7rOmuzWV0dDgr6c6wHKHb uNCtH4d7Ra06gUrEOrfsgLOLn0Zqik89y6aIlMsnTudMg9gMNgFHy1jz4LM7vMkY FS5AVj059heg2uJcfgTvzzcqneyuBLBmF3dS4coowO6oaj8SycpaEmP5e89zkPMI 1kk8d0e6RmXuCh/2AJ8GxxnKvBPgqp2mMKXOCJ8j4AmHEDX/CKpEBBqIWLKkplUU 8DGiOWJUtRZJg398dUeIpiVLoXJthMODjAnkKkuhiFcQbXomlwgg7YSnNAz6TRED Z7KR2leC247kapHnnagf02q2wED8pBeyrxbQPNdrHtSJ9Usm4nTkY443HgVTJW3s aTwPZAQ7 =mh7W -----END PGP SIGNATURE----- Merge tag 'kvmarm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.19 - Add support for the ARMv8.6 WFxT extension - Guard pages for the EL2 stacks - Trap and emulate AArch32 ID registers to hide unsupported features - Ability to select and save/restore the set of hypercalls exposed to the guest - Support for PSCI-initiated suspend in collaboration with userspace - GICv3 register-based LPI invalidation support - Move host PMU event merging into the vcpu data structure - GICv3 ITS save/restore fixes - The usual set of small-scale cleanups and fixes [Due to the conflict, KVM_SYSTEM_EVENT_SEV_TERM is relocated from 4 to 6. - Paolo]	2022-05-25 05:09:23 -04:00
Marc Zyngier	5c0ad551e9	Merge branch kvm-arm64/its-save-restore-fixes-5.19 into kvmarm-master/next * kvm-arm64/its-save-restore-fixes-5.19: : . : Tighten the ITS save/restore infrastructure to fail early rather : than late. Patches courtesy of Rocardo Koller. : . KVM: arm64: vgic: Undo work in failed ITS restores KVM: arm64: vgic: Do not ignore vgic_its_restore_cte failures KVM: arm64: vgic: Add more checks when restoring ITS tables KVM: arm64: vgic: Check that new ITEs could be saved in guest memory Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-05-16 17:48:36 +01:00
Marc Zyngier	822ca7f82b	Merge branch kvm-arm64/misc-5.19 into kvmarm-master/next * kvm-arm64/misc-5.19: : . : Misc fixes and general improvements for KVMM/arm64: : : - Better handle out of sequence sysregs in the global tables : : - Remove a couple of unnecessary loads from constant pool : : - Drop unnecessary pKVM checks : : - Add all known M1 implementations to the SEIS workaround : : - Cleanup kerneldoc warnings : . KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaround KVM: arm64: pkvm: Don't mask already zeroed FEAT_SVE KVM: arm64: pkvm: Drop unnecessary FP/SIMD trap handler KVM: arm64: nvhe: Eliminate kernel-doc warnings KVM: arm64: Avoid unnecessary absolute addressing via literals KVM: arm64: Print emulated register table name when it is unsorted KVM: arm64: Don't BUG_ON() if emulated register table is unsorted Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-05-16 17:48:36 +01:00
Ricardo Koller	8c5e74c90b	KVM: arm64: vgic: Undo work in failed ITS restores Failed ITS restores should clean up all state restored until the failure. There is some cleanup already present when failing to restore some tables, but it's not complete. Add the missing cleanup. Note that this changes the behavior in case of a failed restore of the device tables. restore ioctl: 1. restore collection tables 2. restore device tables With this commit, failures in 2. clean up everything created so far, including state created by 1. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220510001633.552496-5-ricarkol@google.com	2022-05-16 13:58:04 +01:00
Ricardo Koller	a1ccfd6f6e	KVM: arm64: vgic: Do not ignore vgic_its_restore_cte failures Restoring a corrupted collection entry (like an out of range ID) is being ignored and treated as success. More specifically, a vgic_its_restore_cte failure is treated as success by vgic_its_restore_collection_table. vgic_its_restore_cte uses positive and negative numbers to return error, and +1 to return success. The caller then uses "ret > 0" to check for success. Fix this by having vgic_its_restore_cte only return negative numbers on error. Do this by changing alloc_collection return codes to only return negative numbers on error. Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220510001633.552496-4-ricarkol@google.com	2022-05-16 13:58:04 +01:00
Ricardo Koller	243b1f6c8f	KVM: arm64: vgic: Add more checks when restoring ITS tables Try to improve the predictability of ITS save/restores (and debuggability of failed ITS saves) by failing early on restore when trying to read corrupted tables. Restoring the ITS tables does some checks for corrupted tables, but not as many as in a save: an overflowing device ID will be detected on save but not on restore. The consequence is that restoring a corrupted table won't be detected until the next save; including the ITS not working as expected after the restore. As an example, if the guest sets tables overlapping each other, which would most likely result in some corrupted table, this is what we would see from the host point of view: guest sets base addresses that overlap each other save ioctl restore ioctl save ioctl (fails) Ideally, we would like the first save to fail, but overlapping tables could actually be intended by the guest. So, let's at least fail on the restore with some checks: like checking that device and event IDs don't overflow their tables. Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220510001633.552496-3-ricarkol@google.com	2022-05-16 13:58:04 +01:00
Ricardo Koller	cafe7e544d	KVM: arm64: vgic: Check that new ITEs could be saved in guest memory Try to improve the predictability of ITS save/restores by failing commands that would lead to failed saves. More specifically, fail any command that adds an entry into an ITS table that is not in guest memory, which would otherwise lead to a failed ITS save ioctl. There are already checks for collection and device entries, but not for ITEs. Add the corresponding check for the ITT when adding ITEs. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220510001633.552496-2-ricarkol@google.com	2022-05-16 13:58:04 +01:00
Marc Zyngier	cae889302e	KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaround Unsusprisingly, Apple M1 Pro/Max have the exact same defect as the original M1 and generate random SErrors in the host when a guest tickles the GICv3 CPU interface the wrong way. Add the part numbers for both the CPU types found in these two new implementations, and add them to the hall of shame. This also applies to the Ultra version, as it is composed of 2 Max SoCs. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220514102524.3188730-1-maz@kernel.org	2022-05-15 11:18:50 +01:00
Marc Zyngier	49a1a2c70a	KVM: arm64: vgic-v3: Advertise GICR_CTLR.{IR, CES} as a new GICD_IIDR revision Since adversising GICR_CTLR.{IC,CES} is directly observable from a guest, we need to make it selectable from userspace. For that, bump the default GICD_IIDR revision and let userspace downgrade it to the previous default. For GICv2, the two distributor revisions are strictly equivalent. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220405182327.205520-5-maz@kernel.org	2022-05-04 14:09:53 +01:00
Marc Zyngier	4645d11f4a	KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation Since GICv4.1, it has become legal for an implementation to advertise GICR_{INVLPIR,INVALLR,SYNCR} while having an ITS, allowing for a more efficient invalidation scheme (no guest command queue contention when multiple CPUs are generating invalidations). Provide the invalidation registers as a primitive to their ITS counterpart. Note that we don't advertise them to the guest yet (the architecture allows an implementation to do this). Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oupton@google.com> Link: https://lore.kernel.org/r/20220405182327.205520-4-maz@kernel.org	2022-05-04 14:09:53 +01:00
Marc Zyngier	94828468a6	KVM: arm64: vgic-v3: Expose GICR_CTLR.RWP when disabling LPIs When disabling LPIs, a guest needs to poll GICR_CTLR.RWP in order to be sure that the write has taken effect. We so far reported it as 0, as we didn't advertise that LPIs could be turned off the first place. Start tracking this state during which LPIs are being disabled, and expose the 'in progress' state via the RWP bit. We also take this opportunity to disallow enabling LPIs and programming GICR_{PEND,PROP}BASER while LPI disabling is in progress, as allowed by the architecture (UNPRED behaviour). We don't advertise the feature to the guest yet (which is allowed by the architecture). Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220405182327.205520-3-maz@kernel.org	2022-05-04 14:09:53 +01:00
Sean Christopherson	f502cc568d	KVM: Add max_vcpus field in common 'struct kvm' For TDX guests, the maximum number of vcpus needs to be specified when the TDX guest VM is initialized (creating the TDX data corresponding to TDX guest) before creating vcpu. It needs to record the maximum number of vcpus on VM creation (KVM_CREATE_VM) and return error if the number of vcpus exceeds it Because there is already max_vcpu member in arm64 struct kvm_arch, move it to common struct kvm and initialize it to KVM_MAX_VCPUS before kvm_arch_init_vm() instead of adding it to x86 struct kvm_arch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-Id: <e53234cdee6a92357d06c80c03d77c19cdefb804.1646422845.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-02 11:42:42 -04:00
Yu Zhe	c707663e81	KVM: arm64: vgic: Remove unnecessary type castings Remove unnecessary casts. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220329102059.268983-1-yuzhe@nfschina.com	2022-04-06 10:42:55 +01:00
Paolo Bonzini	714797c98e	KVM/arm64 updates for 5.18 - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmI0lrQPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDy0YQAIX2bWcPFMqHqn3CAYhTSTiOK5s+OWx9im5f 5yTPRj+SJ88SWv030r8a5dxWh2dEK2IetM9KifZ0dvmcCs8lYW/9/IUkHYY9lAYJ 9VLH4iPgs9dOD9wtfovfb+vcM8bso9Ndi3aCFJUj+bcNwYU3kBIJ+8AxA5DZoLty 5LPF38eoxrSEv9N0VwqvhGxdgqDp8Zahykr693r+8Wd3Rj6yRoqoEvqWhHdVWlWJ 3quRNkYN4LzjN3x1T9CLaZUqMofbUjfYCAvbZorALJy6In1FfgoyocFe6/JvsmzZ xOlrWWbJz/1NNI6Hoy5aZtQavTFrHu4XbCkjBDL7RhRxj636KWelVoXAbV05XX2r hQYMnN0bwlnAljTefguIZ7frnQyjg5OV8GMu3CTIPMqu//fA+61z+bXoyVy6pzaV gcXHtDgIdiRaT6BJiHST8ctxZWDTr2GUgTGfdlCde7hgmJ7DjManLXvgYx101/Nz VfvKzz3oSvVTelNa/6ZWxuUlwvly0eKONSkwjp0uq5TZ9G8NLaKitA8nKDSkoegx 41iIUEztivuu9KQvQkl8wdcCPwEk8K2sOTH7ikINS/wJ0khiUztndxCAlEPbQo50 567OiSaj5+vqFPZsxWBVTIbmkdBVKCzrG+4B1H4didMb1Q1n2lHhgj1keHTmZyVP jlFofZxf =J1mn -----END PGP SIGNATURE----- Merge tag 'kvmarm-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.18 - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes	2022-03-18 12:43:24 -04:00
Julia Lawall	21ea457842	KVM: arm64: fix typos in comments Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220318103729.157574-24-Julia.Lawall@inria.fr	2022-03-18 14:04:15 +00:00
Marc Zyngier	5bfa685e62	KVM: arm64: vgic: Read HW interrupt pending state from the HW It appears that a read access to GIC[DR]_I[CS]PENDRn doesn't always result in the pending interrupts being accurately reported if they are mapped to a HW interrupt. This is particularily visible when acking the timer interrupt and reading the GICR_ISPENDR1 register immediately after, for example (the interrupt appears as not-pending while it really is...). This is because a HW interrupt has its 'active and pending state' kept in the physical distributor, and not in the virtual one, as mandated by the spec (this is what allows the direct deactivation). The virtual distributor only caries the pending and active states (note the plural, as these are two independent and non-overlapping states). Fix it by reading the HW state back, either from the timer itself or from the distributor if necessary. Reported-by: Ricardo Koller <ricarkol@google.com> Tested-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220208123726.3604198-1-maz@kernel.org	2022-02-11 11:01:12 +00:00
Paolo Bonzini	17179d0068	KVM/arm64 fixes for 5.17, take #1 - Correctly update the shadow register on exception injection when running in nVHE mode - Correctly use the mm_ops indirection when performing cache invalidation from the page-table walker - Restrict the vgic-v3 workaround for SEIS to the two known broken implementations -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmHzv5EPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpD0DcQAMF0hcKYxwuXi+UwQ8u5SsrpQQZ1BWC6euvB FFQiUPANXq/u0xM2kV+5FhjEfHqqjnh7nYLVKpBcetcvGSfWUnZlVI4DKI+5pdte PTa/minS5sq9BDZ/clRnnomNw0UwtH2OLeolg7+UAqBMihicddVBBU6IqvY1Nx+z F2qovZa3Qqb1EB+9+hPS+qGcjlguaBOEzrJ9uIaw532G1JD1K9hhMlabdhJhiJA3 gWuUJO+cuYEdctli+OJb9g92zIDt0hVP+/1tndlbib5BUw6e2vkdyKF0+/7u77xr SDKNmUosvZt/fABZpv6ycgRszoKRjBCIC5takQCZI/l2QzZFbiP/414E8L0J/zLV PI8e1bs/H9pBF3c7WG+if/3jYs+D+/nYhkE+PeW3k5lxzsHo7XE5ei6mzoxzBusC l4c0QQ7lpwep4dOWm4oRxzE0/9IONgVKKlIKGBkpSbtznDkAToTWobAIFVeZj+nm BVxf+A6ddcnQSzXYa/FUsfV3ZEsJVPSs/DL6mBBJuG8lxNzZnabkt+ODfXuhyrXe 6kGkF9+4HE9XyItieZVDUgRcZ9x57c+3q7A9b7Kl+Ds1Z+hsu0tVqghf5YVQAj3a 4IkOBdPEtaGCSrJWxupX+oimCXqdNfbnOqf4VsO8l1O0O8WBRvYaqYL2RKR32kX2 n3nzO/vE =BKqv -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #1 - Correctly update the shadow register on exception injection when running in nVHE mode - Correctly use the mm_ops indirection when performing cache invalidation from the page-table walker - Restrict the vgic-v3 workaround for SEIS to the two known broken implementations	2022-01-28 07:45:15 -05:00
Marc Zyngier	d11a327ed9	KVM: arm64: vgic-v3: Restrict SEIS workaround to known broken systems Contrary to what `df652bcf11` ("KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors") was asserting, there is at least one other system out there (Cavium ThunderX2) implementing SEIS, and not in an obviously broken way. So instead of imposing the M1 workaround on an innocent bystander, let's limit it to the two known broken Apple implementations. Fixes: `df652bcf11` ("KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors") Reported-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220122103912.795026-1-maz@kernel.org	2022-01-22 11:38:16 +00:00
Paolo Bonzini	7fd55a02a4	KVM/arm64 updates for Linux 5.16 - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmHYIpgPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDndsP/RsBmX6bmQnDEhaaqfGAxOETyq/my1eT9r/V 3Ax4fEqSFfD5yHbYvqNRC8ueycH4r8WAr4ACWDAI6XpS/pYx00nx2N+HCSgjGyQR FeXqITuGPEsn4NkGuPci0PFmI8rVUzanl1ugRGQAETVrZo2ZVH2uqKVGT8XOlu0J FB/0x6Z4vMuIgEXyfa+DZ8WdW1aCRgPU2oyOdSdWE57/grjyLJqk6EdMmLyaQ19E vz6vXuRnA/GQwOtByqYEnQ8a4VXsQedCMqg/f9mj0BxpDzxC1ps8Nrpv36aJXKUN LEXapP9bCWPW9LqaKAOZnQYrUIIEFHsCUom0n3reDHrgObA+jivpz75L8GEr3CdC Bv78N04Yymjpp2WA6CuO3r9HjL1nJ6tYqobXU2pvqln4nNC3Ukucjq9ZVuWgS6Hx qOZXgPcZ/HpS3l/U+dAu8yIcV2SchQXDudaq8BsfLd8M1bD+oirSBolZFSvz7MYZ 6+jtEDLUOEO5s4rXiJF46+MauxiELcjaewAEK4WwrS8NBwEyhYe9EPsYcQ5pcrQF QwAd1+y7oLfhpGHv5KJKWswfvbtlLCm6NOAhawq0UXM8bS+79tu0dGjiDzVPBuSf SyA3VtBSKxcpvCrljw9ubtjxvKrviE0MDvlmTP2B1NU+lwm8xRBiwUwOH29qP9zU HDeUj2fy =HkZk -----END PGP SIGNATURE----- Merge tag 'kvmarm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update	2022-01-07 10:42:19 -05:00
Marc Zyngier	ce5b5b05c1	Merge branch kvm-arm64/vgic-fixes-5.17 into kvmarm-master/next * kvm-arm64/vgic-fixes-5.17: : . : A few vgic fixes: : - Harden vgic-v3 error handling paths against signed vs unsigned : comparison that will happen once the xarray-based vcpus are in : - Demote userspace-triggered console output to kvm_debug() : . KVM: arm64: vgic: Demote userspace-triggered console prints to kvm_debug() KVM: arm64: vgic-v3: Fix vcpu index comparison Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-12-16 12:54:12 +00:00
Marc Zyngier	440523b92b	KVM: arm64: vgic: Demote userspace-triggered console prints to kvm_debug() Running the KVM selftests results in these messages being dumped in the kernel console: [ 188.051073] kvm [469]: VGIC redist and dist frames overlap [ 188.056820] kvm [469]: VGIC redist and dist frames overlap [ 188.076199] kvm [469]: VGIC redist and dist frames overlap Being amle to trigger this from userspace is definitely not on, so demote these warnings to kvm_debug(). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211216104507.1482017-1-maz@kernel.org	2021-12-16 10:47:48 +00:00
Marc Zyngier	c95b1d7ca7	KVM: arm64: vgic-v3: Fix vcpu index comparison When handling an error at the point where we try and register all the redistributors, we unregister all the previously registered frames by counting down from the failing index. However, the way the code is written relies on that index being a signed value. Which won't be true once we switch to an xarray-based vcpu set. Since this code is pretty awkward the first place, and that the failure mode is hard to spot, rewrite this loop to iterate over the vcpus upwards rather than downwards. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211216104526.1482124-1-maz@kernel.org	2021-12-16 10:47:24 +00:00
Marc Zyngier	7b6871f670	Merge branch kvm-arm64/pkvm-cleanups-5.17 into kvmarm-master/next * kvm-arm64/pkvm-cleanups-5.17: : . : pKVM cleanups from Quentin Perret: : : This series is a collection of various fixes and cleanups for KVM/arm64 : when running in nVHE protected mode. The first two patches are real : fixes/improvements, the following two are minor cleanups, and the last : two help satisfy my paranoia so they're certainly optional. : . KVM: arm64: pkvm: Make kvm_host_owns_hyp_mappings() robust to VHE KVM: arm64: pkvm: Stub io map functions KVM: arm64: Make __io_map_base static KVM: arm64: Make the hyp memory pool static KVM: arm64: pkvm: Disable GICv2 support KVM: arm64: pkvm: Fix hyp_pool max order Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-12-15 14:21:23 +00:00
Quentin Perret	a770ee80e6	KVM: arm64: pkvm: Disable GICv2 support GICv2 requires having device mappings in guests and the hypervisor, which is incompatible with the current pKVM EL2 page ownership model which only covers memory. While it would be desirable to support pKVM with GICv2, this will require a lot more work, so let's make the current assumption clear until then. Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-3-qperret@google.com	2021-12-15 14:16:28 +00:00
Marc Zyngier	46808a4cb8	KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu index. Unfortunately, we're about to move rework the iterator, which requires this to be upgrade to an unsigned long. Let's bite the bullet and repaint all of it in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-08 04:24:15 -05:00
Marc Zyngier	94b4a6d521	Merge branch kvm-arm64/misc-5.17 into kvmarm-master/next * kvm-arm64/misc-5.17: : . : - Add minimal support for ARMv8.7's PMU extension : - Constify kvm_io_gic_ops : - Drop kvm_is_transparent_hugepage() prototype : . KVM: Drop stale kvm_is_transparent_hugepage() declaration KVM: arm64: Constify kvm_io_gic_ops KVM: arm64: Add minimal handling for the ARMv8.7 PMU Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-12-07 09:14:53 +00:00
Rikard Falkeborn	636dcd0204	KVM: arm64: Constify kvm_io_gic_ops The only usage of kvm_io_gic_ops is to make a comparison with its address and to pass its address to kvm_iodevice_init() which takes a pointer to const kvm_io_device_ops as input. Make it const to allow the compiler to put it in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211204213518.83642-1-rikard.falkeborn@gmail.com	2021-12-06 08:34:06 +00:00
Marc Zyngier	cc5705fb1b	KVM: arm64: Drop vcpu->arch.has_run_once for vcpu->pid With the transition to kvm_arch_vcpu_run_pid_change() to handle the "run once" activities, it becomes obvious that has_run_once is now an exact shadow of vcpu->pid. Replace vcpu->arch.has_run_once with a new vcpu_has_run_once() helper that directly checks for vcpu->pid, and get rid of the now unused field. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-12-01 11:51:22 +00:00
Marc Zyngier	5f8b2591de	Merge branch kvm-arm64/memory-accounting into kvmarm-master/next * kvm-arm64/memory-accounting: : . : Sprinkle a bunch of GFP_KERNEL_ACCOUNT all over the code base : to better track memory allocation made on behalf of a VM. : . KVM: arm64: Add memcg accounting to KVM allocations KVM: arm64: vgic: Add memcg accounting to vgic allocations Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-10-17 11:29:36 +01:00
Jia He	3ef231670b	KVM: arm64: vgic: Add memcg accounting to vgic allocations Inspired by commit `254272ce65` ("kvm: x86: Add memcg accounting to KVM allocations"), it would be better to make arm64 vgic consistent with common kvm codes. The memory allocations of VM scope should be charged into VM process cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT. There remain a few cases since these allocations are global, not in VM scope. Signed-off-by: Jia He <justin.he@arm.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907123112.10232-2-justin.he@arm.com	2021-10-17 11:25:55 +01:00
Marc Zyngier	20a3043075	Merge branch kvm-arm64/vgic-fixes-5.16 into kvmarm-master/next * kvm-arm64/vgic-fixes-5.16: : . : Multiple updates to the GICv3 emulation in order to better support : the dreadful Apple M1 that only implements half of it, and in a : broken way... : . KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3 Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-10-17 11:10:14 +01:00
Marc Zyngier	0924729b21	KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg accesses from the guest. From a performance perspective, this is OK as long as the guest doesn't hammer the GICv3 CPU interface. In most cases, this is fine, unless the guest actively uses priorities and switches PMR_EL1 very often. Which is exactly what happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1. In these condition, the performance plumets as we hit PMR each time we mask/unmask interrupts. Not good. There is however an opportunity for improvement. Careful reading of the architecture specification indicates that the only GICv3 sysreg belonging to the common group (which contains the SGI registers, PMR, DIR, CTLR and RPR) that is allowed to generate a SError is DIR. Everything else is safe. It is thus possible to substitute the trapping of all the common group with just that of DIR if it supported by the implementation. Yes, that's yet another optional bit of the architecture. So let's just do that, as it leads to some impressive result on the M1: Without this change: bash-5.1# /host/home/maz/hackbench 100 process 1000 Running with 10040 (== 4000) tasks. Time: 56.596 With this change: bash-5.1# /host/home/maz/hackbench 100 process 1000 Running with 10040 (== 4000) tasks. Time: 8.649 which is a pretty convincing result. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org	2021-10-17 11:06:36 +01:00
Marc Zyngier	df652bcf11	KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors The infamous M1 has a feature nobody else ever implemented, in the form of the "GIC locally generated SError interrupts", also known as SEIS for short. These SErrors are generated when a guest does something that violates the GIC state machine. It would have been simpler to just ignore the damned thing, but that's not what this HW does. Oh well. This part of of the architecture is also amazingly under-specified. There is a whole 10 lines that describe the feature in a spec that is 930 pages long, and some of these lines are factually wrong. Oh, and it is deprecated, so the insentive to clarify it is low. Now, the spec says that this should be a virtual SError when HCR_EL2.AMO is set. As it turns out, that's not always the case on this CPU, and the SError sometimes fires on the host as a physical SError. Goodbye, cruel world. This clearly is a HW bug, and it means that a guest can easily take the host down, on demand. Thankfully, we have seen systems that were just as broken in the past, and we have the perfect vaccine for it. Apple M1, please meet the Cavium ThunderX workaround. All your GIC accesses will be trapped, sanitised, and emulated. Only the signalling aspect of the HW will be used. It won't be super speedy, but it will at least be safe. You're most welcome. Given that this has only ever been seen on this single implementation, that the spec is unclear at best and that we cannot trust it to ever be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS being set. Tested-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org	2021-10-17 11:06:36 +01:00
Ricardo Koller	96e9038969	KVM: arm64: vgic: Drop vgic_check_ioaddr() There are no more users of vgic_check_ioaddr(). Move its checks to vgic_check_iorange() and then remove it. Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211005011921.437353-6-ricarkol@google.com	2021-10-11 09:31:42 +01:00
Ricardo Koller	2ec02f6c64	KVM: arm64: vgic-v3: Check ITS region is not above the VM IPA size Verify that the ITS region does not extend beyond the VM-specified IPA range (phys_size). base + size > phys_size AND base < phys_size Add the missing check into vgic_its_set_attr() which is called when setting the region. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211005011921.437353-5-ricarkol@google.com	2021-10-11 09:31:42 +01:00
Ricardo Koller	c56a87da0a	KVM: arm64: vgic-v2: Check cpu interface region is not above the VM IPA size Verify that the GICv2 CPU interface does not extend beyond the VM-specified IPA range (phys_size). base + size > phys_size AND base < phys_size Add the missing check into kvm_vgic_addr() which is called when setting the region. This patch also enables some superfluous checks for the distributor (vgic_check_ioaddr was enough as alignment == size for the distributors). Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211005011921.437353-4-ricarkol@google.com	2021-10-11 09:31:41 +01:00
Ricardo Koller	4612d98f58	KVM: arm64: vgic-v3: Check redist region is not above the VM IPA size Verify that the redistributor regions do not extend beyond the VM-specified IPA range (phys_size). This can happen when using KVM_VGIC_V3_ADDR_TYPE_REDIST or KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS with: base + size > phys_size AND base < phys_size Add the missing check into vgic_v3_alloc_redist_region() which is called when setting the regions, and into vgic_v3_check_base() which is called when attempting the first vcpu-run. The vcpu-run check does not apply to KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS because the regions size is known before the first vcpu-run. Note that using the REDIST_REGIONS API results in a different check, which already exists, at first vcpu run: that the number of redist regions is enough for all vcpus. Finally, this patch also enables some extra tests in vgic_v3_alloc_redist_region() by calculating "size" early for the legacy redist api: like checking that the REDIST region can fit all the already created vcpus. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211005011921.437353-3-ricarkol@google.com	2021-10-11 09:31:41 +01:00
Ricardo Koller	f25c5e4daf	kvm: arm64: vgic: Introduce vgic_check_iorange Add the new vgic_check_iorange helper that checks that an iorange is sane: the start address and size have valid alignments, the range is within the addressable PA range, start+size doesn't overflow, and the start wasn't already defined. No functional change. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211005011921.437353-2-ricarkol@google.com	2021-10-11 09:31:41 +01:00
Marc Zyngier	3134cc8beb	KVM: arm64: vgic: Resample HW pending state on deactivation When a mapped level interrupt (a timer, for example) is deactivated by the guest, the corresponding host interrupt is equally deactivated. However, the fate of the pending state still needs to be dealt with in SW. This is specially true when the interrupt was in the active+pending state in the virtual distributor at the point where the guest was entered. On exit, the pending state is potentially stale (the guest may have put the interrupt in a non-pending state). If we don't do anything, the interrupt will be spuriously injected in the guest. Although this shouldn't have any ill effect (spurious interrupts are always possible), we can improve the emulation by detecting the deactivation-while-pending case and resample the interrupt. While we're at it, move the logic into a common helper that can be shared between the two GIC implementations. Fixes: `e40cc57bac` ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts") Reported-by: Raghavendra Rao Ananta <rananta@google.com> Tested-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org	2021-08-20 08:53:22 +01:00
Ricardo Koller	b9a51949ce	KVM: arm64: vgic: Drop WARN from vgic_get_irq vgic_get_irq(intid) is used all over the vgic code in order to get a reference to a struct irq. It warns whenever intid is not a valid number (like when it's a reserved IRQ number). The issue is that this warning can be triggered from userspace (e.g., KVM_IRQ_LINE for intid 1020). Drop the WARN call from vgic_get_irq. Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210818213205.598471-1-ricarkol@google.com	2021-08-19 11:31:04 +01:00
Jason Wang	013cc4c678	KVM: arm64: Fix comments related to GICv2 PMR reporting Remove the repeated word 'the' from two comments. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210728130623.12017-1-wangborong@cdjrlc.com	2021-08-02 14:49:39 +01:00
Marc Zyngier	354920e794	KVM: arm64: vgic: Implement SW-driven deactivation In order to deal with these systems that do not offer HW-based deactivation of interrupts, let implement a SW-based approach: - When the irq is queued into a LR, treat it as a pure virtual interrupt and set the EOI flag in the LR. - When the interrupt state is read back from the LR, force a deactivation when the state is invalid (neither active nor pending) Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag. Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-06-01 10:46:00 +01:00
Marc Zyngier	db75f1a33f	KVM: arm64: vgic: move irq->get_input_level into an ops structure We already have the option to attach a callback to an interrupt to retrieve its pending state. As we are planning to expand this facility, move this callback into its own data structure. This will limit the size of individual interrupts as the ops structures can be shared across multiple interrupts. Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-06-01 10:45:59 +01:00
Marc Zyngier	f6c3e24fb7	KVM: arm64: vgic: Let an interrupt controller advertise lack of HW deactivation The vGIC, as architected by ARM, allows a virtual interrupt to trigger the deactivation of a physical interrupt. This allows the following interrupt to be delivered without requiring an exit. However, some implementations have choosen not to implement this, meaning that we will need some unsavoury workarounds to deal with this. On detecting such a case, taint the kernel and spit a nastygram. We'll deal with this in later patches. Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-06-01 10:45:59 +01:00
Marc Zyngier	669062d2a1	KVM: arm64: vgic: Be tolerant to the lack of maintenance interrupt masking As it turns out, not all the interrupt controllers are able to expose a vGIC maintenance interrupt that can be independently enabled/disabled. And to be fair, it doesn't really matter as all we require is for the interrupt to kick us out of guest mode out way or another. To that effect, add gic_kvm_info.no_maint_irq_mask for an interrupt controller to advertise the lack of masking. Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-06-01 10:45:59 +01:00
Marc Zyngier	0e5cb77706	irqchip/gic: Split vGIC probing information from the GIC code The vGIC advertising code is unsurprisingly very much tied to the GIC implementations. However, we are about to extend the support to lesser implementations. Let's dissociate the vgic registration from the GIC code and move it into KVM, where it makes a bit more sense. This also allows us to mark the gic_kvm_info structures as __initdata. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-06-01 10:45:58 +01:00
Linus Torvalds	152d32aa84	ARM: - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - Some selftests improvements -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+ Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY +2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ== =AXUi -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "This is a large update by KVM standards, including AMD PSP (Platform Security Processor, aka "AMD Secure Technology") and ARM CoreSight (debug and trace) changes. ARM: - CoreSight: Add support for ETE and TRBE - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - AMD PSP driver changes - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - a handful of "Get rid of oprofile leftovers" patches - Some selftests improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits) KVM: selftests: Speed up set_memory_region_test selftests: kvm: Fix the check of return value KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt() KVM: SVM: Skip SEV cache flush if no ASIDs have been used KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() KVM: SVM: Drop redundant svm_sev_enabled() helper KVM: SVM: Move SEV VMCB tracking allocation to sev.c KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() KVM: SVM: Unconditionally invoke sev_hardware_teardown() KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features KVM: SVM: Move SEV module params/variables to sev.c KVM: SVM: Disable SEV/SEV-ES if NPT is disabled KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails KVM: SVM: Zero out the VMCB array used to track SEV ASID association x86/sev: Drop redundant and potentially misleading 'sev_enabled' KVM: x86: Move reverse CPUID helpers to separate header file KVM: x86: Rename GPR accessors to make mode-aware variants the defaults ...	2021-05-01 10:14:08 -07:00
Linus Torvalds	57fa2369ab	CFI on arm64 series for v5.13-rc1 - Clean up list_sort prototypes (Sami Tolvanen) - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmCHCR8ACgkQiXL039xt wCZyFQ//fnUZaXR2K354zDyW6CJljMf+d94RF6rH+J6eMTH2/HXa5v0iJokwABLf ussP6qF4k5wtmI22Gm9A5Zc3e4iiry5pC0jOdk0mk4gzWwFN9MdgNxJZIGA3xqhS bsBK4AGrVKjtZl48G1/ZxJuNDeJhVp6GNK2n6/Gl4rZF6R7D/Upz0XelyJRdDpcM HIGma7jZl6xfGU0mdWCzpOGK1zdMca1WVs7A4YuurSbLn5PZJrcNVWLouDqt/Si2 AduSri1gyPClicgvqWjMOzhUpuw/nJtBLRl1x1EsWk/KSZ1/uNVjlewfzdN4fZrr zbtFr2gLubYLK6JOX7/LqoHlOTgE3tYLL+WIVN75DsOGZBKgHhmebTmWLyqzV0SL oqcyM5d3ucC6msdtAK5Fv4MSp8rpjqlK1Ha4SGRT6kC2wut7AhZ3KD7eyRIz8mV9 Sa9mhignGFJnTEUp+LSbYdrAudgSKxB40WyXPmswAXX4VJFRD4ONrrcAON/SzkUT Hw/JdFRCKkJjgwNQjIQoZcUNMTbFz2PlNIEnjJWm38YImQKQlCb2mXaZKCwBkf45 aheCZk17eKoxTCXFMd+KxlyNEtS2yBfq/PpZgvw7GW/pfFbWUg1+2O41LnihIe5v zu0hN1wNCQqgfxiMZqX1OTb9C/2vybzGsXILt+9nppjZ8EBU7iU= =wU6U -----END PGP SIGNATURE----- Merge tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull CFI on arm64 support from Kees Cook: "This builds on last cycle's LTO work, and allows the arm64 kernels to be built with Clang's Control Flow Integrity feature. This feature has happily lived in Android kernels for almost 3 years[1], so I'm excited to have it ready for upstream. The wide diffstat is mainly due to the treewide fixing of mismatched list_sort prototypes. Other things in core kernel are to address various CFI corner cases. The largest code portion is the CFI runtime implementation itself (which will be shared by all architectures implementing support for CFI). The arm64 pieces are Acked by arm64 maintainers rather than coming through the arm64 tree since carrying this tree over there was going to be awkward. CFI support for x86 is still under development, but is pretty close. There are a handful of corner cases on x86 that need some improvements to Clang and objtool, but otherwise works well. Summary: - Clean up list_sort prototypes (Sami Tolvanen) - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)" * tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: arm64: allow CONFIG_CFI_CLANG to be selected KVM: arm64: Disable CFI for nVHE arm64: ftrace: use function_nocfi for ftrace_call arm64: add __nocfi to __apply_alternatives arm64: add __nocfi to functions that jump to a physical address arm64: use function_nocfi with __pa_symbol arm64: implement function_nocfi psci: use function_nocfi for cpu_resume lkdtm: use function_nocfi treewide: Change list_sort to use const pointers bpf: disable CFI in dispatcher functions kallsyms: strip ThinLTO hashes from static functions kthread: use WARN_ON_FUNCTION_MISMATCH workqueue: use WARN_ON_FUNCTION_MISMATCH module: ensure __cfi_check alignment mm: add generic function_nocfi macro cfi: add __cficanonical add support for Clang CFI	2021-04-27 10:16:46 -07:00
Linus Torvalds	91552ab8ff	The usual updates from the irq departement: Core changes: - Provide IRQF_NO_AUTOEN as a flag for request_irq() so drivers can be cleaned up which either use a seperate mechanism to prevent auto-enable at request time or have a racy mechanism which disables the interrupt right after request. - Get rid of the last usage of irq_create_identity_mapping() and remove the interface. - An overhaul of tasklet_disable(). Most usage sites of tasklet_disable() are in task context and usually in cleanup, teardown code pathes. tasklet_disable() spinwaits for a tasklet which is currently executed. That's not only a problem for PREEMPT_RT where this can lead to a live lock when the disabling task preempts the softirq thread. It's also problematic in context of virtualization when the vCPU which runs the tasklet is scheduled out and the disabling code has to spin wait until it's scheduled back in. Though there are a few code pathes which invoke tasklet_disable() from non-sleepable context. For these a new disable variant which still spinwaits is provided which allows to switch tasklet_disable() to a sleep wait mechanism. For the atomic use cases this does not solve the live lock issue on PREEMPT_RT. That is mitigated by blocking on the RT specific softirq lock. - The PREEMPT_RT specific implementation of softirq processing and local_bh_disable/enable(). On RT enabled kernels soft interrupt processing happens always in task context and all interrupt handlers, which are not explicitly marked to be invoked in hard interrupt context are forced into task context as well. This allows to protect against softirq processing with a per CPU lock, which in turn allows to make BH disabled regions preemptible. Most of the softirq handling code is still shared. The RT/non-RT specific differences are addressed with a set of inline functions which provide the context specific functionality. The local_bh_disable() / local_bh_enable() mechanism are obviously seperate. - The usual set of small improvements and cleanups Driver changes: - New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt controllers - Extended functionality for MStar, STM32 and SC7280 irq chips - Enhanced robustness for ARM GICv3/4.1 drivers - The usual set of cleanups and improvements all over the place -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGh5wTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoZ+/EACWBpQ/2ZHizEw1bzjaDzJrR8U228xu wNi7nSP92Y07nJ3cCX7a6TJ53mqd0n3RT+DprlsOuqSN0D7Ktr/x44V/aZtm0d3N GkFOlpeGCRnHusLaUTwk7a8289LuoQ7OhSxIB409n1I4nLI96ZK41D1tYonMYl6E nxDiGADASfjaciBWbjwJO/mlwmiW/VRpSTxswx0wzakFfbIx9iKyKv1bCJQZ5JK+ lHmf0jxpDIs1EVK/ElJ9Ky6TMBlEmZyiX7n6rujtwJ1W+Jc/uL/y8pLJvGwooVmI yHTYsLMqzviCbAMhJiB3h1qs3GbCGlM78prgJTnOd0+xEUOCcopCRQlsTXVBq8Nb OS+HNkYmYXRfiSH6lINJsIok8Xis28bAw/qWz2Ho+8wLq0TI8crK38roD1fPndee FNJRhsPPOBkscpIldJ0Cr0X5lclkJFiAhAxORPHoseKvQSm7gBMB7H99xeGRffTn yB3XqeTJMvPNmAHNN4Brv6ey3OjwnEWBgwcnIM2LtbIlRtlmxTYuR+82OPOgEvzk fSrjFFJqu0LEMLEOXS4pYN824PawjV//UAy4IaG8AodmUUCSGHgw1gTVa4sIf72t tXY54HqWfRWRpujhVRgsZETqBUtZkL6yvpoe8f6H7P91W5tAfv3oj4ch9RkhUo+Z b0/u9T0+Fpbg+w== =id4G -----END PGP SIGNATURE----- Merge tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The usual updates from the irq departement: Core changes: - Provide IRQF_NO_AUTOEN as a flag for request_irq() so drivers can be cleaned up which either use a seperate mechanism to prevent auto-enable at request time or have a racy mechanism which disables the interrupt right after request. - Get rid of the last usage of irq_create_identity_mapping() and remove the interface. - An overhaul of tasklet_disable(). Most usage sites of tasklet_disable() are in task context and usually in cleanup, teardown code pathes. tasklet_disable() spinwaits for a tasklet which is currently executed. That's not only a problem for PREEMPT_RT where this can lead to a live lock when the disabling task preempts the softirq thread. It's also problematic in context of virtualization when the vCPU which runs the tasklet is scheduled out and the disabling code has to spin wait until it's scheduled back in. There are a few code pathes which invoke tasklet_disable() from non-sleepable context. For these a new disable variant which still spinwaits is provided which allows to switch tasklet_disable() to a sleep wait mechanism. For the atomic use cases this does not solve the live lock issue on PREEMPT_RT. That is mitigated by blocking on the RT specific softirq lock. - The PREEMPT_RT specific implementation of softirq processing and local_bh_disable/enable(). On RT enabled kernels soft interrupt processing happens always in task context and all interrupt handlers, which are not explicitly marked to be invoked in hard interrupt context are forced into task context as well. This allows to protect against softirq processing with a per CPU lock, which in turn allows to make BH disabled regions preemptible. Most of the softirq handling code is still shared. The RT/non-RT specific differences are addressed with a set of inline functions which provide the context specific functionality. The local_bh_disable() / local_bh_enable() mechanism are obviously seperate. - The usual set of small improvements and cleanups Driver changes: - New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt controllers - Extended functionality for MStar, STM32 and SC7280 irq chips - Enhanced robustness for ARM GICv3/4.1 drivers - The usual set of cleanups and improvements all over the place" * tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP irqchip/gic-v3: Do not enable irqs when handling spurious interrups dt-bindings: interrupt-controller: Add IDT 79RC3243x Interrupt Controller irqchip: Add support for IDT 79rc3243x interrupt controller irqdomain: Drop references to recusive irqdomain setup irqdomain: Get rid of irq_create_strict_mappings() irqchip/jcore-aic: Kill use of irq_create_strict_mappings() ARM: PXA: Kill use of irq_create_strict_mappings() irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection irqchip/tb10x: Use 'fallthrough' to eliminate a warning genirq: Reduce irqdebug cacheline bouncing kernel: Initialize cpumask before parsing irqchip/wpcm450: Drop COMPILE_TEST irqchip/irq-mst: Support polarity configuration irqchip: Add driver for WPCM450 interrupt controller dt-bindings: interrupt-controller: Add nuvoton, wpcm450-aic dt-bindings: qcom,pdc: Add compatible for sc7280 irqchip/stm32: Add usart instances exti direct event support irqchip/gic-v3: Fix OF_BAD_ADDR error handling irqchip/sifive-plic: Mark two global variables __ro_after_init ...	2021-04-26 09:43:16 -07:00
Lorenzo Pieralisi	46135d6f87	irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection GIC CPU interfaces versions predating GIC v4.1 were not built to accommodate vINTID within the vSGI range; as reported in the GIC specifications (8.2 "Changes to the CPU interface"), it is CONSTRAINED UNPREDICTABLE to deliver a vSGI to a PE with ID_AA64PFR0_EL1.GIC < b0011. Check the GIC CPUIF version by reading the SYS_ID_AA64_PFR0_EL1. Disable vSGIs if a CPUIF version < 4.1 is detected to prevent using vSGIs on systems where they may misbehave. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210317100719.3331-2-lorenzo.pieralisi@arm.com	2021-04-22 15:55:21 +01:00
Marc Zyngier	e629003215	Merge branch 'kvm-arm64/vlpi-save-restore' into kvmarm-master/next Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-04-13 15:41:45 +01:00
Eric Auger	94ac083539	KVM: arm/arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST read When reading the base address of the a REDIST region through KVM_VGIC_V3_ADDR_TYPE_REDIST we expect the redistributor region list to be populated with a single element. However list_first_entry() expects the list to be non empty. Instead we should use list_first_entry_or_null which effectively returns NULL if the list is empty. Fixes: `dbd9733ab6` ("KVM: arm/arm64: Replace the single rdist region by a list") Cc: <Stable@vger.kernel.org> # v4.18+ Signed-off-by: Eric Auger <eric.auger@redhat.com> Reported-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210412150034.29185-1-eric.auger@redhat.com	2021-04-13 15:04:50 +01:00
Sami Tolvanen	4f0f586bf0	treewide: Change list_sort to use const pointers list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com	2021-04-08 16:04:22 -07:00
Eric Auger	28e9d4bce3	KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace Commit `23bde34771` ("KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace") temporarily fixed a bug identified when attempting to access the GICR_TYPER register before the redistributor region setting, but dropped the support of the LAST bit. Emulating the GICR_TYPER.Last bit still makes sense for architecture compliance though. This patch restores its support (if the redistributor region was set) while keeping the code safe. We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which computes whether a redistributor is the highest one of a series of redistributor contributor pages. With this new implementation we do not need to have a uaccess read accessor anymore. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405163941.510258-9-eric.auger@redhat.com	2021-04-06 14:51:38 +01:00
Eric Auger	e5a3563546	kvm: arm64: vgic-v3: Introduce vgic_v3_free_redist_region() To improve the readability, we introduce the new vgic_v3_free_redist_region helper and also rename vgic_v3_insert_redist_region into vgic_v3_alloc_redist_region Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405163941.510258-8-eric.auger@redhat.com	2021-04-06 14:51:38 +01:00
Eric Auger	da38530976	KVM: arm64: Simplify argument passing to vgic_uaccess_[read\|write] vgic_uaccess() takes a struct vgic_io_device argument, converts it to a struct kvm_io_device and passes it to the read/write accessor functions, which convert it back to a struct vgic_io_device. Avoid the indirection by passing the struct vgic_io_device argument directly to vgic_uaccess_{read,write}. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405163941.510258-7-eric.auger@redhat.com	2021-04-06 14:51:38 +01:00
Eric Auger	3a52116127	KVM: arm/arm64: vgic: Reset base address on kvm_vgic_dist_destroy() On vgic_dist_destroy(), the addresses are not reset. However for kvm selftest purpose this would allow to continue the test execution even after a failure when running KVM_RUN. So let's reset the base addresses. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405163941.510258-5-eric.auger@redhat.com	2021-04-06 14:51:38 +01:00
Eric Auger	8542a8f95a	KVM: arm64: vgic-v3: Fix error handling in vgic_v3_set_redist_base() vgic_v3_insert_redist_region() may succeed while vgic_register_all_redist_iodevs fails. For example this happens while adding a redistributor region overlapping a dist region. The failure only is detected on vgic_register_all_redist_iodevs when vgic_v3_check_base() gets called in vgic_register_redist_iodev(). In such a case, remove the newly added redistributor region and free it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405163941.510258-4-eric.auger@redhat.com	2021-04-06 14:51:37 +01:00
Eric Auger	53b16dd6ba	KVM: arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION read The doc says: "The characteristics of a specific redistributor region can be read by presetting the index field in the attr data. Only valid for KVM_DEV_TYPE_ARM_VGIC_V3" Unfortunately the existing code fails to read the input attr data. Fixes: `04c1109322` ("KVM: arm/arm64: Implement KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION") Cc: stable@vger.kernel.org#v4.17+ Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405163941.510258-3-eric.auger@redhat.com	2021-04-06 14:51:37 +01:00
Eric Auger	d9b201e99c	KVM: arm64: vgic-v3: Fix some error codes when setting RDIST base KVM_DEV_ARM_VGIC_GRP_ADDR group doc says we should return -EEXIST in case the base address of the redist is already set. We currently return -EINVAL. However we need to return -EINVAL in case a legacy REDIST address is attempted to be set while REDIST_REGIONS were set. This case is discriminated by looking at the count field. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405163941.510258-2-eric.auger@redhat.com	2021-04-06 14:51:37 +01:00
Shenming Lu	8082d50f48	KVM: arm64: GICv4.1: Give a chance to save VLPI state Before GICv4.1, we don't have direct access to the VLPI state. So we simply let it fail early when encountering any VLPI in saving. But now we don't have to return -EACCES directly if on GICv4.1. Let’s change the hard code and give a chance to save the VLPI state (and preserve the UAPI). Signed-off-by: Shenming Lu <lushenming@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210322060158.1584-7-lushenming@huawei.com	2021-03-24 18:12:21 +00:00
Zenghui Yu	12df742921	KVM: arm64: GICv4.1: Restore VLPI pending state to physical side When setting the forwarding path of a VLPI (switch to the HW mode), we can also transfer the pending state from irq->pending_latch to VPT (especially in migration, the pending states of VLPIs are restored into kvm’s vgic first). And we currently send "INT+VSYNC" to trigger a VLPI to pending. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Shenming Lu <lushenming@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210322060158.1584-6-lushenming@huawei.com	2021-03-24 18:12:21 +00:00
Shenming Lu	f66b7b151e	KVM: arm64: GICv4.1: Try to save VLPI state in save_pending_tables After pausing all vCPUs and devices capable of interrupting, in order to save the states of all interrupts, besides flushing the states in kvm’s vgic, we also try to flush the states of VLPIs in the virtual pending tables into guest RAM, but we need to have GICv4.1 and safely unmap the vPEs first. As for the saving of VSGIs, which needs the vPEs to be mapped and might conflict with the saving of VLPIs, but since we will map the vPEs back at the end of save_pending_tables and both savings require the kvm->lock to be held (thus only happen serially), it will work fine. Signed-off-by: Shenming Lu <lushenming@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210322060158.1584-5-lushenming@huawei.com	2021-03-24 18:12:21 +00:00
Shenming Lu	80317fe4a6	KVM: arm64: GICv4.1: Add function to get VLPI state With GICv4.1 and the vPE unmapped, which indicates the invalidation of any VPT caches associated with the vPE, we can get the VLPI state by peeking at the VPT. So we add a function for this. Signed-off-by: Shenming Lu <lushenming@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210322060158.1584-4-lushenming@huawei.com	2021-03-24 18:12:20 +00:00
Marc Zyngier	9739f6ef05	KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility It looks like we have broken firmware out there that wrongly advertises a GICv2 compatibility interface, despite the CPUs not being able to deal with it. To work around this, check that the CPU initialising KVM is actually able to switch to MMIO instead of system registers, and use that as a precondition to enable GICv2 compatibility in KVM. Note that the detection happens on a single CPU. If the firmware is lying and that the CPUs are asymetric, all hope is lost anyway. Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20210305185254.3730990-8-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-06 04:18:41 -05:00
Marc Zyngier	b9d699e269	KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config() As we are about to report a bit more information to the rest of the kernel, rename __vgic_v3_get_ich_vtr_el2() to the more explicit __vgic_v3_get_gic_config(). No functional change. Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20210305185254.3730990-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-06 04:18:41 -05:00
Paolo Bonzini	774206bc03	KVM/arm64 fixes for 5.11, take #1 - VM init cleanups - PSCI relay cleanups - Kill CONFIG_KVM_ARM_PMU - Fixup __init annotations - Fixup reg_to_encoding() - Fix spurious PMCR_EL0 access -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl/27REPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDOHoQAJ5uFunaYzBBtiQqXXG0XODGpI7/DXRYfdKX Kp7LS6pJHWvUqYmf1LxXTWXYy1rf3L4JIGKYIo1ZEkKDo2kkGJAKKYdR8aL2m/B4 Q80wFGBBv3DqK2jIQZRH9z3joppsyjKOPJZ6EKJU38t45+TNhiXQSVff2jJychqg KfDh0Oc+UtW5vxVUz8XTvguH3/yrvswk+za/BW/hSDZnUqrUxceCJ0i13agiZ/Zu URdq9MNXt8m6mMssT4Z/339TJlG2e16Y8ZpWWD9t2tQKBuP9UPicABmsOxqyfBrT 42rdhtLacXfXxWzCGe0qf6cxYCH0UuE2gzSk45CJANv/ws6QJn4r/KZaj7U+2Bft ukpruUrDV1+wE7WZRXRo4fpMiTYrijTuyx7ho8TdtyRAcR3Buxhv3l5ZBdvp/fb4 cG27XLBLNEOaUg7NJ/aePVQazjxLdm4uaYKz6T9wO6CFRJ39iMba7K351/nNRYwk bq7cQnfkCgJgWpEPd7rUq8HC2Y0c6FUHWf4FLOAt3en/KDfVjeipN0YvFjf5fCwt Pr3cOgUHOg3sGX8jEGZGm3HhMkeeKn2Op/sRSFzcnwyZGfbPFHvr+55p8WKS4UiK LZ0aa14VEYrqtd4Tha2g2ym138EMPSF3OaeQY3Zsqx6wPD/9gfLydsOSkVezp1JI v38AVi2y =FCg2 -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.11, take #1 - VM init cleanups - PSCI relay cleanups - Kill CONFIG_KVM_ARM_PMU - Fixup __init annotations - Fixup reg_to_encoding() - Fix spurious PMCR_EL0 access	2021-01-08 05:02:40 -05:00
Paolo Bonzini	bc351f0726	Merge branch 'kvm-master' into kvm-next Fixes to get_mmio_spte, destined to 5.10 stable branch.	2021-01-07 18:06:52 -05:00
Marc Zyngier	101068b566	KVM: arm64: Consolidate dist->ready setting into kvm_vgic_map_resources() dist->ready setting is pointlessly spread across the two vgic backends, while it could be consolidated in kvm_vgic_map_resources(). Move it there, and slightly simplify the flows in both backends. Suggested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2020-12-27 14:39:14 +00:00
Alexandru Elisei	9e5c23b9bd	KVM: arm64: Update comment in kvm_vgic_map_resources() vgic_v3_map_resources() returns -EBUSY if the VGIC isn't initialized, update the comment to kvm_vgic_map_resources() to match what the function does. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201201150157.223625-5-alexandru.elisei@arm.com	2020-12-27 14:37:21 +00:00
Alexandru Elisei	1c91f06d29	KVM: arm64: Move double-checked lock to kvm_vgic_map_resources() kvm_vgic_map_resources() is called when a VCPU if first run and it maps all the VGIC MMIO regions. To prevent double-initialization, the VGIC uses the ready variable to keep track of the state of resources and the global KVM mutex to protect against concurrent accesses. After the lock is taken, the variable is checked again in case another VCPU took the lock between the current VCPU reading ready equals false and taking the lock. The double-checked lock pattern is spread across four different functions: in kvm_vcpu_first_run_init(), in kvm_vgic_map_resource() and in vgic_{v2,v3}_map_resources(), which makes it hard to reason about and introduces minor code duplication. Consolidate the checks in kvm_vgic_map_resources(), where the lock is taken. No functional change intended. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201201150157.223625-4-alexandru.elisei@arm.com	2020-12-23 16:43:43 +00:00
Shenming Lu	57e3cebd02	KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bit In order to reduce the impact of the VPT parsing happening on the GIC, we can split the vcpu reseidency in two phases: - programming GICR_VPENDBASER: this still happens in vcpu_load() - checking for the VPT parsing to be complete: this can happen on vcpu entry (in kvm_vgic_flush_hwstate()) This allows the GIC and the CPU to work in parallel, rewmoving some of the entry overhead. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Shenming Lu <lushenming@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201128141857.983-3-lushenming@huawei.com	2020-11-30 11:18:29 +00:00
Zenghui Yu	23bde34771	KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace It was recently reported that if GICR_TYPER is accessed before the RD base address is set, we'll suffer from the unset @rdreg dereferencing. Oops... gpa_t last_rdist_typer = rdreg->base + GICR_TYPER + (rdreg->free_index - 1) * KVM_VGIC_V3_REDIST_SIZE; It's "expected" that users will access registers in the redistributor if the RD has been properly configured (e.g., the RD base address is set). But it hasn't yet been covered by the existing documentation. Per discussion on the list [1], the reporting of the GICR_TYPER.Last bit for userspace never actually worked. And it's difficult for us to emulate it correctly given that userspace has the flexibility to access it any time. Let's just drop the reporting of the Last bit for userspace for now (userspace should have full knowledge about it anyway) and it at least prevents kernel from panic ;-) [1] https://lore.kernel.org/kvmarm/c20865a267e44d1e2c0d52ce4e012263@kernel.org/ Fixes: `ba7b3f1275` ("KVM: arm/arm64: Revisit Redistributor TYPER last bit computation") Reported-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20201117151629.1738-1-yuzenghui@huawei.com Cc: stable@vger.kernel.org	2020-11-17 18:51:09 +00:00

1 2 3 4 5 ...

263 Commits