linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-02 16:44:59 +00:00

Author	SHA1	Message	Date
Thorsten Blum	dea8838128	KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe() Remove hard-coded strings by using the str_enabled_disabled() helper function. Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250110225310.369980-2-thorsten.blum@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-01-11 10:26:56 +00:00
Marc Zyngier	f7d03fcbf1	KVM: arm64: Introduce __pkvm_vcpu_{load,put}() Rather than look-up the hyp vCPU on every run hypercall at EL2, introduce a per-CPU 'loaded_hyp_vcpu' tracking variable which is updated by a pair of load/put hypercalls called directly from kvm_arch_vcpu_{load,put}() when pKVM is enabled. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20241218194059.3670226-10-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 09:44:00 +00:00
Keisuke Nishimura	be7e611274	KVM: arm64: vgic-its: Add error handling in vgic_its_cache_translation The return value of xa_store() needs to be checked. This fix adds an error handling path that resolves the kref inconsistency on failure. As suggested by Oliver Upton, this function does not return the error code intentionally because the translation cache is best effort. Fixes: `8201d1028c` ("KVM: arm64: vgic-its: Maintain a translation cache per ITS") Signed-off-by: Keisuke Nishimura <keisuke.nishimura@inria.fr> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241130144952.23729-1-keisuke.nishimura@inria.fr Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-12-03 16:22:10 -08:00
Marc Zyngier	3b2c81d5fe	KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizes The ITS ABI infrastructure allows for some pretty lax code, where the size of the data doesn't have to match the size of the entry, potentially leading to a collection of interesting bugs. Commit `7fe28d7e68` ("KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*") added some checks, but starts by implicitly casting all writes to a 64bit value, hiding some of the issues. Instead, introduce macros that will check the data type actually used for dealing with the table entries. The macros are taking a symbolic entry type that is used to fetch the size of the entry type for the current ABI. This immediately catches a couple of low-impact gotchas (zero values that are implicitly 32bit), easy enough to fix. Given that we currently only have a single ABI, hardcode a couple of BUILD_BUG_ON()s that will fire if we use anything but a 64bit quantity, and some (currently unreachable) fallback code that may become useful one day. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-11-20 17:21:08 -08:00
Marc Zyngier	e7619f2a2f	KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definition VGIC_MAX_PRIVATE is a pretty useless definition, and is better replaced with VGIC_NR_PRIVATE_IRQS. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-11-20 17:21:08 -08:00
Marc Zyngier	add570b39f	KVM: arm64: vgic: Make vgic_get_irq() more robust vgic_get_irq() has an awkward signature, as it takes both a kvm and a vcpu, where the vcpu is allowed to be NULL if the INTID being looked up is a global interrupt (SPI or LPI). This leads to potentially problematic situations where the INTID passed is a private interrupt, but that there is no vcpu. In order to make things less ambiguous, let have two helpers instead: - vgic_get_irq(struct kvm kvm, u32 intid), which is only concerned with global* interrupts, as indicated by the lack of vcpu. - vgic_get_vcpu_irq(struct kvm_vcpu vcpu, u32 intid), which can return any* interrupt class, but must have of course a non-NULL vcpu. Most of the code nicely falls under one or the other situations, except for a couple of cases (close to the UABI or in the debug code) where we have to distinguish between the two cases. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-11-20 17:21:08 -08:00
Marc Zyngier	d561491ba9	KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIR Make sure we filter out non-LPI invalidation when handling writes to GICR_INVLPIR. Fixes: `4645d11f4a` ("KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241117165757.247686-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-11-20 17:21:07 -08:00
Kunkun Jiang	7602ffd1d5	KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE When DISCARD frees an ITE, it does not invalidate the corresponding ITE. In the scenario of continuous saves and restores, there may be a situation where an ITE is not saved but is restored. This is unreasonable and may cause restore to fail. This patch clears the corresponding ITE when DISCARD frees an ITE. Cc: stable@vger.kernel.org Fixes: `eff484e029` ("KVM: arm64: vgic-its: ITT save and restore") Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-6-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-11-11 19:54:03 +00:00
Kunkun Jiang	e9649129d3	KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device vgic_its_save_device_tables will traverse its->device_list to save DTE for each device. vgic_its_restore_device_tables will traverse each entry of device table and check if it is valid. Restore if valid. But when MAPD unmaps a device, it does not invalidate the corresponding DTE. In the scenario of continuous saves and restores, there may be a situation where a device's DTE is not saved but is restored. This is unreasonable and may cause restore to fail. This patch clears the corresponding DTE when MAPD unmaps a device. Cc: stable@vger.kernel.org Fixes: `57a9a11715` ("KVM: arm64: vgic-its: Device table save/restore") Co-developed-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-5-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-11-11 19:52:31 +00:00
Jing Zhang	7fe28d7e68	KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* In all the vgic_its_save_*() functinos, they do not check whether the data length is 8 bytes before calling vgic_write_guest_lock. This patch adds the check. To prevent the kernel from being blown up when the fault occurs, KVM_BUG_ON() is used. And the other BUG_ON()s are replaced together. Cc: stable@vger.kernel.org Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with the new entry read/write helpers] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-4-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-11-11 19:50:14 +00:00
Oliver Upton	78a0055555	KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration kvm_vgic_map_resources() prematurely marks the distributor as 'ready', potentially allowing vCPUs to enter the guest before the distributor's MMIO registration has been made visible. Plug the race by marking the distributor as ready only after MMIO registration is completed. Rely on the implied ordering of synchronize_srcu() to ensure the MMIO registration is visible before vgic_dist::ready. This also means that writers to vgic_dist::ready are now serialized by the slots_lock, which was effectively the case already as all writers held the slots_lock in addition to the config_lock. Fixes: `59112e9c39` ("KVM: arm64: vgic: Fix a circular locking issue") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241017001947.2707312-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-10-17 09:20:48 +01:00
Oliver Upton	5978d4ec7e	KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS KVM commits to a particular sizing of SPIs when the vgic is initialized, which is before the point a vgic becomes ready. On top of that, KVM supplies a default amount of SPIs should userspace not explicitly configure this. As such, the check for vgic_ready() in the handling of KVM_DEV_ARM_VGIC_GRP_NR_IRQS is completely wrong, and testing if nr_spis is nonzero is sufficient for preventing userspace from playing games with us. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241017001947.2707312-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-10-17 09:20:48 +01:00
Marc Zyngier	df5fd75ee3	KVM: arm64: Don't eagerly teardown the vgic on init error As there is very little ordering in the KVM API, userspace can instanciate a half-baked GIC (missing its memory map, for example) at almost any time. This means that, with the right timing, a thread running vcpu-0 can enter the kernel without a GIC configured and get a GIC created behind its back by another thread. Amusingly, it will pick up that GIC and start messing with the data structures without the GIC having been fully initialised. Similarly, a thread running vcpu-1 can enter the kernel, and try to init the GIC that was previously created. Since this GIC isn't properly configured (no memory map), it fails to correctly initialise. And that's the point where we decide to teardown the GIC, freeing all its resources. Behind vcpu-0's back. Things stop pretty abruptly, with a variety of symptoms. Clearly, this isn't good, we should be a bit more careful about this. It is obvious that this guest is not viable, as it is missing some important part of its configuration. So instead of trying to tear bits of it down, let's just mark it as dead. It means that any further interaction from userspace will result in -EIO. The memory will be released on the "normal" path, when userspace gives up. Cc: stable@vger.kernel.org Reported-by: Alexander Potapenko <glider@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241009183603.3221824-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-10-11 13:40:25 +01:00
Oliver Upton	ae8f8b3761	KVM: arm64: Unregister redistributor for failed vCPU creation Alex reports that syzkaller has managed to trigger a use-after-free when tearing down a VM: BUG: KASAN: slab-use-after-free in kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 Read of size 8 at addr ffffff801c6890d0 by task syz.3.2219/10758 CPU: 3 UID: 0 PID: 10758 Comm: syz.3.2219 Not tainted 6.11.0-rc6-dirty #64 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x17c/0x1a8 arch/arm64/kernel/stacktrace.c:317 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:324 __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x94/0xc0 lib/dump_stack.c:119 print_report+0x144/0x7a4 mm/kasan/report.c:377 kasan_report+0xcc/0x128 mm/kasan/report.c:601 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381 kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 kvm_vm_release+0x4c/0x60 virt/kvm/kvm_main.c:1409 __fput+0x198/0x71c fs/file_table.c:422 ____fput+0x20/0x30 fs/file_table.c:450 task_work_run+0x1cc/0x23c kernel/task_work.c:228 do_notify_resume+0x144/0x1a0 include/linux/resume_user_mode.h:50 el0_svc+0x64/0x68 arch/arm64/kernel/entry-common.c:169 el0t_64_sync_handler+0x90/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Upon closer inspection, it appears that we do not properly tear down the MMIO registration for a vCPU that fails creation late in the game, e.g. a vCPU w/ the same ID already exists in the VM. It is important to consider the context of commit that introduced this bug by moving the unregistration out of __kvm_vgic_vcpu_destroy(). That change correctly sought to avoid an srcu v. config_lock inversion by breaking up the vCPU teardown into two parts, one guarded by the config_lock. Fix the use-after-free while avoiding lock inversion by adding a special-cased unregistration to __kvm_vgic_vcpu_destroy(). This is safe because failed vCPUs are torn down outside of the config_lock. Cc: stable@vger.kernel.org Fixes: `f616506754` ("KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007223909.2157336-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-10-08 10:40:27 +01:00
Marc Zyngier	5cb57a1aff	KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest In order to be consistent, we shouldn't advertise a GICv3 when none is actually usable by the guest. Wipe the feature when these conditions apply, and allow the field to be written from userspace. This now allows us to rewrite the kvm_has_gicv3 helper() in terms of kvm_has_feat(), given that it is always evaluated at runtime. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	8d917e0a86	KVM: arm64: Force GICv3 trap activation when no irqchip is configured on VHE On a VHE system, no GICv3 traps get configured when no irqchip is present. This is not quite matching the "no GICv3" semantics that we want to present. Force such traps to be configured in this case. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	5739a961b5	KVM: arm64: Force SRE traps when SRE access is not enabled We so far only write the ICH_HCR_EL2 config in two situations: - when we need to emulate the GICv3 CPU interface due to HW bugs - when we do direct injection, as the virtual CPU interface needs to be enabled This is all good. But it also means that we don't do anything special when we emulate a GICv2, or that there is no GIC at all. What happens in this case when the guest uses the GICv3 system registers? The guest gets a trap for a sysreg access (EC=0x18) while we'd really like it to get an UNDEF. Fixing this is a bit involved: - we need to set all the required trap bits (TC, TALL0, TALL1, TDIR) - for these traps to take effect, we need to (counter-intuitively) set ICC_SRE_EL1.SRE to 1 so that the above traps take priority. Note that doesn't fully work when GICv2 emulation is enabled, as we cannot set ICC_SRE_EL1.SRE to 1 (it breaks Group0 delivery as IRQ). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	d2137ba8d8	KVM: arm64: Move GICv3 trap configuration to kvm_calculate_traps() Follow the pattern introduced with vcpu_set_hcr(), and introduce vcpu_set_ich_hcr(), which configures the GICv3 traps at the same point. This will allow future changes to introduce trap configuration on a per-VM basis. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	3e6245ebe7	KVM: arm64: Make ICC_SGI_EL1 undef in the absence of a vGICv3 On a system with a GICv3, if a guest hasn't been configured with GICv3 and that the host is not capable of GICv2 emulation, a write to any of the ICC_SGI_EL1 registers is trapped to EL2. We therefore try to emulate the SGI access, only to hit a NULL pointer as no private interrupt is allocated (no GIC, remember?). The obvious fix is to give the guest what it deserves, in the shape of a UNDEF exception. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240820100349.3544850-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-22 08:08:37 +00:00
Marc Zyngier	f616506754	KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors We recently moved the teardown of the vgic part of a vcpu inside a critical section guarded by the config_lock. This teardown phase involves calling into kvm_io_bus_unregister_dev(), which takes the kvm->srcu lock. However, this violates the established order where kvm->srcu is taken on a memory fault (such as an MMIO access), possibly followed by taking the config_lock if the GIC emulation requires mutual exclusion from the other vcpus. It therefore results in a bad lockdep splat, as reported by Zenghui. Fix this by moving the call to kvm_io_bus_unregister_dev() outside of the config_lock critical section. At this stage, there shouln't be any need to hold the config_lock. As an additional bonus, document the ordering between kvm->slots_lock, kvm->srcu and kvm->arch.config_lock so that I cannot pretend I didn't know about those anymore. Fixes: `9eb18136af` ("KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240819125045.3474845-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-19 17:05:21 +00:00
Zenghui Yu	2240a50e62	KVM: arm64: vgic-debug: Don't put unmarked LPIs If there were LPIs being mapped behind our back (i.e., between .start() and .stop()), we would put them at iter_unmark_lpis() without checking if they were actually marked, which is obviously not good. Switch to use the xa_for_each_marked() iterator to fix it. Cc: stable@vger.kernel.org Fixes: `85d3ccc8b7` ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240817101541.1664-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-19 17:04:36 +00:00
Marc Zyngier	9eb18136af	KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface Tearing down a vcpu CPU interface involves freeing the private interrupt array. If we don't hold the lock, we may race against another thread trying to configure it. Yeah, fuzzers do wonderful things... Taking the lock early solves this particular problem. Fixes: `03b3d00a70` ("KVM: arm64: vgic: Allocate private interrupts on demand") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240808091546.3262111-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-08 16:58:22 +00:00
Zenghui Yu	01ab08cafe	KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI In case the guest doesn't have any LPI, we previously relied on the iterator setting 'intid = nr_spis + VGIC_NR_PRIVATE_IRQS' && 'lpi_idx = 1' to exit the iterator. But it was broken with commit `85d3ccc8b7` ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") -- the intid remains at 'nr_spis + VGIC_NR_PRIVATE_IRQS - 1', and we end up endlessly printing the last SPI's state. Consider that it's meaningless to search the LPI xarray and populate lpi_idx when there is no LPI, let's just skip the process for that case. The result is that * If there's no LPI, we focus on the intid and exit the iterator when it runs out of the valid SPI range. * Otherwise we keep the current logic and let the xarray drive the iterator. Fixes: `85d3ccc8b7` ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240807052024.2084-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-07 19:10:22 +00:00
Sebastian Ott	19d837bc88	KVM: arm64: vgic: fix unexpected unlock sparse warnings Get rid of unexpected unlock sparse warnings in vgic code by adding an annotation to vgic_queue_irq_unlock(). arch/arm64/kvm/vgic/vgic.c:334:17: warning: context imbalance in 'vgic_queue_irq_unlock' - unexpected unlock arch/arm64/kvm/vgic/vgic.c:419:5: warning: context imbalance in 'kvm_vgic_inject_irq' - different lock contexts for basic block Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240723101204.7356-4-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-02 18:58:03 +00:00
Sebastian Ott	0aa34b37a7	KVM: arm64: fix kdoc warnings in W=1 builds Fix kdoc warnings by adding missing function parameter descriptions or by conversion to a normal comment. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240723101204.7356-3-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-02 18:58:03 +00:00
Marc Zyngier	0d92e4a7ff	KVM: arm64: Disassociate vcpus from redistributor region on teardown When tearing down a redistributor region, make sure we don't have any dangling pointer to that region stored in a vcpu. Fixes: `e5a3563546` ("kvm: arm64: vgic-v3: Introduce vgic_v3_free_redist_region()") Reported-by: Alexander Potapenko <glider@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240605175637.1635653-1-maz@kernel.org Cc: stable@vger.kernel.org	2024-06-06 08:54:15 +01:00
Paolo Bonzini	e5f62e27b1	KVM/arm64 updates for Linux 6.10 - Move a lot of state that was previously stored on a per vcpu basis into a per-CPU area, because it is only pertinent to the host while the vcpu is loaded. This results in better state tracking, and a smaller vcpu structure. - Add full handling of the ERET/ERETAA/ERETAB instructions in nested virtualisation. The last two instructions also require emulating part of the pointer authentication extension. As a result, the trap handling of pointer authentication has been greattly simplified. - Turn the global (and not very scalable) LPI translation cache into a per-ITS, scalable cache, making non directly injected LPIs much cheaper to make visible to the vcpu. - A batch of pKVM patches, mostly fixes and cleanups, as the upstreaming process seems to be resuming. Fingers crossed! - Allocate PPIs and SGIs outside of the vcpu structure, allowing for smaller EL2 mapping and some flexibility in implementing more or less than 32 private IRQs. - Purge stale mpidr_data if a vcpu is created after the MPIDR map has been created. - Preserve vcpu-specific ID registers across a vcpu reset. - Various minor cleanups and improvements. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmY/PT4ACgkQI9DQutE9 ekNwSA/7BTro0n5gP5/SfSFJeEedigpmHQJtHJk9og0LBzjXZTvYqKpI5J1HnpWE AFsDf3aDRPaSCvI+S14LkkK+TmGtVEXUg8YGytQo08IcO2x6xBT/YjpkVOHy23kq SGgNMPNUH2sycb7hTcz9Z/V0vBeYwFzYEAhmpvtROvmaRd8ZIyt+ofcclwUZZAQ2 SolOXR2d+ynCh8ZCOexqyZ67keikW1NXtW5aNWWFc6S6qhmcWdaWJGDcSyHauFac +YuHjPETJYh7TNpwYTmKclRh1fk/CgA/e+r71Hlgdkg+DGCyVnEZBQxqMi6GTzNC dzy3qhTtRT61SR54q55yMVIC3o6uRSkht+xNg1Nd+UghiqGKAtoYhvGjduodONW2 1Eas6O+vHipu98HgFnkJRPlnF1HR3VunPDwpzIWIZjK0fIXEfrWqCR3nHFaxShOR dniTEPfELguxOtbl3jCZ+KHCIXueysczXFlqQjSDkg/P1l0jKBgpkZzMPY2mpP1y TgjipfSL5gr1GPdbrmh4WznQtn5IYWduKIrdEmSBuru05OmBaCO4geXPUwL4coHd O8TBnXYBTN/z3lORZMSOj9uK8hgU1UWmnOIkdJ4YBBAL8DSS+O+KtCRkHQP0ghl+ whl0q1SWTu4LtOQzN5CUrhq9Tge11erEt888VyJbBJmv8x6qJjE= =CEfD -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 6.10 - Move a lot of state that was previously stored on a per vcpu basis into a per-CPU area, because it is only pertinent to the host while the vcpu is loaded. This results in better state tracking, and a smaller vcpu structure. - Add full handling of the ERET/ERETAA/ERETAB instructions in nested virtualisation. The last two instructions also require emulating part of the pointer authentication extension. As a result, the trap handling of pointer authentication has been greattly simplified. - Turn the global (and not very scalable) LPI translation cache into a per-ITS, scalable cache, making non directly injected LPIs much cheaper to make visible to the vcpu. - A batch of pKVM patches, mostly fixes and cleanups, as the upstreaming process seems to be resuming. Fingers crossed! - Allocate PPIs and SGIs outside of the vcpu structure, allowing for smaller EL2 mapping and some flexibility in implementing more or less than 32 private IRQs. - Purge stale mpidr_data if a vcpu is created after the MPIDR map has been created. - Preserve vcpu-specific ID registers across a vcpu reset. - Various minor cleanups and improvements.	2024-05-12 03:15:53 -04:00
Marc Zyngier	e28157060c	Merge branch kvm-arm64/misc-6.10 into kvmarm-master/next * kvm-arm64/misc-6.10: : . : Misc fixes and updates targeting 6.10 : : - Improve boot-time diagnostics when the sysreg tables : are not correctly sorted : : - Allow FFA_MSG_SEND_DIRECT_REQ in the FFA proxy : : - Fix duplicate XNX field in the ID_AA64MMFR1_EL1 : writeable mask : : - Allocate PPIs and SGIs outside of the vcpu structure, allowing : for smaller EL2 mapping and some flexibility in implementing : more or less than 32 private IRQs. : : - Use bitmap_gather() instead of its open-coded equivalent : : - Make protected mode use hVHE if available : : - Purge stale mpidr_data if a vcpu is created after the MPIDR : map has been created : . KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather() KVM: arm64: vgic: Allocate private interrupts on demand KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNX KVM: arm64: Remove FFA_MSG_SEND_DIRECT_REQ from the denylist KVM: arm64: Improve out-of-order sysreg table diagnostics Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-05-08 16:41:50 +01:00
Marc Zyngier	8540bd1b99	Merge branch kvm-arm64/pkvm-6.10 into kvmarm-master/next * kvm-arm64/pkvm-6.10: (25 commits) : . : At last, a bunch of pKVM patches, courtesy of Fuad Tabba. : From the cover letter: : : "This series is a bit of a bombay-mix of patches we've been : carrying. There's no one overarching theme, but they do improve : the code by fixing existing bugs in pKVM, refactoring code to : make it more readable and easier to re-use for pKVM, or adding : functionality to the existing pKVM code upstream." : . KVM: arm64: Force injection of a data abort on NISV MMIO exit KVM: arm64: Restrict supported capabilities for protected VMs KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap() KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst KVM: arm64: Rename firmware pseudo-register documentation file KVM: arm64: Reformat/beautify PTP hypercall documentation KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit KVM: arm64: Introduce and use predicates that check for protected VMs KVM: arm64: Add is_pkvm_initialized() helper KVM: arm64: Simplify vgic-v3 hypercalls KVM: arm64: Move setting the page as dirty out of the critical section KVM: arm64: Change kvm_handle_mmio_return() return polarity KVM: arm64: Fix comment for __pkvm_vcpu_init_traps() KVM: arm64: Prevent kmemleak from accessing .hyp.data KVM: arm64: Do not map the host fpsimd state to hyp in pKVM KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE KVM: arm64: Support TLB invalidation in guest context KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE KVM: arm64: Check for PTE validity when checking for executable/cacheable KVM: arm64: Avoid BUG-ing from the host abort path ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-05-03 11:39:52 +01:00
Marc Zyngier	03b3d00a70	KVM: arm64: vgic: Allocate private interrupts on demand Private interrupts are currently part of the CPU interface structure that is part of each and every vcpu we create. Currently, we have 32 of them per vcpu, resulting in a per-vcpu array that is just shy of 4kB. On its own, that's no big deal, but it gets in the way of other things: - each vcpu gets mapped at EL2 on nVHE/hVHE configurations. This requires memory that is physically contiguous. However, the EL2 code has no purpose looking at the interrupt structures and could do without them being mapped. - supporting features such as EPPIs, which extend the number of private interrupts past the 32 limit would make the array even larger, even for VMs that do not use the EPPI feature. Address these issues by moving the private interrupt array outside of the vcpu, and replace it with a simple pointer. We take this opportunity to make it obvious what gets initialised when, as that path was remarkably opaque, and tighten the locking. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240502154545.3012089-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-05-03 11:33:50 +01:00
Marc Zyngier	948e1a53c2	KVM: arm64: Simplify vgic-v3 hypercalls Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore hypercalls so that all of the EL2 GICv3 state is covered by a single pair of hypercalls. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-17-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-05-01 16:48:14 +01:00
Oliver Upton	481c9ee846	KVM: arm64: vgic-its: Get rid of the lpi_list_lock The last genuine use case for the lpi_list_lock was the global LPI translation cache, which has been removed in favor of a per-ITS xarray. Remove a layer from the locking puzzle by getting rid of it. vgic_add_lpi() still has a critical section that needs to protect against the insertion of other LPIs; change it to take the LPI xarray's xa_lock to retain this property. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-13-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:56 +01:00
Oliver Upton	ec39bbfd55	KVM: arm64: vgic-its: Rip out the global translation cache The MSI injection fast path has been transitioned away from the global translation cache. Rip it out. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-12-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:56 +01:00
Oliver Upton	e64f2918c6	KVM: arm64: vgic-its: Use the per-ITS translation cache for injection Everything is in place to switch to per-ITS translation caches. Start using the per-ITS cache to avoid the lock serialization related to the global translation cache. Explicitly check for out-of-range device and event IDs as the cache index is packed based on the range the ITS actually supports. Take the RCU read lock to protect against the returned descriptor being freed while trying to take a reference on it, as it is no longer necessary to acquire the lpi_list_lock. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-11-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	dedfcd17fa	KVM: arm64: vgic-its: Spin off helper for finding ITS by doorbell addr The fast path will soon need to find an ITS by doorbell address, as the translation caches will become local to an ITS. Spin off a helper to do just that. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-10-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	8201d1028c	KVM: arm64: vgic-its: Maintain a translation cache per ITS Within the context of a single ITS, it is possible to use an xarray to cache the device ID & event ID translation to a particular irq descriptor. Take advantage of this to build a translation cache capable of fitting all valid translations for a given ITS. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-9-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	c09c8ab99a	KVM: arm64: vgic-its: Scope translation cache invalidations to an ITS As the current LPI translation cache is global, the corresponding invalidation helpers are also globally-scoped. In anticipation of constructing a translation cache per ITS, add a helper for scoped cache invalidations. We still need to support global invalidations when LPIs are toggled on a redistributor, as a property of the translation cache is that all stored LPIs are known to be delieverable. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-8-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	30a0ce9c49	KVM: arm64: vgic-its: Get rid of vgic_copy_lpi_list() The last user has been transitioned to walking the LPI xarray directly. Cut the wart off, and get rid of the now unneeded lpi_count while doing so. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-7-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	85d3ccc8b7	KVM: arm64: vgic-debug: Use an xarray mark for debug iterator The vgic debug iterator is the final user of vgic_copy_lpi_list(), but is a bit more complicated to transition to something else. Use a mark in the LPI xarray to record the indices 'known' to the debug iterator. Protect against the LPIs from being freed by associating an additional reference with the xarray mark. Rework iter_next() to let the xarray walk 'drive' the iteration after visiting all of the SGIs, PPIs, and SPIs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-6-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	11f4f8f3e6	KVM: arm64: vgic-its: Walk LPI xarray in vgic_its_cmd_handle_movall() The new LPI xarray makes it possible to walk the VM's LPIs without holding a lock, meaning that vgic_copy_lpi_list() is no longer necessary. Prepare for the deletion by walking the LPI xarray directly in vgic_its_cmd_handle_movall(). Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-5-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	c64115c80f	KVM: arm64: vgic-its: Walk LPI xarray in vgic_its_invall() The new LPI xarray makes it possible to walk the VM's LPIs without holding a lock, meaning that vgic_copy_lpi_list() is no longer necessary. Prepare for the deletion by walking the LPI xarray directly in vgic_its_invall(). Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	720f73b750	KVM: arm64: vgic-its: Walk LPI xarray in its_sync_lpi_pending_table() The new LPI xarray makes it possible to walk the VM's LPIs without holding a lock, meaning that vgic_copy_lpi_list() is no longer necessary. Prepare for the deletion by walking the LPI xarray directly in its_sync_lpi_pending_table(). Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-04-25 13:19:55 +01:00
Oliver Upton	6ddb4f372f	KVM: arm64: vgic-v2: Check for non-NULL vCPU in vgic_v2_parse_attr() vgic_v2_parse_attr() is responsible for finding the vCPU that matches the user-provided CPUID, which (of course) may not be valid. If the ID is invalid, kvm_get_vcpu_by_id() returns NULL, which isn't handled gracefully. Similar to the GICv3 uaccess flow, check that kvm_get_vcpu_by_id() actually returns something and fail the ioctl if not. Cc: stable@vger.kernel.org Fixes: `7d450e2821` ("KVM: arm/arm64: vgic-new: Add userland access to VGIC dist registers") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240424173959.3776798-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-04-24 19:09:35 +00:00
Paolo Bonzini	961e2bfcf3	KVM/arm64 updates for 6.9 - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZepBjgAKCRCivnWIJHzd FnngAP93VxjCkJ+5qSmYpFNG6r0ECVIbLHFQ59nKn0+GgvbPEgEAwt8svdLdW06h njFTpdzvl4Po+aD/V9xHgqVz3kVvZwE= =1FbW -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.9 - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests	2024-03-11 10:02:32 -04:00
Oliver Upton	4a09ddb833	Merge branch kvm-arm64/kerneldoc into kvmarm/next * kvm-arm64/kerneldoc: : kerneldoc warning fixes, courtesy of Randy Dunlap : : Fixes addressing the widespread misuse of kerneldoc-style comments : throughout KVM/arm64. KVM: arm64: vgic: fix a kernel-doc warning KVM: arm64: vgic-its: fix kernel-doc warnings KVM: arm64: vgic-init: fix a kernel-doc warning KVM: arm64: sys_regs: fix kernel-doc warnings KVM: arm64: PMU: fix kernel-doc warnings KVM: arm64: mmu: fix a kernel-doc warning KVM: arm64: vhe: fix a kernel-doc warning KVM: arm64: hyp/aarch32: fix kernel-doc warnings KVM: arm64: guest: fix kernel-doc warnings KVM: arm64: debug: fix kernel-doc warnings Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-03-07 00:56:16 +00:00
Oliver Upton	8dbc41105e	Merge branch kvm-arm64/lpi-xarray into kvmarm/next * kvm-arm64/lpi-xarray: : xarray-based representation of vgic LPIs : : KVM's linked-list of LPI state has proven to be a bottleneck in LPI : injection paths, due to lock serialization when acquiring / releasing a : reference on an IRQ. : : Start the tedious process of reworking KVM's LPI injection by replacing : the LPI linked-list with an xarray, leveraging this to allow RCU readers : to walk it outside of the spinlock. KVM: arm64: vgic: Don't acquire the lpi_list_lock in vgic_put_irq() KVM: arm64: vgic: Ensure the irq refcount is nonzero when taking a ref KVM: arm64: vgic: Rely on RCU protection in vgic_get_lpi() KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe manner KVM: arm64: vgic: Use atomics to count LPIs KVM: arm64: vgic: Get rid of the LPI linked-list KVM: arm64: vgic-its: Walk the LPI xarray in vgic_copy_lpi_list() KVM: arm64: vgic-v3: Iterate the xarray to find pending LPIs KVM: arm64: vgic: Use xarray to find LPI in vgic_get_lpi() KVM: arm64: vgic: Store LPIs in an xarray Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-03-07 00:55:53 +00:00
Bjorn Helgaas	75841d89f3	KVM: arm64: Fix typos Fix typos, most reported by "codespell arch/arm64". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.linux.dev Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240103231605.1801364-6-helgaas@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-24 09:13:33 +00:00
Oliver Upton	e27f2d561f	KVM: arm64: vgic: Don't acquire the lpi_list_lock in vgic_put_irq() The LPI xarray's xa_lock is sufficient for synchronizing writers when freeing a given LPI. Furthermore, readers can only take a new reference on an IRQ if it was already nonzero. Stop taking the lpi_list_lock unnecessarily and get rid of __vgic_put_lpi_locked(). Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-11-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	50ac89bb70	KVM: arm64: vgic: Ensure the irq refcount is nonzero when taking a ref It will soon be possible for get() and put() calls to happen in parallel, which means in most cases we must ensure the refcount is nonzero when taking a new reference. Switch to using vgic_try_get_irq_kref() where necessary, and document the few conditions where an IRQ's refcount is guaranteed to be nonzero. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-10-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	864d4304ec	KVM: arm64: vgic: Rely on RCU protection in vgic_get_lpi() Stop acquiring the lpi_list_lock in favor of RCU for protecting the read-side critical section in vgic_get_lpi(). In order for this to be safe, we also need to be careful not to take a reference on an irq with a refcount of 0, as it is about to be freed. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	a5c7f011cb	KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe manner Free the vgic_irq structs in an RCU-safe manner to allow reads of the LPI configuration data to happen in parallel with the release of LPIs. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	05f4d4f5d4	KVM: arm64: vgic: Use atomics to count LPIs Switch to using atomics for LPI accounting, allowing vgic_irq references to be dropped in parallel. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	9880835af7	KVM: arm64: vgic: Get rid of the LPI linked-list All readers of LPI configuration have been transitioned to use the LPI xarray. Get rid of the linked-list altogether. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	2798683b8c	KVM: arm64: vgic-its: Walk the LPI xarray in vgic_copy_lpi_list() Start iterating the LPI xarray in anticipation of removing the LPI linked-list. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	49f0a468a1	KVM: arm64: vgic-v3: Iterate the xarray to find pending LPIs Start walking the LPI xarray to find pending LPIs in preparation for the removal of the LPI linked-list. Note that the 'basic' iterator is chosen here as each iteration needs to drop the xarray read lock (RCU) as reads/writes to guest memory can potentially block. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	5a021df719	KVM: arm64: vgic: Use xarray to find LPI in vgic_get_lpi() Iterating over the LPI linked-list is less than ideal when the desired index is already known. Use the INTID to index the LPI xarray instead. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:02 +00:00
Oliver Upton	1d6f83f60f	KVM: arm64: vgic: Store LPIs in an xarray Using a linked-list for LPIs is less than ideal as it of course requires iterative searches to find a particular entry. An xarray is a better data structure for this use case, as it provides faster searches and can still handle a potentially sparse range of INTID allocations. Start by storing LPIs in an xarray, punting usage of the xarray to a subsequent change. The observant among you will notice that we added yet another lock to the chain of locking order rules; document the ordering of the xa_lock. Don't worry, we'll get rid of the lpi_list_lock one day... Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-23 21:46:01 +00:00
Oliver Upton	85a71ee9a0	KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler It is possible that an LPI mapped in a different ITS gets unmapped while handling the MOVALL command. If that is the case, there is no state that can be migrated to the destination. Silently ignore it and continue migrating other LPIs. Cc: stable@vger.kernel.org Fixes: `ff9c114394` ("KVM: arm/arm64: GICv4: Handle MOVALL applied to a vPE") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240221092732.4126848-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-02-21 10:06:41 +00:00
Oliver Upton	8d3a7dfb80	KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table() vgic_get_irq() may not return a valid descriptor if there is no ITS that holds a valid translation for the specified INTID. If that is the case, it is safe to silently ignore it and continue processing the LPI pending table. Cc: stable@vger.kernel.org Fixes: `33d3bc9556` ("KVM: arm64: vgic-its: Read initial LPI pending table") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240221092732.4126848-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-02-21 10:06:41 +00:00
Randy Dunlap	e634ff9598	KVM: arm64: vgic: fix a kernel-doc warning Use the correct function name in a kernel-doc comment to prevent a warning: arch/arm64/kvm/vgic/vgic.c:217: warning: expecting prototype for kvm_vgic_target_oracle(). Prototype was for vgic_target_oracle() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.linux.dev Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240117230714.31025-11-rdunlap@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-01 20:25:42 +00:00
Randy Dunlap	f779d2c017	KVM: arm64: vgic-its: fix kernel-doc warnings Correct the function parameter name "@save tables" -> "@save_tables". Use the "typedef" keyword in the kernel-doc comment for a typedef. These changes prevent kernel-doc warnings: vgic/vgic-its.c:174: warning: Function parameter or struct member 'save_tables' not described in 'vgic_its_abi' arch/arm64/kvm/vgic/vgic-its.c:2152: warning: expecting prototype for entry_fn_t(). Prototype was for int() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.linux.dev Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240117230714.31025-10-rdunlap@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-01 20:25:42 +00:00
Randy Dunlap	dd609a574a	KVM: arm64: vgic-init: fix a kernel-doc warning Change the function comment block to kernel-doc format to prevent a kernel-doc warning: arch/arm64/kvm/vgic/vgic-init.c:448: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Map the MMIO regions depending on the VGIC model exposed to the guest Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.linux.dev Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240117230714.31025-9-rdunlap@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-02-01 20:25:42 +00:00
Paolo Bonzini	5f53d88f10	KVM/arm64 updates for Linux 6.8 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmWX4wUACgkQI9DQutE9 ekM0DxAAvOJtM+m8ahv2tCSHZpwowkuKBBc7JWI75l4befHEOSvYMZwQwejrequa lPwLgx9t0sGjba+tRGv1JZMtnUBjV4V/lcrhX95AYTF5dfg7vbuTxUh/YFu1CaQ/ MkuKVJ74PUWqpvDYSzwW8Jjqu6RskjW0HqVPMbFkmUWWc8cgExc8XD9M+nu0SrNT g5261KD53CUeyNaR0/+zkaHouq2Skeqw/u2d5OLdnY23hINMZ0qR1jYHj935suYy YrMTiMje1h/fs7YXWra4LmMcsg0V+3LZVQJXwRARrZdk2xkW5w+eLPIYjVqcA7aT VwhrtzjEzD56trrSZClOpj7MSVfQ8OjV7BgvSUpgLT5+kjVrFLIEMIOakiTOCoIJ weweRawTyomUoIsT1EkRmRYQkPH3Z552tcrztD/slYvqrtCB4JcHKF0O7BT88ZfM t2hRhlT+32KR9cOciLfFMzlZI1uKQYF8Z+CvvBA5TJ9Hv8JsIwF2E/NjYUy2ilca iDzF5KdZ/OLQzjwWVWDq9OlvepB2rLGQKNnw67jd1BSzd9Jj3eVuaI/9xRBrLDYR cBOMoIaZMy7Va+pop1zoFEhC7IbTglVHzsj2ch+4F1NB/1+Dd0zBQKbDUPqp5TR/ OOuonTTVk9yH6RgpUULKlbRZ4oU70UoOBFBxCqnvng0cw1KBbbA= =Q6c+ -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 6.8 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups.	2024-01-08 08:09:53 -05:00
Oliver Upton	ad362fe07f	KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache There is a potential UAF scenario in the case of an LPI translation cache hit racing with an operation that invalidates the cache, such as a DISCARD ITS command. The root of the problem is that vgic_its_check_cache() does not elevate the refcount on the vgic_irq before dropping the lock that serializes refcount changes. Have vgic_its_check_cache() raise the refcount on the returned vgic_irq and add the corresponding decrement after queueing the interrupt. Cc: stable@vger.kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240104183233.3560639-1-oliver.upton@linux.dev	2024-01-04 19:26:34 +00:00
Paolo Bonzini	5c2b2176ea	KVM/arm64 fixes for 6.7, part #2 - Ensure a vCPU's redistributor is unregistered from the MMIO bus if vCPU creation fails - Fix building KVM selftests for arm64 from the top-level Makefile -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZYCYmAAKCRCivnWIJHzd FhU+AQDqIOIg3VMV+VjxhrG5aiHccq9o1mczO4LL9FQUO9AdYwD/SbTP4puBlfai gOFQDuvJFogTwKmYPDO2jycp1ekTuQ0= =RhfO -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 6.7, part #2 - Ensure a vCPU's redistributor is unregistered from the MMIO bus if vCPU creation fails - Fix building KVM selftests for arm64 from the top-level Makefile	2023-12-22 18:03:54 -05:00
Oliver Upton	39084ba8d0	KVM: arm64: vgic-v3: Reinterpret user ISPENDR writes as I{C,S}PENDR User writes to ISPENDR for GICv3 are treated specially, as zeroes actually clear the pending state for interrupts (unlike HW). Reimplement it using the ISPENDR and ICPENDR user accessors. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231219065855.1019608-4-oliver.upton@linux.dev	2023-12-22 09:34:27 +00:00
Oliver Upton	561851424d	KVM: arm64: vgic: Use common accessor for writes to ICPENDR Fold MMIO and user accessors into a common helper while maintaining the distinction between the two. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231219065855.1019608-3-oliver.upton@linux.dev	2023-12-22 09:34:17 +00:00
Oliver Upton	13886f3444	KVM: arm64: vgic: Use common accessor for writes to ISPENDR Perhaps unsurprisingly, there is a considerable amount of duplicate code between the MMIO and user accessors for ISPENDR. At the same time there are some important differences between user and guest MMIO, like how SGIs can only be made pending from userspace. Fold user and MMIO accessors into a common helper, maintaining the distinction between the two. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231219065855.1019608-2-oliver.upton@linux.dev	2023-12-22 09:33:54 +00:00
Marc Zyngier	7b95382f96	KVM: arm64: vgic-v4: Restore pending state on host userspace write When the VMM writes to ISPENDR0 to set the state pending state of an SGI, we fail to convey this to the HW if this SGI is already backed by a GICv4.1 vSGI. This is a bit of a corner case, as this would only occur if the vgic state is changed on an already running VM, but this can apparently happen across a guest reset driven by the VMM. Fix this by always writing out the pending_latch value to the HW, and reseting it to false. Reported-by: Kunkun Jiang <jiangkunkun@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Cc: stable@vger.kernel.org # 5.10+ Link: https://lore.kernel.org/r/7e7f2c0c-448b-10a9-8929-4b8f4f6e2a32@huawei.com	2023-12-22 09:27:36 +00:00
Marc Zyngier	6bef365e31	KVM: arm64: vgic: Ensure that slots_lock is held in vgic_register_all_redist_iodevs() Although we implicitly depend on slots_lock being held when registering IO devices with the IO bus infrastructure, we don't enforce this requirement. Make it explicit. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-12-12 07:11:38 +00:00
Marc Zyngier	02e3858f08	KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroy When failing to create a vcpu because (for example) it has a duplicate vcpu_id, we destroy the vcpu. Amusingly, this leaves the redistributor registered with the KVM_MMIO bus. This is no good, and we should properly clean the mess. Force a teardown of the vgic vcpu interface, including the RD device before returning to the caller. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-12-12 07:11:38 +00:00
Marc Zyngier	d26b9cb33c	KVM: arm64: vgic: Add a non-locking primitive for kvm_vgic_vcpu_destroy() As we are going to need to call into kvm_vgic_vcpu_destroy() without prior holding of the slots_lock, introduce __kvm_vgic_vcpu_destroy() as a non-locking primitive of kvm_vgic_vcpu_destroy(). Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-12-12 07:11:38 +00:00
Marc Zyngier	01ad29d224	KVM: arm64: vgic: Simplify kvm_vgic_destroy() When destroying a vgic, we have rather cumbersome rules about when slots_lock and config_lock are held, resulting in fun buglets. The first port of call is to simplify kvm_vgic_map_resources() so that there is only one call to kvm_vgic_destroy() instead of two, with the second only holding half of the locks. For that, we kill the non-locking primitive and move the call outside of the locking altogether. This doesn't change anything (we re-acquire the locks and teardown the whole vgic), and simplifies the code significantly. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-12-12 07:11:38 +00:00
Kunkun Jiang	8e4ece6889	KVM: arm64: GICv4: Do not perform a map to a mapped vLPI Before performing a map, let's check whether the vLPI has been mapped. Fixes: `196b136498` ("KVM: arm/arm64: GICv4: Wire mapping/unmapping of VLPIs in VFIO irq bypass") Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20231120131210.2039-1-jiangkunkun@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-11-20 19:13:32 +00:00
Linus Torvalds	6803bd7956	ARM: * Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest * Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table * Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM * Guest support for memory operation instructions (FEAT_MOPS) * Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code * Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use * Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations * Miscellaneous kernel and selftest fixes LoongArch: * New architecture. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: * Support for the Smstateen and Zicond extensions * Support for virtualizing senvcfg * Support for virtualized SBI debug console (DBCN) S390: * Nested page table management can be monitored through tracepoints and statistics x86: * Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ * Avoid WARN on systems with Intel IPI virtualization * Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. * Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). * Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. * Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. * "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. * Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. * Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. * Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. * Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. * Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. * Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. * Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. * Optimize injection of PMU interrupts that are simultaneous with NMIs. * Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: * Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. * Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. * Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y. This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: * Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. * Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: * Various updates and fixes, mostly for x86 * MTRR and PAT fixes and optimizations: -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVBZc0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroP1LQf+NgsmZ1lkGQlKdSdijoQ856w+k0or l2SV1wUwiEdFPSGK+RTUlHV5Y1ni1dn/CqCVIJZKEI3ZtZ1m9/4HKIRXvbMwFHIH hx+E4Lnf8YUjsGjKTLd531UKcpphztZavQ6pXLEwazkSkDEra+JIKtooI8uU+9/p bd/eF1V+13a8CHQf1iNztFJVxqBJbVlnPx4cZDRQQvewskIDGnVDtwbrwCUKGtzD eNSzhY7si6O2kdQNkuA8xPhg29dYX9XLaCK2K1l8xOUm8WipLdtF86GAKJ5BVuOL 6ek/2QCYjZ7a+coAZNfgSEUi8JmFHEqCo7cnKmWzPJp+2zyXsdudqAhT1g== =UIxm -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes LoongArch: - New architecture for kvm. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: - Support for the Smstateen and Zicond extensions - Support for virtualizing senvcfg - Support for virtualized SBI debug console (DBCN) S390: - Nested page table management can be monitored through tracepoints and statistics x86: - Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ - Avoid WARN on systems with Intel IPI virtualization - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. - Optimize injection of PMU interrupts that are simultaneous with NMIs. - Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: - Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. - Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. - Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: - Various updates and fixes, mostly for x86 - MTRR and PAT fixes and optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits) KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: x86: Service NMI requests after PMI requests in VM-Enter path KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} ...	2023-11-02 15:45:15 -10:00
Oliver Upton	54b44ad26c	Merge branch kvm-arm64/sgi-injection into kvmarm/next * kvm-arm64/sgi-injection: : vSGI injection improvements + fixes, courtesy Marc Zyngier : : Avoid linearly searching for vSGI targets using a compressed MPIDR to : index a cache. While at it, fix some egregious bugs in KVM's mishandling : of vcpuid (user-controlled value) and vcpu_idx. KVM: arm64: Clarify the ordering requirements for vcpu/RD creation KVM: arm64: vgic-v3: Optimize affinity-based SGI injection KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available KVM: arm64: Build MPIDR to vcpu index cache at runtime KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() KVM: arm64: Use vcpu_idx for invalidation tracking KVM: arm64: vgic: Use vcpu_idx for the debug information KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id KVM: arm64: vgic-v3: Refactor GICv3 SGI generation KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-10-30 20:19:13 +00:00
Marc Zyngier	fe49fd940e	KVM: arm64: Move VTCR_EL2 into struct s2_mmu We currently have a global VTCR_EL2 value for each guest, even if the guest uses NV. This implies that the guest's own S2 must fit in the host's. This is odd, for multiple reasons: - the PARange values and the number of IPA bits don't necessarily match: you can have 33 bits of IPA space, and yet you can only describe 32 or 36 bits of PARange - When userspace set the IPA space, it creates a contract with the kernel saying "this is the IPA space I'm prepared to handle". At no point does it constraint the guest's own IPA space as long as the guest doesn't try to use a [I]PA outside of the IPA space set by userspace - We don't even try to hide the value of ID_AA64MMFR0_EL1.PARange. And then there is the consequence of the above: if a guest tries to create a S2 that has for input address something that is larger than the IPA space defined by the host, we inject a fatal exception. This is no good. For all intent and purposes, a guest should be able to have the S2 it really wants, as long as the output address of that S2 isn't outside of the IPA space. For that, we need to have a per-s2_mmu VTCR_EL2 setting, which allows us to represent the full PARange. Move the vctr field into the s2_mmu structure, which has no impact whatsoever, except for NV. Note that once we are able to override ID_AA64MMFR0_EL1.PARange from userspace, we'll also be able to restrict the size of the shadow S2 that NV uses. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012205108.3937270-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-10-23 18:48:46 +00:00
Mark Rutland	d8569fba13	arm64: kvm: Use cpus_have_final_cap() explicitly Much of the arm64 KVM code uses cpus_have_const_cap() to check for cpucaps, but this is unnecessary and it would be preferable to use cpus_have_final_cap(). For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. KVM is initialized after cpucaps have been finalized and alternatives have been patched. Since commit: `d86de40dec` ("arm64: cpufeature: upgrade hyp caps to final") ... use of cpus_have_const_cap() in hyp code is automatically converted to use cpus_have_final_cap(): \| static __always_inline bool cpus_have_const_cap(int num) \| { \| if (is_hyp_code()) \| return cpus_have_final_cap(num); \| else if (system_capabilities_finalized()) \| return __cpus_have_const_cap(num); \| else \| return cpus_have_cap(num); \| } Thus, converting hyp code to use cpus_have_final_cap() directly will not result in any functional change. Non-hyp KVM code is also not executed until cpucaps have been finalized, and it would be preferable to extent the same treatment to this code and use cpus_have_final_cap() directly. This patch converts instances of cpus_have_const_cap() in KVM-only code over to cpus_have_final_cap(). As all of this code runs after cpucaps have been finalized, there should be no functional change as a result of this patch, but the redundant instructions generated by cpus_have_const_cap() will be removed from the non-hyp KVM code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 12:57:56 +01:00
Marc Zyngier	b5daffb120	KVM: arm64: vgic-v3: Optimize affinity-based SGI injection Our affinity-based SGI injection code is a bit daft. We iterate over all the CPUs trying to match the set of affinities that the guest is trying to reach, leading to some very bad behaviours if the selected targets are at a high vcpu index. Instead, we can now use the fact that we have an optimised MPIDR to vcpu mapping, and only look at the relevant values. This results in a much faster injection for large VMs, and in a near constant time, irrespective of the position in the vcpu index space. As a bonus, this is mostly deleting a lot of hard-to-read code. Nobody will complain about that. Suggested-by: Xu Zhao <zhaoxu.35@bytedance.com> Tested-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:44 +00:00
Marc Zyngier	ac0fe56d46	KVM: arm64: vgic: Use vcpu_idx for the debug information When dumping the debug information, use vcpu_idx instead of vcpu_id, as this is independent of any userspace influence. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:43 +00:00
Marc Zyngier	4e7728c81a	KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id When parsing a GICv2 attribute that contains a cpuid, handle this as the vcpu_id, not a vcpu_idx, as userspace cannot really know the mapping between the two. For this, use kvm_get_vcpu_by_id() instead of kvm_get_vcpu(). Take this opportunity to get rid of the pointless check against online_vcpus, which doesn't make much sense either, and switch to FIELD_GET as a way to extract the vcpu_id. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:43 +00:00
Marc Zyngier	f3f60a5653	KVM: arm64: vgic-v3: Refactor GICv3 SGI generation As we're about to change the way SGIs are sent, start by splitting out some of the basic functionnality: instead of intermingling the broadcast and non-broadcast cases with the actual SGI generation, perform the following cleanups: - move the SGI queuing into its own helper - split the broadcast code from the affinity-driven code - replace the mask/shift combinations with FIELD_GET() - fix the confusion between vcpu_id and vcpu when handling the broadcast case The result is much more readable, and paves the way for further optimisations. Tested-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:43 +00:00
Marc Zyngier	d455d366c4	KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id Since our emulated ITS advertises GITS_TYPER.PTA=0, the target address associated to a collection is a PE number and not an address. So far, so good. However, the PE number is what userspace has provided given us (aka the vcpu_id), and not the internal vcpu index. Make sure we consistently retrieve the vcpu by ID rather than by index, adding a helper that deals with most of the cases. We also get rid of the pointless (and bogus) comparisons to online_vcpus, which don't really make sense. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:43 +00:00
Marc Zyngier	9a0a75d3cc	KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Passing a vcpu_id to kvm_vgic_inject_irq() is silly for two reasons: - we often confuse vcpu_id and vcpu_idx - we eventually have to convert it back to a vcpu - we can't count Instead, pass a vcpu pointer, which is unambiguous. A NULL vcpu is also allowed for interrupts that are not private to a vcpu (such as SPIs). Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-09-30 18:15:43 +00:00
Yue Haibing	a6b33d009f	KVM: arm64: Remove unused declarations Commit `53692908b0` ("KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI") removed vgic_v2_set_npie()/vgic_v3_set_npie() but not the declarations. Commit `29eb5a3c57` ("KVM: arm64: Handle PtrAuth traps early") left behind kvm_arm_vcpu_ptrauth_trap(), remove it. Commit `2a0c343386` ("KVM: arm64: Initialize trap registers for protected VMs") declared but never implemented kvm_init_protected_traps() and commit `cf5d318865` ("arm/arm64: KVM: Turn off vcpus on PSCI shutdown/reboot") declared but never implemented force_vm_exit(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230814140636.45988-1-yuehaibing@huawei.com	2023-08-15 20:27:32 +01:00
Marc Zyngier	b321c31c9b	KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when running a preemptible kernel, as it is possible that a vCPU is blocked without requesting a doorbell interrupt. The issue is that any preemption that occurs between vgic_v4_put() and schedule() on the block path will mark the vPE as nonresident and not request a doorbell irq. This occurs because when the vcpu thread is resumed on its way to block, vcpu_load() will make the vPE resident again. Once the vcpu actually blocks, we don't request a doorbell anymore, and the vcpu won't be woken up on interrupt delivery. Fix it by tracking that we're entering WFI, and key the doorbell request on that flag. This allows us not to make the vPE resident when going through a preempt/schedule cycle, meaning we don't lose any state. Cc: stable@vger.kernel.org Fixes: `8e01d9a396` ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Suggested-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Co-developed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-07-13 22:23:34 +00:00
Marc Zyngier	1caa71a7a6	KVM: arm64: Restore GICv2-on-GICv3 functionality When reworking the vgic locking, the vgic distributor registration got simplified, which was a very good cleanup. But just a tad too radical, as we now register the native vgic only, ignoring the GICv2-on-GICv3 that allows pre-historic VMs (or so I thought) to run. As it turns out, QEMU still defaults to GICv2 in some cases, and this breaks Nathan's setup! Fix it by propagating the requested vgic type rather than the host's version. Fixes: `59112e9c39` ("KVM: arm64: vgic: Fix a circular locking issue") Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> link: https://lore.kernel.org/r/20230606221525.GA2269598@dev-arch.thelio-3990X	2023-06-07 16:38:25 +01:00
Jean-Philippe Brucker	6254873226	KVM: arm64: vgic: Fix a comment It is host userspace, not the guest, that issues KVM_DEV_ARM_VGIC_GRP_CTRL Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-5-jean-philippe@linaro.org	2023-05-19 10:20:00 +01:00
Jean-Philippe Brucker	c38b8400ae	KVM: arm64: vgic: Fix locking comment It is now config_lock that must be held, not kvm lock. Replace the comment with a lockdep annotation. Fixes: `f003277311` ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-4-jean-philippe@linaro.org	2023-05-19 10:20:00 +01:00
Jean-Philippe Brucker	9cf2f840c4	KVM: arm64: vgic: Wrap vgic_its_create() with config_lock vgic_its_create() changes the vgic state without holding the config_lock, which triggers a lockdep warning in vgic_v4_init(): [ 358.667941] WARNING: CPU: 3 PID: 178 at arch/arm64/kvm/vgic/vgic-v4.c:245 vgic_v4_init+0x15c/0x7a8 ... [ 358.707410] vgic_v4_init+0x15c/0x7a8 [ 358.708550] vgic_its_create+0x37c/0x4a4 [ 358.709640] kvm_vm_ioctl+0x1518/0x2d80 [ 358.710688] __arm64_sys_ioctl+0x7ac/0x1ba8 [ 358.711960] invoke_syscall.constprop.0+0x70/0x1e0 [ 358.713245] do_el0_svc+0xe4/0x2d4 [ 358.714289] el0_svc+0x44/0x8c [ 358.715329] el0t_64_sync_handler+0xf4/0x120 [ 358.716615] el0t_64_sync+0x190/0x194 Wrap the whole of vgic_its_create() with config_lock since, in addition to calling vgic_v4_init(), it also modifies the global kvm->arch.vgic state. Fixes: `f003277311` ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-3-jean-philippe@linaro.org	2023-05-19 10:20:00 +01:00
Jean-Philippe Brucker	59112e9c39	KVM: arm64: vgic: Fix a circular locking issue Lockdep reports a circular lock dependency between the srcu and the config_lock: [ 262.179917] -> #1 (&kvm->srcu){.+.+}-{0:0}: [ 262.182010] __synchronize_srcu+0xb0/0x224 [ 262.183422] synchronize_srcu_expedited+0x24/0x34 [ 262.184554] kvm_io_bus_register_dev+0x324/0x50c [ 262.185650] vgic_register_redist_iodev+0x254/0x398 [ 262.186740] vgic_v3_set_redist_base+0x3b0/0x724 [ 262.188087] kvm_vgic_addr+0x364/0x600 [ 262.189189] vgic_set_common_attr+0x90/0x544 [ 262.190278] vgic_v3_set_attr+0x74/0x9c [ 262.191432] kvm_device_ioctl+0x2a0/0x4e4 [ 262.192515] __arm64_sys_ioctl+0x7ac/0x1ba8 [ 262.193612] invoke_syscall.constprop.0+0x70/0x1e0 [ 262.195006] do_el0_svc+0xe4/0x2d4 [ 262.195929] el0_svc+0x44/0x8c [ 262.196917] el0t_64_sync_handler+0xf4/0x120 [ 262.198238] el0t_64_sync+0x190/0x194 [ 262.199224] [ 262.199224] -> #0 (&kvm->arch.config_lock){+.+.}-{3:3}: [ 262.201094] __lock_acquire+0x2b70/0x626c [ 262.202245] lock_acquire+0x454/0x778 [ 262.203132] __mutex_lock+0x190/0x8b4 [ 262.204023] mutex_lock_nested+0x24/0x30 [ 262.205100] vgic_mmio_write_v3_misc+0x5c/0x2a0 [ 262.206178] dispatch_mmio_write+0xd8/0x258 [ 262.207498] __kvm_io_bus_write+0x1e0/0x350 [ 262.208582] kvm_io_bus_write+0xe0/0x1cc [ 262.209653] io_mem_abort+0x2ac/0x6d8 [ 262.210569] kvm_handle_guest_abort+0x9b8/0x1f88 [ 262.211937] handle_exit+0xc4/0x39c [ 262.212971] kvm_arch_vcpu_ioctl_run+0x90c/0x1c04 [ 262.214154] kvm_vcpu_ioctl+0x450/0x12f8 [ 262.215233] __arm64_sys_ioctl+0x7ac/0x1ba8 [ 262.216402] invoke_syscall.constprop.0+0x70/0x1e0 [ 262.217774] do_el0_svc+0xe4/0x2d4 [ 262.218758] el0_svc+0x44/0x8c [ 262.219941] el0t_64_sync_handler+0xf4/0x120 [ 262.221110] el0t_64_sync+0x190/0x194 Note that the current report, which can be triggered by the vgic_irq kselftest, is a triple chain that includes slots_lock, but after inverting the slots_lock/config_lock dependency, the actual problem reported above remains. In several places, the vgic code calls kvm_io_bus_register_dev(), which synchronizes the srcu, while holding config_lock (#1). And the MMIO handler takes the config_lock while holding the srcu read lock (#0). Break dependency #1, by registering the distributor and redistributors without holding config_lock. The ITS also uses kvm_io_bus_register_dev() but already relies on slots_lock to serialize calls. The distributor iodev is created on the first KVM_RUN call. Multiple threads will race for vgic initialization, and only the first one will see !vgic_ready() under the lock. To serialize those threads, rely on slots_lock rather than config_lock. Redistributors are created earlier, through KVM_DEV_ARM_VGIC_GRP_ADDR ioctls and vCPU creation. Similarly, serialize the iodev creation with slots_lock, and the rest with config_lock. Fixes: `f003277311` ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-2-jean-philippe@linaro.org	2023-05-19 10:20:00 +01:00
Marc Zyngier	9a48c597d6	Merge branch kvm-arm64/misc-6.4 into kvmarm-master/fixes * kvm-arm64/misc-6.4: : . : Minor changes for 6.4: : : - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...) : : - FP/SVE/SME documentation update, in the hope that this field : becomes clearer... : : - Add workaround for the usual Apple SEIS brokenness : : - Random comment fixes : . KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations KVM: arm64: Clarify host SME state management KVM: arm64: Restructure check for SVE support in FP trap handler KVM: arm64: Document check for TIF_FOREIGN_FPSTATE KVM: arm64: Fix repeated words in comments KVM: arm64: Use the bitmap API to allocate bitmaps KVM: arm64: Slightly optimize flush_context() Signed-off-by: Marc Zyngier <maz@kernel.org>	2023-05-11 15:25:58 +01:00
Marc Zyngier	e910baa9c1	KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations Unsurprisingly, the M2 PRO is also affected by the SEIS bug, so add it to the naughty list. And since M2 MAX is likely to be of the same ilk, flag it as well. Tested on a M2 PRO mini machine. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20230501182141.39770-1-maz@kernel.org	2023-05-11 15:17:02 +01:00
Marc Zyngier	b22498c484	Merge branch kvm-arm64/timer-vm-offsets into kvmarm-master/next * kvm-arm64/timer-vm-offsets: (21 commits) : . : This series aims at satisfying multiple goals: : : - allow a VMM to atomically restore a timer offset for a whole VM : instead of updating the offset each time a vcpu get its counter : written : : - allow a VMM to save/restore the physical timer context, something : that we cannot do at the moment due to the lack of offsetting : : - provide a framework that is suitable for NV support, where we get : both global and per timer, per vcpu offsetting, and manage : interrupts in a less braindead way. : : Conflict resolution involves using the new per-vcpu config lock instead : of the home-grown timer lock. : . KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: selftests: Augment existing timer test to handle variable offset KVM: arm64: selftests: Deal with spurious timer interrupts KVM: arm64: selftests: Add physical timer registers to the sysreg list KVM: arm64: nv: timers: Support hyp timer emulation KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co KVM: arm64: timers: Abstract the number of valid timers per vcpu KVM: arm64: timers: Fast-track CNTPCT_EL0 trap handling KVM: arm64: Elide kern_hyp_va() in VHE-specific parts of the hypervisor KVM: arm64: timers: Move the timer IRQs into arch_timer_vm_data KVM: arm64: timers: Abstract per-timer IRQ access KVM: arm64: timers: Rationalise per-vcpu timer init KVM: arm64: timers: Allow save/restoring of the physical timer KVM: arm64: timers: Allow userspace to set the global counter offset KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVM KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2 KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer arm64: Add HAS_ECV_CNTPOFF capability arm64: Add CNTPOFF_EL2 register definition ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2023-04-21 09:36:40 +01:00
Oliver Upton	49e5d16b6f	KVM: arm64: vgic: Don't acquire its_lock before config_lock commit `f003277311` ("KVM: arm64: Use config_lock to protect vgic state") was meant to rectify a longstanding lock ordering issue in KVM where the kvm->lock is taken while holding vcpu->mutex. As it so happens, the aforementioned commit introduced yet another locking issue by acquiring the its_lock before acquiring the config lock. This is obviously wrong, especially considering that the lock ordering is well documented in vgic.c. Reshuffle the locks once more to take the config_lock before the its_lock. While at it, sprinkle in the lockdep hinting that has become popular as of late to keep lockdep apprised of our ordering. Cc: stable@vger.kernel.org Fixes: `f003277311` ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230412062733.988229-1-oliver.upton@linux.dev	2023-04-12 13:50:18 +01:00
Marc Zyngier	81dc9504a7	KVM: arm64: nv: timers: Support hyp timer emulation Emulating EL2 also means emulating the EL2 timers. To do so, we expand our timer framework to deal with at most 4 timers. At any given time, two timers are using the HW timers, and the two others are purely emulated. The role of deciding which is which at any given time is left to a mapping function which is called every time we need to make such a decision. Reviewed-by: Colton Lewis <coltonlewis@google.com> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-18-maz@kernel.org	2023-03-30 19:01:10 +01:00
Marc Zyngier	96906a9150	KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVM Being able to lock/unlock all vcpus in one go is a feature that only the vgic has enjoyed so far. Let's be brave and expose it to the world. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-7-maz@kernel.org	2023-03-30 19:01:09 +01:00
Oliver Upton	f003277311	KVM: arm64: Use config_lock to protect vgic state Almost all of the vgic state is VM-scoped but accessed from the context of a vCPU. These accesses were serialized on the kvm->lock which cannot be nested within a vcpu->mutex critical section. Move over the vgic state to using the config_lock. Tweak the lock ordering where necessary to ensure that the config_lock is acquired after the vcpu->mutex. Acquire the config_lock in kvm_vgic_create() to avoid a race between the converted flows and GIC creation. Where necessary, continue to acquire kvm->lock to avoid a race with vCPU creation (i.e. flows that use lock_all_vcpus()). Finally, promote the locking expectations in comments to lockdep assertions and update the locking documentation for the config_lock as well as vcpu->mutex. Cc: stable@vger.kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-5-oliver.upton@linux.dev	2023-03-29 14:08:31 +01:00
Paolo Bonzini	4090871d77	KVM/arm64 updates for 6.3 - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place. - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company). - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests. - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM. - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested. - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems. - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor. - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host. - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add myself as co-maintainer This also drags in a couple of branches to avoid conflicts: - The shared 'kvm-hw-enable-refactor' branch that reworks initialization, as it conflicted with the virtual cache topology changes. - arm64's 'for-next/sme2' branch, as the PSCI relay changes, as both touched the EL2 initialization code. -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmPw29cPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpD9doQAIJyMW0odT6JBe15uGCxTuTnJbb8mniajJdX CuSxPl85WyKLtZbIJLRTQgyt6Nzbu0N38zM0y/qBZT5BvAnWYI8etvnJhYZjooAy jrf0Me/GM5hnORXN+1dByCmlV+DSuBkax86tgIC7HhU71a2SWpjlmWQi/mYvQmIK PBAqpFF+w2cWHi0ZvCq96c5EXBdN4FLEA5cdZhekCbgw1oX8+x+HxdpBuGW5lTEr 9oWOzOzJQC1uFnjP3unFuIaG94QIo+NA4aGLMzfb7wm2wdQUnKebtdj/RxsDZOKe 43Q1+MDFWMsxxFu4FULH8fPMwidIm5rfz3pw3JJloqaZp8vk/vjDLID7AYucMIX8 1G/mjqz6E9lYvv57WBmBhT/+apSDAmeHlAT97piH73Nemga91esDKuHSdtA8uB5j mmzcUYajuB2GH9rsaXJhVKt/HW7l9fbGliCkI99ckq/oOTO9VsKLsnwS/rMRIsPn y2Y8Lyoe4eqokd1DNn5/bo+3qDnfmzm6iDmZOo+JYuJv9KS95zuw17Wu7la9UAPV e13+btoijHDvu8RnTecuXljWfAAKVtEjpEIoS5aP2R2iDvhr0d8POlMPaJ40YuRq D2fKr18b6ngt+aI0TY63/ksEIFexx67HuwQsUZ2lRjyjq5/x+u3YIqUPbKrU4Rnl uxXjSvyr =r4s/ -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.3 - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place. - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company). - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests. - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM. - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested. - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems. - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor. - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host. - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer This also drags in arm64's 'for-next/sme2' branch, because both it and the PSCI relay changes touch the EL2 initialization code.	2023-02-20 06:12:42 -05:00
Paolo Bonzini	33436335e9	KVM/riscv changes for 6.3 - Fix wrong usage of PGDIR_SIZE to check page sizes - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect() - Redirect illegal instruction traps to guest - SBI PMU support for guest -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmPifFIACgkQrUjsVaLH LAcEyxAAinMBaBhiPmwWZQvcCzh/UFmJo8BQCwAPuwoc/a4ZGAR7ylzd0oJilP8M wSgX6Ad8XF+CEW2VpxW9nwyi41N25ep1Lrf8vOaWy9L9QNUo0t15WrCIbXT2p399 HrK9fz7HHKKIMsJy+rYb9EepdmMf55xtr1Y/EjyvhoDQbrEMlKsAODYz/SUoriQG Tn3cCYBzLdvzDzu0xXM9v+nsetWXdajK/v4je+mE3NQceXhePAO4oVWP4IpnoROd ZQm3evvVdf0WtKG9curxwMB7jjBqDBFrcLYl0qHGa7pi2o5PzVM7esgaV47KwetH IgA/Mrf1IfzpgM7VYDDax5wUHlKj63KisqU0J8rU3PUloQXaWqv7+ho51t9GzZ/i 9x4uyO/evVntgyTw6HCbqmQJDgEtJiG1ydrR/ydBMYHLnh7LPY2UpKgcqmirtbkK 1/DYDp84vikQ5VW1hc8IACdoBShh9Moh4xsEStzkTrIeHcZCjtORXUh8UIPZ0Mu2 7Mnkktu9I55SLwA3rwH/EYT1ISrOV1G+q3wfqgeLpn8YUWwCIiqWQ5Ur0/WSMJse uJ3HedZDzj9T4n4khX+mKEYh6joAafQZag+4TID2lRSwd0S/mpeC22hYrViMdDmq yhE+JNin/sz4AVaHNzGwfqk2NC2RFl9aRn2X0xTwyBubif9pKMQ= =spUL -----END PGP SIGNATURE----- Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv changes for 6.3 - Fix wrong usage of PGDIR_SIZE to check page sizes - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect() - Redirect illegal instruction traps to guest - SBI PMU support for guest	2023-02-15 12:33:28 -05:00

1 2 3 4 5 ...

263 Commits