Add core KVM selftests support for LoongArch, it includes exception
handler, mmu page table setup and vCPU startup entry support.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add KVM selftests header files for LoongArch, including processor.h
and kvm_util_arch.h. It mainly contains LoongArch CSR register and page
table entry definition.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
On LoongArch system, 16K page is used in general and GVA width is 47 bit
while GPA width is 47 bit also, here add new VM mode VM_MODE_P47V47_16K.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add a test case to verify x86's bus lock exit functionality, which is now
supported on both Intel and AMD. Trigger bus lock exits by performing a
split-lock access, i.e. an atomic access that splits two cache lines.
Verify that the correct number of bus lock exits are generated, and that
the counter is incremented correctly and at the appropriate time based on
the underlying architecture.
Generate bus locks in both L1 and L2 (if nested virtualization is enabled),
as SVM's functionality in particular requires non-trivial logic to do the
right thing when running nested VMs.
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Co-developed-by: Manali Shukla <manali.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
Link: https://lore.kernel.org/r/20250502050346.14274-6-manali.shukla@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use MGLRU's debugfs interface to do access tracking instead of
page_idle. The logic to use the page_idle bitmap is left in, as it is
useful for kernels that do not have MGLRU built in.
When MGLRU is enabled, page_idle will report pages as still idle even
after being accessed, as MGLRU doesn't necessarily clear the Idle folio
flag when accessing an idle page, so the test will not attempt to use
page_idle if MGLRU is enabled but otherwise not usable.
Aging pages with MGLRU is much faster than marking pages as idle with
page_idle.
Co-developed-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/r/20250508184649.2576210-8-jthoughton@google.com
[sean: print parsed features, not raw string]
Signed-off-by: Sean Christopherson <seanjc@google.com>
libcgroup.o is built separately from KVM selftests and cgroup selftests,
so different compiler flags used by the different selftests will not
conflict with each other.
Signed-off-by: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/r/20250508184649.2576210-7-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add an option to skip sanity check of number of still idle pages,
and set it by default to skip, in case hypervisor or NUMA balancing
is detected.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Co-developed-by: James Houghton <jthoughton@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250508184649.2576210-3-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Extract the guts of thp_configured() and get_trans_hugepagesz() to
standalone helpers so that the core logic can be reused for other sysfs
files, e.g. to query numa_balancing.
Opportunistically assert that the initial fscanf() read at least one byte,
and add a comment explaining the second call to fscanf().
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/r/20250508184649.2576210-2-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2)
ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported
and MTE_frac == 0 indicates it is supported.
As MTE_frac was previously unconditionally read as 0 from the guest
and user-space, check that using SET_ONE_REG to set it to 0 succeeds
but does not change MTE_frac from unsupported (0xF) to supported (0).
This is required as values originating from KVM from user-space must
be accepted to avoid breaking migration.
Also, to allow this MTE field to be tested, enable KVM_ARM_CAP_MTE
for the set_id_regs test. No effect on existing tests is expected.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-4-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add a test to verify KVM's fastops emulation via forced emulation. KVM's
so called "fastop" infrastructure executes the to-be-emulated instruction
directly on hardware instead of manually emulating the instruction in
software, using various shenanigans to glue together the emulator context
and CPU state, e.g. to get RFLAGS fed into the instruction and back out
for the emulator.
Add testcases for all instructions that are low hanging fruit. While the
primary goal of the selftest is to validate the glue code, a secondary
goal is to ensure "emulation" matches hardware exactly, including for
arithmetic flags that are architecturally undefined. While arithmetic
flags may be *architecturally* undefined, their behavior is deterministic
for a given CPU (likely a given uarch, and possibly even an entire family
or class of CPUs). I.e. KVM has effectively been emulating underlying
hardware behavior for years.
Link: https://lore.kernel.org/r/20250506011250.1089254-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Until recently, the kernel could unexpectedly discard SVE state for a
period after a KVM_RUN ioctl, when the guest did not execute any
FPSIMD/SVE/SME instructions. We fixed that issue in commit:
fbc7e61195 ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state")
Add a test which tries to provoke that issue by manipulating SVE state
before/after running a guest which does not execute any FPSIMD/SVE/SME
instructions. The test executes a handful of iterations to miminize
the risk that the issue is masked by preemption.
Signed-off--by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250417-kvm-selftest-sve-signal-v1-1-6330c2f3da0c@kernel.org
[maz: Restored MR's SoB, fixed commit message according to MR's write-up]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Trying to cut the branch you are sat on is pretty dumb. And so is
trying to disable the instruction set you are executing on.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20250429114117.3618800-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Extend sev_smoke_test to also run a minimal SEV-SNP smoke test that
initializes and sets up private memory regions required to run a simple
SEV-SNP guest.
Similar to its SEV-ES smoke test counterpart, this also does not
support GHCB and ucall yet and uses the GHCB MSR protocol to trigger an
exit of the type KVM_EXIT_SYSTEM_EVENT.
Signed-off-by: Pratik R. Sampat <prsampat@amd.com>
Link: https://lore.kernel.org/r/20250305230000.231025-11-prsampat@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
In preparation for SNP, cleanup the smoke test to decouple deriving type
from policy. This will allow reusing the existing interfaces for SNP.
No functional change intended.
Signed-off-by: Pratik R. Sampat <prsampat@amd.com>
Link: https://lore.kernel.org/r/20250305230000.231025-10-prsampat@amd.com
[sean: massage shortlog+changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Force the SEV-SNP VM type to set the KVM_MEM_GUEST_MEMFD flag for the
creation of private memslots.
Signed-off-by: Pratik R. Sampat <prsampat@amd.com>
Link: https://lore.kernel.org/r/20250305230000.231025-9-prsampat@amd.com
[sean: add a comment, don't break non-x86]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Extend the SEV library to include support for SNP ioctl() wrappers,
which aid in launching and interacting with a SEV-SNP guest.
Signed-off-by: Pratik R. Sampat <prsampat@amd.com>
Link: https://lore.kernel.org/r/20250305230000.231025-8-prsampat@amd.com
[sean: use BIT()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
In preparation for SNP, declutter the vm type check by introducing a
SEV-SNP VM type check as well as a transitive set of helper functions.
The SNP VM type is the subset of SEV-ES. Similarly, the SEV-ES and SNP
types are subset of the SEV VM type check.
Signed-off-by: Pratik R. Sampat <prsampat@amd.com>
Link: https://lore.kernel.org/r/20250305230000.231025-7-prsampat@amd.com
[sean: make the helpers static inlines]
Signed-off-by: Sean Christopherson <seanjc@google.com>
For SEV tests, assert() failures on VM type or fd do not provide
sufficient error reporting. Replace assert() with TEST_ASSERT_EQ() to
obtain more detailed information on the assertion condition failure,
including the call stack.
Signed-off-by: Pratik R. Sampat <prsampat@amd.com>
Link: https://lore.kernel.org/r/20250305230000.231025-6-prsampat@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Move the SMT control check out of the hyperv_cpuid selftest so that it
is generally accessible all KVM selftests. Split the functionality into
a helper that populates a buffer with SMT control value which other
helpers can use to ascertain if SMT state is available and active.
Signed-off-by: Pratik R. Sampat <prsampat@amd.com>
Link: https://lore.kernel.org/r/20250305230000.231025-5-prsampat@amd.com
[sean: prepend is_ to the helpers]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Abstract rep vmmcall coded into the vmgexit helper for the sev
library.
No functional change intended.
Signed-off-by: Pratik R. Sampat <prsampat@amd.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/r/20250305230000.231025-4-prsampat@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add the X86_FEATURE_SEV_SNP CPU feature to the architectural definition
for the SEV-SNP VM type to exercise the KVM_SEV_INIT2 call. Ensure that
the SNP test is skipped in scenarios where CPUID supports it but KVM
does not, preventing reporting of failure in such cases.
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Pratik R. Sampat <prsampat@amd.com>
Link: https://lore.kernel.org/r/20250305230000.231025-3-prsampat@amd.com
[sean: use the same pattern as SEV and SEV-ES]
Signed-off-by: Sean Christopherson <seanjc@google.com>
* Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
stage-1 page tables) to align with the architecture. This avoids
possibly taking an SEA at EL2 on the page table walk or using an
architecturally UNKNOWN fault IPA.
* Use acquire/release semantics in the KVM FF-A proxy to avoid reading
a stale value for the FF-A version.
* Fix KVM guest driver to match PV CPUID hypercall ABI.
* Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
selftests, which is the only memory type for which atomic
instructions are architecturally guaranteed to work.
s390:
* Don't use %pK for debug printing and tracepoints.
x86:
* Use a separate subclass when acquiring KVM's per-CPU posted interrupts
wakeup lock in the scheduled out path, i.e. when adding a vCPU on
the list of vCPUs to wake, to workaround a false positive deadlock.
The schedule out code runs with a scheduler lock that the wakeup
handler takes in the opposite order; but it does so with IRQs disabled
and cannot run concurrently with a wakeup.
* Explicitly zero-initialize on-stack CPUID unions
* Allow building irqbypass.ko as as module when kvm.ko is a module
* Wrap relatively expensive sanity check with KVM_PROVE_MMU
* Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses
selftests:
* Add more scenarios to the MONITOR/MWAIT test.
* Add option to rseq test to override /dev/cpu_dma_latency
* Bring list of exit reasons up to date
* Cleanup Makefile to list once tests that are valid on all architectures
Other:
* Documentation fixes
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmf083IUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroN1dgf/QwfpZcHoMNQSnrc1jMy2LHrArln2
XfmsOGZTU7kyoLQsLWGAPNocOveGdiemTDsj5ZXoNMnqV8hCBr+tZuv2gWI1rr/o
kiGerdIgSZ9piTjBlJkVAaOzbWhg2DUnr7qVVzEzFY9+rPNyQ81vgAfU7h56KhYB
optecozmBrHHAxvQZwmPeL9UyPWFjOF1BY/8LTMx7X+aVuCX6qx1JqO3a3ylAw4J
tGXv6qFJfuCnu1d1b4X0ILce0iMUTOjQzvTcIm+BKjYycecl+3j1aczC/BOorIgc
mf0+XeauhcTduK73pirnvx2b05eOxntgkOpwJytO2RP6pE0uK+2Th/C3Qg==
=ba/Y
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
stage-1 page tables) to align with the architecture. This avoids
possibly taking an SEA at EL2 on the page table walk or using an
architecturally UNKNOWN fault IPA
- Use acquire/release semantics in the KVM FF-A proxy to avoid
reading a stale value for the FF-A version
- Fix KVM guest driver to match PV CPUID hypercall ABI
- Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
selftests, which is the only memory type for which atomic
instructions are architecturally guaranteed to work
s390:
- Don't use %pK for debug printing and tracepoints
x86:
- Use a separate subclass when acquiring KVM's per-CPU posted
interrupts wakeup lock in the scheduled out path, i.e. when adding
a vCPU on the list of vCPUs to wake, to workaround a false positive
deadlock. The schedule out code runs with a scheduler lock that the
wakeup handler takes in the opposite order; but it does so with
IRQs disabled and cannot run concurrently with a wakeup
- Explicitly zero-initialize on-stack CPUID unions
- Allow building irqbypass.ko as as module when kvm.ko is a module
- Wrap relatively expensive sanity check with KVM_PROVE_MMU
- Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses
selftests:
- Add more scenarios to the MONITOR/MWAIT test
- Add option to rseq test to override /dev/cpu_dma_latency
- Bring list of exit reasons up to date
- Cleanup Makefile to list once tests that are valid on all
architectures
Other:
- Documentation fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
KVM: arm64: Use acquire/release to communicate FF-A version negotiation
KVM: arm64: selftests: Explicitly set the page attrs to Inner-Shareable
KVM: arm64: selftests: Introduce and use hardware-definition macros
KVM: VMX: Use separate subclasses for PI wakeup lock to squash false positive
KVM: VMX: Assert that IRQs are disabled when putting vCPU on PI wakeup list
KVM: x86: Explicitly zero-initialize on-stack CPUID unions
KVM: Allow building irqbypass.ko as as module when kvm.ko is a module
KVM: x86/mmu: Wrap sanity check on number of TDP MMU pages with KVM_PROVE_MMU
KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency
KVM: x86: Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses
Documentation: kvm: remove KVM_CAP_MIPS_TE
Documentation: kvm: organize capabilities in the right section
Documentation: kvm: fix some definition lists
Documentation: kvm: drop "Capability" heading from capabilities
Documentation: kvm: give correct name for KVM_CAP_SPAPR_MULTITCE
Documentation: KVM: KVM_GET_SUPPORTED_CPUID now exposes TSC_DEADLINE
selftests: kvm: list once tests that are valid on all architectures
selftests: kvm: bring list of exit reasons up to date
selftests: kvm: revamp MONITOR/MWAIT tests
KVM: arm64: Don't translate FAR if invalid/unsafe
...
- Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
stage-1 page tables) to align with the architecture. This avoids
possibly taking an SEA at EL2 on the page table walk or using an
architecturally UNKNOWN fault IPA.
- Use acquire/release semantics in the KVM FF-A proxy to avoid reading
a stale value for the FF-A version.
- Fix KVM guest driver to match PV CPUID hypercall ABI.
- Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
selftests, which is the only memory type for which atomic
instructions are architecturally guaranteed to work.
-----BEGIN PGP SIGNATURE-----
iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZ/RO9hccb2xpdmVyLnVw
dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFmRuAP0YajO4qHJe1vHtCkamuPnEY0Kp
E+t2TwPafPbrPdQ1PgEAq6lHuSdUnid1r/uhRKIT+ywW8tE97eNwQAa1LFma0Ac=
=d4G5
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-6.15-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64: First batch of fixes for 6.15
- Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
stage-1 page tables) to align with the architecture. This avoids
possibly taking an SEA at EL2 on the page table walk or using an
architecturally UNKNOWN fault IPA.
- Use acquire/release semantics in the KVM FF-A proxy to avoid reading
a stale value for the FF-A version.
- Fix KVM guest driver to match PV CPUID hypercall ABI.
- Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
selftests, which is the only memory type for which atomic
instructions are architecturally guaranteed to work.
Atomic instructions such as 'ldset' in the guest have been observed to
cause an EL1 data abort with FSC 0x35 (IMPLEMENTATION DEFINED fault
(Unsupported Exclusive or Atomic access)) on Neoverse-N3.
Per DDI0487L.a B2.2.6, atomic instructions are only architecturally
guaranteed for Inner/Outer Shareable Normal Write-Back memory. For
anything else the behavior is IMPLEMENTATION DEFINED and can lose
atomicity, or, in this case, generate an abort.
It would appear that selftests sets up the stage-1 mappings as Non
Shareable, leading to the observed abort. Explicitly set the
Shareability field to Inner Shareable for non-LPA2 page tables. Note
that for the LPA2 page table format, translations for cacheable memory
inherit the shareability attribute of the PTW, i.e. TCR_ELx.SH{0,1}.
Suggested-by: Oliver Upton <oupton@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20250405001042.1470552-3-rananta@google.com
[oliver: Rephrase changelog]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The kvm selftest library for arm64 currently configures the hardware
fields, such as shift and mask in the page-table entries and registers,
directly with numbers. While it add comments at places, it's better to
rewrite them with appropriate macros to improve the readability and
reduce the risk of errors. Hence, introduce macros to define the
hardware fields and use them in the arm64 processor library.
Most of the definitions are primary copied from the Linux's header,
arch/arm64/include/asm/pgtable-hwdef.h.
No functional change intended.
Suggested-by: Oliver Upton <oupton@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20250405001042.1470552-2-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* The sub-architecture selection Kconfig system has been cleaned up,
the documentation has been improved, and various detections have been
fixed.
* The vector-related extensions dependencies are now validated when
parsing from device tree and in the DT bindings.
* Misaligned access probing can be overridden via a kernel command-line
parameter, along with various fixes to misalign access handling.
* Support for relocatable !MMU kernels builds.
* Support for hpge pfnmaps, which should improve TLB utilization.
* Support for runtime constants, which improves the d_hash()
performance.
* Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm.
* Various fixes, including:
- We were missing a secondary mmu notifier call when flushing the
tlb which is required for IOMMU.
- Fix ftrace panics by saving the registers as expected by ftrace.
- Fix a couple of stimecmp usage related to cpu hotplug.
- purgatory_start is now aligned as per the STVEC requirements.
- A fix for hugetlb when calculating the size of non-present PTEs.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmfv/soTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYierZEACDwI9lJFCEbQPon3z8rAy1moTj0+AZ
bMfZFqMphUTrJ0cMm2+Bc+XZgck12zHCyu1UljDcZVYMCHA9aOoj5C5NkBBVLCuL
uLYrhIoQXtJaVIANiFl0SHAZmh2s2OoSgmUzrEZ8JGlHpKCF7EVX5bHEsOvzn9ir
B2W992W6q3ISuKXHKsTpa7rmTtf7swGYg6zW3pX3l6HmY+EMEQOcQl0tAB383J/T
lm0K4+YvLpRJdm2ARpNGWlcFXj9/UXUM5hplK3aBAHpPKQ5/83/4tMDsfRvhpEVC
VJXNgK+H4XLD542aQ8d4ZROguyhwn9e2n6Dkv0OqfNk4lg5pUBcJUZftQ+rB7AWg
VYB1KVpxhwcruheXJFz8S3EzjZTcS+JrcD80vvx8JmHdXkZwHTfYUgiFwe/TR7yr
b518fEbXpVwDZiCbaAe3Cmpw0mlNnSVmU4hgNbiwt0fu9DGdPN9WQbyds68RKb7A
TWwDmmD6kV2BTWl0mHPtu9VhX58CDG+0WYbHA7r82p2T50187766C92GYfN2UPpz
lH0iMRDkmucclZ3fEoosJ+HsDntc4oe6Bhdzuj52Q7vBpDd/QB6t5cfrlDpEEdgU
3qoWMN5mb5l1rbvrqENh5ZgmEpzV8K0R5F5quiXh/9wO0y1kepDslTqC2oXK/m0p
DzsvvD6UnNMOUQ==
=nCJo
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- The sub-architecture selection Kconfig system has been cleaned up,
the documentation has been improved, and various detections have been
fixed
- The vector-related extensions dependencies are now validated when
parsing from device tree and in the DT bindings
- Misaligned access probing can be overridden via a kernel command-line
parameter, along with various fixes to misalign access handling
- Support for relocatable !MMU kernels builds
- Support for hpge pfnmaps, which should improve TLB utilization
- Support for runtime constants, which improves the d_hash()
performance
- Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm
- Various fixes, including:
- We were missing a secondary mmu notifier call when flushing the
tlb which is required for IOMMU
- Fix ftrace panics by saving the registers as expected by ftrace
- Fix a couple of stimecmp usage related to cpu hotplug
- purgatory_start is now aligned as per the STVEC requirements
- A fix for hugetlb when calculating the size of non-present PTEs
* tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (65 commits)
riscv: Add norvc after .option arch in runtime const
riscv: Make sure toolchain supports zba before using zba instructions
riscv/purgatory: 4B align purgatory_start
riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator
selftests: riscv: fix v_exec_initval_nolibc.c
riscv: Fix hugetlb retrieval of number of ptes in case of !present pte
riscv: print hartid on bringup
riscv: Add norvc after .option arch in runtime const
riscv: Remove CONFIG_PAGE_OFFSET
riscv: Support CONFIG_RELOCATABLE on riscv32
asm-generic: Always define Elf_Rel and Elf_Rela
riscv: Support CONFIG_RELOCATABLE on NOMMU
riscv: Allow NOMMU kernels to access all of RAM
riscv: Remove duplicate CONFIG_PAGE_OFFSET definition
RISC-V: errata: Use medany for relocatable builds
dt-bindings: riscv: document vector crypto requirements
dt-bindings: riscv: add vector sub-extension dependencies
dt-bindings: riscv: d requires f
RISC-V: add f & d extension validation checks
RISC-V: add vector crypto extension validation checks
...
Add a "-l <latency>" param to the rseq test so that the user can override
/dev/cpu_dma_latency, as described by the test's suggested workaround for
not being able to complete enough migrations.
cpu_dma_latency is not a normal file, even as far as procfs files go.
Writes to cpu_dma_latency only persist so long as the file is open, e.g.
so that the kernel automatically reverts back to a power-optimized state
once the sensitive workload completes. Provide the necessary functionality
instead of effectively forcing the user to write a non-obvious wrapper.
Cc: Dongsheng Zhang <dongsheng.x.zhang@intel.com>
Cc: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250401142238.819487-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Several tests cover infrastructure from virt/kvm/ and userspace APIs that have
only minimal requirements from architecture-specific code. As such, they are
available on all architectures that have libkvm support, and this presumably
will apply also in the future (for example if loongarch gets selftests support).
Put them in a separate variable and list them only once.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250401141327.785520-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250331221851.614582-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Run each testcase in a separate VMs to cover more possibilities;
move WRMSR close to MONITOR/MWAIT to test updating CPUID bits
while in the VM.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Nested virtualization support for VGICv3, giving the nested
hypervisor control of the VGIC hardware when running an L2 VM
- Removal of 'late' nested virtualization feature register masking,
making the supported feature set directly visible to userspace
- Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
- Paravirtual interface for discovering the set of CPU implementations
where a VM may run, addressing a longstanding issue of guest CPU
errata awareness in big-little systems and cross-implementation VM
migration
- Userspace control of the registers responsible for identifying a
particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
allowing VMs to be migrated cross-implementation
- pKVM updates, including support for tracking stage-2 page table
allocations in the protected hypervisor in the 'SecPageTable' stat
- Fixes to vPMU, ensuring that userspace updates to the vPMU after
KVM_RUN are reflected into the backing perf events
-----BEGIN PGP SIGNATURE-----
iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZ9s9gBccb2xpdmVyLnVw
dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFp6LAQCOQ1Fidp8RT1NdhLLAhW5D4gLe
MNT619R4qfqu64ZpeQEAidHMAYaGRk5KDNBq6Jn+awcJnwCcMnh2ok0vTOjz3gY=
=RC6A
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.15
- Nested virtualization support for VGICv3, giving the nested
hypervisor control of the VGIC hardware when running an L2 VM
- Removal of 'late' nested virtualization feature register masking,
making the supported feature set directly visible to userspace
- Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
- Paravirtual interface for discovering the set of CPU implementations
where a VM may run, addressing a longstanding issue of guest CPU
errata awareness in big-little systems and cross-implementation VM
migration
- Userspace control of the registers responsible for identifying a
particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
allowing VMs to be migrated cross-implementation
- pKVM updates, including support for tracking stage-2 page table
allocations in the protected hypervisor in the 'SecPageTable' stat
- Fixes to vPMU, ensuring that userspace updates to the vPMU after
KVM_RUN are reflected into the backing perf events
- Disable the kernel perf counter during configure
- KVM selftests improvements for PMU
- Fix warning at the time of KVM module removal
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmfbsiQACgkQrUjsVaLH
LAfGAxAAlODgZMDt+f/YlOOmQPsLohGrWsZjvheRfrf3X5PFNuXd1frpZSbF1PCs
lk4WZrekJo2VT2tWMEBjFrH6A9slHUzjZs+xIxig1N/Gmxy1clKp/svJVG20WwQI
a5/5jPH3TNDL3KUphwBF0QIi/Tlp6+SZ5RAag6QsBXKnFy5+o9lioR2T6tFzMoxy
ry8VD8e395bQXn5tYkk4vsr9vBRzpgq7ZBOMg6zyNTL86UBOVq3QXPNzSE5kenXW
4zqDmuxbk+GgpUKLXA9aXwLvasGiOr+59uToZd2lHf5ynSdSprQyW11I5ZOFs2DX
800mLeYtyHc4nGTi6OO3VN9toMhmgijtFLNpBDg4LRvwMfB+sp62Ixsmjs3CBCd+
GqcbJaMtdgM1SpZ3tlzEQoWpIBlNYs7BuG/6Jen9xKHHvOg7Ty9UimJeMikV+FUm
srq0+GlgjaniZ/d8arIIm7LVqZLAc6OSzn+A8IcE0KQ9fKBgB0CZrANW+N+b+7ub
Lkq9hA8tM5Ti0bNszqPyJu3bmhWAOEOwF0CjYL79vEhIYsegCKKtruKOppdM+sl2
dzk+DVV9Aar+bDu/v/bYV5TfIq39U86Tqxe+OU7EPzch5NdyDvKcpZ9/JIZnf/j5
tviM3awZfFmgTeBUTAYvaR34KExBOLQLkGb9BqAydYjg+pl9jVE=
=HP3N
-----END PGP SIGNATURE-----
Merge tag 'kvm-riscv-6.15-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.15
- Disable the kernel perf counter during configure
- KVM selftests improvements for PMU
- Fix warning at the time of KVM module removal
* kvm-arm64/writable-midr:
: Writable implementation ID registers, courtesy of Sebastian Ott
:
: Introduce a new capability that allows userspace to set the
: ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1,
: and AIDR_EL1. Also plug a hole in KVM's trap configuration where
: SMIDR_EL1 was readable at EL1, despite the fact that KVM does not
: support SME.
KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable
KVM: arm64: Copy guest CTR_EL0 into hyp VM
KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR
KVM: arm64: Allow userspace to change the implementation ID registers
KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value
KVM: arm64: Maintain per-VM copy of implementation ID regs
KVM: arm64: Set HCR_EL2.TID1 unconditionally
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/pv-cpuid:
: Paravirtualized implementation ID, courtesy of Shameer Kolothum
:
: Big-little has historically been a pain in the ass to virtualize. The
: implementation ID (MIDR, REVIDR, AIDR) of a vCPU can change at the whim
: of vCPU scheduling. This can be particularly annoying when the guest
: needs to know the underlying implementation to mitigate errata.
:
: "Hyperscalers" face a similar scheduling problem, where VMs may freely
: migrate between hosts in a pool of heterogenous hardware. And yes, our
: server-class friends are equally riddled with errata too.
:
: In absence of an architected solution to this wart on the ecosystem,
: introduce support for paravirtualizing the implementation exposed
: to a VM, allowing the VMM to describe the pool of implementations that a
: VM may be exposed to due to scheduling/migration.
:
: Userspace is expected to intercept and handle these hypercalls using the
: SMCCC filter UAPI, should it choose to do so.
smccc: kvm_guest: Fix kernel builds for 32 bit arm
KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2
smccc/kvm_guest: Enable errata based on implementation CPUs
arm64: Make _midr_in_range_list() an exported function
KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2
KVM: arm64: Specify hypercall ABI for retrieving target implementations
arm64: Modify _midr_range() functions to read MIDR/REVIDR internally
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/nv-idregs:
: Changes to exposure of NV features, courtesy of Marc Zyngier
:
: Apply NV-specific feature restrictions at reset rather than at the point
: of KVM_RUN. This makes the true feature set visible to userspace, a
: necessary step towards save/restore support or NV VMs.
:
: Add an additional vCPU feature flag for selecting the E2H0 flavor of NV,
: such that the VHE-ness of the VM can be applied to the feature set.
KVM: arm64: selftests: Test that TGRAN*_2 fields are writable
KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN*_2
KVM: arm64: Advertise FEAT_ECV when possible
KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable
KVM: arm64: Allow userspace to limit NV support to nVHE
KVM: arm64: Move NV-specific capping to idreg sanitisation
KVM: arm64: Enforce NV limits on a per-idregs basis
KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available
KVM: arm64: Consolidate idreg callbacks
KVM: arm64: Advertise NV2 in the boot messages
KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0
KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero
KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace
arm64: cpufeature: Handle NV_frac as a synonym of NV2
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
- Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and
improve its coverage by collecting all dirty entries on each iteration.
- Fix a few minor bugs related to handling of stats FDs.
- Add infrastructure to make vCPU and VM stats FDs available to tests by
default (open the FDs during VM/vCPU creation).
- Relax an assertion on the number of HLT exits in the xAPIC IPI test when
running on a CPU that supports AMD's Idle HLT (which elides interception of
HLT if a virtual IRQ is pending and unmasked).
- Misc cleanups and fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfZqZYACgkQOlYIJqCj
N/2G0Q/5AZAnOPBU/xmIZrntaWSkPX0wYCBzTCFAbyuXTYdWmCMaVJUjj0W4h15U
rAPxq76moGSHT06EQhw8clrNHbRSr4OZvsJIQMgGK5GBXlxfD+S1bNb/NKijpKm1
yjI4VIKA1ugpPLbb4KM522UQdAwkBI8KuPbhzWCsz2kzFf3Iy/B6ZgfevTjldZ9f
KbocBQhCaEOmk0i1UZWMnuJU+rh06y/BNR/7a/vhaZGy4pPm/WZpSezKbti3j7F5
+/VXJ+fh2T1oPzlycdQ4i9phYAOMJONcNxlzhs0Hp71zkxNykVKyv1zxvCI3ioEn
nnHFYQyremDfDJhxabOJRYB4ETlhLpX9nOlz2euIy7YeurDs73Qtjg2rixgAyq80
/FITPTFVlcHxjPr9YC+wtWzZB2qnGNMfA7QujnoG3XlGQJJsvnFTygXfBMj+W0G/
K8V93lvGTcwtT7x+9MCDhD0wZXfPBXorze8QrDfCWF96G1Y+5gZmYGs38Cu6IWrc
6q3tHcDmUaOGI1Mag97J1smSWw+ZC1+UuHJSW7DMZldOm/2fXkqrRpbHRXEzRQIa
vZjBZKoYHpqGMKayQ+GIRql9R6o+ZGQ6OQXgtLl0fxPPA01/hKX1ITZayBnHDl3C
wm3svDZy6hPWs17kwtLj+cRSMVYItbjOgpOj58gBo16w1NPNYTI=
=t77Q
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-selftests-6.15' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 6.15, part 2
- Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and
improve its coverage by collecting all dirty entries on each iteration.
- Fix a few minor bugs related to handling of stats FDs.
- Add infrastructure to make vCPU and VM stats FDs available to tests by
default (open the FDs during VM/vCPU creation).
- Relax an assertion on the number of HLT exits in the xAPIC IPI test when
running on a CPU that supports AMD's Idle HLT (which elides interception of
HLT if a virtual IRQ is pending and unmasked).
- Misc cleanups and fixes.
- Misc cleanups and prep work.
- Annotate _no_printf() with "printf" so that pr_debug() statements are
checked by the compiler for default builds (and pr_info() when QUIET).
- Attempt to whack the last LLC references/misses mole in the Intel PMU
counters test by adding a data load and doing CLFLUSH{OPT} on the data
instead of the code being executed. The theory is that modern Intel CPUs
have learned new code prefetching tricks that bypass the PMU counters.
- Fix a flaw in the Intel PMU counters test where it asserts that an event is
counting correctly without actually knowing what the event counts on the
underlying hardware.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfZpvcACgkQOlYIJqCj
N/3a+BAAmv0iagaYSiigAWZ9EM/2+CZDGp0+g7sB95graJFKnrKCXSkpuUr1UIYr
aDPLHAPBZ8szWNqrJARdSui0QP8cEd2erJkMNRDEQqx/OYKQJ+0W+gmlbleQ6GL6
7C/zUn0JdC0AhpdfePw6nIPpwIYBxDwHMQSuPUOxw5ZJmVCzQxn43MmPWTfL11Eo
ARfh6VWmXSvp+5emyjnlGmcT1/a3UlfAGDRekK0gaPYmEtaf6LfKCXc+nX84q0fr
T2N8nVcmEDg9VUO3JZfexoLcKpFMsTdwg4GbQ4beytYwPx3YELvCP9eFl5yo4lxk
9YL6uPdXyk0/Cgpv/QyB4YtnaQ4tktPmFLYjEtf8NIMLubouTS6dtYErKq9MOgsP
7UaOBVY6O7+cG+iOU2yGn1Mct0XxhKKLu3n5ZlGLJ/+8v1PUIegL3wUwpQaycp7O
xgRd/narmKEmMYFr7D0y7epcUVBlkP17YsFH4w6z9moDjJNVz4Ziqft6ju0N3P1s
a/8CA5gd2lplp/Yae62f2mgJNjw5kbEYtGO9ybINbhacznsl+3JsU85k1h+7dZrW
8lmfNJ8+B95UrZ9Wo0mWsv+UKYYFHRFoHjKlevuUTPQ/QxoghvpQ5iU3P+rT8E5n
858OhTi2vlAhg8a338XdfUOy799F73ecDaIKyPMdO74v7wq1swo=
=RjiY
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-selftests_6.15-1' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 6.15, part 1
- Misc cleanups and prep work.
- Annotate _no_printf() with "printf" so that pr_debug() statements are
checked by the compiler for default builds (and pr_info() when QUIET).
- Attempt to whack the last LLC references/misses mole in the Intel PMU
counters test by adding a data load and doing CLFLUSH{OPT} on the data
instead of the code being executed. The theory is that modern Intel CPUs
have learned new code prefetching tricks that bypass the PMU counters.
- Fix a flaw in the Intel PMU counters test where it asserts that an event is
counting correctly without actually knowing what the event counts on the
underlying hardware.
- Fix a bug in PIC emulation that caused KVM to emit a spurious KVM_REQ_EVENT.
- Add a helper to consolidate handling of mp_state transitions, and use it to
clear pv_unhalted whenever a vCPU is made RUNNABLE.
- Defer runtime CPUID updates until KVM emulates a CPUID instruction, to
coalesce updates when multiple pieces of vCPU state are changing, e.g. as
part of a nested transition.
- Fix a variety of nested emulation bugs, and add VMX support for synthesizing
nested VM-Exit on interception (instead of injecting #UD into L2).
- Drop "support" for PV Async #PF with proctected guests without SEND_ALWAYS,
as KVM can't get the current CPL.
- Misc cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfZhoYACgkQOlYIJqCj
N/1oSBAAlhsZzv4m6rHtWACjGsSBlAE5WM7HrpnwjOyMW+Desc0WLL4L6qAlxgT7
4ZkBvQ4zsiyUIdLv1XARvkBYHTUkcgG8fQSSJQ6grZit5OcFcMgWafvQrJYI6428
TyzF/x6t0gj8CjDluA6jM/eL5HhWZZDOIg0Wma+pyYpW+7y2tkphXyYyOQPBGwv3
geRUetbDHHXjf042k/8f1j1vjzrNNvAg3YyNyx1YbdU9XKsn5D+SeUW2eVfYk8G7
5QsCOGvUYcbbjrR8kbCZKexvoH6Np9J6YKDe4R9R2yDzgs/96qz6xkYTGCVkHA1y
uursKqRHgbXBxzxa+ban073laT7Qt3S01Gd9bJW3IO7hzG89gl4qfX7fap8T9Yc2
yeBTYIgInpyx+NCdZ2Z/++BzPagBGfa77gFX/eIkmsVA9LWYi9CI3FSjtr/czvWm
a4tfMPvTVBjsBQQ7t/lNksrq0O51lbb3iqqv3ToQpDOOqCWuMEU5xcihhPRr5NSZ
dX4o/jIDhCV8EyXdtASyqMlYBXcuC45ojEZn1elh0QogzYAdSGQ2bIDyxuBtA//k
kSbi+E4GB64jVfBWUyK2QeLOBnBkH7mh6Cg5UYr1Ln9Sm6l8vrcxhcbnchiWxXMI
WCK7BJwI2HojBVpEZ04jMkjHvg36uSfjOzmMLT5yPXfFNebsGmA=
=8SGF
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-misc-6.15' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.15:
- Fix a bug in PIC emulation that caused KVM to emit a spurious KVM_REQ_EVENT.
- Add a helper to consolidate handling of mp_state transitions, and use it to
clear pv_unhalted whenever a vCPU is made RUNNABLE.
- Defer runtime CPUID updates until KVM emulates a CPUID instruction, to
coalesce updates when multiple pieces of vCPU state are changing, e.g. as
part of a nested transition.
- Fix a variety of nested emulation bugs, and add VMX support for synthesizing
nested VM-Exit on interception (instead of injecting #UD into L2).
- Drop "support" for PV Async #PF with proctected guests without SEND_ALWAYS,
as KVM can't get the current CPL.
- Misc cleanups
It is helpful to vary the number of the LCOFI interrupts generated
by the overflow test. Allow additional argument for overflow test
to accommodate that. It can be easily cross-validated with
/proc/interrupts output in the host.
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-4-41d177e45929@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The PMU test commandline option takes an argument to disable a
certain test. The initial assumption behind this was a common use case
is just to run all the test most of the time. However, running a single
test seems more useful instead. Especially, the overflow test has been
helpful to validate PMU virtualizaiton interrupt changes.
Switching the command line option to run a single test instead
of disabling a single test also allows to provide additional
test specific arguments to the test. The default without any options
remains unchanged which continues to run all the tests.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-3-41d177e45929@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
There is no need to start the counter in the overflow handler as we
intend to trigger precise number of LCOFI interrupts through these
tests. The overflow irq handler has already stopped the counter. As
a result, the stop call from the test function may return already
stopped error which is fine as well.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-2-41d177e45929@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Print out the index of mismatching XSAVE bytes using unsigned decimal
format. Some versions of clang complain about trying to print an integer
as an unsigned char.
x86/sev_smoke_test.c:55:51: error: format specifies type 'unsigned char'
but the argument has type 'int' [-Werror,-Wformat]
Fixes: 8c53183dba ("selftests: kvm: add test for transferring FPU state into VMSA")
Link: https://lore.kernel.org/r/20250228233852.3855676-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
During the initial mprotect(RO) stage of mmu_stress_test, keep vCPUs
spinning until all vCPUs have hit -EFAULT, i.e. until all vCPUs have tried
to write to a read-only page. If a vCPU manages to complete an entire
iteration of the loop without hitting a read-only page, *and* the vCPU
observes mprotect_ro_done before starting a second iteration, then the
vCPU will prematurely fall through to GUEST_SYNC(3) (on x86 and arm64) and
get out of sequence.
Replace the "do-while (!r)" loop around the associated _vcpu_run() with
a single invocation, as barring a KVM bug, the vCPU is guaranteed to hit
-EFAULT, and retrying on success is super confusion, hides KVM bugs, and
complicates this fix. The do-while loop was semi-unintentionally added
specifically to fudge around a KVM x86 bug, and said bug is unhittable
without modifying the test to force x86 down the !(x86||arm64) path.
On x86, if forced emulation is enabled, vcpu_arch_put_guest() may trigger
emulation of the store to memory. Due a (very, very) longstanding bug in
KVM x86's emulator, emulate writes to guest memory that fail during
__kvm_write_guest_page() unconditionally return KVM_EXIT_MMIO. While that
is desirable in the !memslot case, it's wrong in this case as the failure
happens due to __copy_to_user() hitting a read-only page, not an emulated
MMIO region.
But as above, x86 only uses vcpu_arch_put_guest() if the __x86_64__ guards
are clobbered to force x86 down the common path, and of course the
unexpected MMIO is a KVM bug, i.e. *should* cause a test failure.
Fixes: b6c304aec6 ("KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)")
Reported-by: Yan Zhao <yan.y.zhao@intel.com>
Closes: https://lore.kernel.org/all/20250208105318.16861-1-yan.y.zhao@intel.com
Debugged-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20250228230804.3845860-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
If the CPU supports Idle HLT, which elides HLT VM-Exits if the vCPU has an
unmasked pending IRQ or NMI, relax the xAPIC IPI test's assertion on the
number of HLT exits to only require that the number of exits is less than
or equal to the number of HLT instructions that were executed. I.e. don't
fail the test if Idle HLT does what it's supposed to do.
Note, unfortunately there's no way to determine if *KVM* supports Idle HLT,
as this_cpu_has() checks raw CPU support, and kvm_cpu_has() checks what can
be exposed to L1, i.e. the latter would check if KVM supports nested Idle
HLT. But, since the assert is purely bonus coverage, checking for CPU
support is good enough.
Cc: Manali Shukla <Manali.Shukla@amd.com>
Tested-by: Manali Shukla <Manali.Shukla@amd.com>
Link: https://lore.kernel.org/r/20250226231809.3183093-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add an L1 (guest) assert to the nested exceptions test to verify that KVM
doesn't put VMRUN in an STI shadow (AMD CPUs bleed the shadow into the
guest's int_state if a #VMEXIT occurs before VMRUN fully completes).
Add a similar assert to the VMX side as well, because why not.
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
One difference here with other pseudo-firmware bitmap registers
is that the default/reset value for the supported hypercall
function-ids is 0 at present. Hence, modify the test accordingly.
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Link: https://lore.kernel.org/r/20250221140229.12588-7-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Assert that MIDR_EL1, REVIDR_EL1, AIDR_EL1 are writable from userspace,
that the changed values are visible to guests, and that they are
preserved across a vCPU reset.
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250225005401.679536-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Add a rudimentary test for validating KVM's handling of L1 hypervisor
intercepts during instruction emulation on behalf of L2. To minimize
complexity and avoid overlap with other tests, only validate KVM's
handling of instructions that L1 wants to intercept, i.e. that generate a
nested VM-Exit. Full testing of emulation on behalf of L2 is better
achieved by running existing (forced) emulation tests in a VM, (although
on VMX, getting L0 to emulate on #UD requires modifying either L1 KVM to
not intercept #UD, or modifying L0 KVM to prioritize L0's exception
intercepts over L1's intercepts, as is done by KVM for SVM).
Since emulation should never be successful, i.e. L2 always exits to L1,
dynamically generate the L2 code stream instead of adding a helper for
each instruction. Doing so requires hand coding instruction opcodes, but
makes it significantly easier for the test to compute the expected "next
RIP" and instruction length.
Link: https://lore.kernel.org/r/20250201015518.689704-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that the binary stats cache infrastructure is largely scope agnostic,
add support for vCPU-scoped stats. Like VM stats, open and cache the
stats FD when the vCPU is created so that it's guaranteed to be valid when
vcpu_get_stats() is invoked.
Account for the extra per-vCPU file descriptor in kvm_set_files_rlimit(),
so that tests that create large VMs don't run afoul of resource limits.
To sanity check that the infrastructure actually works, and to get a bit
of bonus coverage, add an assert in x86's xapic_ipi_test to verify that
the number of HLTs executed by the test matches the number of HLT exits
observed by KVM.
Tested-by: Manali Shukla <Manali.Shukla@amd.com>
Link: https://lore.kernel.org/r/20250111005049.1247555-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Move the max vCPUs test's RLIMIT_NOFILE adjustments to common code, and
use the new helper to adjust the resource limit for non-barebones VMs by
default. x86's recalc_apic_map_test creates 512 vCPUs, and a future
change will open the binary stats fd for all vCPUs, which will put the
recalc APIC test above some distros' default limit of 1024.
Link: https://lore.kernel.org/r/20250111005049.1247555-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Get and cache a VM's binary stats FD when the VM is opened, as opposed to
waiting until the stats are first used. Opening the stats FD outside of
__vm_get_stat() will allow converting it to a scope-agnostic helper.
Note, this doesn't interfere with kvm_binary_stats_test's testcase that
verifies a stats FD can be used after its own VM's FD is closed, as the
cached FD is also closed during kvm_vm_free().
Link: https://lore.kernel.org/r/20250111005049.1247555-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a struct and helpers to manage the binary stats cache, which is
currently used only for VM-scoped stats. This will allow expanding the
selftests infrastructure to provide support for vCPU-scoped binary stats,
which, except for the ioctl to get the stats FD are identical to VM-scoped
stats.
Defer converting __vm_get_stat() to a scope-agnostic helper to a future
patch, as getting the stats FD from KVM needs to be moved elsewhere
before it can be made completely scope-agnostic.
Link: https://lore.kernel.org/r/20250111005049.1247555-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Turn vm_get_stat() into a macro that generates a string for the stat name,
as opposed to taking a string. This will allow hardening stat usage in
the future to generate errors on unknown stats at compile time.
No functional change intended.
Link: https://lore.kernel.org/r/20250111005049.1247555-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Fail the test if it attempts to read a stat that doesn't exist, e.g. due
to a typo (hooray, strings), or because the test tried to get a stat for
the wrong scope. As is, there's no indiciation of failure and @data is
left untouched, e.g. holds '0' or random stack data in most cases.
Fixes: 8448ec5993 ("KVM: selftests: Add NX huge pages test")
Link: https://lore.kernel.org/r/20250111005049.1247555-4-seanjc@google.com
[sean: fixup spelling mistake, courtesy of Colin Ian King]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Close/free a VM's binary stats cache when the VM is released, not when the
VM is fully freed. When a VM is re-created, e.g. for state save/restore
tests, the stats FD and descriptor points at the old, defunct VM. The FD
is still valid, in that the underlying stats file won't be freed until the
FD is closed, but reading stats will always pull information from the old
VM.
Note, this is a benign bug in the current code base as none of the tests
that recreate VMs use binary stats.
Fixes: 83f6e109f5 ("KVM: selftests: Cache binary stats metadata for duration of test")
Link: https://lore.kernel.org/r/20250111005049.1247555-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
When allocating and freeing a VM's cached binary stats info, check for a
NULL descriptor, not a '0' file descriptor, as '0' is a legal FD. E.g. in
the unlikely scenario the kernel installs the stats FD at entry '0',
selftests would reallocate on the next __vm_get_stat() and/or fail to free
the stats in kvm_vm_free().
Fixes: 83f6e109f5 ("KVM: selftests: Cache binary stats metadata for duration of test")
Link: https://lore.kernel.org/r/20250111005049.1247555-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that dirty_log_test doesn't require running multiple iterations to
verify dirty pages, and actually runs the requested number of iterations,
drop the requirement that the test run at least "3" (which was really "2"
at the time the test was written) iterations.
Link: https://lore.kernel.org/r/20250111003004.1235645-21-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Actually run all requested iterations, instead of iterations-1 (the count
starts at '1' due to the need to avoid '0' as an in-memory value for a
dirty page).
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Set the per-iteration variables at the start of each iteration instead of
setting them before the loop, and at the end of each iteration. To ensure
the vCPU doesn't race ahead before the first iteration, simply have the
vCPU worker want for sem_vcpu_cont, which conveniently avoids the need to
special case posting sem_vcpu_cont from the loop.
Link: https://lore.kernel.org/r/20250111003004.1235645-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that each iteration collects all dirty entries and ensures the guest
*completes* at least one write, tighten the exemptions for the last dirty
page of the previous iteration. Specifically, the only legal value (other
than the current iteration) is N-1.
Unlike the last page for the current iteration, the in-progress write from
the previous iteration is guaranteed to have completed, otherwise the test
would have hung.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Ensure the vCPU fully completes at least one write in each dirty_log_test
iteration, as failure to dirty any pages complicates verification and
forces the test to be overly conservative about possible values. E.g.
verification needs to allow the last dirty page from a previous iteration
to have *any* value, because the vCPU could get stuck for multiple
iterations, which is unlikely but can happen in heavily overloaded and/or
nested virtualization setups.
Somewhat arbitrarily set the minimum to 0x100/256; high enough to be
interesting, but not so high as to lead to pointlessly long runtimes.
Opportunistically report the number of writes per iteration for debug
purposes, and so that a human can sanity check the test. Due to each
write targeting a random page, the number of dirty pages will likely be
lower than the number of total writes, but it shouldn't be absurdly lower
(which would suggest the pRNG is broken)
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a sanity check that a completely garbage value wasn't written to
the last dirty page in the ring, e.g. that it doesn't contain the *next*
iteration's value.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Collect all dirty entries during each iteration of dirty_log_test by
doing a final collection after the vCPU has been stopped. To deal with
KVM's destructive approach to getting the dirty bitmaps, use a second
bitmap for the post-stop collection.
Collecting all entries that were dirtied during an iteration simplifies
the verification logic *and* improves test coverage.
- If a page is written during iteration X, but not seen as dirty until
X+1, the test can get a false pass if the page is also written during
X+1.
- If a dirty page used a stale value from a previous iteration, the test
would grant a false pass.
- If a missed dirty log occurs in the last iteration, the test would fail
to detect the issue.
E.g. modifying mark_page_dirty_in_slot() to dirty an unwritten gfn:
if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
unsigned long rel_gfn = gfn - memslot->base_gfn;
u32 slot = (memslot->as_id << 16) | memslot->id;
if (!vcpu->extra_dirty &&
gfn_to_memslot(kvm, gfn + 1) == memslot) {
vcpu->extra_dirty = true;
mark_page_dirty_in_slot(kvm, memslot, gfn + 1);
}
if (kvm->dirty_ring_size && vcpu)
kvm_dirty_ring_push(vcpu, slot, rel_gfn);
else if (memslot->dirty_bitmap)
set_bit_le(rel_gfn, memslot->dirty_bitmap);
}
isn't detected with the current approach, even with an interval of 1ms
(when running nested in a VM; bare metal would be even *less* likely to
detect the bug due to the vCPU being able to dirty more memory). Whereas
collecting all dirty entries consistently detects failures with an
interval of 700ms or more (the longer interval means a higher probability
of an actual write to the prematurely-dirtied page).
Link: https://lore.kernel.org/r/20250111003004.1235645-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Print out the last dirty pages from the current and previous iteration on
verification failures. In many cases, bugs (especially test bugs) occur
on the edges, i.e. on or near the last pages, and being able to correlate
failures with the last pages can aid in debug.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
When verifying pages in dirty_log_test, immediately continue on all "pass"
scenarios to make the logic consistent in how it handles pass vs. fail.
No functional change intended.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
When running dirty_log_test using the dirty ring, post to sem_vcpu_stop
only when the main thread has explicitly requested that the vCPU stop.
Synchronizing the vCPU and main thread whenever the dirty ring happens to
be full is unnecessary, as KVM's ABI is to actively prevent the vCPU from
running until the ring is no longer full. I.e. attempting to run the vCPU
will simply result in KVM_EXIT_DIRTY_RING_FULL without ever entering the
guest. And if KVM doesn't exit, e.g. let's the vCPU dirty more pages,
then that's a KVM bug worth finding.
Posting to sem_vcpu_stop on ring full also makes it difficult to get the
test logic right, e.g. it's easy to let the vCPU keep running when it
shouldn't, as a ring full can essentially happen at any given time.
Opportunistically rework the handling of dirty_ring_vcpu_ring_full to
leave it set for the remainder of the iteration in order to simplify the
surrounding logic.
Link: https://lore.kernel.org/r/20250111003004.1235645-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
In the dirty_log_test guest code, exit to userspace only when the vCPU is
explicitly told to stop. Periodically exiting just to check if a flag has
been set is unnecessary, weirdly complex, and wastes time handling exits
that could be used to dirty memory.
Opportunistically convert 'i' to a uint64_t to guard against the unlikely
scenario that guest_num_pages exceeds the storage of an int.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that the vCPU doesn't dirty every page on the first iteration for
architectures that support the dirty ring, honor vcpu_stop in the dirty
ring's vCPU worker, i.e. stop when the main thread says "stop". This will
allow plumbing vcpu_stop into the guest so that the vCPU doesn't need to
periodically exit to userspace just to see if it should stop.
Add a comment explaining that marking all pages as dirty is problematic
for the dirty ring, as it results in the guest getting stuck on "ring
full". This could be addressed by adding a GUEST_SYNC() in that initial
loop, but it's not clear how that would interact with s390's behavior.
Link: https://lore.kernel.org/r/20250111003004.1235645-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
s390 specific workaround causes the dirty-log mode of the test to dirty
all guest memory on the first iteration, which is very slow when the test
is run in a nested VM.
Limit this workaround to s390x.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Continue collecting entries from the dirty ring for the entire time the
vCPU is running. Collecting exactly once all but guarantees the vCPU will
encounter a "ring full" event and stop. While testing ring full is
interesting, stopping and doing nothing is not, especially for larger
intervals as the test effectively does nothing for a much longer time.
To balance continuous collection with letting the guest make forward
progress, chunk the interval waiting into 1ms loops (which also makes
the math dead simple).
To maintain coverage for "ring full", collect entries on subsequent
iterations if and only if the ring has been filled at least once. I.e.
let the ring fill up (if the interval allows), but after that contiuously
empty it so that the vCPU can keep running.
Opportunistically drop unnecessary zero-initialization of "count".
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Cache the page's value during verification in a local variable, re-reading
from the pointer is ugly and error prone, e.g. allows for bugs like
checking the pointer itself instead of the value.
No functional change intended.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Track and print the number of dirty and clear pages for each iteration.
This provides parity between all log modes, and will allow collecting the
dirty ring multiple times per iteration without spamming the console.
Opportunistically drop the "Dirtied N pages" print, which is redundant
and wrong. For the dirty ring testcase, the vCPU isn't guaranteed to
complete a loop. And when the vCPU does complete a loot, there are no
guarantees that it has *dirtied* that many pages; because the writes are
to random address, the vCPU may have written the same page over and over,
i.e. only dirtied one page.
While the number of writes performed by the vCPU is also interesting,
e.g. the pr_info() could be tweaked to use different verbiage, pages_count
doesn't correctly track the number of writes either (because loops aren't
guaranteed to a complete). Delete the print for now, as a future patch
will precisely track the number of writes, at which point the verification
phase can report the number of writes performed by each iteration.
Link: https://lore.kernel.org/r/20250111003004.1235645-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Drop an srandom() initialization that was leftover from the conversion to
use selftests' guest_random_xxx() APIs.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Drop the signal/kick from dirty_log_test's dirty ring handling, as kicking
the vCPU adds marginal value, at the cost of adding significant complexity
to the test.
Asynchronously interrupting the vCPU isn't novel; unless the kernel is
fully tickless, the vCPU will be interrupted by IRQs for any decently
large interval.
And exiting to userspace mode in the middle of a sequence isn't novel
either, as the vCPU will do so every time the ring becomes full.
Link: https://lore.kernel.org/r/20250111003004.1235645-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Sync the new iteration to the guest prior to restarting the vCPU, otherwise
it's possible for the vCPU to dirty memory for the next iteration using the
current iteration's value.
Note, because the guest can be interrupted between the vCPU's load of the
iteration and its write to memory, it's still possible for the guest to
store the previous iteration to memory as the previous iteration may be
cached in a CPU register (which the test accounts for).
Note #2, the test's current approach of collecting dirty entries *before*
stopping the vCPU also results dirty memory having the previous iteration.
E.g. if page is dirtied in the previous iteration, but not the current
iteration, the verification phase will observe the previous iteration's
value in memory. That wart will be remedied in the near future, at which
point synchronizing the iteration before restarting the vCPU will guarantee
the only way for verification to observe stale iterations is due to the
CPU register caching case, or due to a dirty entry being collected before
the store retires.
Link: https://lore.kernel.org/r/20250111003004.1235645-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
If dirty_log_test is run nested, it is possible for entries in the emulated
PML log to appear before the actual memory write is committed to the RAM,
due to the way KVM retries memory writes as a response to a MMU fault.
In addition to that in some very rare cases retry can happen more than
once, which will lead to the test failure because once the write is
finally committed it may have a very outdated iteration value.
Detect and avoid this case.
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20250111003004.1235645-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use KVM_ASM_SAFE_FEP, not simply KVM_ASM_SAFE, for kvm_asm_safe_fep(), as
the non-FEP version doesn't force emulation (stating the obvious). Note,
there are currently no users of kvm_asm_safe_fep().
Fixes: ab3b6a7de8 ("KVM: selftests: Add a forced emulation variation of KVM_ASM_SAFE()")
Link: https://lore.kernel.org/r/20250130163135.270770-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add testcases to x86's Hyper-V CPUID test to verify that KVM advertises
support for features that require an in-kernel local APIC appropriately,
i.e. that KVM hides support from the vCPU-scoped ioctl if the VM doesn't
have an in-kernel local APIC.
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20250118003454.2619573-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Allocate, get, and free the CPUID array in the Hyper-V CPUID test in the
test's core helper, instead of copy+pasting code at each call site. In
addition to deduplicating a small amount of code, restricting visibility
of the array to a single invocation of the core test prevents "leaking" an
array across test cases. Passing in @vcpu to the helper will also allow
pivoting on VM-scoped information without needing to pass more booleans,
e.g. to conditionally assert on features that require an in-kernel APIC.
To avoid use-after-free bugs due to overzealous and careless developers,
opportunstically add a comment to explain that the system-scoped helper
caches the Hyper-V CPUID entries, i.e. that the caller is not responsible
for freeing the memory.
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20250118003454.2619573-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Make the Hyper-V CPUID test's local helper test_hv_cpuid_e2big() static,
it's not used outside of the test (and isn't intended to be).
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20250118003454.2619573-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Print out the expected vs. actual count of the Top-Down Slots event on
failure in the Intel PMU counters test. GUEST_ASSERT() only expands
constants/macros, i.e. only prints the value of the expected count, which
makes it difficult to debug and triage failures.
Link: https://lore.kernel.org/r/20250117234204.2600624-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that validation of event count is tied to hardware support for event,
and not to guest support for an event, drop the unused "event" parameter
from the various helpers.
No functional change intended.
Link: https://lore.kernel.org/r/20250117234204.2600624-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Drop the local "nr_arch_events" in the Intel PMU counters test as the test
asserts that "nr_arch_events <= NR_INTEL_ARCH_EVENTS", and then sets
nr_arch_events to the max of the two. I.e. nr_arch_events is guaranteed
to be NR_INTEL_ARCH_EVENTS for the meat of the test, just use
NR_INTEL_ARCH_EVENTS directly.
No functional change intended.
Link: https://lore.kernel.org/r/20250117234204.2600624-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
In the Intel PMU counters test, only validate the counts for architectural
events that are supported in hardware. If an arch event isn't supported,
the event selector may enable a completely different event, and thus the
logic for the expected count is bogus.
This fixes test failures on pre-Icelake systems due to the encoding for
the architectural Top-Down Slots event corresponding to something else
(at least on the Skylake family of CPUs).
Note, validation relies on *hardware* support, not KVM support and not
guest support. Architectural events are all about enumerating the event
selector encoding; lack of enumeration for an architectural event doesn't
mean the event itself is unsupported, i.e. the event should still count as
expected even if KVM and/or guest CPUID doesn't enumerate the event as
being "architectural".
Note #2, it's desirable to _program_ the architectural event encoding even
if hardware doesn't support the event. The count can't be validated when
the event is fully enabled, but KVM should still let the guest program the
event selector, and the PMC shouldn't count if the event is disabled.
Fixes: 4f1bd6b160 ("KVM: selftests: Test Intel PMU architectural events on gp counters")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202501141009.30c629b4-lkp@intel.com
Debugged-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250117234204.2600624-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Wrap PMU counter test's array of Intel architectrual in a helper function
so that the events can be queried in multiple locations. Add a comment to
explain the need for a wrapper.
No functional change intended.
Link: https://lore.kernel.org/r/20250117234204.2600624-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
There is a spelling mistake in a literal string and in the function
test_get_inital_dirty. Fix them.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Message-ID: <20250204105647.367743-1-colin.i.king@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In some rare situations a non default storage key is already set on the
memory used by the test. Within normal VMs the key is reset / zapped
when the memory is added to the VM. This is not the case for ucontrol
VMs. With the initial iske check removed this test case can work in all
situations. The function of the iske instruction is still validated by
the remaining code.
Fixes: 0185fbc6a2 ("KVM: s390: selftests: Add uc_skey VM test case")
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Link: https://lore.kernel.org/r/20250128131803.1047388-1-schlameuss@linux.ibm.com
Message-ID: <20250128131803.1047388-1-schlameuss@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
With the latest patch, attempting to create a memslot from userspace
will result in an EEXIST error for UCONTROL VMs, instead of EINVAL,
since the new memslot will collide with the internal memslot. There is
no simple way to bring back the previous behaviour.
This is not a problem, but the test needs to be fixed accordingly.
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Link: https://lore.kernel.org/r/20250123144627.312456-5-imbrenda@linux.ibm.com
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20250123144627.312456-5-imbrenda@linux.ibm.com>
* New features:
- Support for non-protected guest in protected mode, achieving near
feature parity with the non-protected mode
- Support for the EL2 timers as part of the ongoing NV support
- Allow control of hardware tracing for nVHE/hVHE
* Improvements, fixes and cleanups:
- Massive cleanup of the debug infrastructure, making it a bit less
awkward and definitely easier to maintain. This should pave the
way for further optimisations
- Complete rewrite of pKVM's fixed-feature infrastructure, aligning
it with the rest of KVM and making the code easier to follow
- Large simplification of pKVM's memory protection infrastructure
- Better handling of RES0/RES1 fields for memory-backed system
registers
- Add a workaround for Qualcomm's Snapdragon X CPUs, which suffer
from a pretty nasty timer bug
- Small collection of cleanups and low-impact fixes
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmeYqJcQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNLUhCACxUTMVQXhfW3qbh0UQxPd7XXvjI+Hm7SPS
wDuVTle4jrFVGHxuZqtgWLmx8hD7bqO965qmFgbevKlwsRY33onH2nbH4i4AcwbA
jcdM4yMHZI4+Qmnb4G5ZJ89IwjAhHPZTBOV5KRhyHQ/qtRciHHtOgJde7II9fd68
uIESg4SSSyUzI47YSEHmGVmiBIhdQhq2qust0m6NPFalEGYstPbpluPQ6R1CsDqK
v14TIAW7t0vSPucBeODxhA5gEa2JsvNi+sqA+DF/ELH2ZqpkuR7rofgMGblaXCSD
JXa5xamRB9dI5zi8vatwfOzYlog+/gzmPqMh/9JXpiDGHxJe0vlz
=tQ8F
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull KVM/arm64 updates from Will Deacon:
"New features:
- Support for non-protected guest in protected mode, achieving near
feature parity with the non-protected mode
- Support for the EL2 timers as part of the ongoing NV support
- Allow control of hardware tracing for nVHE/hVHE
Improvements, fixes and cleanups:
- Massive cleanup of the debug infrastructure, making it a bit less
awkward and definitely easier to maintain. This should pave the way
for further optimisations
- Complete rewrite of pKVM's fixed-feature infrastructure, aligning
it with the rest of KVM and making the code easier to follow
- Large simplification of pKVM's memory protection infrastructure
- Better handling of RES0/RES1 fields for memory-backed system
registers
- Add a workaround for Qualcomm's Snapdragon X CPUs, which suffer
from a pretty nasty timer bug
- Small collection of cleanups and low-impact fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
arm64/sysreg: Get rid of TRFCR_ELx SysregFields
KVM: arm64: nv: Fix doc header layout for timers
KVM: arm64: nv: Apply RESx settings to sysreg reset values
KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors
KVM: arm64: Fix selftests after sysreg field name update
coresight: Pass guest TRFCR value to KVM
KVM: arm64: Support trace filtering for guests
KVM: arm64: coresight: Give TRBE enabled state to KVM
coresight: trbe: Remove redundant disable call
arm64/sysreg/tools: Move TRFCR definitions to sysreg
tools: arm64: Update sysreg.h header files
KVM: arm64: Drop pkvm_mem_transition for host/hyp donations
KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing
KVM: arm64: Drop pkvm_mem_transition for FF-A
KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()
arm64: kvm: Introduce nvhe stack size constants
KVM: arm64: Fix nVHE stacktrace VA bits mask
KVM: arm64: Fix FEAT_MTE in pKVM
Documentation: Update the behaviour of "kvm-arm.mode"
...
- Svvptc, Zabha, and Ziccrse extension support for Guest/VM
- Virtualize SBI system suspend extension for Guest/VM
- Trap related exit statstics as SBI PMU firmware counters for Guest/VM
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmeJ3zYACgkQrUjsVaLH
LAdd4hAAj7MDKBXFfkgrg6M5b3CgjFp/1FbiQQe+gJNEOrDdWJy115imk2IKbEZq
MAHLF1nlavmi3ZxTCGRPamyyVej5A0+oNujl+ktltHXIKPqXVcJvz5Iu15uDJNZf
Ow6Vz74HONl3UKH1H029hFA8oXBTzlxyLv8EXfUJ/HZ/bUeQAfgX52pdvTZ54o4Z
gJy+dODD7LOIh9WSlqsrqRcaZkQ4uWZNzQVuXQyubon/AhyqLZyoX/Kx+eLg0QSG
LwwXQA5ew4fG3heOWGSSVhiBeqKj7Kmk7l3tyWe6f/YUw7BP1EEwY6ufdH2jy6Z3
mEE/3+mYhkBvr6sWCMPgGt8IMGqe6vRPE1SjPRk3xjzQN4a5aSGXA3J2bBHZhdgt
ksGKD0CNhFv/E9LyeQnIylqQqNL9IIb35hrRmVCGzeOJB5PhqDRZiuyyz4LMOgY9
1uY6c5fuSxV0IIZV2Y4oTxZ26dCBkGzR5rrVBSakSF1xWfU+0rzX91FslEUzPyDO
W+RcG9ziFTdCbYkTGlt6sSiXRwkYe/TD6VpDgsxbQECgW/9Itg2BSCZojuy7Lfo9
idhjHLIouruGyQKrJmadUdOuHzLOCX8XMo1oTjlrPudNoILC6GmZs+X7xUUJ6Fzi
mOgxcUBsByzZLhyhPnbwS0o0D7La7HJbuNne8VSHfEjPhr8/yl0=
=EQ0o
-----END PGP SIGNATURE-----
Merge tag 'kvm-riscv-6.14-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.14
- Svvptc, Zabha, and Ziccrse extension support for Guest/VM
- Virtualize SBI system suspend extension for Guest/VM
- Trap related exit statstics as SBI PMU firmware counters for Guest/VM
- Overhaul KVM's CPUID feature infrastructure to replace "governed" features
with per-vCPU tracking of the vCPU's capabailities for all features. Along
the way, refactor the code to make it easier to add/modify features, and
add a variety of self-documenting macro types to again simplify adding new
features and to help readers understand KVM's handling of existing features.
- Rework KVM's handling of VM-Exits during event vectoring to plug holes where
KVM unintentionally puts the vCPU into infinite loops in some scenarios,
e.g. if emulation is triggered by the exit, and to bring parity between VMX
and SVM.
- Add pending request and interrupt injection information to the kvm_exit and
kvm_entry tracepoints respectively.
- Fix a relatively benign flaw where KVM would end up redoing RDPKRU when
loading guest/host PKRU due to a refactoring of the kernel helpers that
didn't account for KVM's pre-checking of the need to do WRPKRU.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmeJngsACgkQOlYIJqCj
N/1dfA/+NIZmnd8OV9Zvc6HGxrzgt4QsM9pmsUmrfkDWefxMYIAMeaW8Vn4CJfRf
zY/UcqyNI7JYxSiuVTckz+Tf54HhqYaLrUwILGCQ49koirZx+aQT1OUfjLroVMlh
ffX1i6GOoLNtxjb9MXM/heLVdUbvmzQMSFkd/AkOH+nrOtDNOiPlZfjHsewj9zrf
BNJGhzvT4M6vc/AsScC7tc0yFD5KKFRv8tVwJ6Zf1nWKyUDOSpMTWkVnq6geKJPZ
iGBZPPNg55Oy1g6uj6VYWmqYTD8Qioz5jtEJ/8pPHdAyIFo21s81bfJc548d+QLh
KfrL1K7TrCOhSAGC3Cb3lTLeq2immmGHaiTBLwGABG4MhpiX4NVpMMdOyFbVLMOS
HIYuwXwDckm1pfU7/w+PgPaakCyPrXQntm+3Y2pvDOoY6e2JbwodK4j8BvvQda35
8TrYKEGFvq5aij7Iw1O9TUoLAocDM/sHIHE6BCazHyzKBIv9xLRFeabiCQ+A1pwv
gZk5u0+j+DPpLdeLhbMYhIXUtr3bvyMYvc+tRkG716f8ubAE3+Kn5BEDo4Ot2DcT
vc+NTRYYWN6zavHiJH3Ddt153yj256JCZhLwCdfbryCQdz3Mpy16m36tgkDRd3lR
QT4IkPQo1Vl/aU0yiE/dhnJgh1rTO26YQjZoHs5Oj16d0HRrKyc=
=32mM
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-misc-6.14' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.14:
- Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities
instead of just those where KVM needs to manage state and/or explicitly
enable the feature in hardware. Along the way, refactor the code to make
it easier to add features, and to make it more self-documenting how KVM
is handling each feature.
- Rework KVM's handling of VM-Exits during event vectoring; this plugs holes
where KVM unintentionally puts the vCPU into infinite loops in some scenarios
(e.g. if emulation is triggered by the exit), and brings parity between VMX
and SVM.
- Add pending request and interrupt injection information to the kvm_exit and
kvm_entry tracepoints respectively.
- Fix a relatively benign flaw where KVM would end up redoing RDPKRU when
loading guest/host PKRU, due to a refactoring of the kernel helpers that
didn't account for KVM's pre-checking of the need to do WRPKRU.
* kvm-arm64/coresight-6.14:
: .
: Trace filtering update from James Clark. From the cover letter:
:
: "The guest filtering rules from the Perf session are now honored for both
: nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the
: start of the Perf session and doing nothing else further, or caching the
: guest value and writing it at guest switch for nVHE. In pKVM, trace is
: now be disabled for both protected and unprotected guests."
: .
KVM: arm64: Fix selftests after sysreg field name update
coresight: Pass guest TRFCR value to KVM
KVM: arm64: Support trace filtering for guests
KVM: arm64: coresight: Give TRBE enabled state to KVM
coresight: trbe: Remove redundant disable call
arm64/sysreg/tools: Move TRFCR definitions to sysreg
tools: arm64: Update sysreg.h header files
Signed-off-by: Marc Zyngier <maz@kernel.org>
1. Clear LLBCTL if secondary mmu mapping changed.
2. Add hypercall service support for usermode VMM.
This is a really small changeset, because the Chinese New Year
(Spring Festival) is coming. Happy New Year!
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmeFGcIWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImern1D/9AZ8M+0nBAaONZaq2qKLC+RaW6
KqvFsR1PUUFzVcQZaHh9OZcx5s4EAH12EaxBH68W0o0ejbTUJp8QXT6cmO9bNFj1
tVaczGACss34kDerrddHisOpimFdaP+ECX4Q43oTc5N7vG6zUu3ijOISnIIxkhHP
RlX/+5Djw0NoaVAkhEj4v+LkY33z5QnDFNI0OjJiHpDepP2vLQ1FD573pLMeqcGs
BVYwZv7DP3SnVajSjRhT/r5qjy9EMjrXRLkIIwyjOUArRaPq/Lfg4CTK85e5MZsR
2GTkdjvh/YArpluRki4FX1cVOwpBbEtC+24/NWB+MPijtnYqMyAoIraZGqJMAzhw
P6W70A15GvBhlhQmvKNai1oXkdZaaT7XDcbFT706Cwhu7LvcNM8kK7VrPc59WLTR
uHO+ehJh0DCpBMC2BKH/8sztGx80u7SB4Ph0ytZCK+uYznTMEiBqRup7E/QLLG+1
EotXv8U4+Bwx/inzMxwi6vR1ZXo0dIDsnvFdSZeA6PC/cSoPzdqCdrXjQT/7HUIu
DNgcsRVL3LFE+A/sDVGb5/w9UPdQfCdO10bu97FkY37ftqp7LvPTlWJvDZJx+Wle
KfErCOM1/ZRQ2knzE7fst58auA3ZFNn3jWRkD/0gJ4X1Fgu63VrYeuc4FL7r8ken
HxKLYOLtD6dOzR5DeA==
=z2VU
-----END PGP SIGNATURE-----
Merge tag 'loongarch-kvm-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.14
1. Clear LLBCTL if secondary mmu mapping changed.
2. Add hypercall service support for usermode VMM.
This is a really small changeset, because the Chinese New Year
(Spring Festival) is coming. Happy New Year!
Fix KVM selftests that check for EL0's 64bit-ness, and use a now
removed definition. Kindly point them at the new one.
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fix a latent bug when the kernel is compiled in debug mode.
Two small UCONTROL fixes and their selftests.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoWuZBM6M3lCBSfTnuARItAMU6BMFAmd+aN0ACgkQuARItAMU
6BOtohAApC3dPKw1pwlxnvZLSHm7TizFpvczHqqyF/sglz1js/YYRDmXPy0wJiw+
mYa6qhhzDD1NsCw+7Fq8Zsh8kh0RjeiBLAJY/651wQUJzkXLed1dqei9uegXsGgi
TqOZct07avQaxrlfH3VHsaE+QOeqQZai5INoEmk7G1U+zwoMBYRNCq2tyULEixCt
8iMg+5cMbXICmiQ9X2KxkslkWwR3+hpSFpOvVATu6R1PO42fLT/Sqxk/YKAQIaq1
9tNkHTUISjz5vLJgOUfuB5kLCQoCwwuONWBREoyKRtWsoy+QdevphIB4LfqDq43T
XOFdSVZa2JNUZa3/HiZJ0UJAk2R1pYXsQJBYgOJMnclqThY677DOetrKR8QNmPwB
+a+2WTLkeJzL7T0a2eChhXqpUd03W6MjAnG8AYAK05imV3PKlaZxiV/Nfzik9ghM
ab75HzaofX6zgJK6VxfEAggMCi84jDmHzlrwwr6DLFqFcsZ7tV5PoPcWIm3f6SpG
l0ARCHXGZ0cIeBwYG7ZIzHVMpzATwea61ov7LLTm9jUAIF0di67ry/e6KY/SJYiP
uKI6S6iyjjUl1VbqK2gAiU8fTscAY9oxd54NYdQ9r3gHYjaWI3ikpWz6EMotIvnh
bDWv0ZC9fF2qlajrenD+nc7q+s2N1UWaH8NYEkPBeMrY9FHmmD4=
=NID3
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-master-6.13-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: three small bugfixes
Fix a latent bug when the kernel is compiled in debug mode.
Two small UCONTROL fixes and their selftests.
Copy KVM-Unit-Tests' x86 helpers for emitting STI and CLI, comments and
all, and use them throughout x86 selftests. The safe_halt() and sti_nop()
logic in particular benefits from centralized comments, as the behavior
isn't obvious unless the reader is already aware of the STI shadow.
Cc: Manali Shukla <Manali.Shukla@amd.com>
Link: https://lore.kernel.org/r/20241220012617.3513898-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>