Armv8.9/v9.4 introduces the feature Hardware managed Access Flag
for Table descriptors (FEAT_HAFT). The feature is indicated by
ID_AA64MMFR1_EL1.HAFDBS == 0b0011 and can be enabled by
TCR2_EL1.HAFT so it has a dependency on FEAT_TCR2.
Adds the Kconfig for FEAT_HAFT and support detecting and enabling
the feature. The feature is enabled in __cpu_setup() before MMU on
just like HA. A CPU capability is added to notify the user of the
feature.
Add definition of P{G,4,U,M}D_TABLE_AF bit and set the AF bit
when creating the page table, which will save the hardware
from having to update them at runtime. This will be ignored if
FEAT_HAFT is not enabled.
The AF bit of table descriptors cannot be managed by the software
per spec, unlike the HA. So this should be used only if it's supported
system wide by system_supports_haft().
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241102104235.62560-4-yangyicong@huawei.com
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
[catalin.marinas@arm.com: added the ID check back to __cpu_setup in case of future CPU errata]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Despite KVM now being able to deal with XS-tagged TLBIs, we still don't
expose these feature bits to KVM.
Plumb in the feature in ID_AA64ISAR1_EL1.
Fixes: 0feec7769a ("KVM: arm64: nv: Add handling of NXS-flavoured TLBI operations")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241031083519.364313-1-maz@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ARMv8.4 adds support for 'Memory Partitioning And Monitoring' (MPAM)
which describes an interface to cache and bandwidth controls wherever
they appear in the system.
Add support to detect MPAM. Like SVE, MPAM has an extra id register that
describes some more properties, including the virtualisation support,
which is optional. Detect this separately so we can detect
mismatched/insane systems, but still use MPAM on the host even if the
virtualisation support is missing.
MPAM needs enabling at the highest implemented exception level, otherwise
the register accesses trap. The 'enabled' flag is accessible to lower
exception levels, but its in a register that traps when MPAM isn't enabled.
The cpufeature 'matches' hook is extended to test this on one of the
CPUs, so that firmware can emulate MPAM as disabled if it is reserved
for use by secure world.
Secondary CPUs that appear late could trip cpufeature's 'lower safe'
behaviour after the MPAM properties have been advertised to user-space.
Add a verify call to ensure late secondaries match the existing CPUs.
(If you have a boot failure that bisects here its likely your CPUs
advertise MPAM in the id registers, but firmware failed to either enable
or MPAM, or emulate the trap as if it were disabled)
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241030160317.2528209-4-joey.gouly@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
We have filled all 64 bits of AT_HWCAP2 so in order to support discovery of
further features provide the framework to use the already defined AT_HWCAP3
for further CPU features.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20241004-arm64-elf-hwcap3-v2-2-799d1daad8b0@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Provide a hwcap to enable userspace to detect support for GCS.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Yury Khrustalev <yury.khrustalev@arm.com>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-18-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add a cpufeature for GCS, allowing other code to conditionally support it
at runtime.
Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-12-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Expose a HWCAP and ID_AA64MMFR3_EL1_S1POE to userspace, so they can be used to
check if the CPU supports the feature.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20240822151113.1479789-12-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
This indicates if the system supports POE. This is a CPUCAP_BOOT_CPU_FEATURE
as the boot CPU will enable POE if it has it, so secondary CPUs must also
have this feature.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20240822151113.1479789-6-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
This replaces custom macros usage (i.e ID_AA64PFR0_EL1_ELx_64BIT_ONLY and
ID_AA64PFR0_EL1_ELx_32BIT_64BIT) and instead directly uses register fields
from ID_AA64PFR0_EL1 sysreg definition. Finally let's drop off both these
custom macros as they are now redundant.
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20240613102710.3295108-3-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Enable ECBHB bits in ID_AA64MMFR1 register as per ARM DDI 0487K.a
specification.
When guest OS read ID_AA64MMFR1_EL1, kvm emulate this reg using
ftr_id_aa64mmfr1 and always return ID_AA64MMFR1_EL1.ECBHB=0 to guest.
It results in guest syscall jump to tramp ventry, which is not needed
in implementation with ID_AA64MMFR1_EL1.ECBHB=1.
Let's make the guest syscall process the same as the host.
Signed-off-by: Nianyao Tang <tangnianyao@huawei.com>
Link: https://lore.kernel.org/r/20240611122049.2758600-1-tangnianyao@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cortex-X4 and Neoverse-V3 suffer from errata whereby an MSR to the SSBS
special-purpose register does not affect subsequent speculative
instructions, permitting speculative store bypassing for a window of
time. This is described in their Software Developer Errata Notice (SDEN)
documents:
* Cortex-X4 SDEN v8.0, erratum 3194386:
https://developer.arm.com/documentation/SDEN-2432808/0800/
* Neoverse-V3 SDEN v6.0, erratum 3312417:
https://developer.arm.com/documentation/SDEN-2891958/0600/
To workaround these errata, it is necessary to place a speculation
barrier (SB) after MSR to the SSBS special-purpose register. This patch
adds the requisite SB after writes to SSBS within the kernel, and hides
the presence of SSBS from EL0 such that userspace software which cares
about SSBS will manipulate this via prctl(PR_GET_SPECULATION_CTRL, ...).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240508081400.235362-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
* Changes to FPU handling came in via the main s390 pull request
* Only deliver to the guest the SCLP events that userspace has
requested.
* More virtual vs physical address fixes (only a cleanup since
virtual and physical address spaces are currently the same).
* Fix selftests undefined behavior.
x86:
* Fix a restriction that the guest can't program a PMU event whose
encoding matches an architectural event that isn't included in the
guest CPUID. The enumeration of an architectural event only says
that if a CPU supports an architectural event, then the event can be
programmed *using the architectural encoding*. The enumeration does
NOT say anything about the encoding when the CPU doesn't report support
the event *in general*. It might support it, and it might support it
using the same encoding that made it into the architectural PMU spec.
* Fix a variety of bugs in KVM's emulation of RDPMC (more details on
individual commits) and add a selftest to verify KVM correctly emulates
RDMPC, counter availability, and a variety of other PMC-related
behaviors that depend on guest CPUID and therefore are easier to
validate with selftests than with custom guests (aka kvm-unit-tests).
* Zero out PMU state on AMD if the virtual PMU is disabled, it does not
cause any bug but it wastes time in various cases where KVM would check
if a PMC event needs to be synthesized.
* Optimize triggering of emulated events, with a nice ~10% performance
improvement in VM-Exit microbenchmarks when a vPMU is exposed to the
guest.
* Tighten the check for "PMI in guest" to reduce false positives if an NMI
arrives in the host while KVM is handling an IRQ VM-Exit.
* Fix a bug where KVM would report stale/bogus exit qualification information
when exiting to userspace with an internal error exit code.
* Add a VMX flag in /proc/cpuinfo to report 5-level EPT support.
* Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for
read, e.g. to avoid serializing vCPUs when userspace deletes a memslot.
* Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM
doesn't support yielding in the middle of processing a zap, and 1GiB
granularity resulted in multi-millisecond lags that are quite impolite
for CONFIG_PREEMPT kernels.
* Allocate write-tracking metadata on-demand to avoid the memory overhead when
a kernel is built with i915 virtualization support but the workloads use
neither shadow paging nor i915 virtualization.
* Explicitly initialize a variety of on-stack variables in the emulator that
triggered KMSAN false positives.
* Fix the debugregs ABI for 32-bit KVM.
* Rework the "force immediate exit" code so that vendor code ultimately decides
how and when to force the exit, which allowed some optimization for both
Intel and AMD.
* Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if
vCPU creation ultimately failed, causing extra unnecessary work.
* Cleanup the logic for checking if the currently loaded vCPU is in-kernel.
* Harden against underflowing the active mmu_notifier invalidation
count, so that "bad" invalidations (usually due to bugs elsehwere in the
kernel) are detected earlier and are less likely to hang the kernel.
x86 Xen emulation:
* Overlay pages can now be cached based on host virtual address,
instead of guest physical addresses. This removes the need to
reconfigure and invalidate the cache if the guest changes the
gpa but the underlying host virtual address remains the same.
* When possible, use a single host TSC value when computing the deadline for
Xen timers in order to improve the accuracy of the timer emulation.
* Inject pending upcall events when the vCPU software-enables its APIC to fix
a bug where an upcall can be lost (and to follow Xen's behavior).
* Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
events fails, e.g. if the guest has aliased xAPIC IDs.
RISC-V:
* Support exception and interrupt handling in selftests
* New self test for RISC-V architectural timer (Sstc extension)
* New extension support (Ztso, Zacas)
* Support userspace emulation of random number seed CSRs.
ARM:
* Infrastructure for building KVM's trap configuration based on the
architectural features (or lack thereof) advertised in the VM's ID
registers
* Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
x86's WC) at stage-2, improving the performance of interacting with
assigned devices that can tolerate it
* Conversion of KVM's representation of LPIs to an xarray, utilized to
address serialization some of the serialization on the LPI injection
path
* Support for _architectural_ VHE-only systems, advertised through the
absence of FEAT_E2H0 in the CPU's ID register
* Miscellaneous cleanups, fixes, and spelling corrections to KVM and
selftests
LoongArch:
* Set reserved bits as zero in CPUCFG.
* Start SW timer only when vcpu is blocking.
* Do not restart SW timer when it is expired.
* Remove unnecessary CSR register saving during enter guest.
* Misc cleanups and fixes as usual.
Generic:
* cleanup Kconfig by removing CONFIG_HAVE_KVM, which was basically always
true on all architectures except MIPS (where Kconfig determines the
available depending on CPU capabilities). It is replaced either by
an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM)
everywhere else.
* Factor common "select" statements in common code instead of requiring
each architecture to specify it
* Remove thoroughly obsolete APIs from the uapi headers.
* Move architecture-dependent stuff to uapi/asm/kvm.h
* Always flush the async page fault workqueue when a work item is being
removed, especially during vCPU destruction, to ensure that there are no
workers running in KVM code when all references to KVM-the-module are gone,
i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded.
* Grab a reference to the VM's mm_struct in the async #PF worker itself instead
of gifting the worker a reference, so that there's no need to remember
to *conditionally* clean up after the worker.
Selftests:
* Reduce boilerplate especially when utilize selftest TAP infrastructure.
* Add basic smoke tests for SEV and SEV-ES, along with a pile of library
support for handling private/encrypted/protected memory.
* Fix benign bugs where tests neglect to close() guest_memfd files.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmX0iP8UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroND7wf+JZoNvwZ+bmwWe/4jn/YwNoYi/C5z
eypn8M1gsWEccpCpqPBwznVm9T29rF4uOlcMvqLEkHfTpaL1EKUUjP1lXPz/ileP
6a2RdOGxAhyTiFC9fjy+wkkjtLbn1kZf6YsS0hjphP9+w0chNbdn0w81dFVnXryd
j7XYI8R/bFAthNsJOuZXSEjCfIHxvTTG74OrTf1B1FEBB+arPmrgUeJftMVhffQK
Sowgg8L/Ii/x6fgV5NZQVSIyVf1rp8z7c6UaHT4Fwb0+RAMW8p9pYv9Qp1YkKp8y
5j0V9UzOHP7FRaYimZ5BtwQoqiZXYylQ+VuU/Y2f4X85cvlLzSqxaEMAPA==
=mqOV
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"S390:
- Changes to FPU handling came in via the main s390 pull request
- Only deliver to the guest the SCLP events that userspace has
requested
- More virtual vs physical address fixes (only a cleanup since
virtual and physical address spaces are currently the same)
- Fix selftests undefined behavior
x86:
- Fix a restriction that the guest can't program a PMU event whose
encoding matches an architectural event that isn't included in the
guest CPUID. The enumeration of an architectural event only says
that if a CPU supports an architectural event, then the event can
be programmed *using the architectural encoding*. The enumeration
does NOT say anything about the encoding when the CPU doesn't
report support the event *in general*. It might support it, and it
might support it using the same encoding that made it into the
architectural PMU spec
- Fix a variety of bugs in KVM's emulation of RDPMC (more details on
individual commits) and add a selftest to verify KVM correctly
emulates RDMPC, counter availability, and a variety of other
PMC-related behaviors that depend on guest CPUID and therefore are
easier to validate with selftests than with custom guests (aka
kvm-unit-tests)
- Zero out PMU state on AMD if the virtual PMU is disabled, it does
not cause any bug but it wastes time in various cases where KVM
would check if a PMC event needs to be synthesized
- Optimize triggering of emulated events, with a nice ~10%
performance improvement in VM-Exit microbenchmarks when a vPMU is
exposed to the guest
- Tighten the check for "PMI in guest" to reduce false positives if
an NMI arrives in the host while KVM is handling an IRQ VM-Exit
- Fix a bug where KVM would report stale/bogus exit qualification
information when exiting to userspace with an internal error exit
code
- Add a VMX flag in /proc/cpuinfo to report 5-level EPT support
- Rework TDP MMU root unload, free, and alloc to run with mmu_lock
held for read, e.g. to avoid serializing vCPUs when userspace
deletes a memslot
- Tear down TDP MMU page tables at 4KiB granularity (used to be
1GiB). KVM doesn't support yielding in the middle of processing a
zap, and 1GiB granularity resulted in multi-millisecond lags that
are quite impolite for CONFIG_PREEMPT kernels
- Allocate write-tracking metadata on-demand to avoid the memory
overhead when a kernel is built with i915 virtualization support
but the workloads use neither shadow paging nor i915 virtualization
- Explicitly initialize a variety of on-stack variables in the
emulator that triggered KMSAN false positives
- Fix the debugregs ABI for 32-bit KVM
- Rework the "force immediate exit" code so that vendor code
ultimately decides how and when to force the exit, which allowed
some optimization for both Intel and AMD
- Fix a long-standing bug where kvm_has_noapic_vcpu could be left
elevated if vCPU creation ultimately failed, causing extra
unnecessary work
- Cleanup the logic for checking if the currently loaded vCPU is
in-kernel
- Harden against underflowing the active mmu_notifier invalidation
count, so that "bad" invalidations (usually due to bugs elsehwere
in the kernel) are detected earlier and are less likely to hang the
kernel
x86 Xen emulation:
- Overlay pages can now be cached based on host virtual address,
instead of guest physical addresses. This removes the need to
reconfigure and invalidate the cache if the guest changes the gpa
but the underlying host virtual address remains the same
- When possible, use a single host TSC value when computing the
deadline for Xen timers in order to improve the accuracy of the
timer emulation
- Inject pending upcall events when the vCPU software-enables its
APIC to fix a bug where an upcall can be lost (and to follow Xen's
behavior)
- Fall back to the slow path instead of warning if "fast" IRQ
delivery of Xen events fails, e.g. if the guest has aliased xAPIC
IDs
RISC-V:
- Support exception and interrupt handling in selftests
- New self test for RISC-V architectural timer (Sstc extension)
- New extension support (Ztso, Zacas)
- Support userspace emulation of random number seed CSRs
ARM:
- Infrastructure for building KVM's trap configuration based on the
architectural features (or lack thereof) advertised in the VM's ID
registers
- Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
x86's WC) at stage-2, improving the performance of interacting with
assigned devices that can tolerate it
- Conversion of KVM's representation of LPIs to an xarray, utilized
to address serialization some of the serialization on the LPI
injection path
- Support for _architectural_ VHE-only systems, advertised through
the absence of FEAT_E2H0 in the CPU's ID register
- Miscellaneous cleanups, fixes, and spelling corrections to KVM and
selftests
LoongArch:
- Set reserved bits as zero in CPUCFG
- Start SW timer only when vcpu is blocking
- Do not restart SW timer when it is expired
- Remove unnecessary CSR register saving during enter guest
- Misc cleanups and fixes as usual
Generic:
- Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
always true on all architectures except MIPS (where Kconfig
determines the available depending on CPU capabilities). It is
replaced either by an architecture-dependent symbol for MIPS, and
IS_ENABLED(CONFIG_KVM) everywhere else
- Factor common "select" statements in common code instead of
requiring each architecture to specify it
- Remove thoroughly obsolete APIs from the uapi headers
- Move architecture-dependent stuff to uapi/asm/kvm.h
- Always flush the async page fault workqueue when a work item is
being removed, especially during vCPU destruction, to ensure that
there are no workers running in KVM code when all references to
KVM-the-module are gone, i.e. to prevent a very unlikely
use-after-free if kvm.ko is unloaded
- Grab a reference to the VM's mm_struct in the async #PF worker
itself instead of gifting the worker a reference, so that there's
no need to remember to *conditionally* clean up after the worker
Selftests:
- Reduce boilerplate especially when utilize selftest TAP
infrastructure
- Add basic smoke tests for SEV and SEV-ES, along with a pile of
library support for handling private/encrypted/protected memory
- Fix benign bugs where tests neglect to close() guest_memfd files"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
selftests: kvm: remove meaningless assignments in Makefiles
KVM: riscv: selftests: Add Zacas extension to get-reg-list test
RISC-V: KVM: Allow Zacas extension for Guest/VM
KVM: riscv: selftests: Add Ztso extension to get-reg-list test
RISC-V: KVM: Allow Ztso extension for Guest/VM
RISC-V: KVM: Forward SEED CSR access to user space
KVM: riscv: selftests: Add sstc timer test
KVM: riscv: selftests: Change vcpu_has_ext to a common function
KVM: riscv: selftests: Add guest helper to get vcpu id
KVM: riscv: selftests: Add exception handling support
LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
LoongArch: KVM: Do not restart SW timer when it is expired
LoongArch: KVM: Start SW timer only when vcpu is blocking
LoongArch: KVM: Set reserved bits as zero in CPUCFG
KVM: selftests: Explicitly close guest_memfd files in some gmem tests
KVM: x86/xen: fix recursive deadlock in timer injection
KVM: pfncache: simplify locking and make more self-contained
KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
KVM: x86/xen: improve accuracy of Xen timers
...
* for-next/stage1-lpa2: (48 commits)
: Add support for LPA2 and WXN and stage 1
arm64/mm: Avoid ID mapping of kpti flag if it is no longer needed
arm64/mm: Use generic __pud_free() helper in pud_free() implementation
arm64: gitignore: ignore relacheck
arm64: Use Signed/Unsigned enums for TGRAN{4,16,64} and VARange
arm64: mm: Make PUD folding check in set_pud() a runtime check
arm64: mm: add support for WXN memory translation attribute
mm: add arch hook to validate mmap() prot flags
arm64: defconfig: Enable LPA2 support
arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs
arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels
arm64: ptdump: Deal with translation levels folded at runtime
arm64: ptdump: Disregard unaddressable VA space
arm64: mm: Add support for folding PUDs at runtime
arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging
arm64: mm: Add 5 level paging support to fixmap and swapper handling
arm64: Enable LPA2 at boot if supported by the system
arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion
arm64: mm: Add definitions to support 5 levels of paging
arm64: mm: Add LPA2 support to phys<->pte conversion routines
arm64: mm: Wire up TCR.DS bit to PTE shareability fields
...
* arm64/for-next/perf: (39 commits)
docs: perf: Fix build warning of hisi-pcie-pmu.rst
perf: starfive: Only allow COMPILE_TEST for 64-bit architectures
MAINTAINERS: Add entry for StarFive StarLink PMU
docs: perf: Add description for StarFive's StarLink PMU
dt-bindings: perf: starfive: Add JH8100 StarLink PMU
perf: starfive: Add StarLink PMU support
docs: perf: Update usage for target filter of hisi-pcie-pmu
drivers/perf: hisi_pcie: Merge find_related_event() and get_event_idx()
drivers/perf: hisi_pcie: Relax the check on related events
drivers/perf: hisi_pcie: Check the target filter properly
drivers/perf: hisi_pcie: Add more events for counting TLP bandwidth
drivers/perf: hisi_pcie: Fix incorrect counting under metric mode
drivers/perf: hisi_pcie: Introduce hisi_pcie_pmu_get_event_ctrl_val()
drivers/perf: hisi_pcie: Rename hisi_pcie_pmu_{config,clear}_filter()
drivers/perf: hisi: Enable HiSilicon Erratum 162700402 quirk for HIP09
perf/arm_cspmu: Add devicetree support
dt-bindings/perf: Add Arm CoreSight PMU
perf/arm_cspmu: Simplify counter reset
perf/arm_cspmu: Simplify attribute groups
perf/arm_cspmu: Simplify initialisation
...
* for-next/reorg-va-space:
: Reorganise the arm64 kernel VA space in preparation for LPA2 support
: (52-bit VA/PA).
arm64: kaslr: Adjust randomization range dynamically
arm64: mm: Reclaim unused vmemmap region for vmalloc use
arm64: vmemmap: Avoid base2 order of struct page size to dimension region
arm64: ptdump: Discover start of vmemmap region at runtime
arm64: ptdump: Allow all region boundaries to be defined at boot time
arm64: mm: Move fixmap region above vmemmap region
arm64: mm: Move PCI I/O emulation region above the vmemmap region
* for-next/rust-for-arm64:
: Enable Rust support for arm64
arm64: rust: Enable Rust support for AArch64
rust: Refactor the build target to allow the use of builtin targets
* for-next/misc:
: Miscellaneous arm64 patches
ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
arm64: Remove enable_daif macro
arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exception
arm64: cpufeatures: Clean up temporary variable to simplify code
arm64: Update setup_arch() comment on interrupt masking
arm64: remove unnecessary ifdefs around is_compat_task()
arm64: ftrace: Don't forbid CALL_OPS+CC_OPTIMIZE_FOR_SIZE with Clang
arm64/sme: Ensure that all fields in SMCR_EL1 are set to known values
arm64/sve: Ensure that all fields in ZCR_EL1 are set to known values
arm64/sve: Document that __SVE_VQ_MAX is much larger than needed
arm64: make member of struct pt_regs and it's offset macro in the same order
arm64: remove unneeded BUILD_BUG_ON assertion
arm64: kretprobes: acquire the regs via a BRK exception
arm64: io: permit offset addressing
arm64: errata: Don't enable workarounds for "rare" errata by default
* for-next/daif-cleanup:
: Clean up DAIF handling for EL0 returns
arm64: Unmask Debug + SError in do_notify_resume()
arm64: Move do_notify_resume() to entry-common.c
arm64: Simplify do_notify_resume() DAIF masking
* for-next/kselftest:
: Miscellaneous arm64 kselftest patches
kselftest/arm64: Test that ptrace takes effect in the target process
* for-next/documentation:
: arm64 documentation patches
arm64/sme: Remove spurious 'is' in SME documentation
arm64/fp: Clarify effect of setting an unsupported system VL
arm64/sme: Fix cut'n'paste in ABI document
arm64/sve: Remove bitrotted comment about syscall behaviour
* for-next/sysreg:
: sysreg updates
arm64/sysreg: Update ID_AA64DFR0_EL1 register
arm64/sysreg: Update ID_DFR0_EL1 register fields
arm64/sysreg: Add register fields for ID_AA64DFR1_EL1
* for-next/dpisa:
: Support for 2023 dpISA extensions
kselftest/arm64: Add 2023 DPISA hwcap test coverage
kselftest/arm64: Add basic FPMR test
kselftest/arm64: Handle FPMR context in generic signal frame parser
arm64/hwcap: Define hwcaps for 2023 DPISA features
arm64/ptrace: Expose FPMR via ptrace
arm64/signal: Add FPMR signal handling
arm64/fpsimd: Support FEAT_FPMR
arm64/fpsimd: Enable host kernel access to FPMR
arm64/cpufeature: Hook new identification registers up to cpufeature
The 2023 architecture extensions include a large number of floating point
features, most of which simply add new instructions. Add hwcaps so that
userspace can enumerate these features.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-6-c568edc8ed7f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
FEAT_FPMR defines a new EL0 accessible register FPMR use to configure the
FP8 related features added to the architecture at the same time. Detect
support for this register and context switch it for EL0 when present.
Due to the sharing of responsibility for saving floating point state
between the host kernel and KVM FP8 support is not yet implemented in KVM
and a stub similar to that used for SVCR is provided for FPMR in order to
avoid bisection issues. To make it easier to share host state with the
hypervisor we store FPMR as a hardened usercopy field in uw (along with
some padding).
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-3-c568edc8ed7f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The 2023 architecture extensions have defined several new ID registers,
hook them up to the cpufeature code so we can add feature checks and hwcaps
based on their contents.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-1-c568edc8ed7f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Clean up one temporary variable to simplifiy code in capability
detection.
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20240229105208.456704-1-liaochang1@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Open-coding the feature matching parameters for LVA/LVA2 leads to
issues with upcoming changes to the cpufeature code.
By making TGRAN{4,16,64} and VARange signed/unsigned as per the
architecture, we can use the existing macros, making the feature
match robust against those changes.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Update Kconfig to permit 4k and 16k granule configurations to be built
with 52-bit virtual addressing, now that all the prerequisites are in
place.
While at it, update the feature description so it matches on the
appropriate feature bits depending on the page size. For simplicity,
let's just keep ARM64_HAS_VA52 as the feature name.
Note that LPA2 based 52-bit virtual addressing requires 52-bit physical
addressing support to be enabled as well, as programming TCR.TxSZ to
values below 16 is not allowed unless TCR.DS is set, which is what
activates the 52-bit physical addressing support.
While supporting the converse (52-bit physical addressing without 52-bit
virtual addressing) would be possible in principle, let's keep things
simple, by only allowing these features to be enabled at the same time.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-85-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to support LPA2 on 16k pages in a way that permits non-LPA2
systems to run the same kernel image, we have to be able to fall back to
at most 48 bits of virtual addressing.
Falling back to 48 bits would result in a level 0 with only 2 entries,
which is suboptimal in terms of TLB utilization. So instead, let's fall
back to 47 bits in that case. This means we need to be able to fold PUDs
dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
on LPA2 with 4k pages.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-81-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add support for 5 level paging in the G-to-nG routine that creates its
own temporary page tables to traverse the swapper page tables. Also add
support for running the 5 level configuration with the top level folded
at runtime, to support CPUs that do not implement the LPA2 extension.
While at it, wire up the level skipping logic so it will also trigger on
4 level configurations with LPA2 enabled at build time but not active at
runtime, as we'll fall back to 3 level paging in that case.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-77-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add support for overriding the VARange field of the MMFR2 CPU ID
register. This permits the associated LVA feature to be overridden early
enough for the boot code that creates the kernel mapping to take it into
account.
Given that LPA2 implies LVA, disabling the latter should disable the
former as well. So override the ID_AA64MMFR0.TGran field of the current
page size as well if it advertises support for 52-bit addressing.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-71-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.
Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.
On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.
Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for moving the first assignment of arm64_use_ng_mappings
to an earlier stage in the boot, ensure that kaslr_requires_kpti() is
accessible without relying on the core kernel's view on whether or not
KASLR is enabled. So make it a static inline, and move the
kaslr_enabled() check out of it and into the callers, one of which will
disappear in a subsequent patch.
Once/when support for the obsolete ThunderX 1 platform is dropped, this
check reduces to a E0PD feature check on the local CPU.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-61-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add some helpers to extract and apply feature overrides to the bare
idreg values. This involves inspecting the value and mask of the
specific field that we are interested in, given that an override
value/mask pair might be invalid for one field but valid for another.
Then, wire up the new helper for the hVHE test - note that we can drop
the sysreg test here, as the override will be invalid when trying to
enable hVHE on non-VHE capable hardware.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-55-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to allow the CPU feature override detection code to run even
earlier, move the feature override global variables into BSS, which is
the only part of the static kernel image that is mapped read-write in
the initial ID map.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-52-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Using this_cpu_has_cap() has the potential to go wrong when
used system-wide on a preemptible kernel. Instead, use the
__system_matches_cap() helper when checking for FEAT_NV in the
FEAT_NV1 probing helper.
Fixes: 3673d01a2f ("arm64: cpufeatures: Only check for NV1 if NV is present")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/kvmarm/86bk8k5ts3.wl-maz@kernel.org/
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
We handle ID_AA64MMFR4_EL1.E2H0 being 0 as NV1 being present.
However, this is only true if FEAT_NV is implemented.
Add the required check to has_nv1(), avoiding spuriously advertising
NV1 on HW that doesn't have NV at all.
Fixes: da9af5071b ("arm64: cpufeature: Detect HCR_EL2.NV1 being RES0")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240212144736.1933112-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
When triggering a CPU hotplug scenario, we reparse the CPU feature
with SCOPE_LOCAL_CPU, for which we use __read_sysreg_by_encoding()
to get the HW value for this CPU.
As it turns out, we're missing the handling for ID_AA64MMFR4_EL1,
and trigger a BUG(). Funnily enough, Marek isn't completely happy
about that.
Add the damn register to the list.
Fixes: 805bb61f82 ("arm64: cpufeature: Add ID_AA64MMFR4_EL1 handling")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240212144736.1933112-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Although the Apple M2 family of CPUs can have HCR_EL2.NV1 being
set and clear, with the change in trap behaviour being OK, they
explode spectacularily on an EL2 S1 page table using the nVHE
format. This is no good.
Let's pretend this HW doesn't have NV1, and move along.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240122181344.258974-11-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
A variant of FEAT_E2H0 not being implemented exists in the form of
HCR_EL2.E2H being RES1 *and* HCR_EL2.NV1 being RES0 (indicating that
only VHE is supported on the host and nested guests).
Add the necessary infrastructure for this new CPU capability.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240122181344.258974-7-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Add ID_AA64MMFR4_EL1 to the list of idregs the kernel knows about,
and describe the E2H0 field.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240122181344.258974-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
When a field gets overriden, the kernel indicates the result of
the override in dmesg. This works well with unsigned fields, but
results in a pretty ugly output when the field is signed.
Truncate the field to its width before displaying it.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240122181344.258974-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Although we've had signed values for some features such as PMUv3
and FP, the code that handles the comparaison with some limit
has a couple of annoying issues:
- the min_field_value is always unsigned, meaning that we cannot
easily compare it with a negative value
- it is not possible to have a range of values, let alone a range
of negative values
Fix this by:
- adding an upper limit to the comparison, defaulting to all bits
being set to the maximum positive value
- ensuring that the signess of the min and max values are taken into
account
A ARM64_CPUID_FIELDS_NEG() macro is provided for signed features, but
nothing is using it yet.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240122181344.258974-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
- Use memdup_array_user() to harden against overflow.
- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
- Clean up Kconfigs that all KVM architectures were selecting
- New functionality around "guest_memfd", a new userspace API that
creates an anonymous file and returns a file descriptor that refers
to it. guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be resized.
guest_memfd files do however support PUNCH_HOLE, which can be used to
switch a memory area between guest_memfd and regular anonymous memory.
- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
per-page attributes for a given page of guest memory; right now the
only attribute is whether the guest expects to access memory via
guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
x86:
- Support for "software-protected VMs" that can use the new guest_memfd
and page attributes infrastructure. This is mostly useful for testing,
since there is no pKVM-like infrastructure to provide a meaningfully
reduced TCB.
- Fix a relatively benign off-by-one error when splitting huge pages during
CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
- Use more generic lockdep assertions in paths that don't actually care
about whether the caller is a reader or a writer.
- let Xen guests opt out of having PV clock reported as "based on a stable TSC",
because some of them don't expect the "TSC stable" bit (added to the pvclock
ABI by KVM, but never set by Xen) to be set.
- Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
flushes on nested transitions, i.e. always satisfies flush requests. This
allows running bleeding edge versions of VMware Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
- On AMD machines with vNMI, always rely on hardware instead of intercepting
IRET in some cases to detect unmasking of NMIs
- Support for virtualizing Linear Address Masking (LAM)
- Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events using a
dedicated field instead of snapshotting the "previous" counter. If the
hardware PMC count triggers overflow that is recognized in the same VM-Exit
that KVM manually bumps an event count, KVM would pend PMIs for both the
hardware-triggered overflow and for KVM-triggered overflow.
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be problematic for
subsystems that require no regressions for W=1 builds.
- Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
"features".
- Don't force a masterclock update when a vCPU synchronizes to the current TSC
generation, as updating the masterclock can cause kvmclock's time to "jump"
unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
partly as a super minor optimization, but mostly to make KVM play nice with
position independent executable builds.
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
at build time.
ARM64:
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
base granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with
a prefix branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV
support to that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Loongarch:
- Optimization for memslot hugepage checking
- Cleanup and fix some HW/SW timer issues
- Add LSX/LASX (128bit/256bit SIMD) support
RISC-V:
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list selftest
- Support for reporting steal time along with selftest
s390:
- Bugfixes
Selftests:
- Fix an annoying goof where the NX hugepage test prints out garbage
instead of the magic token needed to run the test.
- Fix build errors when a header is delete/moved due to a missing flag
in the Makefile.
- Detect if KVM bugged/killed a selftest's VM and print out a helpful
message instead of complaining that a random ioctl() failed.
- Annotate the guest printf/assert helpers with __printf(), and fix the
various bugs that were lurking due to lack of said annotation.
There are two non-KVM patches buried in the middle of guest_memfd support:
fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
The first is small and mostly suggested-by Christian Brauner; the second
a bit less so but it was written by an mm person (Vlastimil Babka).
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+
6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML
wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE
4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi
rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4
a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA==
=66Ws
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"Generic:
- Use memdup_array_user() to harden against overflow.
- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
architectures.
- Clean up Kconfigs that all KVM architectures were selecting
- New functionality around "guest_memfd", a new userspace API that
creates an anonymous file and returns a file descriptor that refers
to it. guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be
resized. guest_memfd files do however support PUNCH_HOLE, which can
be used to switch a memory area between guest_memfd and regular
anonymous memory.
- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
per-page attributes for a given page of guest memory; right now the
only attribute is whether the guest expects to access memory via
guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
TDX or ARM64 pKVM is checked by firmware or hypervisor that
guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
the case of pKVM).
x86:
- Support for "software-protected VMs" that can use the new
guest_memfd and page attributes infrastructure. This is mostly
useful for testing, since there is no pKVM-like infrastructure to
provide a meaningfully reduced TCB.
- Fix a relatively benign off-by-one error when splitting huge pages
during CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in
non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
a non-huge SPTE.
- Use more generic lockdep assertions in paths that don't actually
care about whether the caller is a reader or a writer.
- let Xen guests opt out of having PV clock reported as "based on a
stable TSC", because some of them don't expect the "TSC stable" bit
(added to the pvclock ABI by KVM, but never set by Xen) to be set.
- Revert a bogus, made-up nested SVM consistency check for
TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM
always flushes on nested transitions, i.e. always satisfies flush
requests. This allows running bleeding edge versions of VMware
Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV
support.
- On AMD machines with vNMI, always rely on hardware instead of
intercepting IRET in some cases to detect unmasking of NMIs
- Support for virtualizing Linear Address Masking (LAM)
- Fix a variety of vPMU bugs where KVM fail to stop/reset counters
and other state prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events
using a dedicated field instead of snapshotting the "previous"
counter. If the hardware PMC count triggers overflow that is
recognized in the same VM-Exit that KVM manually bumps an event
count, KVM would pend PMIs for both the hardware-triggered overflow
and for KVM-triggered overflow.
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be
problematic for subsystems that require no regressions for W=1
builds.
- Advertise all of the host-supported CPUID bits that enumerate
IA32_SPEC_CTRL "features".
- Don't force a masterclock update when a vCPU synchronizes to the
current TSC generation, as updating the masterclock can cause
kvmclock's time to "jump" unexpectedly, e.g. when userspace
hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter
fault paths, partly as a super minor optimization, but mostly to
make KVM play nice with position independent executable builds.
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the
code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
"emulation" at build time.
ARM64:
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with a prefix
branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV support to
that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Loongarch:
- Optimization for memslot hugepage checking
- Cleanup and fix some HW/SW timer issues
- Add LSX/LASX (128bit/256bit SIMD) support
RISC-V:
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list
selftest
- Support for reporting steal time along with selftest
s390:
- Bugfixes
Selftests:
- Fix an annoying goof where the NX hugepage test prints out garbage
instead of the magic token needed to run the test.
- Fix build errors when a header is delete/moved due to a missing
flag in the Makefile.
- Detect if KVM bugged/killed a selftest's VM and print out a helpful
message instead of complaining that a random ioctl() failed.
- Annotate the guest printf/assert helpers with __printf(), and fix
the various bugs that were lurking due to lack of said annotation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
x86/kvm: Do not try to disable kvmclock if it was not enabled
KVM: x86: add missing "depends on KVM"
KVM: fix direction of dependency on MMU notifiers
KVM: introduce CONFIG_KVM_COMMON
KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
RISC-V: KVM: selftests: Add get-reg-list test for STA registers
RISC-V: KVM: selftests: Add steal_time test support
RISC-V: KVM: selftests: Add guest_sbi_probe_extension
RISC-V: KVM: selftests: Move sbi_ecall to processor.c
RISC-V: KVM: Implement SBI STA extension
RISC-V: KVM: Add support for SBI STA registers
RISC-V: KVM: Add support for SBI extension registers
RISC-V: KVM: Add SBI STA info to vcpu_arch
RISC-V: KVM: Add steal-update vcpu request
RISC-V: KVM: Add SBI STA extension skeleton
RISC-V: paravirt: Implement steal-time support
RISC-V: Add SBI STA extension definitions
RISC-V: paravirt: Add skeleton for pv-time support
RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
...
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
base granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with
a prefix branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV
support to that version of the architecture.
- A small set of vgic fixes and associated cleanups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmWX4wUACgkQI9DQutE9
ekM0DxAAvOJtM+m8ahv2tCSHZpwowkuKBBc7JWI75l4befHEOSvYMZwQwejrequa
lPwLgx9t0sGjba+tRGv1JZMtnUBjV4V/lcrhX95AYTF5dfg7vbuTxUh/YFu1CaQ/
MkuKVJ74PUWqpvDYSzwW8Jjqu6RskjW0HqVPMbFkmUWWc8cgExc8XD9M+nu0SrNT
g5261KD53CUeyNaR0/+zkaHouq2Skeqw/u2d5OLdnY23hINMZ0qR1jYHj935suYy
YrMTiMje1h/fs7YXWra4LmMcsg0V+3LZVQJXwRARrZdk2xkW5w+eLPIYjVqcA7aT
VwhrtzjEzD56trrSZClOpj7MSVfQ8OjV7BgvSUpgLT5+kjVrFLIEMIOakiTOCoIJ
weweRawTyomUoIsT1EkRmRYQkPH3Z552tcrztD/slYvqrtCB4JcHKF0O7BT88ZfM
t2hRhlT+32KR9cOciLfFMzlZI1uKQYF8Z+CvvBA5TJ9Hv8JsIwF2E/NjYUy2ilca
iDzF5KdZ/OLQzjwWVWDq9OlvepB2rLGQKNnw67jd1BSzd9Jj3eVuaI/9xRBrLDYR
cBOMoIaZMy7Va+pop1zoFEhC7IbTglVHzsj2ch+4F1NB/1+Dd0zBQKbDUPqp5TR/
OOuonTTVk9yH6RgpUULKlbRZ4oU70UoOBFBxCqnvng0cw1KBbbA=
=Q6c+
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 6.8
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
base granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with
a prefix branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV
support to that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Merge in arm64 fixes queued for 6.7 so that kpti_install_ng_mappings()
can be updated to use arm64_kernel_unmapped_at_el0() instead of checking
the ARM64_UNMAP_KERNEL_AT_EL0 CPU capability directly.
* for-next/fixes:
arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
perf/arm-cmn: Fail DTC counter allocation correctly
arm64: Avoid enabling KPTI unnecessarily
* for-next/lpa2-prep:
arm64: mm: get rid of kimage_vaddr global variable
arm64: mm: Take potential load offset into account when KASLR is off
arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
arm64: Add ARM64_HAS_LPA2 CPU capability
arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
arm64/mm: Update tlb invalidation routines for FEAT_LPA2
arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs
arm64/mm: Modify range-based tlbi to decrement scale
* kvm-arm64/nv-6.8-prefix:
: .
: Nested Virtualization support update, focussing on the
: NV2 support (VNCR mapping and such).
: .
KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
KVM: arm64: nv: Map VNCR-capable registers to a separate page
KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers
KVM: arm64: Introduce a bad_trap() primitive for unexpected trap handling
KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR
KVM: arm64: nv: Compute NV view of idregs as a one-off
KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt()
arm64: cpufeatures: Restrict NV support to FEAT_NV2
Signed-off-by: Marc Zyngier <maz@kernel.org>
To anyone who has played with FEAT_NV, it is obvious that the level
of performance is rather low due to the trap amplification that it
imposes on the host hypervisor. FEAT_NV2 solves a number of the
problems that FEAT_NV had.
It also turns out that all the existing hardware that has FEAT_NV
also has FEAT_NV2. Finally, it is now allowed by the architecture
to build FEAT_NV2 *only* (as denoted by ID_AA64MMFR4_EL1.NV_frac),
which effectively seals the fate of FEAT_NV.
Restrict the NV support to NV2, and be done with it. Nobody will
cry over the old crap. NV_frac will eventually be supported once
the intrastructure is ready.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Currently the detection+enablement of boot cpucaps is separate from the
patching of boot cpucap alternatives, which means there's a period where
cpus_have_cap($CAP) and alternative_has_cap($CAP) may be mismatched.
It would be preferable to manage the boot cpucaps in the same way as the
system cpucaps, both for clarity and to minimize the risk of accidental
usage of code relying upon an alternative which has not yet been
patched.
This patch aligns the handling of boot cpucaps with the handling of
system cpucaps:
* The existing setup_boot_cpu_capabilities() function is moved to be
closer to the setup_system_capabilities() and setup_system_features()
functions so that they're more clearly related and more likely to be
updated together in future.
* The patching of boot cpucap alternatives is moved into
setup_boot_cpu_capabilities(), immediately after boot cpucaps are
detected and enabled.
* A new setup_boot_cpu_features() function is added to mirror
setup_system_features(); this handles initialization of cpucap data
structures and calls setup_boot_cpu_capabilities(). This makes
init_cpu_features() a closer mirror to update_cpu_features(), and
makes smp_prepare_boot_cpu() a closer mirror to smp_cpus_done().
Importantly, while these changes alter the structure of the code, they
retain the existing order of calls to:
init_cpu_features(); // prefix initializing feature regs
init_cpucap_indirect_list();
detect_system_supports_pseudo_nmi();
update_cpu_capabilities(SCOPE_BOOT_CPU | SCOPE_LOCAL_CPU);
enable_cpu_capabilities(SCOPE_BOOT_CPU);
apply_boot_alternatives();
... and hence there should be no functional change as a result of this
patch; this is purely a structural cleanup.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20231212170910.3745497-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Recent changes to remove cpus_have_const_cap() introduced new users of
cpus_have_cap() in the period between detecting system cpucaps and
patching alternatives. It would be preferable to defer these until after
the relevant cpucaps have been patched so that these can use the usual
feature check helper functions, which is clearer and has less risk of
accidental usage of code relying upon an alternative which has not yet
been patched.
This patch reworks the system-wide cpucap detection and patching to
minimize this transient period:
* The detection, enablement, and patching of system cpucaps is moved
into a new setup_system_capabilities() function so that these can be
grouped together more clearly, with no other functions called in the
period between detection and patching. This is called from
setup_system_features() before the subsequent checks that depend on
the cpucaps.
The logging of TTBR0 PAN and cpucaps with a mask is also moved here to
keep these as close as possible to update_cpu_capabilities().
At the same time, comments are corrected and improved to make the
intent clearer.
* As hyp_mode_check() only tests system register values (not hwcaps) and
must be called prior to patching, the call to hyp_mode_check() is
moved before the call to setup_system_features().
* In setup_system_features(), the use of system_uses_ttbr0_pan() is
restored, now that this occurs after alternatives are patched. This is
a partial revert of commit:
53d62e995d ("arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN")
* In sve_setup() and sme_setup(), the use of system_supports_sve() and
system_supports_sme() respectively are restored, now that these occur
after alternatives are patched. This is a partial revert of commit:
a76521d160 ("arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20231212170910.3745497-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Back in 2016, it was argued that implementations lacking a HW
prefetcher could be helped by sprinkling a number of PRFM
instructions in strategic locations.
In 2023, the one platform that presumably needed this hack is no
longer in active use (let alone maintained), and an quick
experiment shows dropping this hack only leads to a 0.4% drop
on a full kernel compilation (tested on a MT30-GS0 48 CPU system).
Given that this is pretty much in the noise department and that
it may give odd ideas to other implementers, drop the hack for
good.
Suggested-by: Will Deacon <will@kernel.org>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20231122133754.1240687-1-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Commit 42c5a3b04b refactored the KPTI init code in a way that results
in the use of non-global kernel mappings even on systems that have no
need for it, and even when KPTI has been disabled explicitly via the
command line.
Ensure that this only happens when we have decided (based on the
detected system-wide CPU features) that KPTI should be enabled.
Fixes: 42c5a3b04b ("arm64: Split kpti_install_ng_mappings()")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231127120049.2258650-6-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Expose FEAT_LPA2 as a capability so that we can take advantage of
alternatives patching in the hypervisor.
Although FEAT_LPA2 presence is advertised separately for stage1 and
stage2, the expectation is that in practice both stages will either
support or not support it. Therefore, we combine both into a single
capability, allowing us to simplify the implementation. KVM requires
support in both stages in order to use LPA2 since the same library is
used for hyp stage 1 and guest stage 2 pgtables.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-6-ryan.roberts@arm.com
In commit 44bd78dd2b ("irqchip/gic-v3: Disable pseudo NMIs on
MediaTek devices w/ firmware issues") we added a method for detecting
MediaTek devices with broken firmware and disabled pseudo-NMI. While
that worked, it didn't address the problem at a deep enough level.
The fundamental issue with this broken firmware is that it's not
saving and restoring several important GICR registers. The current
list is believed to be:
* GICR_NUM_IPRIORITYR
* GICR_CTLR
* GICR_ISPENDR0
* GICR_ISACTIVER0
* GICR_NSACR
Pseudo-NMI didn't work because it was the only thing (currently) in
the kernel that relied on the broken registers, so forcing pseudo-NMI
off was an effective fix. However, it could be observed that calling
system_uses_irq_prio_masking() on these systems still returned
"true". That caused confusion and led to the need for
commit a07a594152 ("arm64: smp: avoid NMI IPIs with broken MediaTek
FW"). It's worried that the incorrect value returned by
system_uses_irq_prio_masking() on these systems will continue to
confuse future developers.
Let's fix the issue a little more completely by disabling IRQ
priorities at a deeper level in the kernel. Once we do this we can
revert some of the other bits of code dealing with this quirk.
This includes a partial revert of commit 44bd78dd2b
("irqchip/gic-v3: Disable pseudo NMIs on MediaTek devices w/ firmware
issues"). This isn't a full revert because it leaves some of the
changes to the "quirks" structure around in case future code needs it.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231107072651.v2.1.Ide945748593cffd8ff0feb9ae22b795935b944d6@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64/for-next/perf:
perf: hisi: Fix use-after-free when register pmu fails
drivers/perf: hisi_pcie: Initialize event->cpu only on success
drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
perf/arm-cmn: Enable per-DTC counter allocation
perf/arm-cmn: Rework DTC counters (again)
perf/arm-cmn: Fix DTC domain detection
drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
drivers/perf: xgene: Use device_get_match_data()
perf/amlogic: add missing MODULE_DEVICE_TABLE
docs/perf: Add ampere_cspmu to toctree to fix a build warning
perf: arm_cspmu: ampere_cspmu: Add support for Ampere SoC PMU
perf: arm_cspmu: Support implementation specific validation
perf: arm_cspmu: Support implementation specific filters
perf: arm_cspmu: Split 64-bit write to 32-bit writes
perf: arm_cspmu: Separate Arm and vendor module
* for-next/sve-remove-pseudo-regs:
: arm64/fpsimd: Remove the vector length pseudo registers
arm64/sve: Remove SMCR pseudo register from cpufeature code
arm64/sve: Remove ZCR pseudo register from cpufeature code
* for-next/backtrace-ipi:
: Add IPI for backtraces/kgdb, use NMI
arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup
arm64: smp: avoid NMI IPIs with broken MediaTek FW
arm64: smp: Mark IPI globals as __ro_after_init
arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup
arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI
arm64: smp: Add arch support for backtrace using pseudo-NMI
arm64: smp: Remove dedicated wakeup IPI
arm64: idle: Tag the arm64 idle functions as __cpuidle
irqchip/gic-v3: Enable support for SGIs to act as NMIs
* for-next/kselftest:
: Various arm64 kselftest updates
kselftest/arm64: Validate SVCR in streaming SVE stress test
* for-next/misc:
: Miscellaneous patches
arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
arm64: Remove system_uses_lse_atomics()
arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
arm64/mm: Hoist synchronization out of set_ptes() loop
arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed
* for-next/cpufeat-display-cores:
: arm64 cpufeature display enabled cores
arm64: cpufeature: Change DBM to display enabled cores
arm64: cpufeature: Display the set of cores with a feature
Now that we have the ability to display the list of cores
with a feature when its selectivly enabled, lets convert
DBM to use that as well.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Link: https://lore.kernel.org/r/20231017052322.1211099-3-jeremy.linton@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The AMU feature can be enabled on a subset of the cores in a system.
Because of that, it prints a message for each core as it is detected.
This becomes tedious when there are hundreds of cores. Instead, for
CPU features which can be enabled on a subset of the present cores,
lets wait until update_cpu_capabilities() and print the subset of cores
the feature was enabled on.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com>
Tested-by: Punit Agrawal <punit.agrawal@bytedance.com>
Link: https://lore.kernel.org/r/20231017052322.1211099-2-jeremy.linton@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There are no longer any users of cpus_have_const_cap(), and therefore it
can be removed.
Remove cpus_have_const_cap(). At the same time, remove
__cpus_have_const_cap(), as this is a trivial wrapper of
alternative_has_cap_unlikely(), which can be used directly instead.
The comment for __system_matches_cap() is updated to no longer refer to
cpus_have_const_cap(). As we have a number of ways to check the cpucaps,
the specific suggestions are removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In has_useable_cnp() we use cpus_have_const_cap() to check for
ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but this is not necessary and
cpus_have_cap() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
We use has_useable_cnp() to determine whether we have the system-wide
ARM64_HAS_CNP cpucap. Due to the structure of the cpufeature code, we
call has_useable_cnp() in two distinct cases:
1) When finalizing system capabilities, setup_system_capabilities() will
call has_useable_cnp() with SCOPE_SYSTEM to determine whether all
CPUs have the feature. This is called after we've detected any local
cpucaps including ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but prior to
patching alternatives.
If the ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not
detect ARM64_HAS_CNP.
2) After finalizing system capabilties, verify_local_cpu_capabilities()
will call has_useable_cnp() with SCOPE_LOCAL_CPU to verify that CPUs
have CNP if we previously detected it.
Note that if ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will
not have detected ARM64_HAS_CNP.
For case 1 we must check the system_cpucaps bitmap as this occurs prior
to patching the alternatives. For case 2 we'll only call
has_useable_cnp() once per subsequent onlining of a CPU, and as this
isn't a fast path it's not necessary to optimize for this case.
This patch replaces the use of cpus_have_const_cap() with
cpus_have_cap(), which will only generate the bitmap test and avoid
generating an alternative sequence, resulting in slightly simpler annd
smaller code being generated. The ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
cpucap is added to cpucap_is_possible() so that code can be elided
entirely when this is not possible.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In elf_hwcap_fixup() we use cpus_have_const_cap() to check for
ARM64_WORKAROUND_1742098, but this is not necessary and cpus_have_cap()
would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The ARM64_WORKAROUND_1742098 cpucap is detected and patched before
elf_hwcap_fixup() can run, and hence it is not necessary to use
cpus_have_const_cap(). We run cpus_have_const_cap() at most twice: once
after finalizing system cpucaps, and potentially once more after
detecting mismatched CPUs which support AArch32 at EL0. Due to this,
it's not necessary to optimize for many calls to elf_hwcap_fixup(), and
it's fine to use cpus_have_cap().
This patch replaces the use of cpus_have_const_cap() with
cpus_have_cap(), which will only generate the bitmap test and avoid
generating an alternative sequence, resulting in slightly simpler annd
smaller code being generated. For consistenct with other cpucaps, the
ARM64_WORKAROUND_1742098 cpucap is added to cpucap_is_possible() so that
code can be elided when this is not possible. However, as we only define
compat_elf_hwcap2 when CONFIG_COMPAT=y, some ifdeffery is still required
within user_feature_fixup() to avoid build errors when CONFIG_COMPAT=n.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In system_uses_hw_pan() we use cpus_have_const_cap() to check for
ARM64_HAS_PAN, but this is only necessary so that the
system_uses_ttbr0_pan() check in setup_cpu_features() can run prior to
alternatives being patched, and otherwise this is not necessary and
alternative_has_cap_*() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The ARM64_HAS_PAN cpucap is used by system_uses_hw_pan() and
system_uses_ttbr0_pan() depending on whether CONFIG_ARM64_SW_TTBR0_PAN
is selected, and:
* We only use system_uses_hw_pan() directly in __sdei_handler(), which
isn't reachable until after alternatives have been patched, and for
this it is safe to use alternative_has_cap_*().
* We use system_uses_ttbr0_pan() in a few places:
- In check_and_switch_context() and cpu_uninstall_idmap(), which will
defer installing a translation table into TTBR0 when the
ARM64_HAS_PAN cpucap is not detected.
Prior to patching alternatives, all CPUs will be using init_mm with
the reserved ttbr0 translation tables install in TTBR0, so these can
safely use alternative_has_cap_*().
- In update_saved_ttbr0(), which will only save the active TTBR0 into
a per-thread variable when the ARM64_HAS_PAN cpucap is not detected.
Prior to patching alternatives, all CPUs will be using init_mm with
the reserved ttbr0 translation tables install in TTBR0, so these can
safely use alternative_has_cap_*().
- In efi_set_pgd(), which will handle check_and_switch_context()
deferring the installation of TTBR0 when TTBR0 PAN is detected.
The EFI runtime services are not initialized until after
alternatives have been patched, and so this can safely use
alternative_has_cap_*() or cpus_have_final_cap().
- In uaccess_ttbr0_disable() and uaccess_ttbr0_enable(), where we'll
avoid installing/uninstalling a translation table in TTBR0 when
ARM64_HAS_PAN is detected.
Prior to patching alternatives we will not perform any uaccess and
will not call uaccess_ttbr0_disable() or uaccess_ttbr0_enable(), and
so these can safely use alternative_has_cap_*() or
cpus_have_final_cap().
- In is_el1_permission_fault() where we will consider a translation
fault on a TTBR0 address to be a permission fault when ARM64_HAS_PAN
is not detected *and* we have set the PAN bit in the SPSR (which
tells us that in the interrupted context, TTBR0 pointed at the
reserved zero ttbr).
In the window between detecting system cpucaps and patching
alternatives we should not perform any accesses to TTBR0 addresses,
and no userspace translation tables exist until after patching
alternatives. Thus it is safe for this to use alternative_has_cap*().
This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.
So that the check for TTBR0 PAN in setup_cpu_features() can run prior to
alternatives being patched, the call to system_uses_ttbr0_pan() is
replaced with an explicit check of the ARM64_HAS_PAN bit in the
system_cpucaps bitmap.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In system_supports_cnp() we use cpus_have_const_cap() to check for
ARM64_HAS_CNP, but this is only necessary so that the cpu_enable_cnp()
callback can run prior to alternatives being patched, and otherwise this
is not necessary and alternative_has_cap_*() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The cpu_enable_cnp() callback is run immediately after the ARM64_HAS_CNP
cpucap is detected system-wide under setup_system_capabilities(), prior
to alternatives being patched. During this window cpu_enable_cnp() uses
cpu_replace_ttbr1() to set the CNP bit for the swapper_pg_dir in TTBR1.
No other users of the ARM64_HAS_CNP cpucap need the up-to-date value
during this window:
* As KVM isn't initialized yet, kvm_get_vttbr() isn't reachable.
* As cpuidle isn't initialized yet, __cpu_suspend_exit() isn't
reachable.
* At this point all CPUs are using the swapper_pg_dir with a reserved
ASID in TTBR1, and the idmap_pg_dir in TTBR0, so neither
check_and_switch_context() nor cpu_do_switch_mm() need to do anything
special.
This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. To allow cpu_enable_cnp() to function prior to alternatives
being patched, cpu_replace_ttbr1() is split into cpu_replace_ttbr1() and
cpu_enable_swapper_cnp(), with the former only used for early TTBR1
replacement, and the latter used by both cpu_enable_cnp() and
__cpu_suspend_exit().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently we have a negative cpucap which describes the *absence* of
FP/SIMD rather than *presence* of FP/SIMD. This largely works, but is
somewhat awkward relative to other cpucaps that describe the presence of
a feature, and it would be nicer to have a cpucap which describes the
presence of FP/SIMD:
* This will allow the cpucap to be treated as a standard
ARM64_CPUCAP_SYSTEM_FEATURE, which can be detected with the standard
has_cpuid_feature() function and ARM64_CPUID_FIELDS() description.
* This ensures that the cpucap will only transition from not-present to
present, reducing the risk of unintentional and/or unsafe usage of
FP/SIMD before cpucaps are finalized.
* This will allow using arm64_cpu_capabilities::cpu_enable() to enable
the use of FP/SIMD later, with FP/SIMD being disabled at boot time
otherwise. This will ensure that any unintentional and/or unsafe usage
of FP/SIMD prior to this is trapped, and will ensure that FP/SIMD is
never unintentionally enabled for userspace in mismatched big.LITTLE
systems.
This patch replaces the negative ARM64_HAS_NO_FPSIMD cpucap with a
positive ARM64_HAS_FPSIMD cpucap, making changes as described above.
Note that as FP/SIMD will now be trapped when not supported system-wide,
do_fpsimd_acc() must handle these traps in the same way as for SVE and
SME. The commentary in fpsimd_restore_current_state() is updated to
describe the new scheme.
No users of system_supports_fpsimd() need to know that FP/SIMD is
available prior to alternatives being patched, so this is updated to
use alternative_has_cap_likely() to check for the ARM64_HAS_FPSIMD
cpucap, without generating code to test the system_cpucaps bitmap.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The arm64_cpu_capabilities::cpu_enable() callbacks for SVE, SME, SME2,
and FA64 are named with an unusual "${feature}_kernel_enable" pattern
rather than the much more common "cpu_enable_${feature}". Now that we
only use these as cpu_enable() callbacks, it would be nice to have them
match the usual scheme.
This patch renames the cpu_enable() callbacks to match this scheme. At
the same time, the comment above cpu_enable_sve() is removed for
consistency with the other cpu_enable() callbacks.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When a CPUs onlined we first probe for supported features and
propetites, and then we subsequently enable features that have been
detected. This is a little problematic for SVE and SME, as some
properties (e.g. vector lengths) cannot be probed while they are
disabled. Due to this, the code probing for SVE properties has to enable
SVE for EL1 prior to proving, and the code probing for SME properties
has to enable SME for EL1 prior to probing. We never disable SVE or SME
for EL1 after probing.
It would be a little nicer to transiently enable SVE and SME during
probing, leaving them both disabled unless explicitly enabled, as this
would make it much easier to catch unintentional usage (e.g. when they
are not present system-wide).
This patch reworks the SVE and SME feature probing code to only
transiently enable support at EL1, disabling after probing is complete.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The arm64_cpu_capabilities::cpu_enable callbacks are intended for
cpu-local feature enablement (e.g. poking system registers). These get
called for each online CPU when boot/system cpucaps get finalized and
enabled, and get called whenever a CPU is subsequently onlined.
For KPTI with the ARM64_UNMAP_KERNEL_AT_EL0 cpucap, we use the
kpti_install_ng_mappings() function as the cpu_enable callback. This
does a mixture of cpu-local configuration (setting VBAR_EL1 to the
appropriate trampoline vectors) and some global configuration (rewriting
the swapper page tables to sue non-glboal mappings) that must happen at
most once.
This patch splits kpti_install_ng_mappings() into a cpu-local
cpu_enable_kpti() initialization function and a system-wide
kpti_install_ng_mappings() function. The cpu_enable_kpti() function is
responsible for selecting the necessary cpu-local vectors each time a
CPU is onlined, and the kpti_install_ng_mappings() function performs the
one-time rewrite of the translation tables too use non-global mappings.
Splitting the two makes the code a bit easier to follow and also allows
the page table rewriting code to be marked as __init such that it can be
freed after use.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For ARM64_WORKAROUND_2658417, we use a cpu_enable() callback to hide the
ID_AA64ISAR1_EL1.BF16 ID register field. This is a little awkward as
CPUs may attempt to apply the workaround concurrently, requiring that we
protect the bulk of the callback with a raw_spinlock, and requiring some
pointless work every time a CPU is subsequently hotplugged in.
This patch makes this a little simpler by handling the masking once at
boot time. A new user_feature_fixup() function is called at the start of
setup_user_features() to mask the feature, matching the style of
elf_hwcap_fixup(). The ARM64_WORKAROUND_2658417 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is
not possible.
Note that the ARM64_WORKAROUND_2658417 capability is matched with
ERRATA_MIDR_RANGE(), which implicitly gives the capability a
ARM64_CPUCAP_LOCAL_CPU_ERRATUM type, which forbids the late onlining of
a CPU with the erratum if the erratum was not present at boot time.
Therefore this patch doesn't change the behaviour for late onlining.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently setup_cpu_features() handles a mixture of one-time kernel
feature setup (e.g. cpucaps) and one-time user feature setup (e.g. ELF
hwcaps). Subsequent patches will rework other one-time setup and expand
the logic currently in setup_cpu_features(), and in preparation for this
it would be helpful to split the kernel and user setup into separate
functions.
This patch splits setup_user_features() out of setup_cpu_features(),
with a few additional cleanups of note:
* setup_cpu_features() is renamed to setup_system_features() to make it
clear that it handles system-wide feature setup rather than cpu-local
feature setup.
* setup_system_capabilities() is folded into setup_system_features().
* Presence of TTBR0 pan is logged immediately after
update_cpu_capabilities(), so that this is guaranteed to appear
alongside all the other detected system cpucaps.
* The 'cwg' variable is removed as its value is only consumed once and
it's simpler to use cache_type_cwg() directly without assigning its
return value to a variable.
* The call to setup_user_features() is moved after alternatives are
patched, which will allow user feature setup code to depend on
alternative branches and allow for simplifications in subsequent
patches.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
FEAT_LRCPC3 adds more instructions to support the Release Consistency model.
Add a HWCAP so that userspace can make decisions about instructions it can use.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230919162757.2707023-2-joey.gouly@arm.com
[catalin.marinas@arm.com: change the HWCAP number]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
SVE 2.1 introduced a new feature FEAT_SVE_B16B16 which adds instructions
supporting the BFloat16 floating point format. Report this to userspace
through the ID registers and hwcap.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230915-arm64-zfr-b16b16-el0-v1-1-f9aba807bdb5@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For reasons that are not currently apparent during cpufeature enumeration
we maintain a pseudo register for SMCR which records the maximum supported
vector length using the value that would be written to SMCR_EL1.LEN to
configure it. This is not exposed to userspace and is not sufficient for
detecting unsupportable configurations, we need the more detailed checks in
vec_update_vq_map() for that since we can't cope with missing vector
lengths on late CPUs and KVM requires an exactly matching set of supported
vector lengths as EL1 can enumerate VLs directly with the hardware.
Remove the code, replacing the usage in sme_setup() with a query of the
vq_map.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230913-arm64-vec-len-cpufeature-v1-2-cc69b0600a8a@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For reasons that are not currently apparent during cpufeature enumeration
we maintain a pseudo register for ZCR which records the maximum supported
vector length using the value that would be written to ZCR_EL1.LEN to
configure it. This is not exposed to userspace and is not sufficient for
detecting unsupportable configurations, we need the more detailed checks in
vec_update_vq_map() for that since we can't cope with missing vector
lengths on late CPUs and KVM requires an exactly matching set of supported
vector lengths as EL1 can enumerate VLs directly with the hardware.
Remove the code, replacing the usage in sve_setup() with a query of the
vq_map.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230913-arm64-vec-len-cpufeature-v1-1-cc69b0600a8a@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ClearBHB support is indicated by the CLRBHB field in ID_AA64ISAR2_EL1.
Following some refactoring the kernel incorrectly checks the BC field
instead. Fix the detection to use the right field.
(Note: The original ClearBHB support had it as FTR_HIGHER_SAFE, but this
patch uses FTR_LOWER_SAFE, which seems more correct.)
Also fix the detection of BC (hinted conditional branches) to use
FTR_LOWER_SAFE, so that it is not reported on mismatched systems.
Fixes: 356137e68a ("arm64/sysreg: Make BHB clear feature defines match the architecture")
Fixes: 8fcc8285c0 ("arm64/sysreg: Convert ID_AA64ISAR2_EL1 to automatic generation")
Cc: stable@vger.kernel.org
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230912133429.2606875-1-kristina.martsenko@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
* Clean up vCPU targets, always returning generic v8 as the preferred target
* Trap forwarding infrastructure for nested virtualization (used for traps
that are taken from an L2 guest and are needed by the L1 hypervisor)
* FEAT_TLBIRANGE support to only invalidate specific ranges of addresses
when collapsing a table PTE to a block PTE. This avoids that the guest
refills the TLBs again for addresses that aren't covered by the table PTE.
* Fix vPMU issues related to handling of PMUver.
* Don't unnecessary align non-stack allocations in the EL2 VA space
* Drop HCR_VIRT_EXCP_MASK, which was never used...
* Don't use smp_processor_id() in kvm_arch_vcpu_load(),
but the cpu parameter instead
* Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
* Remove prototypes without implementations
RISC-V:
* Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
* Added ONE_REG interface for SATP mode
* Added ONE_REG interface to enable/disable multiple ISA extensions
* Improved error codes returned by ONE_REG interfaces
* Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
* Added get-reg-list selftest for KVM RISC-V
s390:
* PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
Allows a PV guest to use crypto cards. Card access is governed by
the firmware and once a crypto queue is "bound" to a PV VM every
other entity (PV or not) looses access until it is not bound
anymore. Enablement is done via flags when creating the PV VM.
* Guest debug fixes (Ilya)
x86:
* Clean up KVM's handling of Intel architectural events
* Intel bugfixes
* Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug
registers and generate/handle #DBs
* Clean up LBR virtualization code
* Fix a bug where KVM fails to set the target pCPU during an IRTE update
* Fix fatal bugs in SEV-ES intrahost migration
* Fix a bug where the recent (architecturally correct) change to reinject
#BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
* Retry APIC map recalculation if a vCPU is added/enabled
* Overhaul emergency reboot code to bring SVM up to par with VMX, tie the
"emergency disabling" behavior to KVM actually being loaded, and move all of
the logic within KVM
* Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC
ratio MSR cannot diverge from the default when TSC scaling is disabled
up related code
* Add a framework to allow "caching" feature flags so that KVM can check if
the guest can use a feature without needing to search guest CPUID
* Rip out the ancient MMU_DEBUG crud and replace the useful bits with
CONFIG_KVM_PROVE_MMU
* Fix KVM's handling of !visible guest roots to avoid premature triple fault
injection
* Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface
that is needed by external users (currently only KVMGT), and fix a variety
of issues in the process
This last item had a silly one-character bug in the topic branch that
was sent to me. Because it caused pretty bad selftest failures in
some configurations, I decided to squash in the fix. So, while the
exact commit ids haven't been in linux-next, the code has (from the
kvm-x86 tree).
Generic:
* Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass
action specific data without needing to constantly update the main handlers.
* Drop unused function declarations
Selftests:
* Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
* Add support for printf() in guest code and covert all guest asserts to use
printf-based reporting
* Clean up the PMU event filter test and add new testcases
* Include x86 selftests in the KVM x86 MAINTAINERS entry
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT1m0kUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMNgggAiN7nz6UC423FznuI+yO3TLm8tkx1
CpKh5onqQogVtchH+vrngi97cfOzZb1/AtifY90OWQi31KEWhehkeofcx7G6ERhj
5a9NFADY1xGBsX4exca/VHDxhnzsbDWaWYPXw5vWFWI6erft9Mvy3tp1LwTvOzqM
v8X4aWz+g5bmo/DWJf4Wu32tEU6mnxzkrjKU14JmyqQTBawVmJ3RYvHVJ/Agpw+n
hRtPAy7FU6XTdkmq/uCT+KUHuJEIK0E/l1js47HFAqSzwdW70UDg14GGo1o4ETxu
RjZQmVNvL57yVgi6QU38/A0FWIsWQm5IlaX1Ug6x8pjZPnUKNbo9BY4T1g==
=W+4p
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Clean up vCPU targets, always returning generic v8 as the preferred
target
- Trap forwarding infrastructure for nested virtualization (used for
traps that are taken from an L2 guest and are needed by the L1
hypervisor)
- FEAT_TLBIRANGE support to only invalidate specific ranges of
addresses when collapsing a table PTE to a block PTE. This avoids
that the guest refills the TLBs again for addresses that aren't
covered by the table PTE.
- Fix vPMU issues related to handling of PMUver.
- Don't unnecessary align non-stack allocations in the EL2 VA space
- Drop HCR_VIRT_EXCP_MASK, which was never used...
- Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu
parameter instead
- Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
- Remove prototypes without implementations
RISC-V:
- Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
- Added ONE_REG interface for SATP mode
- Added ONE_REG interface to enable/disable multiple ISA extensions
- Improved error codes returned by ONE_REG interfaces
- Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
- Added get-reg-list selftest for KVM RISC-V
s390:
- PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
Allows a PV guest to use crypto cards. Card access is governed by
the firmware and once a crypto queue is "bound" to a PV VM every
other entity (PV or not) looses access until it is not bound
anymore. Enablement is done via flags when creating the PV VM.
- Guest debug fixes (Ilya)
x86:
- Clean up KVM's handling of Intel architectural events
- Intel bugfixes
- Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use
debug registers and generate/handle #DBs
- Clean up LBR virtualization code
- Fix a bug where KVM fails to set the target pCPU during an IRTE
update
- Fix fatal bugs in SEV-ES intrahost migration
- Fix a bug where the recent (architecturally correct) change to
reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to
skip it)
- Retry APIC map recalculation if a vCPU is added/enabled
- Overhaul emergency reboot code to bring SVM up to par with VMX, tie
the "emergency disabling" behavior to KVM actually being loaded,
and move all of the logic within KVM
- Fix user triggerable WARNs in SVM where KVM incorrectly assumes the
TSC ratio MSR cannot diverge from the default when TSC scaling is
disabled up related code
- Add a framework to allow "caching" feature flags so that KVM can
check if the guest can use a feature without needing to search
guest CPUID
- Rip out the ancient MMU_DEBUG crud and replace the useful bits with
CONFIG_KVM_PROVE_MMU
- Fix KVM's handling of !visible guest roots to avoid premature
triple fault injection
- Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the
API surface that is needed by external users (currently only
KVMGT), and fix a variety of issues in the process
Generic:
- Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier
events to pass action specific data without needing to constantly
update the main handlers.
- Drop unused function declarations
Selftests:
- Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
- Add support for printf() in guest code and covert all guest asserts
to use printf-based reporting
- Clean up the PMU event filter test and add new testcases
- Include x86 selftests in the KVM x86 MAINTAINERS entry"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits)
KVM: x86/mmu: Include mmu.h in spte.h
KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
KVM: x86/mmu: Disallow guest from using !visible slots for page tables
KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
KVM: x86/mmu: Harden new PGD against roots without shadow pages
KVM: x86/mmu: Add helper to convert root hpa to shadow page
drm/i915/gvt: Drop final dependencies on KVM internal details
KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
KVM: x86/mmu: Assert that correct locks are held for page write-tracking
KVM: x86/mmu: Rename page-track APIs to reflect the new reality
KVM: x86/mmu: Drop infrastructure for multiple page-track modes
KVM: x86/mmu: Use page-track notifiers iff there are external users
KVM: x86/mmu: Move KVM-only page-track declarations to internal header
KVM: x86: Remove the unused page-track hook track_flush_slot()
drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
KVM: x86: Add a new page-track hook to handle memslot deletion
drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
KVM: x86: Reject memslot MOVE operations if KVMGT is attached
...
In order to allow us to have shared code for managing fine grained traps
for KVM guests add it as a detected feature rather than relying on it
being a dependency of other features.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[maz: converted to ARM64_CPUID_FIELDS()]
Link: https://lore.kernel.org/r/20230301-kvm-arm64-fgt-v4-1-1bf8d235ac1f@kernel.org
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230815183903.2735724-10-maz@kernel.org
Add a HWCAP for FEAT_HBC, so that userspace can make a decision on using
this feature.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230804143746.3900803-2-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
The recently added Enhanced Virtualization Traps cpufeature does not use
the ARM64_CPUID_FIELDS() helper, convert it to do so. No functional
change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev>
Link: https://lore.kernel.org/r/20230718-arm64-evt-cpuid-helper-v1-1-68375d1e6b92@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
* Eager page splitting optimization for dirty logging, optionally
allowing for a VM to avoid the cost of hugepage splitting in the stage-2
fault path.
* Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with
services that live in the Secure world. pKVM intervenes on FF-A calls
to guarantee the host doesn't misuse memory donated to the hyp or a
pKVM guest.
* Support for running the split hypervisor with VHE enabled, known as
'hVHE' mode. This is extremely useful for testing the split
hypervisor on VHE-only systems, and paves the way for new use cases
that depend on having two TTBRs available at EL2.
* Generalized framework for configurable ID registers from userspace.
KVM/arm64 currently prevents arbitrary CPU feature set configuration
from userspace, but the intent is to relax this limitation and allow
userspace to select a feature set consistent with the CPU.
* Enable the use of Branch Target Identification (FEAT_BTI) in the
hypervisor.
* Use a separate set of pointer authentication keys for the hypervisor
when running in protected mode, as the host is untrusted at runtime.
* Ensure timer IRQs are consistently released in the init failure
paths.
* Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps
(FEAT_EVT), as it is a register commonly read from userspace.
* Erratum workaround for the upcoming AmpereOne part, which has broken
hardware A/D state management.
RISC-V:
* Redirect AMO load/store misaligned traps to KVM guest
* Trap-n-emulate AIA in-kernel irqchip for KVM guest
* Svnapot support for KVM Guest
s390:
* New uvdevice secret API
* CMM selftest and fixes
* fix racy access to target CPU for diag 9c
x86:
* Fix missing/incorrect #GP checks on ENCLS
* Use standard mmu_notifier hooks for handling APIC access page
* Drop now unnecessary TR/TSS load after VM-Exit on AMD
* Print more descriptive information about the status of SEV and SEV-ES during
module load
* Add a test for splitting and reconstituting hugepages during and after
dirty logging
* Add support for CPU pinning in demand paging test
* Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
included along the way
* Add a "nx_huge_pages=never" option to effectively avoid creating NX hugepage
recovery threads (because nx_huge_pages=off can be toggled at runtime)
* Move handling of PAT out of MTRR code and dedup SVM+VMX code
* Fix output of PIC poll command emulation when there's an interrupt
* Add a maintainer's handbook to document KVM x86 processes, preferred coding
style, testing expectations, etc.
* Misc cleanups, fixes and comments
Generic:
* Miscellaneous bugfixes and cleanups
Selftests:
* Generate dependency files so that partial rebuilds work as expected
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmSgHrIUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroORcAf+KkBlXwQMf+Q0Hy6Mfe0OtkKmh0Ae
6HJ6dsuMfOHhWv5kgukh+qvuGUGzHq+gpVKmZg2yP3h3cLHOLUAYMCDm+rjXyjsk
F4DbnJLfxq43Pe9PHRKFxxSecRcRYCNox0GD5UYL4PLKcH0FyfQrV+HVBK+GI8L3
FDzUcyJkR12Lcj1qf++7fsbzfOshL0AJPmidQCoc6wkLJpUEr/nYUqlI1Kx3YNuQ
LKmxFHS4l4/O/px3GKNDrLWDbrVlwciGIa3GZLS52PZdW3mAqT+cqcPcYK6SW71P
m1vE80VbNELX5q3YSRoOXtedoZ3Pk97LEmz/xQAsJ/jri0Z5Syk0Ok0m/Q==
=AMXp
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM64:
- Eager page splitting optimization for dirty logging, optionally
allowing for a VM to avoid the cost of hugepage splitting in the
stage-2 fault path.
- Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact
with services that live in the Secure world. pKVM intervenes on
FF-A calls to guarantee the host doesn't misuse memory donated to
the hyp or a pKVM guest.
- Support for running the split hypervisor with VHE enabled, known as
'hVHE' mode. This is extremely useful for testing the split
hypervisor on VHE-only systems, and paves the way for new use cases
that depend on having two TTBRs available at EL2.
- Generalized framework for configurable ID registers from userspace.
KVM/arm64 currently prevents arbitrary CPU feature set
configuration from userspace, but the intent is to relax this
limitation and allow userspace to select a feature set consistent
with the CPU.
- Enable the use of Branch Target Identification (FEAT_BTI) in the
hypervisor.
- Use a separate set of pointer authentication keys for the
hypervisor when running in protected mode, as the host is untrusted
at runtime.
- Ensure timer IRQs are consistently released in the init failure
paths.
- Avoid trapping CTR_EL0 on systems with Enhanced Virtualization
Traps (FEAT_EVT), as it is a register commonly read from userspace.
- Erratum workaround for the upcoming AmpereOne part, which has
broken hardware A/D state management.
RISC-V:
- Redirect AMO load/store misaligned traps to KVM guest
- Trap-n-emulate AIA in-kernel irqchip for KVM guest
- Svnapot support for KVM Guest
s390:
- New uvdevice secret API
- CMM selftest and fixes
- fix racy access to target CPU for diag 9c
x86:
- Fix missing/incorrect #GP checks on ENCLS
- Use standard mmu_notifier hooks for handling APIC access page
- Drop now unnecessary TR/TSS load after VM-Exit on AMD
- Print more descriptive information about the status of SEV and
SEV-ES during module load
- Add a test for splitting and reconstituting hugepages during and
after dirty logging
- Add support for CPU pinning in demand paging test
- Add support for AMD PerfMonV2, with a variety of cleanups and minor
fixes included along the way
- Add a "nx_huge_pages=never" option to effectively avoid creating NX
hugepage recovery threads (because nx_huge_pages=off can be toggled
at runtime)
- Move handling of PAT out of MTRR code and dedup SVM+VMX code
- Fix output of PIC poll command emulation when there's an interrupt
- Add a maintainer's handbook to document KVM x86 processes,
preferred coding style, testing expectations, etc.
- Misc cleanups, fixes and comments
Generic:
- Miscellaneous bugfixes and cleanups
Selftests:
- Generate dependency files so that partial rebuilds work as
expected"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits)
Documentation/process: Add a maintainer handbook for KVM x86
Documentation/process: Add a label for the tip tree handbook's coding style
KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index
RISC-V: KVM: Remove unneeded semicolon
RISC-V: KVM: Allow Svnapot extension for Guest/VM
riscv: kvm: define vcpu_sbi_ext_pmu in header
RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip
RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC
RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip
RISC-V: KVM: Add in-kernel emulation of AIA APLIC
RISC-V: KVM: Implement device interface for AIA irqchip
RISC-V: KVM: Skeletal in-kernel AIA irqchip support
RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero
RISC-V: KVM: Add APLIC related defines
RISC-V: KVM: Add IMSIC related defines
RISC-V: KVM: Implement guest external interrupt line management
KVM: x86: Remove PRIx* definitions as they are solely for user space
s390/uv: Update query for secret-UVCs
s390/uv: replace scnprintf with sysfs_emit
s390/uvdevice: Add 'Lock Secret Store' UVC
...
* arm64/for-next/perf:
docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst
docs: perf: Add new description for HiSilicon UC PMU
drivers/perf: hisi: Add support for HiSilicon UC PMU driver
drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver
perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE
perf/arm-cmn: Add sysfs identifier
perf/arm-cmn: Revamp model detection
perf/arm_dmc620: Add cpumask
dt-bindings: perf: fsl-imx-ddr: Add i.MX93 compatible
drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver
perf/arm_cspmu: Decouple APMT dependency
perf/arm_cspmu: Clean up ACPI dependency
ACPI/APMT: Don't register invalid resource
perf/arm_cspmu: Fix event attribute type
perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used
drivers/perf: hisi: Don't migrate perf to the CPU going to teardown
drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
perf/arm-cmn: Fix DTC reset
perf: qcom_l2_pmu: Make l2_cache_pmu_probe_cluster() more robust
perf/arm-cci: Slightly optimize cci_pmu_sync_counters()
* for-next/kpti:
: Simplify KPTI trampoline exit code
arm64: entry: Simplify tramp_alias macro and tramp_exit routine
arm64: entry: Preserve/restore X29 even for compat tasks
* for-next/missing-proto-warn:
: Address -Wmissing-prototype warnings
arm64: add alt_cb_patch_nops prototype
arm64: move early_brk64 prototype to header
arm64: signal: include asm/exception.h
arm64: kaslr: add kaslr_early_init() declaration
arm64: flush: include linux/libnvdimm.h
arm64: module-plts: inline linux/moduleloader.h
arm64: hide unused is_valid_bugaddr()
arm64: efi: add efi_handle_corrupted_x18 prototype
arm64: cpuidle: fix #ifdef for acpi functions
arm64: kvm: add prototypes for functions called in asm
arm64: spectre: provide prototypes for internal functions
arm64: move cpu_suspend_set_dbg_restorer() prototype to header
arm64: avoid prototype warnings for syscalls
arm64: add scs_patch_vmlinux prototype
arm64: xor-neon: mark xor_arm64_neon_*() static
* for-next/iss2-decode:
: Add decode of ISS2 to data abort reports
arm64/esr: Add decode of ISS2 to data abort reporting
arm64/esr: Use GENMASK() for the ISS mask
* for-next/kselftest:
: Various arm64 kselftest improvements
kselftest/arm64: Log signal code and address for unexpected signals
kselftest/arm64: Add a smoke test for ptracing hardware break/watch points
* for-next/misc:
: Miscellaneous patches
arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe
arm64: hibernate: remove WARN_ON in save_processor_state
arm64/fpsimd: Exit streaming mode when flushing tasks
arm64: mm: fix VA-range sanity check
arm64/mm: remove now-superfluous ISBs from TTBR writes
arm64: consolidate rox page protection logic
arm64: set __exception_irq_entry with __irq_entry as a default
arm64: syscall: unmask DAIF for tracing status
arm64: lockdep: enable checks for held locks when returning to userspace
arm64/cpucaps: increase string width to properly format cpucaps.h
arm64/cpufeature: Use helper for ECV CNTPOFF cpufeature
* for-next/feat_mops:
: Support for ARMv8.8 memcpy instructions in userspace
kselftest/arm64: add MOPS to hwcap test
arm64: mops: allow disabling MOPS from the kernel command line
arm64: mops: detect and enable FEAT_MOPS
arm64: mops: handle single stepping after MOPS exception
arm64: mops: handle MOPS exceptions
KVM: arm64: hide MOPS from guests
arm64: mops: don't disable host MOPS instructions from EL2
arm64: mops: document boot requirements for MOPS
KVM: arm64: switch HCRX_EL2 between host and guest
arm64: cpufeature: detect FEAT_HCX
KVM: arm64: initialize HCRX_EL2
* for-next/module-alloc:
: Make the arm64 module allocation code more robust (clean-up, VA range expansion)
arm64: module: rework module VA range selection
arm64: module: mandate MODULE_PLTS
arm64: module: move module randomization to module.c
arm64: kaslr: split kaslr/module initialization
arm64: kasan: remove !KASAN_VMALLOC remnants
arm64: module: remove old !KASAN_VMALLOC logic
* for-next/sysreg: (21 commits)
: More sysreg conversions to automatic generation
arm64/sysreg: Convert TRBIDR_EL1 register to automatic generation
arm64/sysreg: Convert TRBTRG_EL1 register to automatic generation
arm64/sysreg: Convert TRBMAR_EL1 register to automatic generation
arm64/sysreg: Convert TRBSR_EL1 register to automatic generation
arm64/sysreg: Convert TRBBASER_EL1 register to automatic generation
arm64/sysreg: Convert TRBPTR_EL1 register to automatic generation
arm64/sysreg: Convert TRBLIMITR_EL1 register to automatic generation
arm64/sysreg: Rename TRBIDR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBTRG_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBMAR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBSR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBBASER_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBPTR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBLIMITR_EL1 fields per auto-gen tools format
arm64/sysreg: Convert OSECCR_EL1 to automatic generation
arm64/sysreg: Convert OSDTRTX_EL1 to automatic generation
arm64/sysreg: Convert OSDTRRX_EL1 to automatic generation
arm64/sysreg: Convert OSLAR_EL1 to automatic generation
arm64/sysreg: Standardise naming of bitfield constants in OSL[AS]R_EL1
arm64/sysreg: Convert MDSCR_EL1 to automatic register generation
...
* for-next/cpucap:
: arm64 cpucap clean-up
arm64: cpufeature: fold cpus_set_cap() into update_cpu_capabilities()
arm64: cpufeature: use cpucap naming
arm64: alternatives: use cpucap naming
arm64: standardise cpucap bitmap names
* for-next/acpi:
: Various arm64-related ACPI patches
ACPI: bus: Consolidate all arm specific initialisation into acpi_arm_init()
* for-next/kdump:
: Simplify the crashkernel reservation behaviour of crashkernel=X,high on arm64
arm64: add kdump.rst into index.rst
Documentation: add kdump.rst to present crashkernel reservation on arm64
arm64: kdump: simplify the reservation behaviour of crashkernel=,high
* for-next/acpi-doc:
: Update ACPI documentation for Arm systems
Documentation/arm64: Update ACPI tables from BBR
Documentation/arm64: Update references in arm-acpi
Documentation/arm64: Update ARM and arch reference
* for-next/doc:
: arm64 documentation updates
Documentation/arm64: Add ptdump documentation
* for-next/tpidr2-fix:
: Fix the TPIDR2_EL0 register restoring on sigreturn
kselftest/arm64: Add a test case for TPIDR2 restore
arm64/signal: Restore TPIDR2 register rather than memory state
* kvm-arm64/misc:
: Miscellaneous updates
:
: - Avoid trapping CTR_EL0 on systems with FEAT_EVT, as the register is
: commonly read by userspace
:
: - Make use of FEAT_BTI at hyp stage-1, setting the Guard Page bit to 1
: for executable mappings
:
: - Use a separate set of pointer authentication keys for the hypervisor
: when running in protected mode (i.e. pKVM)
:
: - Plug a few holes in timer initialization where KVM fails to free the
: timer IRQ(s)
KVM: arm64: Use different pointer authentication keys for pKVM
KVM: arm64: timers: Fix resource leaks in kvm_timer_hyp_init()
KVM: arm64: Use BTI for nvhe
KVM: arm64: Relax trapping of CTR_EL0 when FEAT_EVT is available
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/configurable-id-regs:
: Configurable ID register infrastructure, courtesy of Jing Zhang
:
: Create generalized infrastructure for allowing userspace to select the
: supported feature set for a VM, so long as the feature set is a subset
: of what hardware + KVM allows. This does not add any new features that
: are user-configurable, and instead focuses on the necessary refactoring
: to enable future work.
:
: As a consequence of the series, feature asymmetry is now deliberately
: disallowed for KVM. It is unlikely that VMMs ever configured VMs with
: asymmetry, nor does it align with the kernel's overall stance that
: features must be uniform across all cores in the system.
:
: Furthermore, KVM incorrectly advertised an IMP_DEF PMU to guests for
: some time. Migrations from affected kernels was supported by explicitly
: allowing such an ID register value from userspace, and forwarding that
: along to the guest. KVM now allows an IMP_DEF PMU version to be restored
: through the ID register interface, but reinterprets the user value as
: not implemented (0).
KVM: arm64: Rip out the vestiges of the 'old' ID register scheme
KVM: arm64: Handle ID register reads using the VM-wide values
KVM: arm64: Use generic sanitisation for ID_AA64PFR0_EL1
KVM: arm64: Use generic sanitisation for ID_(AA64)DFR0_EL1
KVM: arm64: Use arm64_ftr_bits to sanitise ID register writes
KVM: arm64: Save ID registers' sanitized value per guest
KVM: arm64: Reuse fields of sys_reg_desc for idreg
KVM: arm64: Rewrite IMPDEF PMU version as NI
KVM: arm64: Make vCPU feature flags consistent VM-wide
KVM: arm64: Relax invariance of KVM_ARM_VCPU_POWER_OFF
KVM: arm64: Separate out feature sanitisation and initialisation
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Rather than reinventing the wheel in KVM to do ID register sanitisation
we can rely on the work already done in the core kernel. Implement a
generalized sanitisation of ID registers based on the combination of the
arm64_ftr_bits definitions from the core kernel and (optionally) a set
of KVM-specific overrides.
This all amounts to absolutely nothing for now, but will be used in
subsequent changes to realize user-configurable ID registers.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20230609190054.1542113-8-oliver.upton@linux.dev
[Oliver: split off from monster patch, rewrote commit description,
reworked RAZ handling, return EINVAL to userspace]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Expose a capability keying the hVHE feature as well as a new
predicate testing it. Nothing is so far using it, and nothing
is enabling it yet.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Disabling KASLR from the command line is implemented as a feature
override. Repaint it slightly so that it can further be used as
more generic infrastructure for SW override purposes.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
We only use cpus_set_cap() in update_cpu_capabilities(), where we
open-code an analgous update to boot_cpucaps.
Due to the way the cpucap_ptrs[] array is initialized, we know that the
capability number cannot be greater than or equal to ARM64_NCAPS, so the
warning is superfluous.
Fold cpus_set_cap() into update_cpu_capabilities(), matching what we do
for the boot_cpucaps, and making the relationship between the two a bit
clearer.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230607164846.3967305-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To more clearly align the various users of the cpucap enumeration, this patch
changes the cpufeature code to use the term `cpucap` in favour of `cpu_hwcap`.
This more clearly aligns with other users of the cpucaps, and avoids confusion
with the ELF hwcaps.
There should be no functional change as a result of this patch; this is
purely a renaming exercise.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230607164846.3967305-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The 'cpu_hwcaps' and 'boot_capabilities' bitmaps are bitmaps have the
same enumerated bits, but are named wildly differently for no good
reason. The terms 'hwcaps' and 'capabilities' have become ambiguous over
time (e.g. due to clashes with ELF hwcaps and the structures used to
manage feature detection), and it would be nicer to use 'cpucaps',
matching the <asm/cpucaps.h> header the enumerated bit indices are
defined in.
While this isn't a functional problem, it makes the code harder than
necessary to understand, and hard to extend with related functionality
(e.g. per-cpu cpucap bitmaps).
To that end, this patch renames `boot_capabilities` to `boot_cpucaps`
and `cpu_hwcaps` to `system_cpucaps`. This more clearly indicates the
relationship between the two and aligns with terminology used elsewhere
in our feature management code.
This change was scripted with:
| find . -type f -name '*.[chS]' -print0 | \
| xargs -0 sed -i 's/\<boot_capabilities\>/boot_cpucaps/'
| find . -type f -name '*.[chS]' -print0 | \
| xargs -0 sed -i 's/\<cpu_hwcaps\>/system_cpucaps/'
... and the instance of "cpu_hwcap" (without a trailing "s") in
<asm/mmu_context.h> corrected manually to "system_cpucaps".
Subsequent patches will adjust the naming of related functions to better
align with the `cpucap` naming.
There should be no functional change as a result of this patch; this is
purely a renaming exercise.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230607164846.3967305-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This indicates if the system supports PIE. This is a CPUCAP_BOOT_CPU_FEATURE
as the boot CPU will enable PIE if it has it, so secondary CPUs must also
have this feature.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230606145859.697944-8-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This capability indicates if the system supports the TCR2_ELx system register.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230606145859.697944-7-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The Arm v8.8/9.3 FEAT_MOPS feature provides new instructions that
perform a memory copy or set. Wire up the cpufeature code to detect the
presence of FEAT_MOPS and enable it.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20230509142235.3284028-10-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Detect if the system has the new HCRX_EL2 register added in ARMv8.7/9.2,
so that subsequent patches can check for its presence.
KVM currently relies on the register being present on all CPUs (or
none), so the kernel will panic if that is not the case. Fortunately no
such systems currently exist, but this can be revisited if they appear.
Note that the kernel will not panic if CONFIG_KVM is disabled.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20230509142235.3284028-3-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The newly added support for ECV CNTPOFF open codes the recently added
helper ARM64_CPUID_FIELDS(), make use of the helper. No functional
change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20230523-arm64-ecv-helper-v1-1-506dfb5fb199@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
CTR_EL0 can often be used in userspace, and it would be nice if
KVM didn't have to emulate it unnecessarily.
While it isn't possible to trap the cache configuration registers
independently from CTR_EL0 in the base ARMv8.0 architecture, FEAT_EVT
allows these cache configuration registers (CCSIDR_EL1, CCSIDR2_EL1,
CLIDR_EL1 and CSSELR_EL1) to be trapped independently by setting
HCR_EL2.TID4.
Switch to using TID4 instead of TID2 in the cases where FEAT_EVT
is available *and* that KVM doesn't need to sanitise CTR_EL0 to
paper over mismatched cache configurations.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230515170016.965378-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
- Fix regression in CPU erratum workaround when disabling the MMU
- Fix detection of pointer authentication hwcaps
- Avoid writeable, executable ELF sections in vmlinux
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmRTe+UQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNJXqB/9C9DrrbHUg9ZPqAIUpXkyaxem4gpIS+kyU
+ard53uweuQHchuR/x2s2K9Sp/ano5jGnQXEjikNy29Opu2UYI/wmsqdJEn3km8q
kohTRsiFgQ40Y85/3iJ8ug6+llxCxK6AXdZCskdWTP56Jur0WpNiQd0a/ShYQLdX
wBHdInT3QpDVzd5bEWDtUEj4H//tTCy4rESQyGsLhrHgb/x8uZKgZMtPJp6+Q3Eq
ofs+PQc0qHr/Ri3ahQOCMbxTbNaLIgUzkyXbZN+y2JtxgE+l8E3Gsir2+Pv7mcSx
1gCSLCmpwE7rVJpTykN+jA6OSsoSUSJHXs6565nF4n8+ugdL7aqR
=Tba+
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"A few arm64 fixes that came in during the merge window for -rc1.
The main thing is restoring the pointer authentication hwcaps, which
disappeared during some recent refactoring
- Fix regression in CPU erratum workaround when disabling the MMU
- Fix detection of pointer authentication hwcaps
- Avoid writeable, executable ELF sections in vmlinux"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: lds: move .got section out of .text
arm64: kernel: remove SHF_WRITE|SHF_EXECINSTR from .idmap.text
arm64: cpufeature: Fix pointer auth hwcaps
arm64: Fix label placement in record_mmu_state()
The pointer auth hwcaps are not getting reported to userspace, as they
are missing the .matches field. Add the field back.
Fixes: 876e3c8efe ("arm64/cpufeature: Pull out helper for CPUID register definitions")
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230428132546.2513834-1-kristina.martsenko@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
* More phys_to_virt conversions
* Improvement of AP management for VSIE (nested virtualization)
ARM64:
* Numerous fixes for the pathological lock inversion issue that
plagued KVM/arm64 since... forever.
* New framework allowing SMCCC-compliant hypercalls to be forwarded
to userspace, hopefully paving the way for some more features
being moved to VMMs rather than be implemented in the kernel.
* Large rework of the timer code to allow a VM-wide offset to be
applied to both virtual and physical counters as well as a
per-timer, per-vcpu offset that complements the global one.
This last part allows the NV timer code to be implemented on
top.
* A small set of fixes to make sure that we don't change anything
affecting the EL1&0 translation regime just after having having
taken an exception to EL2 until we have executed a DSB. This
ensures that speculative walks started in EL1&0 have completed.
* The usual selftest fixes and improvements.
KVM x86 changes for 6.4:
* Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled,
and by giving the guest control of CR0.WP when EPT is enabled on VMX
(VMX-only because SVM doesn't support per-bit controls)
* Add CR0/CR4 helpers to query single bits, and clean up related code
where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return
as a bool
* Move AMD_PSFD to cpufeatures.h and purge KVM's definition
* Avoid unnecessary writes+flushes when the guest is only adding new PTEs
* Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations
when emulating invalidations
* Clean up the range-based flushing APIs
* Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single
A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle
changed SPTE" overhead associated with writing the entire entry
* Track the number of "tail" entries in a pte_list_desc to avoid having
to walk (potentially) all descriptors during insertion and deletion,
which gets quite expensive if the guest is spamming fork()
* Disallow virtualizing legacy LBRs if architectural LBRs are available,
the two are mutually exclusive in hardware
* Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
after KVM_RUN, similar to CPUID features
* Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES
* Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
x86 AMD:
* Add support for virtual NMIs
* Fixes for edge cases related to virtual interrupts
x86 Intel:
* Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
not being reported due to userspace not opting in via prctl()
* Fix a bug in emulation of ENCLS in compatibility mode
* Allow emulation of NOP and PAUSE for L2
* AMX selftests improvements
* Misc cleanups
MIPS:
* Constify MIPS's internal callbacks (a leftover from the hardware enabling
rework that landed in 6.3)
Generic:
* Drop unnecessary casts from "void *" throughout kvm_main.c
* Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
size by 8 bytes on 64-bit kernels by utilizing a padding hole
Documentation:
* Fix goof introduced by the conversion to rST
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNExkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNyjwf+MkzDael9y9AsOZoqhEZ5OsfQYJ32
Im5ZVYsPRU2K5TuoWql6meIihgclCj1iIU32qYHa2F1WYt2rZ72rJp+HoY8b+TaI
WvF0pvNtqQyg3iEKUBKPA4xQ6mj7RpQBw86qqiCHmlfNt0zxluEGEPxH8xrWcfhC
huDQ+NUOdU7fmJ3rqGitCvkUbCuZNkw3aNPR8dhU8RAWrwRzP2hBOmdxIeo81WWY
XMEpJSijbGpXL9CvM0Jz9nOuMJwZwCCBGxg1vSQq0xTfLySNMxzvWZC2GFaBjucb
j0UOQ7yE0drIZDVhd3sdNslubXXU6FcSEzacGQb9aigMUon3Tem9SHi7Kw==
=S2Hq
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"s390:
- More phys_to_virt conversions
- Improvement of AP management for VSIE (nested virtualization)
ARM64:
- Numerous fixes for the pathological lock inversion issue that
plagued KVM/arm64 since... forever.
- New framework allowing SMCCC-compliant hypercalls to be forwarded
to userspace, hopefully paving the way for some more features being
moved to VMMs rather than be implemented in the kernel.
- Large rework of the timer code to allow a VM-wide offset to be
applied to both virtual and physical counters as well as a
per-timer, per-vcpu offset that complements the global one. This
last part allows the NV timer code to be implemented on top.
- A small set of fixes to make sure that we don't change anything
affecting the EL1&0 translation regime just after having having
taken an exception to EL2 until we have executed a DSB. This
ensures that speculative walks started in EL1&0 have completed.
- The usual selftest fixes and improvements.
x86:
- Optimize CR0.WP toggling by avoiding an MMU reload when TDP is
enabled, and by giving the guest control of CR0.WP when EPT is
enabled on VMX (VMX-only because SVM doesn't support per-bit
controls)
- Add CR0/CR4 helpers to query single bits, and clean up related code
where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long"
return as a bool
- Move AMD_PSFD to cpufeatures.h and purge KVM's definition
- Avoid unnecessary writes+flushes when the guest is only adding new
PTEs
- Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s
optimizations when emulating invalidations
- Clean up the range-based flushing APIs
- Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a
single A/D bit using a LOCK AND instead of XCHG, and skip all of
the "handle changed SPTE" overhead associated with writing the
entire entry
- Track the number of "tail" entries in a pte_list_desc to avoid
having to walk (potentially) all descriptors during insertion and
deletion, which gets quite expensive if the guest is spamming
fork()
- Disallow virtualizing legacy LBRs if architectural LBRs are
available, the two are mutually exclusive in hardware
- Disallow writes to immutable feature MSRs (notably
PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features
- Overhaul the vmx_pmu_caps selftest to better validate
PERF_CAPABILITIES
- Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
- AMD SVM:
- Add support for virtual NMIs
- Fixes for edge cases related to virtual interrupts
- Intel AMX:
- Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if
XTILE_DATA is not being reported due to userspace not opting in
via prctl()
- Fix a bug in emulation of ENCLS in compatibility mode
- Allow emulation of NOP and PAUSE for L2
- AMX selftests improvements
- Misc cleanups
MIPS:
- Constify MIPS's internal callbacks (a leftover from the hardware
enabling rework that landed in 6.3)
Generic:
- Drop unnecessary casts from "void *" throughout kvm_main.c
- Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the
struct size by 8 bytes on 64-bit kernels by utilizing a padding
hole
Documentation:
- Fix goof introduced by the conversion to rST"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits)
KVM: s390: pci: fix virtual-physical confusion on module unload/load
KVM: s390: vsie: clarifications on setting the APCB
KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA
KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state
KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init()
KVM: selftests: Test the PMU event "Instructions retired"
KVM: selftests: Copy full counter values from guest in PMU event filter test
KVM: selftests: Use error codes to signal errors in PMU event filter test
KVM: selftests: Print detailed info in PMU event filter asserts
KVM: selftests: Add helpers for PMC asserts in PMU event filter test
KVM: selftests: Add a common helper for the PMU event filter guest code
KVM: selftests: Fix spelling mistake "perrmited" -> "permitted"
KVM: arm64: vhe: Drop extra isb() on guest exit
KVM: arm64: vhe: Synchronise with page table walker on MMU update
KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc()
KVM: arm64: nvhe: Synchronise with page table walker on TLBI
KVM: arm64: Handle 32bit CNTPCTSS traps
KVM: arm64: nvhe: Synchronise with page table walker on vcpu run
KVM: arm64: vgic: Don't acquire its_lock before config_lock
KVM: selftests: Add test to verify KVM's supported XCR0
...
Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening in
the driver core in the quest to be able to move "struct bus" and "struct
class" into read-only memory, a task now complete with these changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules for
all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most of
them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
LEGadNS38k5fs+73UaxV
=7K4B
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
When defining which value to look for in a system register field we
currently manually specify the register, field shift, width and sign and
the value to look for. This opens the potential for error with for example
the wrong field width or sign being specified, an enumeration value for
a different similarly named field or letting something be initialised to 0.
Since we now generate defines for all the ID registers we now have named
constants for all of these things generated from the system register
description, meaning that we can generate initialisation for all the fields
used in matching from a minimal specification of register, field and match
value. This is both shorter and eliminates or makes build failures several
potential errors.
No change in the generated binary.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230303-arm64-cpufeature-helpers-v2-3-4c8f28a6f203@kernel.org
[will: Drop explicit '.sign' assignment for BTI feature]
Signed-off-by: Will Deacon <will@kernel.org>
A number of the cpufeatures use raw numbers for the minimum field values
specified rather than symbolic constants. In preparation for the use of
helper macros replace all these with the appropriate constants.
No change in the generated binary.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230303-arm64-cpufeature-helpers-v2-2-4c8f28a6f203@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
We use the same structure to match hwcaps and CPU features so we can use
the same helper to generate the fields required. Pull the portion of the
current hwcaps helper that initialises the fields out into a separate
define placed earlier in the file so we can use it for cpufeatures.
No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230303-arm64-cpufeature-helpers-v2-1-4c8f28a6f203@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>