linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-29 11:42:36 +00:00

Author	SHA1	Message	Date
Ada Couprie Diaz	31575e11ec	arm64: debug: split brk64 exception entry Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. The BRK64 instruction can only be triggered by a BRK instruction. Thus, we know that the PC is a legitimate address and isn't being used to train a branch predictor with a bogus address : we don't need to call `arm64_apply_bp_hardening()`. We do not need to handle the Cortex-A76 erratum #1463225 either, as it only relevant for single stepping at EL1. BRK64 does not write FAR_EL1 either, as only hardware watchpoints do so. Split the BRK64 exception entry, adjust the function signature, and its behaviour to match the lack of needed mitigations. Further, as the EL0 and EL1 code paths are cleanly separated, we can split `do_brk64()` into `do_el0_brk64()` and `do_el1_brk64()`, and call them directly from the relevant entry paths. Use `die()` directly for the EL1 error path, as in `do_el1_bti()` and `do_el1_undef()`. We can also remove `NOKRPOBE_SYMBOL` for the EL0 path, as it cannot lead to a kprobe recursion. When taking a BRK64 exception from EL0, the exception handling is safely preemptible : the only possible handler is `uprobe_brk_handler()`. It only operates on task-local data and properly checks its validity, then raises a Thread Information Flag, processed before returning to userspace in `do_notify_resume()`, which is already preemptible. Thus we can safely unmask interrupts and enable preemption before handling the break itself, fixing a PREEMPT_RT issue where the handler could call a sleeping function with preemption disabled. Given that the break hook registration is handled statically in `call_break_hook` since (arm64: debug: call software break handlers statically) and that we now bypass the exception handler registration, this change renders `early_brk64` redundant : its functionality is now handled through the post-init path. This also removes the last usage of `el1_dbg()`. This also removes the last usage of `el0_dbg()` without `CONFIG_COMPAT`. Mark it `__maybe_unused`, to prevent a warning when building this patch without `CONFIG_COMPAT`, as the following patch removes `el0_dbg()`. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-12-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-08 13:27:42 +01:00
Ada Couprie Diaz	413f0bba00	arm64: debug: split hardware watchpoint exception entry Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. Hardware watchpoints are the only debug exceptions that will write FAR_EL1, so we need to preserve it and pass it down. However, they cannot be used to maliciously train branch predictors, so we can omit calling `arm64_bp_hardening()`, nor do they need to handle the Cortex-A76 erratum #1463225, as it only applies to single stepping exceptions. As the hardware watchpoint handler only returns 0 and never triggers the call to `arm64_notify_die()`, we can call it directly from `entry-common.c`. Split the hardware watchpoint exception entry and adjust the behaviour to match the lack of needed mitigations. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-11-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-08 13:27:42 +01:00
Ada Couprie Diaz	0ac7584c08	arm64: debug: split single stepping exception entry Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. The single stepping exception has the most constraints : it can be exploited to train branch predictors and it needs special handling at EL1 for the Cortex-A76 erratum #1463225. We need to conserve all those mitigations. However, it does not write an address at FAR_EL1, as only hardware watchpoints do so. The single-step handler does its own signaling if it needs to and only returns 0, so we can call it directly from `entry-common.c`. Split the single stepping exception entry, adjust the function signature, keep the security mitigation and erratum handling. Further, as the EL0 and EL1 code paths are cleanly separated, we can split `do_softstep()` into `do_el0_softstep()` and `do_el1_softstep()` and call them directly from the relevant entry paths. We can also remove `NOKPROBE_SYMBOL` for the EL0 path, as it cannot lead to a kprobe recursion. Move the call to `arm64_apply_bp_hardening()` to `entry-common.c` so that we can do it as early as possible, and only for the exceptions coming from EL0, where it is needed. This is safe to do as it is `noinstr`, as are all the functions it may call. `el0_ia()` and `el0_pc()` already call it this way. When taking a soft-step exception from EL0, most of the single stepping handling is safely preemptible : the only possible handler is `uprobe_single_step_handler()`. It only operates on task-local data and properly checks its validity, then raises a Thread Information Flag, processed before returning to userspace in `do_notify_resume()`, which is already preemptible. However, the soft-step handler first calls `reinstall_suspended_bps()` to check if there is any hardware breakpoint or watchpoint pending or already stepped through. This cannot be preempted as it manipulates the hardware breakpoint and watchpoint registers. Move the call to `try_step_suspended_breakpoints()` to `entry-common.c` and adjust the relevant comments. We can now safely unmask interrupts before handling the step itself, fixing a PREEMPT_RT issue where the handler could call a sleeping function with preemption disabled. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Closes: https://lore.kernel.org/linux-arm-kernel/Z6YW_Kx4S2tmj2BP@uudg.org/ Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-10-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-08 13:27:42 +01:00
Ada Couprie Diaz	80691d3552	arm64: debug: refactor reinstall_suspended_bps() `reinstall_suspended_bps()` plays a key part in the stepping process when we have hardware breakpoints and watchpoints enabled. It checks if we need to step one, will re-enable it if it has been handled and will return whether or not we need to proceed with a single-step. However, the current naming and return values make it harder to understand the logic and goal of the function. Rename it `try_step_suspended_breakpoints()` and change the return value to a boolean, aligning it with similar functions used in `do_el0_undef()` like `try_emulate_mrs()`, and making its behaviour more obvious. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-9-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-08 13:27:42 +01:00
Ada Couprie Diaz	43e2ae77fc	arm64: debug: split hardware breakpoint exception entry Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. Hardware breakpoints exceptions are generated by the hardware after user configuration. As such, they can be exploited when training branch predictors outside of the userspace VA range: they still need to call `arm64_apply_bp_hardening()` if needed to mitigate against this attack. However, they do not need to handle the Cortex-A76 erratum #1463225 as it only applies to single stepping exceptions. It does not set an address in FAR_EL1 either, only the hardware watchpoint does. As the hardware breakpoint handler only returns 0 and never triggers the call to `arm64_notify_die()`, we can call it directly from `entry-common.c`. Split the hardware breakpoint exception entry, adjust the function signature, and handling of the Cortex-A76 erratum to fit the behaviour of the exception. Move the call to `arm64_apply_bp_hardening()` to `entry-common.c` so that we can do it as early as possible, and only for the exceptions coming from EL0, where it is needed. This is safe to do as it is `noinstr`, as are all the functions it may call. `el0_ia()` and `el0_pc()` already call it this way. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-8-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-08 13:27:41 +01:00
Ada Couprie Diaz	eaff68b328	arm64: entry: Add entry and exit functions for debug exceptions Move the `debug_exception_enter()` and `debug_exception_exit()` functions from mm/fault.c, as they are needed to split the debug exceptions entry paths from the current unified one. Make them externally visible in include/asm/exception.h until the caller in mm/fault.c is cleaned up. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-7-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-08 13:27:41 +01:00
Ada Couprie Diaz	d4e0b12620	arm64: debug: remove break/step handler registration infrastructure Remove all infrastructure for the dynamic registration previously used by software breakpoints and stepping handlers. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-6-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-08 13:27:41 +01:00
Ada Couprie Diaz	403b48aad5	arm64: debug: call step handlers statically Software stepping checks for the correct handler by iterating over a list of dynamically registered handlers and calling all of them until one handles the exception. This is the only generic way to handle software stepping handlers in arm64 as the exception does not provide an immediate that could be checked, contrary to software breakpoints. However, the registration mechanism is not exported and has only two current users : the KGDB stepping handler, and the uprobe single step handler. Given that one comes from user mode and the other from kernel mode, call the appropriate one by checking the source EL of the exception. Add a stand-in that returns DBG_HOOK_ERROR when the configuration options are not enabled. Remove `arch_init_uprobes()` as it is not useful anymore and is specific to arm64. Unify the naming of the handler to XXX_single_step_handler(), making it clear they are related. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-5-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-08 13:27:41 +01:00
Ada Couprie Diaz	6adfdc5e2e	arm64: debug: call software breakpoint handlers statically Software breakpoints pass an immediate value in ESR ("comment") that can be used to call a specialized handler (KGDB, KASAN...). We do so in two different ways : - During early boot, `early_brk64` statically checks against known immediates and calls the corresponding handler, - During init, handlers are dynamically registered into a list. When called, the generic software breakpoint handler will iterate over the list to find the appropriate handler. The dynamic registration does not provide any benefit here as it is not exported and all its uses are within the arm64 tree. It also depends on an RCU list, whose safe access currently relies on the non-preemptible state of `do_debug_exception`. Replace the list iteration logic in `call_break_hooks` to call the breakpoint handlers statically if they are enabled, like in `early_brk64`. Expose the handlers in their respective headers to be reachable from `arch/arm64/kernel/debug-monitors.c` at link time. Unify the naming of the software breakpoint handlers to XXX_brk_handler(), making it clear they are related and to differentiate from the hardware breakpoints. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-4-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-08 13:27:41 +01:00
Ada Couprie Diaz	b1e2d95524	arm64: refactor aarch32_break_handler() `aarch32_break_handler()` is called in `do_el0_undef()` when we are trying to handle an exception whose Exception Syndrome is unknown. It checks if the instruction hit might be a 32-bit arm break (be it A32 or T2), and sends a SIGTRAP to userspace if it is so that it can be handled. However, this is badly represented in the naming of the function, and is not consistent with the other functions called with the same logic in `do_el0_undef()`. Rename it `try_handle_aarch32_break()` and change the return value to a boolean to align with the logic of the other tentative handlers in `do_el0_undef()`, the previous error code being ignored anyway. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250707114109.35672-3-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-08 13:27:41 +01:00
Ankit Agrawal	0c67288e0c	KVM: arm64: Allow cacheable stage 2 mapping using VMA flags KVM currently forces non-cacheable memory attributes (either Normal-NC or Device-nGnRE) for a region based on pfn_is_map_memory(), i.e. whether or not the kernel has a cacheable alias for it. This is necessary in situations where KVM needs to perform CMOs on the region but is unnecessarily restrictive when hardware obviates the need for CMOs. KVM doesn't need to perform any CMOs on hardware with FEAT_S2FWB and CTR_EL0.DIC. As luck would have it, there are implementations in the wild that need to map regions of a device with cacheable attributes to function properly. An example of this is Nvidia's Grace Hopper/Blackwell systems where GPU memory is interchangeable with DDR and retains properties such as cacheability, unaligned accesses, atomics and handling of executable faults. Of course, for this to work in a VM the GPU memory needs to have a cacheable mapping at stage-2. Allow cacheable stage-2 mappings to be created on supporting hardware when the VMA has cacheable memory attributes. Check these preconditions during memslot creation (in addition to fault handling) to potentially 'fail-fast' as a courtesy to userspace. CC: Oliver Upton <oliver.upton@linux.dev> CC: Sean Christopherson <seanjc@google.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: David Hildenbrand <david@redhat.com> Tested-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250705071717.5062-6-ankita@nvidia.com [ Oliver: refine changelog, squash kvm_supports_cacheable_pfnmap() patch ] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-07 16:54:19 -07:00
Mark Brown	0d1c86b840	arm64/gcs: Don't try to access GCS registers if arm64.nogcs is enabled During EL2 setup if GCS is advertised in the ID registers we will reset the GCS control registers GCSCR_EL1 and GCSCRE0_EL1 to known values in order to ensure it is disabled. This is done without taking into account overrides supplied on the command line, meaning that if the user has configured arm64.nogcs we will still access these GCS specific registers. If this was done because EL3 does not enable GCS this results in traps to EL3 and a failed boot which is not what users would expect from having set that parameter. Move the writes to these registers to finalise_el2_state where we can pay attention to the command line overrides. For simplicity we leave the updates to the traps in HCRX_EL2 and the FGT registers in place since these should only be relevant for KVM guests and KVM will manage them itself for guests. This follows the existing practice for other similar traps for overridable features such as those for TPIDR2_EL0 and SMPRI_EL1. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250619-arm64-fix-nogcs-v1-1-febf2973672e@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2025-07-04 16:29:12 +01:00
Anshuman Khandual	d3a80c5109	arm64/debug: Drop redundant DBG_MDSCR_* macros MDSCR_EL1 has already been defined in tools sysreg format and hence can be used in all debug monitor related call paths. But using generated sysreg definitions causes build warnings because there is a mismatch between mdscr variable (u32) and GENMASK() based masks (long unsigned int). Convert all variables handling MDSCR_EL1 register as u64 which also reflects its true width as well. -------------------------------------------------------------------------- arch/arm64/kernel/debug-monitors.c: In function ‘disable_debug_monitors’: arch/arm64/kernel/debug-monitors.c:108:13: warning: conversion from ‘long unsigned int’ to ‘u32’ {aka ‘unsigned int’} changes value from ‘18446744073709518847’ to ‘4294934527’ [-Woverflow] 108 \| disable = ~MDSCR_EL1_MDE; \| ^ -------------------------------------------------------------------------- While here, replace an open encoding with MDSCR_EL1_TDCC in __cpu_setup(). Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250613023646.1215700-2-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-07-03 20:00:24 +01:00
Mark Rutland	42ce432522	KVM: arm64: Remove kvm_arch_vcpu_run_map_fp() Historically KVM hyp code saved the host's FPSIMD state into the hosts's fpsimd_state memory, and so it was necessary to map this into the hyp Stage-1 mappings before running a vCPU. This is no longer necessary as of commits: * `fbc7e61195` ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state") * `8eca7f6d51` ("KVM: arm64: Remove host FPSIMD saving for non-protected KVM") Since those commits, we eagerly save the host's FPSIMD state before calling into hyp to run a vCPU, and hyp code never reads nor writes the host's fpsimd_state memory. There's no longer any need to map the host's fpsimd_state memory into the hyp Stage-1, and kvm_arch_vcpu_run_map_fp() is unnecessary but benign. Remove kvm_arch_vcpu_run_map_fp(). Currently there is no code to perform a corresponding unmap, and we never mapped the host's SVE or SME state into the hyp Stage-1, so no other code needs to be removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Cc: kvmarm@lists.linux.dev Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250619134817.4075340-1-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-07-03 10:39:24 +01:00
Yeoreum Yun	f620372209	arm64/hwcaps: Add MTE_STORE_ONLY hwcaps Since ARMv8.9, FEAT_MTE_STORE_ONLY can be used to restrict raise of tag check fault on store operation only. add MTE_STORE_ONLY hwcaps so that user can use this feature. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20250618092957.2069907-5-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-07-02 18:49:04 +01:00
Yeoreum Yun	4d51ff5bba	arm64/kernel: Support store-only mte tag check Introduce new flag -- MTE_CTRL_STORE_ONLY used to set store-only tag check. This flag isn't overridden by prefered tcf flag setting but set together with prefered setting of way to report tag check fault. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618092957.2069907-4-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-07-02 18:49:04 +01:00
Yeoreum Yun	7c7f55039b	arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported If FEAT_MTE_TAGGED_FAR (Armv8.9) is supported, bits 63:60 of the fault address are preserved in response to synchronous tag check faults (SEGV_MTESERR). This patch modifies below to support this feature: - Use the original FAR_EL1 value when an MTE tag check fault occurs, if ARM64_MTE_FAR is supported so that not only logical tag (bits 59:56) but also address tag (bits 63:60] being reported too. - Add HWCAP for mtefar to let user know bits 63:60 includes address tag information when when FEAT_MTE_TAGGED_FAR is supported. Applications that require this information should install a signal handler with the SA_EXPOSE_TAGBITS flag. While this introduces a minor ABI change, most applications do not set this flag and therefore will not be affected. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20250618084513.1761345-3-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-07-02 17:44:17 +01:00
Song Liu	fd1e0fd71f	arm64: Implement HAVE_LIVEPATCH Allocate a task flag used to represent the patch pending state for the task. When a livepatch is being loaded or unloaded, the livepatch code uses this flag to select the proper version of a being patched kernel functions to use for current task. In arch/arm64/Kconfig, select HAVE_LIVEPATCH and include proper Kconfig. This is largely based on [1] by Suraj Jitindar Singh. [1] https://lore.kernel.org/all/20210604235930.603-1-surajjs@amazon.com/ Cc: Suraj Jitindar Singh <surajjs@amazon.com> Cc: Torsten Duwe <duwe@suse.de> Acked-by: Miroslav Benes <mbenes@suse.cz> Tested-by: Breno Leitao <leitao@debian.org> Tested-by: Andrea della Porta <andrea.porta@suse.com> Signed-off-by: Song Liu <song@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250630174502.842486-1-song@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-07-01 15:18:53 +01:00
Mikołaj Lenczewski	5aa4b62576	arm64: Add BBM Level 2 cpu feature The Break-Before-Make cpu feature supports multiple levels (levels 0-2), and this commit adds a dedicated BBML2 cpufeature to test against support for. To support BBML2 in as wide a range of contexts as we can, we want not only the architectural guarantees that BBML2 makes, but additionally want BBML2 to not create TLB conflict aborts. Not causing aborts avoids us having to prove that no recursive faults can be induced in any path that uses BBML2, allowing its use for arbitrary kernel mappings. This feature builds on the previous ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE, as all early cpus must support BBML2 for us to enable it (and any later cpus must also support it to be onlined). Not onlining late cpus that do not support BBML2 is unavoidable, as we might currently be using BBML2 semantics for kernel memory regions. This could cause faults in the late cpus, and would be difficult to unwind, so let us avoid the case altogether. Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250625113435.26849-3-miko.lenczewski@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-06-30 18:09:05 +01:00
Catalin Marinas	3eb06f6ce3	arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type For system-wide capabilities, the kernel has the SCOPE_SYSTEM type. Such capabilities are checked once the SMP boot has completed using the sanitised ID registers. However, there is a need for a new capability type similar in scope to the system one but with checking performed locally on each CPU during boot (e.g. based on MIDR_EL1 which is not a sanitised register). Introduce ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS which, together with ARM64_CPUCAP_SCOPE_LOCAL_CPU, ensures that such capability is enabled only if all early CPUs have it. For ease of use, define ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE which combines SCOPE_LOCAL_CPU, PERMITTED_FOR_LATE_CPUS and MATCH_ALL_EARLY_CPUS. Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com> Reviewed-by: Suzuki K Poulose <Suzuki.Poulose@arm.com> Link: https://lore.kernel.org/r/20250625113435.26849-2-miko.lenczewski@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-06-30 17:42:03 +01:00
Mark Rutland	04c5355b2a	KVM: arm64: VHE: Centralize ISBs when returning to host The VHE hyp code has recently gained a few ISBs. Simplify this to one unconditional ISB in __kvm_vcpu_run_vhe(), and remove the unnecessary ISB from the kvm_call_hyp_ret() macro. While kvm_call_hyp_ret() is also used to invoke __vgic_v3_get_gic_config(), but no ISB is necessary in that case either. For the moment, an ISB is left in kvm_call_hyp(), as there are many more users, and removing the ISB would require a more thorough audit. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250617133718.4014181-8-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-06-19 13:34:59 +01:00
Mark Rutland	3a300a33e4	KVM: arm64: Remove cpacr_clear_set() We no longer use cpacr_clear_set(). Remove cpacr_clear_set() and its helper functions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250617133718.4014181-7-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-06-19 13:06:20 +01:00
Linus Torvalds	dde6379705	ARM: - Rework of system register accessors for system registers that are directly writen to memory, so that sanitisation of the in-memory value happens at the correct time (after the read, or before the write). For convenience, RMW-style accessors are also provided. - Multiple fixes for the so-called "arch-timer-edge-cases' selftest, which was always broken. x86: - Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to pass only the "untouched" addresses and flipping the shared/private bit in the implementation. - Disable SEV-SNP support on initialization failure -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhMVEQUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNCYwgAoqYz+RTcIx6EwHD/kN/9maUcQVl1 MMmXqfF2jQYmTvk7ocEW1qwx/SV0kB+H9LOThV8SWTvVxDbNqAaUWRDz+wcz3zaO 6/sUwz4dtU4XaTgxYhhB82lsPtJHyM+FM+bNL4rFFnrA1tZ93wRsMEeryZ5h960V C1Bc+PLBdpj+S3gQGvxeMxnG/n0oOAcecUqQa3ViIOKSfZEc/11+BjjvfvkYqExq s206faKSqor8xVXUbgtOk3LZreIExj/mD+pwMiUBvG0H0g4wnaG7Arc41QCFMowF 4l4sQVMWFZiKQvQZSfdQOeNsXcepWw0qISK7UeoWpLnpM78uUfCS6iG1rA== =Hc3G -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Rework of system register accessors for system registers that are directly writen to memory, so that sanitisation of the in-memory value happens at the correct time (after the read, or before the write). For convenience, RMW-style accessors are also provided. - Multiple fixes for the so-called "arch-timer-edge-cases' selftest, which was always broken. x86: - Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to pass only the "untouched" addresses and flipping the shared/private bit in the implementation. - Disable SEV-SNP support on initialization failure * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORY KVM: x86/mmu: Embed direct bits into gpa for KVM_PRE_FAULT_MEMORY KVM: SEV: Disable SEV-SNP support on initialization failure KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases KVM: arm64: selftests: Fix xVAL init in arch_timer_edge_cases KVM: arm64: selftests: Fix thread migration in arch_timer_edge_cases KVM: arm64: selftests: Fix help text for arch_timer_edge_cases KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg KVM: arm64: Add RMW specific sysreg accessor KVM: arm64: Add assignment-specific sysreg accessor	2025-06-13 10:05:31 -07:00
Magnus Lindholm	403d1338a4	mm: pgtable: fix pte_swp_exclusive Make pte_swp_exclusive return bool instead of int. This will better reflect how pte_swp_exclusive is actually used in the code. This fixes swap/swapoff problems on Alpha due pte_swp_exclusive not returning correct values when _PAGE_SWP_EXCLUSIVE bit resides in upper 32-bits of PTE (like on alpha). Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/lkml/20250218175735.19882-2-linmag7@gmail.com/ Link: https://lore.kernel.org/lkml/20250602041118.GA2675383@ZenIV/ [ Applied as the 'sed' script Al suggested - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-06-11 14:52:08 -07:00
Paolo Bonzini	ce360c2bfd	KVM/arm64 fixes for 6.16, take #2 - Rework of system register accessors for system registers that are directly writen to memory, so that sanitisation of the in-memory value happens at the correct time (after the read, or before the write). For convenience, RMW-style accessors are also provided. - Multiple fixes for the so-called "arch-timer-edge-cases' selftest, which was always broken. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmhCs3IACgkQI9DQutE9 ekMxlBAApd03crgHQy8V7I997D9TA/Ph4PkUOZOg091JAABkOZBCLd3H8hbe7Va6 2XPD7IeTQUEP/8Xwc0+sWF3X4bIqU3PlxZ/TI4IgNDxazz2l+1LTHCrWrP47VXMr j5czEzWkSX/59LFc0jL3T0VxKhN9fI+aSE9UZCCXc0BGyLIlRNclO4ho87xkgbxM AuhM0VslXtAZBF9DBrtOQ1EodI5Cc7vH38id/8SCL9f74rKln4UViSuPhRQxgzgy 7T523OERyAINJ8e6UNd0Tg5GFYdj2bMeivnTleaFFxmCH+tAKYtSTV8d6n0fzsOF 1D+6uU93v4ky3DWwCvmEXLzijH6pRrLjMLsC4Sx1kFCPe05Zaui/g65n4REflZm6 0xZ2bnTsZP1/MYrZya/XpXipF0EGITqsOuKpHgEO495TIgmAZKev+GIp3NDooSYk dZWN0U0ctePV2+WFoxNyN+r9nrg/xSujnyU0k3kMmRcfRHcATzZG6jYOj8CrLdNO jWZ56XhghiJj01B1IjVskuSyTwcoRMH4h//C7oAAFQoOuZtEgduGeZUQxz7EoBxX /I4Cg4+9P/m310gjdEVMGPdvrFQgweJc8K3+mT3WGRA8AT4Nhi6pxZxnzWeABuUD 4HpVruNxygMwODilk3YruJ/yat7FqTBTdRZt4w+cwpBTi8VPPqs= =OMHL -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.16, take #2 - Rework of system register accessors for system registers that are directly writen to memory, so that sanitisation of the in-memory value happens at the correct time (after the read, or before the write). For convenience, RMW-style accessors are also provided. - Multiple fixes for the so-called "arch-timer-edge-cases' selftest, which was always broken.	2025-06-11 14:25:22 -04:00
Linus Torvalds	e9e668cd27	arm64 fixes for -rc1 - Disable problematic linker assertions for broken versions of LLD. - Work around sporadic link failure with LLD and various randconfig builds. - Fix missing invalidation in the TLB batching code when reclaim races with mprotect() and friends. - Add a command-line override for MPAM to allow booting on systems with broken firmware. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmhBcycQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNDwWCACtc4Jw3wwkmaiiP9Ner1/7wKq8xRLC2WRU tJjWLSkeoTthxf0DZILc61rNpOalfaRK774/Xo0OiYOBpKeAi5cSaUYMyabVJGcK k1R0KXDUu8oS6xKXmXyeuBV2pK4v4aET3E6lzUQZfvamhzuZfCvvKKrF5K8vv5Ph eowBMWKugMrwXMOBkRgVopppobdneFuVvnoMlNNYWOy70wDekoPV3qizoVJG/ulQ BTFunXX8Otufrm48Ye2bYalfwoiGdUQaJz/gRuHko0o3SOhqR3qZp2DWxQgBwJ+g VI6/dRLnVQpdg6toTvS9jzPczVfLt4/5VhLevbBcJuaUOER4SOZl =cfnk -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "We've got a couple of build fixes when using LLD, a missing TLB invalidation and a workaround for broken firmware on SoCs with CPUs that implement MPAM: - Disable problematic linker assertions for broken versions of LLD - Work around sporadic link failure with LLD and various randconfig builds - Fix missing invalidation in the TLB batching code when reclaim races with mprotect() and friends - Add a command-line override for MPAM to allow booting on systems with broken firmware" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Add override for MPAM arm64/mm: Close theoretical race where stale TLB entry remains valid arm64: Work around convergence issue with LLD linker arm64: Disable LLD linker ASSERT()s for the time being	2025-06-05 11:39:17 -07:00
Marc Zyngier	b5fa1f91e1	KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand Now that we don't have any use of __vcpu_sys_reg() as a lvalue, remove the in-place update, and directly return the sanitised value. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-06-05 14:18:01 +01:00
Marc Zyngier	8800b7c4bb	KVM: arm64: Add RMW specific sysreg accessor In a number of cases, we perform a Read-Modify-Write operation on a system register, meaning that we would apply the RESx masks twice. Instead, provide a new accessor that performs this RMW operation, allowing the masks to be applied exactly once per operation. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-06-05 14:18:01 +01:00
Marc Zyngier	6678791ee3	KVM: arm64: Add assignment-specific sysreg accessor Assigning a value to a system register doesn't do what it is supposed to be doing if that register is one that has RESx bits. The main problem is that we use __vcpu_sys_reg(), which can be used both as a lvalue and rvalue. When used as a lvalue, the bit masking occurs before the new value is assigned, meaning that we (1) do pointless work on the old cvalue, and (2) potentially assign an invalid value as we fail to apply the masks to it. Fix this by providing a new __vcpu_assign_sys_reg() that does what it says on the tin, and sanitises the new value instead of the old one. This comes with a significant amount of churn. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-06-05 14:17:32 +01:00
Linus Torvalds	7f9039c524	Generic: * Clean up locking of all vCPUs for a VM by using the _nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra. Add MGLRU support to the access tracking perf test. ARM fixes: * Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change. * Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated. * Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum. * Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping. * Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet. s390: * Fix interaction between some filesystems and Secure Execution * Some cleanups and refactorings, preparing for an upcoming big series x86: * Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN. * Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM. * Refine and harden handling of spurious faults. * Add support for ALLOWED_SEV_FEATURES. * Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y. * Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits. * Don't account temporary allocations in sev_send_update_data(). * Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold. * Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX. * Advertise support to userspace for WRMSRNS and PREFETCHI. * Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing. * Add a module param to control and enumerate support for device posted interrupts. * Fix a potential overflow with nested virt on Intel systems running 32-bit kernels. * Flush shadow VMCSes on emergency reboot. * Add support for SNP to the various SEV selftests. * Add a selftest to verify fastops instructions via forced emulation. * Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg9TjwUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOUxQf7B7nnWqIKd7jSkGzSD6YsSX9TXktr 2tJIOfWM3zNYg5GRCidg+m4Y5+DqQWd3Hi5hH2P9wUw7RNuOjOFsDe+y0VBr8ysE ve39t/yp+mYalNmHVFl8s3dBDgrIeGKiz+Wgw3zCQIBZ18rJE1dREhv37RlYZ3a2 wSvuObe8sVpCTyKIowDs1xUi7qJUBoopMSuqfleSHawRrcgCpV99U8/KNFF5plLH 7fXOBAHHniVCVc+mqQN2wxtVJDhST+U3TaU4GwlKy9Yevr+iibdOXffveeIgNEU4 D6q1F2zKp6UdV3+p8hxyaTTbiCVDqsp9WOgY/0I/f+CddYn0WVZgOlR+ow== =mYFL -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more kvm updates from Paolo Bonzini: Generic: - Clean up locking of all vCPUs for a VM by using the _nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra - Add MGLRU support to the access tracking perf test ARM fixes: - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet s390: - Fix interaction between some filesystems and Secure Execution - Some cleanups and refactorings, preparing for an upcoming big series x86: - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM - Refine and harden handling of spurious faults - Add support for ALLOWED_SEV_FEATURES - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits - Don't account temporary allocations in sev_send_update_data() - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX - Advertise support to userspace for WRMSRNS and PREFETCHI - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing - Add a module param to control and enumerate support for device posted interrupts - Fix a potential overflow with nested virt on Intel systems running 32-bit kernels - Flush shadow VMCSes on emergency reboot - Add support for SNP to the various SEV selftests - Add a selftest to verify fastops instructions via forced emulation - Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler" tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) rtmutex_api: provide correct extern functions KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race KVM: arm64: Unmap vLPIs affected by changes to GSI routing information KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock KVM: arm64: Use lock guard in vgic_v4_set_forwarding() KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue KVM: s390: Simplify and move pv code KVM: s390: Refactor and split some gmap helpers KVM: s390: Remove unneeded srcu lock s390: Remove unneeded includes s390/uv: Improve splitting of large folios that cannot be split while dirty s390/uv: Always return 0 from s390_wiggle_split_folio() if successful s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful rust: add helper for mutex_trylock RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation ...	2025-06-02 12:24:58 -07:00
Xi Ruoyao	10f885d63a	arm64: Add override for MPAM As the message of the commit `09e6b306f3` ("arm64: cpufeature: discover CPU support for MPAM") already states, if a buggy firmware fails to either enable MPAM or emulate the trap as if it were disabled, the kernel will just fail to boot. While upgrading the firmware should be the best solution, we have some hardware of which the vendor have made no response 2 months after we requested a firmware update. Allow overriding it so our devices don't become some e-waste. Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: Mingcong Bai <jeffbai@aosc.io> Cc: Shaopeng Tan <tan.shaopeng@fujitsu.com> Cc: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250602043723.216338-1-xry111@xry111.site Signed-off-by: Will Deacon <will@kernel.org>	2025-06-02 13:49:09 +01:00
Ryan Roberts	4b63491838	arm64/mm: Close theoretical race where stale TLB entry remains valid Commit `3ea277194d` ("mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries") describes a race that, prior to the commit, could occur between reclaim and operations such as mprotect() when using reclaim's tlbbatch mechanism. See that commit for details but the summary is: """ Nadav Amit identified a theoritical race between page reclaim and mprotect due to TLB flushes being batched outside of the PTL being held. He described the race as follows: CPU0 CPU1 ---- ---- user accesses memory using RW PTE [PTE now cached in TLB] try_to_unmap_one() ==> ptep_get_and_clear() ==> set_tlb_ubc_flush_pending() mprotect(addr, PROT_READ) ==> change_pte_range() ==> [ PTE non-present - no flush ] user writes using cached RW PTE ... try_to_unmap_flush() """ The solution was to insert flush_tlb_batched_pending() in mprotect() and friends to explcitly drain any pending reclaim TLB flushes. In the modern version of this solution, arch_flush_tlb_batched_pending() is called to do that synchronisation. arm64's tlbbatch implementation simply issues TLBIs at queue-time (arch_tlbbatch_add_pending()), eliding the trailing dsb(ish). The trailing dsb(ish) is finally issued in arch_tlbbatch_flush() at the end of the batch to wait for all the issued TLBIs to complete. Now, the Arm ARM states: """ The completion of the TLB maintenance instruction is guaranteed only by the execution of a DSB by the observer that performed the TLB maintenance instruction. The execution of a DSB by a different observer does not have this effect, even if the DSB is known to be executed after the TLB maintenance instruction is observed by that different observer. """ arch_tlbbatch_add_pending() and arch_tlbbatch_flush() conform to this requirement because they are called from the same task (either kswapd or caller of madvise(MADV_PAGEOUT)), so either they are on the same CPU or if the task was migrated, __switch_to() contains an extra dsb(ish). HOWEVER, arm64's arch_flush_tlb_batched_pending() is also implemented as a dsb(ish). But this may be running on a CPU remote from the one that issued the outstanding TLBIs. So there is no architectural gurantee of synchonization. Therefore we are still vulnerable to the theoretical race described in Commit `3ea277194d` ("mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries"). Fix this by flushing the entire mm in arch_flush_tlb_batched_pending(). This aligns with what the other arches that implement the tlbbatch feature do. Cc: <stable@vger.kernel.org> Fixes: `43b3dfdd04` ("arm64: support batched/deferred tlb shootdown during page reclamation/migration") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250530152445.2430295-1-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-06-02 13:02:14 +01:00
Ard Biesheuvel	dc0a083948	arm64: Work around convergence issue with LLD linker LLD will occasionally error out with a '__init_end does not converge' error if INIT_IDMAP_DIR_SIZE is defined in terms of _end, as this results in a circular dependency. Counter this by dimensioning the initial IDMAP page tables based on a new boundary marker 'kimage_limit', and define it such that its value should not change as a result of the initdata segment being pushed over a 64k segment boundary due to changes in INIT_IDMAP_DIR_SIZE, provided that its value doesn't change by more than 2M between linker passes. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250531123005.3866382-2-ardb+git@google.com Signed-off-by: Will Deacon <will@kernel.org>	2025-06-02 12:53:18 +01:00
Paolo Bonzini	61374cc145	KVM/arm64 fixes for 6.16, take #1 - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change. - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated. - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum. - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping. - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmg5fbgACgkQI9DQutE9 ekNMyhAAq99rswsyqUrDwTd4qwJBtRYj4FMkBD8rH1oPQcXkqPGVllOMOgAnJhtd mU1/XpU8gk49nsx5kwdg1RORQG4lKCZATH1F2MhInDWMMSSWUTWBu0+qG62GFQWF Db+aXcvyebHdfzunDmT1/NebhV9VGCnkUtbQrFBvBbqOt1xNCRhKNvv5RWNEugTW R2DIg4gCk2x1/iZceP8dDRDpnvFk8txtzZsWE6g1DZRw1mmlrz01rRgD7EJtkyaQ c74mqpc/A2GuJv2mwpz3GCXTLiNgAuNq1X9+yeEqYM+VVvuyGy906BTFXxgvYl6l o4OvE1F6QOtcJIokPb5kBlpxz76J2bNxKuUrY7rxM73Of0ZgGMsSltQxdcoGhJZ7 cyfUWFVQ+H5Gx+BVs/+5u2XdtX7OC0U50eXP0XrzviunTkEUhvRaH1WfTh9Zike/ U150d0oU17MeVZ9HlOWyhXV/nId6CPX5xo7BUjIq2e1Oy4mYkIYYbowdrV5t6NiJ FHwzXNa/T0prZDSPW4t1GL/8o8hcnd4KRGLTJDWBYk58J6S1hHG0QElAz5RvqUj+ nf8HSycZMBrpdXtIK6nu/GRk9Ng+6gT4KHomwjgIDj6QVn7DTFFId911j+HRmoCP DiV4QD4yVRSMBDX5cdQj2XBk2raMTOxdN9A1y6T2H3b+iLavVyo= =JjTn -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.16-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.16, take #1 - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change. - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated. - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum. - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping. - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet.	2025-06-02 03:05:29 -04:00
Linus Torvalds	00c010e130	- The 11 patch series "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - The 8 patch series "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - The 3 patch series "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - The 2 patch series "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - The 8 patch series "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - The 6 patch series "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - The 3 patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - The 2 patch series "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - The 3 patch series "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - The 3 patch series "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - The 12 patch series "Always call constructor for kernel page tables" from Kevin Brodsky "ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables". This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - The 9 patch series "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - The 3 patch series "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - The 4 patch series "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - The 6 patch series "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - The 3 patch series "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - The 3 patch series ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - The 7 patch series "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - The 5 patch series "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - The 2 patch series "vmscan: enforce mems_effective during demotion" from Gregory Price "changes reclaim to respect cpuset.mems_effective during demotion when possible". because "presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated." "This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently." - The 2 patch series ""Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - The 3 patch series "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - The 4 patch series "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - The 17 patch series "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - The 7 patch series "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - The 2 patch series "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - The 2 patch series "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - The 4 patch series "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - The 6 patch series "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - The 2 patch series "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - The 8 patch series "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - The 3 patch series "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - The 4 patch series "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - The 6 patch series "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is "yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents." - The 7 patch series "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - The 4 patch series "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDt5qgAKCRDdBJ7gKXxA ju6XAP9nTiSfRz8Cz1n5LJZpFKEGzLpSihCYyR6P3o1L9oe3mwEAlZ5+XAwk2I5x Qqb/UGMEpilyre1PayQqOnct3aSL9Ao= =tYYm -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...	2025-05-31 15:44:16 -07:00
Linus Torvalds	dee264c16a	require gcc-8 and binutils-2.30 x86 already uses gcc-8 as the minimum version, this changes all other architectures to the same version. gcc-8 is used is Debian 10 and Red Hat Enterprise Linux 8, both of which are still supported, and binutils 2.30 is the oldest corresponding version on those. Ubuntu Pro 18.04 and SUSE Linux Enterprise Server 15 both use gcc-7 as the system compiler but additionally include toolchains that remain supported. With the new minimum toolchain versions, a number of workarounds for older versions can be dropped, in particular on x86_64 and arm64. Importantly, the updated compiler version allows removing two of the five remaining gcc plugins, as support for sancov and structeak features is already included in modern compiler versions. I tried collecting the known changes that are possible based on the new toolchain version, but expect that more cleanups will be possible. Since this touches multiple architectures, I merged the patches through the asm-generic tree. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmg6vNMACgkQmmx57+YA GNkOmg/+LtR9B2P27GPBeG8HnLTZ8hKELiyYeSk6ZFgQv5hevE37HV35Yru7e7gu wcF6CgYr8ff4CVcHM7y0790oGew1thkqq5CklFIH0EwCDJx/mWfZR1SS2jfZIEWM HSDOlQQd1S8oWia14tSnQos3nW3CB9/ABVTHH+Wvl3xn48WMRvgK2LJgGLuxJrt8 5DD9auHiLjchWB5tB4DU98IgWWgFUGMTsI6IayZ4dkF4CdWqd89h0Y3pjJYeBgHS mPxzR2q8WjEmG9hp7QuZQgn/pAYleJAwHvvkoLrkQ2ieqx3FjWiwFbQp4CG1Sc8L eBR1lnkqS2z/e7xJLfe86fOoKWWu4I0tZKhRan/0+UOGm5nXrGpqSxKS8ZDsRuAp 3fvyhIp1cYSa7Xkok8BFhLEFR0tguXJXnXBc3tWE5VXIfFNd0Ohh1GUYhXDAqWKh i0jN9dSNhokM3AqBi6qZl5kmBnRA3UsIaOg3QRrqN8IlBPp+u7i5xsrJIUWvD95o TO06admmLcCJT8n6ZfNVfRjBgzu8+t54UVaDx9YYwxoNGOSFwqOb8CSPTWPxLmDr RKDUOvO8DBlP7uFz9neP+LxluA3DjurRZvb0z0AmCZ8/RXEmTMCyfP5a6esxquXt 0Bqo6hM9q+TeXTHNS1CNvqLSWWikw+AzS/ZPPvriYFn5lxtbq6c= =pdDC -----END PGP SIGNATURE----- Merge tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull compiler version requirement update from Arnd Bergmann: "Require gcc-8 and binutils-2.30 x86 already uses gcc-8 as the minimum version, this changes all other architectures to the same version. gcc-8 is used is Debian 10 and Red Hat Enterprise Linux 8, both of which are still supported, and binutils 2.30 is the oldest corresponding version on those. Ubuntu Pro 18.04 and SUSE Linux Enterprise Server 15 both use gcc-7 as the system compiler but additionally include toolchains that remain supported. With the new minimum toolchain versions, a number of workarounds for older versions can be dropped, in particular on x86_64 and arm64. Importantly, the updated compiler version allows removing two of the five remaining gcc plugins, as support for sancov and structeak features is already included in modern compiler versions. I tried collecting the known changes that are possible based on the new toolchain version, but expect that more cleanups will be possible. Since this touches multiple architectures, I merged the patches through the asm-generic tree." * tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: Makefile.kcov: apply needed compiler option unconditionally in CFLAGS_KCOV Documentation: update binutils-2.30 version reference gcc-plugins: remove SANCOV gcc plugin Kbuild: remove structleak gcc plugin arm64: drop binutils version checks raid6: skip avx512 checks kbuild: require gcc-8 and binutils-2.30	2025-05-31 08:16:52 -07:00
Marc Zyngier	94d889713d	arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue Broonie reports that `fed55f49fa` ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23") breaks one of the vdso selftests (vdso_test_chacha) as it indirectly drags asm/sysreg.h. It is rather unfortunate (and worrying) that userspace gets built with non-UAPI headers. In any case, paper over the issue by dragging linux/kconfig.h in asm/sysreg.h. It is the right thing to do, at least from the kernel perspective. Reported-by: Mark Brown <broonie@kernel.org> Fixes: `fed55f49fa` ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23") Link: https://lore.kernel.org/r/aDCDGZ-G-nCP3hJI@finisterre.sirena.org.uk Cc: D Scott Phillips <scott@os.amperecomputing.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250523170208.530818-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-30 09:09:01 +01:00
Linus Torvalds	43db111107	ARM: * Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. * Enable nested virtualisation support on systems that support it, though it is disabled by default. * Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. * Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. * Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. * New selftest checking the pKVM ownership transition rules * Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. * Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. * Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. * Add a new selftest for the SVE host state being corrupted by a guest. * Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. * Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. * Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. * and the usual random cleanups. LoongArch: * Don't flush tlb if the host supports hardware page table walks. * Add KVM selftests support. RISC-V: * Add vector registers to get-reg-list selftest * VCPU reset related improvements * Remove scounteren initialization from VCPU reset * Support VCPU reset from userspace using set_mpstate() ioctl x86: * Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit `7bcf7246c4` ("Merge branch 'kvm-tdx-finish-initial' into HEAD"). -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg02hwUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNnkwf/db4xeWKSMseCIvBVR+ObDn3LXhwT hAgmTkDkP1zq9RfbfJSbUA1DXRwfP+f1sWySLMWECkFEQW9fGIJF9fOQRDSXKmhX 158U3+FEt+3jxLRCGFd4zyXAqyY3C8JSkPUyJZxCpUbXtB5tdDNac4rZAXKDULwe sUi0OW/kFDM2yt369pBGQAGdN+75/oOrYISGOSvMXHxjccNqvveX8MUhpBjYIuuj 73iBWmsfv3vCtam56Racz3C3v44ie498PmWFtnB0R+CVfWfrnUAaRiGWx+egLiBW dBPDiZywMn++prmphEUFgaStDTQy23JBLJ8+RvHkp+o5GaTISKJB3nedZQ== =adZU -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "As far as x86 goes this pull request "only" includes TDX host support. Quotes are appropriate because (at 6k lines and 100+ commits) it is much bigger than the rest, which will come later this week and consists mostly of bugfixes and selftests. s390 changes will also come in the second batch. ARM: - Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. - Enable nested virtualisation support on systems that support it, though it is disabled by default. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. LoongArch: - Don't flush tlb if the host supports hardware page table walks. - Add KVM selftests support. RISC-V: - Add vector registers to get-reg-list selftest - VCPU reset related improvements - Remove scounteren initialization from VCPU reset - Support VCPU reset from userspace using set_mpstate() ioctl x86: - Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit `7bcf7246c4` ('Merge branch 'kvm-tdx-finish-initial' into HEAD')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits) x86/tdx: mark tdh_vp_enter() as __flatten Documentation: virt/kvm: remove unreferenced footnote RISC-V: KVM: lock the correct mp_state during reset KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET RISC-V: KVM: Remove scounteren initialization KVM: RISC-V: remove unnecessary SBI reset state ...	2025-05-29 08:10:01 -07:00
Linus Torvalds	47cf96fbe3	arm64 updates for 6.16 ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree. - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls. - Fix a node refcount imbalance in the PSCI device-tree code. CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4. - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM. - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code. Entry code: - Hook up TIF_NEED_RESCHED_LAZY so that CONFIG_PREEMPT_LAZY can be selected. Memory management: - Prevent BSS exports from being used by the early PI code. - Propagate level and stride information to the low-level TLB invalidation routines when operating on hugetlb entries. - Use the page-table contiguous hint for vmap() mappings with VM_ALLOW_HUGE_VMAP where possible. - Optimise vmalloc()/vmap() page-table updates to use "lazy MMU mode" and hook this up on arm64 so that the trailing DSB (used to publish the updates to the hardware walker) can be deferred until the end of the mapping operation. - Extend mmap() randomisation for 52-bit virtual addresses (on par with 48-bit addressing) and remove limited support for randomisation of the linear map. Perf and PMUs: - Add support for probing the CMN-S3 driver using ACPI. - Minor driver fixes to the CMN, Arm-NI and amlogic PMU drivers. Selftests: - Fix FPSIMD and SME tests to align with the freshly re-enabled SME support. - Fix default setting of the OUTPUT variable so that tests are installed in the right location. vDSO: - Replace raw counter access from inline assembly code with a call to the the __arch_counter_get_cntvct() helper function. Miscellaneous: - Add some missing header inclusions to the CCA headers. - Rework rendering of /proc/cpuinfo to follow the x86-approach and avoid repeated buffer expansion (the user-visible format remains identical). - Remove redundant selection of CONFIG_CRC32 - Extend early error message when failing to map the device-tree blob. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmg1uTgQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNFv2CAC9S5OW0btOAo7V/LFBpLhJM3hdIV6Sn6N1 d/K5znuqPBG6VPfBrshaZltEl/C3U8KG4H8xrlX5cSo7CRuf3DgVBw3kiZ6ERZj6 1gnKR54juA1oWhcroPl0s76ETWj3N4gO036u2qOhWNAYflDunh1+bCIGJkG4H/yP wqtWn974YUbad/zQJSbG3IMO1yvxZ/PsNpVF8HjyQ0/ZPWsYTscrhNQ0hWro17sR CTcUaGxH4GrXW24EGNgkLB9aq67X2rtGGtaIlp5oFl8FuLklc7TYbPwJp8cPCTNm 0Sp0mpuR9M675pYIKoCI9m5twc46znRIKmbXi5LvPd77418y3jTf =03N4 -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The headline feature is the re-enablement of support for Arm's Scalable Matrix Extension (SME) thanks to a bumper crop of fixes from Mark Rutland. If matrices aren't your thing, then Ryan's page-table optimisation work is much more interesting. Summary: ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code Entry code: - Hook up TIF_NEED_RESCHED_LAZY so that CONFIG_PREEMPT_LAZY can be selected Memory management: - Prevent BSS exports from being used by the early PI code - Propagate level and stride information to the low-level TLB invalidation routines when operating on hugetlb entries - Use the page-table contiguous hint for vmap() mappings with VM_ALLOW_HUGE_VMAP where possible - Optimise vmalloc()/vmap() page-table updates to use "lazy MMU mode" and hook this up on arm64 so that the trailing DSB (used to publish the updates to the hardware walker) can be deferred until the end of the mapping operation - Extend mmap() randomisation for 52-bit virtual addresses (on par with 48-bit addressing) and remove limited support for randomisation of the linear map Perf and PMUs: - Add support for probing the CMN-S3 driver using ACPI - Minor driver fixes to the CMN, Arm-NI and amlogic PMU drivers Selftests: - Fix FPSIMD and SME tests to align with the freshly re-enabled SME support - Fix default setting of the OUTPUT variable so that tests are installed in the right location vDSO: - Replace raw counter access from inline assembly code with a call to the the __arch_counter_get_cntvct() helper function Miscellaneous: - Add some missing header inclusions to the CCA headers - Rework rendering of /proc/cpuinfo to follow the x86-approach and avoid repeated buffer expansion (the user-visible format remains identical) - Remove redundant selection of CONFIG_CRC32 - Extend early error message when failing to map the device-tree blob" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits) arm64: cputype: Add cputype definition for HIP12 arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again perf/arm-cmn: Add CMN S3 ACPI binding arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace perf/arm-cmn: Initialise cmn->cpu earlier kselftest/arm64: Set default OUTPUT path when undefined arm64: Update comment regarding values in __boot_cpu_mode arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/cpuinfo: only show one cpu's info in c_show() arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop ...	2025-05-28 14:55:35 -07:00
Maxim Levitsky	b586c5d219	KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs Use kvm_trylock_all_vcpus instead of a custom implementation when locking all vCPUs of a VM, to avoid triggering a lockdep warning, in the case in which the VM is configured to have more than MAX_LOCK_DEPTH vCPUs. This fixes the following false lockdep warning: [ 328.171264] BUG: MAX_LOCK_DEPTH too low! [ 328.175227] turning off the locking correctness validator. [ 328.180726] Please attach the output of /proc/lock_stat to the bug report [ 328.187531] depth: 48 max: 48! [ 328.190678] 48 locks held by qemu-kvm/11664: [ 328.194957] #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0 [ 328.204048] #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.212521] #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.220991] #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.229463] #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.237934] #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.246405] #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-05-27 12:16:41 -04:00
Will Deacon	217e3cbba3	Merge branch 'for-next/vdso' into for-next/core * for-next/vdso: arm64: vdso: Use __arch_counter_get_cntvct()	2025-05-27 12:26:54 +01:00
Will Deacon	53a087046a	Merge branch 'for-next/sme-fixes' into for-next/core * for-next/sme-fixes: (35 commits) arm64/fpsimd: Allow CONFIG_ARM64_SME to be selected arm64/fpsimd: ptrace: Gracefully handle errors arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state arm64/fpsimd: ptrace: Do not present register data for inactive mode arm64/fpsimd: ptrace: Save task state before generating SVE header arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale data arm64/fpsimd: Make clone() compatible with ZA lazy saving arm64/fpsimd: Clear PSTATE.SM during clone() arm64/fpsimd: Consistently preserve FPSIMD state during clone() arm64/fpsimd: Remove redundant task->mm check arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return() arm64/fpsimd: Add task_smstop_sm() arm64/fpsimd: Factor out {sve,sme}_state_size() helpers arm64/fpsimd: Clarify sve_sync_*() functions arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE arm64/fpsimd: signal: Consistently read FPSIMD context arm64/fpsimd: signal: Mandate SVE payload for streaming-mode state arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame only arm64/fpsimd: Do not discard modified SVE state ...	2025-05-27 12:26:43 +01:00
Will Deacon	c73497194a	Merge branch 'for-next/mm' into for-next/core * for-next/mm: arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop arm64: hugetlb: Use __set_ptes_anysz() and __ptep_get_and_clear_anysz() arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() mm/page_table_check: Batch-check pmds/puds just like ptes arm64: hugetlb: Refine tlb maintenance scope arm64: hugetlb: Cleanup huge_pte size discovery mechanisms arm64: pageattr: Explicitly bail out when changing permissions for vmalloc_huge mappings arm64: Support ARM64_VA_BITS=52 when setting ARCH_MMAP_RND_BITS_MAX arm64/mm: Remove randomization of the linear map	2025-05-27 12:26:06 +01:00
Will Deacon	9d27622f7d	Merge branch 'for-next/misc' into for-next/core * for-next/misc: arm64/cpuinfo: only show one cpu's info in c_show() arm64: Extend pr_crit message on invalid FDT arm64: Kconfig: remove unnecessary selection of CRC32 arm64: Add missing includes for mem_encrypt	2025-05-27 12:25:58 +01:00
Will Deacon	48055fb882	Merge branch 'for-next/entry' into for-next/core * for-next/entry: arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again arm64: Update comment regarding values in __boot_cpu_mode arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64: enable PREEMPT_LAZY	2025-05-27 12:24:12 +01:00
Paolo Bonzini	4d526b02df	KVM/arm64 updates for 6.16 * New features: - Add large stage-2 mapping support for non-protected pKVM guests, clawing back some performance. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Enable nested virtualisation support on systems that support it (yes, it has been a long time coming), though it is disabled by default. * Improvements, fixes and cleanups: - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. This ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmgwU7UACgkQI9DQutE9 ekN93g//fNnejxf01dBFIbuylzYEyHZSEH0iTGLeM+ES9zvntCzciTYVzb27oqNG RDLShlQYp3w4rAe6ORzyePyHptOmKXCxfj/VXUFp3A7H9QYOxt1nacD3WxI9fCOo LzaSLquvgwFBaeTdDE0KdeTUKQHluId+w1Azh0lnHGeUP+lOHNZ8FqoP1/la0q04 GvVL+l3wz/IhPP8r1YA0Q1bzJ5SLfSpjIw/0F5H/xgI4lyYdHzgFL8sKuSyFeCyM 2STQi+ZnTCsAs4bkXkw2Pp9CFYrfQgZi+sf7Om+noAKhbJo3vb7/RHpgjv+QCjJy Kx4g9CbxHfaM03cH6uSLBoFzsACR1iAuUz8BCSRvvVNH4RVT6H+34nzjLZXLncrP gm1uYs9aMTLr91caeAx0aYIMWGYa1uqV0rum3WxyIHezN9Q/NuQoZyfprUufr8oX wCYE+ot4VT3DwG0UFZKKwj0BiCbYcbph9nBLVyZJsg8OKxpvspkCtPriFp1kb6BP dTTGSXd9JJqwSgP9qJLxijcv6Nfgp2gT42TWwh/dJRZXhnTCvr9IyclFIhoIIq3G Q2BkFCXOoEoNQhBA1tiWzJ9nDHf52P72Z2K1gPyyMZwF49HGa2BZBCJGkqX06wSs Riolf1/cjFhDno1ThiHKsHT0sG1D4oc9k/1NLq5dyNAEGcgATIA= =Jju3 -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.16 * New features: - Add large stage-2 mapping support for non-protected pKVM guests, clawing back some performance. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Enable nested virtualisation support on systems that support it (yes, it has been a long time coming), though it is disabled by default. * Improvements, fixes and cleanups: - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. This ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups.	2025-05-26 16:19:46 -04:00
Marc Zyngier	1b85d923ba	Merge branch kvm-arm64/misc-6.16 into kvmarm-master/next * kvm-arm64/misc-6.16: : . : Misc changes and improvements for 6.16: : : - Add a new selftest for the SVE host state being corrupted by a guest : : - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, : ensuring that the window for interrupts is slightly bigger, and avoiding : a pretty bad erratum on the AmpereOne HW : : - Replace a couple of open-coded on/off strings with str_on_off() : : - Get rid of the pKVM memblock sorting, which now appears to be superflous : : - Drop superflous clearing of ICH_LR_EOI in the LR when nesting : : - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from : a pretty bad case of TLB corruption unless accesses to HCR_EL2 are : heavily synchronised : : - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables : in a human-friendly fashion : . KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables arm64: errata: Work around AmpereOne's erratum AC04_CPU_23 KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1 KVM: arm64: Drop sort_memblock_regions() KVM: arm64: selftests: Add test for SVE host corruption KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Replace ternary flags with str_on_off() helper Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-23 10:59:43 +01:00
Marc Zyngier	7f3225fe8b	Merge branch kvm-arm64/nv-nv into kvmarm-master/next * kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-23 10:58:57 +01:00
Marc Zyngier	fef3acf5ae	Merge branch kvm-arm64/fgt-masks into kvmarm-master/next * kvm-arm64/fgt-masks: (43 commits) : . : Large rework of the way KVM deals with trap bits in conjunction with : the CPU feature registers. It now draws a direct link between which : the feature set, the system registers that need to UNDEF to match : the configuration and bits that need to behave as RES0 or RES1 in : the trap registers that are visible to the guest. : : Best of all, these definitions are mostly automatically generated : from the JSON description published by ARM under a permissive : license. : . KVM: arm64: Handle TSB CSYNC traps KVM: arm64: Add FGT descriptors for FEAT_FGT2 KVM: arm64: Allow sysreg ranges for FGT descriptors KVM: arm64: Add context-switch for FEAT_FGT2 registers KVM: arm64: Add trap routing for FEAT_FGT2 registers KVM: arm64: Add sanitisation for FEAT_FGT2 registers KVM: arm64: Add FEAT_FGT2 registers to the VNCR page KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits KVM: arm64: Allow kvm_has_feat() to take variable arguments KVM: arm64: Use FGT feature maps to drive RES0 bits KVM: arm64: Validate FGT register descriptions against RES0 masks KVM: arm64: Switch to table-driven FGU configuration KVM: arm64: Handle PSB CSYNC traps KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask KVM: arm64: Remove hand-crafted masks for FGT registers KVM: arm64: Use computed FGT masks to setup FGT registers KVM: arm64: Propagate FGT masks to the nVHE hypervisor KVM: arm64: Unconditionally configure fine-grain traps KVM: arm64: Use computed masks as sanitisers for FGT registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-23 10:58:15 +01:00
Marc Zyngier	cb86616c39	Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/next * kvm-arm64/ubsan-el2: : . : Add UBSAN support to the EL2 portion of KVM, reusing most of the : existing logic provided by CONFIG_IBSAN_TRAP. : : Patches courtesy of Mostafa Saleh. : . KVM: arm64: Handle UBSAN faults KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2 ubsan: Remove regs from report_ubsan_failure() arm64: Introduce esr_is_ubsan_brk() Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-23 10:57:32 +01:00
Marc Zyngier	a90e001754	Merge branch kvm-arm64/pkvm-np-thp-6.16 into kvmarm-master/next * kvm-arm64/pkvm-np-thp-6.16: (21 commits) : . : Large mapping support for non-protected pKVM guests, courtesy of : Vincent Donnefort. From the cover letter: : : "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM : np-guests, that is installing PMD-level mappings in the stage-2, : whenever the stage-1 is backed by either Hugetlbfs or THPs." : . KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section KVM: arm64: Unconditionally cross check hyp state KVM: arm64: Defer EL2 stage-1 mapping on share KVM: arm64: Move hyp state to hyp_vmemmap KVM: arm64: Introduce {get,set}_host_state() helpers KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE KVM: arm64: Fix pKVM page-tracking comments ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-23 10:56:25 +01:00
Yicong Yang	226ff35039	arm64: cputype: Add cputype definition for HIP12 Add MIDR encoding for HiSilicon HIP12 which is used on HiSilicon HIP12 SoCs. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250425033845.57671-2-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-22 11:22:18 +01:00
Rob Herring (Arm)	8083499715	arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again Commit `5b39db6037` ("arm64: el2_setup.h: Rename some labels to be more diff-friendly") reworked the labels in __init_el2_fgt to say what's skipped rather than what the target location is. The exception was "set_fgt_" which is where registers are written. In reviewing the BRBE additions, Will suggested "set_debug_fgt_" where HDFGxTR_EL2 are written. Doing that would partially revert commit `5b39db6037` undoing the goal of minimizing additions here, but it would follow the convention for labels where registers are written. So let's do both. Branches that skip something go to a "skip" label and places that set registers have a "set" label. This results in some double labels, but it makes things entirely consistent. While we're here, the SME skip label was incorrectly named, so fix it. Reported-by: Will Deacon <will@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250520-arm-brbe-v19-v22-2-c1ddde38e7f8@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2025-05-22 11:20:59 +01:00
Vincent Donnefort	c353fde17d	KVM: arm64: np-guest CMOs with PMD_SIZE fixmap With the introduction of stage-2 huge mappings in the pKVM hypervisor, guest pages CMO is needed for PMD_SIZE size. Fixmap only supports PAGE_SIZE and iterating over the huge-page is time consuming (mostly due to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency. Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap) to improve guest page CMOs when stage-2 huge mappings are installed. On a Pixel6, the iterative solution resulted in a latency of ~700us, while the PMD_SIZE fixmap reduces it to ~100us. Because of the horrendous private range allocation that would be necessary, this is disabled for 64KiB pages systems. Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 14:33:51 +01:00
Quentin Perret	3669ddd8fa	KVM: arm64: Add a range to pkvm_mappings In preparation for supporting stage-2 huge mappings for np-guest, add a nr_pages member for pkvm_mappings to allow EL1 to track the size of the stage-2 mapping. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-9-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 14:33:51 +01:00
Quentin Perret	b38c9775f7	KVM: arm64: Convert pkvm_mappings to interval tree In preparation for supporting stage-2 huge mappings for np-guest, let's convert pgt.pkvm_mappings to an interval tree. No functional change intended. Suggested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-8-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 14:33:51 +01:00
Marc Zyngier	d5702dd224	Merge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16 * kvm-arm64/pkvm-selftest-6.16: : . : pKVM selftests covering the memory ownership transitions by : Quentin Perret. From the initial cover letter: : : "We have recently found a bug [1] in the pKVM memory ownership : transitions by code inspection, but it could have been caught with a : test. : : Introduce a boot-time selftest exercising all the known pKVM memory : transitions and importantly checks the rejection of illegal transitions. : : The new test is hidden behind a new Kconfig option separate from : CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the : transition checks ([1] doesn't reproduce with EL2 debug enabled). : : [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/" : . KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 14:33:43 +01:00
D Scott Phillips	fed55f49fa	arm64: errata: Work around AmpereOne's erratum AC04_CPU_23 On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous translations for data addresses initiated by load/store instructions. Only instruction initiated translations are vulnerable, not translations from prefetches for example. A DSB before the store to HCR_EL2 is sufficient to prevent older instructions from hitting the window for corruption, and an ISB after is sufficient to prevent younger instructions from hitting the window for corruption. Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 12:46:26 +01:00
Marc Zyngier	98dbe56a01	KVM: arm64: Handle TSB CSYNC traps The architecture introduces a trap for TSB CSYNC that fits in the same EC as LS64 and PSB CSYNC. Let's deal with it in a similar way. It's not that we expect this to be useful any time soon anyway. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:36:21 +01:00
Marc Zyngier	4bc0fe0898	KVM: arm64: Add sanitisation for FEAT_FGT2 registers Just like the FEAT_FGT registers, treat the FGT2 variant the same way. THis is a large update, but a fairly mechanical one. The config dependencies are extracted from the 2025-03 JSON drop. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:36:10 +01:00
Marc Zyngier	df56f1ccb0	KVM: arm64: Add FEAT_FGT2 registers to the VNCR page The FEAT_FGT2 registers are part of the VNCR page. Describe the corresponding offsets and add them to the vcpu sysreg enumeration. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:35:30 +01:00
Marc Zyngier	a764b56bf9	KVM: arm64: Allow kvm_has_feat() to take variable arguments In order to be able to write more compact (and easier to read) code, let kvm_has_feat() and co take variable arguments. This enables constructs such as: #define FEAT_SME ID_AA64PFR1_EL1, SME, IMP if (kvm_has_feat(kvm, FEAT_SME)) [...] which is admitedly more readable. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:35:30 +01:00
Marc Zyngier	c6cbe6a4c1	KVM: arm64: Use FGT feature maps to drive RES0 bits Another benefit of mapping bits to features is that it becomes trivial to define which bits should be handled as RES0. Let's apply this principle to the guest's view of the FGT registers. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:35:00 +01:00
Marc Zyngier	a7484c80e5	KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* Since we're (almost) feature complete, let's allow userspace to request KVM_ARM_VCPU_EL2* by bumping KVM_VCPU_MAX_FEATURES up. We also now advertise the features to userspace with new capabilities. It's going to be great... Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250514103501.2225951-17-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	4ffa72ad8f	KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR and potentially knock the fixmap mapping. Even worse, that TLBI must be able to work cross-vcpu. For that, we track on a per-VM basis if any VNCR is mapped, using an atomic counter. Whenever a TLBI S1E2 occurs and that this counter is non-zero, we take the long road all the way back to the core code. There, we iterate over all vcpus and check whether this particular invalidation has any damaging effect. If it does, we nuke the pseudo TLB and the corresponding fixmap. Yes, this is costly. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	2a359e0725	KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 Now that we can handle faults triggered through VNCR_EL2, we need to map the corresponding page at EL2. But where, you'll ask? Since each CPU in the system can run a vcpu, we need a per-CPU mapping. For that, we carve a NR_CPUS range in the fixmap, giving us a per-CPU va at which to map the guest's VNCR's page. The mapping occurs both on vcpu load and on the back of a fault, both generating a request that will take care of the mapping. That mapping will also get dropped on vcpu put. Yes, this is a bit heavy handed, but it is simple. Eventually, we may want to have a per-VM, per-CPU mapping, which would avoid all the TLBI overhead. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	069a05e535	KVM: arm64: nv: Handle VNCR_EL2-triggered faults As VNCR_EL2.BADDR contains a VA, it is bound to trigger faults. These faults can have multiple source: - We haven't mapped anything on the host: we need to compute the resulting translation, populate a TLB, and eventually map the corresponding page - The permissions are out of whack: we need to tell the guest about this state of affairs Note that the kernel doesn't support S1POE for itself yet, so the particular case of a VNCR page mapped with no permissions or with write-only permissions is not correctly handled yet. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	6fb75733f1	KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits, and make it accessible to userspace when the VM is configured to support FEAT_NV2. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	ea8d3cf46d	KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR is a virtual address in the EL2&0 (or EL2, but we thankfully ignore this) translation regime. As we need to replicate such mapping in the real EL2, it means that we need to remember that there is such a translation, and that any TLBI affecting EL2 can possibly affect this translation. It also means that any invalidation driven by an MMU notifier must be able to shoot down any such mapping. All in all, we need a data structure that represents this mapping, and that is extremely close to a TLB. Given that we can only use one of those per vcpu at any given time, we only allocate one. No effort is made to keep that structure small. If we need to start caching multiple of them, we may want to revisit that design point. But for now, it is kept simple so that we can reason about it. Oh, and add a braindump of how things are supposed to work, because I will definitely page this out at some point. Yes, pun intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	bd914a9814	KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting We currently check for HCR_EL2.NV being set to decide whether we need to repaint PSTATE.M to say EL2 instead of EL1 on exit. However, this isn't correct when L2 is itself a hypervisor, and that L1 as set its own HCR_EL2.NV. That's because we "flatten" the state and inherit parts of the guest's own setup. In that case, we shouldn't adjust PSTATE.M, as this is really EL1 for both us and the guest. Instead of trying to try and work out how we ended-up with HCR_EL2.NV being set by introspecting both the host and guest states, use a per-CPU flag to remember the context (HYP or not), and use that information to decide whether PSTATE needs tweaking. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	85bba00425	KVM: arm64: nv: Move TLBI range decoding to a helper As we are about to expand out TLB invalidation capabilities to support recursive virtualisation, move the decoding of a TLBI by range into a helper that returns the base, the range and the ASID. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	a0ec2b822c	KVM: arm64: nv: Snapshot S1 ASID tagging information during walk We currently completely ignore any sort of ASID tagging during a S1 walk, as AT doesn't care about it. However, such information is required if we are going to create anything that looks like a TLB from this walk. Let's capture it both the nG and ASID information while walking the page tables. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	34fa9dece5	KVM: arm64: nv: Extract translation helper from the AT code The address translation infrastructure is currently pretty tied to the AT emulation. However, we also need to features that require the use of VAs, such as VNCR_EL2 (and maybe one of these days SPE), meaning that we need a slightly more generic infrastructure. Start this by introducing a new helper (__kvm_translate_va()) that performs a S1 walk for a given translation regime, EL and PAN settings. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	fb3066904a	arm64: sysreg: Add layout for VNCR_EL2 Now that we're about to emulate VNCR_EL2, we need its full layout. Add it to the sysreg file. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Ard Biesheuvel	93d0d6f8a6	arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace init_pgdir[] is only referenced from the startup code, but lives after BSS in the linker map. Before tightening the rules about accessing BSS from startup code, move init_pgdir[] into the __pi_ namespace, so it does not need to be exported explicitly. For symmetry, do the same with init_idmap_pgdir[], although it lives before BSS. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20250508114328.2460610-6-ardb+git@google.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-16 16:05:21 +01:00
Ben Horgan	694f574f74	arm64: Update comment regarding values in __boot_cpu_mode The values stored in __boot_cpu_mode were changed without updating the comment. Rectify that. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Link: https://lore.kernel.org/r/20250513124525.677736-1-ben.horgan@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-16 15:12:21 +01:00
Gavin Shan	13c63ce358	arm64: mm: Drop redundant check in pmd_trans_huge() pmd_val(pmd) is redundant because a positive pmd_present(pmd) ensures a positive pmd_val(pmd) according to their definitions like below. #define pmd_val(x) ((x).pmd) #define pmd_present(pmd) pte_present(pmd_pte(pmd)) #define pte_present(pte) (pte_valid(pte) \|\| pte_present_invalid(pte)) #define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID)) #define pte_present_invalid(pte) \ ((pte_val(pte) & (PTE_VALID \| PTE_PRESENT_INVALID)) == PTE_PRESENT_INVALID) pte_present() can't be positive unless either of the flag PTE_VALID or PTE_PRESENT_INVALID is set. In this case, pmd_val(pmd) should be positive either. So lets drop the redundant check pmd_val(pmd) and no functional changes intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250508085251.204282-1-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-16 15:10:13 +01:00
Ryan Roberts	1ef3095b14	arm64/mm: Permit lazy_mmu_mode to be nested lazy_mmu_mode is not supposed to permit nesting. But in practice this does happen with CONFIG_DEBUG_PAGEALLOC, where a page allocation inside a lazy_mmu_mode section (such as zap_pte_range()) will change permissions on the linear map with apply_to_page_range(), which re-enters lazy_mmu_mode (see stack trace below). The warning checking that nesting was not happening was previously being triggered due to this. So let's relax by removing the warning and tolerate nesting in the arm64 implementation. The first (inner) call to arch_leave_lazy_mmu_mode() will flush and clear the flag such that the remainder of the work in the outer nest behaves as if outside of lazy mmu mode. This is safe and keeps tracking simple. Code review suggests powerpc deals with this issue in the same way. ------------[ cut here ]------------ WARNING: CPU: 6 PID: 1 at arch/arm64/include/asm/pgtable.h:89 __apply_to_page_range+0x85c/0x9f8 Modules linked in: ip_tables x_tables ipv6 CPU: 6 UID: 0 PID: 1 Comm: systemd Not tainted 6.15.0-rc5-00075-g676795fe9cf6 #1 PREEMPT Hardware name: QEMU KVM Virtual Machine, BIOS 2024.08-4 10/25/2024 pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __apply_to_page_range+0x85c/0x9f8 lr : __apply_to_page_range+0x2b4/0x9f8 sp : ffff80008009b3c0 x29: ffff80008009b460 x28: ffff0000c43a3000 x27: ffff0001ff62b108 x26: ffff0000c43a4000 x25: 0000000000000001 x24: 0010000000000001 x23: ffffbf24c9c209c0 x22: ffff80008009b4d0 x21: ffffbf24c74a3b20 x20: ffff0000c43a3000 x19: ffff0001ff609d18 x18: 0000000000000001 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000003 x14: 0000000000000028 x13: ffffbf24c97c1000 x12: ffff0000c43a3fff x11: ffffbf24cacc9a70 x10: ffff0000c43a3fff x9 : ffff0001fffff018 x8 : 0000000000000012 x7 : ffff0000c43a4000 x6 : ffff0000c43a4000 x5 : ffffbf24c9c209c0 x4 : ffff0000c43a3fff x3 : ffff0001ff609000 x2 : 0000000000000d18 x1 : ffff0000c03e8000 x0 : 0000000080000000 Call trace: __apply_to_page_range+0x85c/0x9f8 (P) apply_to_page_range+0x14/0x20 set_memory_valid+0x5c/0xd8 __kernel_map_pages+0x84/0xc0 get_page_from_freelist+0x1110/0x1340 __alloc_frozen_pages_noprof+0x114/0x1178 alloc_pages_mpol+0xb8/0x1d0 alloc_frozen_pages_noprof+0x48/0xc0 alloc_pages_noprof+0x10/0x60 get_free_pages_noprof+0x14/0x90 __tlb_remove_folio_pages_size.isra.0+0xe4/0x140 __tlb_remove_folio_pages+0x10/0x20 unmap_page_range+0xa1c/0x14c0 unmap_single_vma.isra.0+0x48/0x90 unmap_vmas+0xe0/0x200 vms_clear_ptes+0xf4/0x140 vms_complete_munmap_vmas+0x7c/0x208 do_vmi_align_munmap+0x180/0x1a8 do_vmi_munmap+0xac/0x188 __vm_munmap+0xe0/0x1e0 __arm64_sys_munmap+0x20/0x38 invoke_syscall+0x48/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x4c/0x16c el0t_64_sync_handler+0x10c/0x140 el0t_64_sync+0x198/0x19c irq event stamp: 281312 hardirqs last enabled at (281311): [<ffffbf24c780fd04>] bad_range+0x164/0x1c0 hardirqs last disabled at (281312): [<ffffbf24c89c4550>] el1_dbg+0x24/0x98 softirqs last enabled at (281054): [<ffffbf24c752d99c>] handle_softirqs+0x4cc/0x518 softirqs last disabled at (281019): [<ffffbf24c7450694>] __do_softirq+0x14/0x20 ---[ end trace 0000000000000000 ]--- Fixes: `5fdd05efa1` ("arm64/mm: Batch barriers when updating kernel mappings") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Closes: https://lore.kernel.org/linux-arm-kernel/aCH0TLRQslXHin5Q@arm.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250512150333.5589-1-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-14 13:28:40 +01:00
Ryan Roberts	b81c688426	arm64/mm: Disable barrier batching in interrupt contexts Commit `5fdd05efa1` ("arm64/mm: Batch barriers when updating kernel mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF flags in order to defer barriers until exiting the mode. At the same time, it added warnings to check that pte manipulations were never performed in interrupt context, because the tracking implementation could not deal with nesting. But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC) do manipulate ptes in softirq context, which triggered the warnings. So let's take the simplest and safest route and disable the batching optimization in interrupt contexts. This makes these users no worse off than prior to the optimization. Additionally the known offenders are debug features that only manipulate a single PTE, so there is no performance gain anyway. There may be some obscure case of encrypted/decrypted DMA with the dma_free_coherent called from an interrupt context, but again, this is no worse off than prior to the commit. Some options for supporting nesting were considered, but there is a difficult to solve problem if any code manipulates ptes within interrupt context but outside of a lazy mmu region. If this case exists, the code would expect the updates to be immediate, but because the task context may have already been in lazy mmu mode, the updates would be deferred, which could cause incorrect behaviour. This problem is avoided by always ensuring updates within interrupt context are immediate. Fixes: `5fdd05efa1` ("arm64/mm: Batch barriers when updating kernel mappings") Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-arm-kernel/681f2a09.050a0220.f2294.0006.GAE@google.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250512102242.4156463-1-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-14 13:27:55 +01:00
Anshuman Khandual	dbb9c166a0	arm64/mm: define ptdesc_t Define ptdesc_t type which describes the basic page table descriptor layout on arm64 platform. Subsequently all level specific pxxval_t descriptors are derived from ptdesc_t thus establishing a common original format, which can also be appropriate for page table entries, masks and protection values etc which are used at all page table levels. Link: https://lkml.kernel.org/r/20250407053113.746295-4-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:48:19 -07:00
Anshuman Khandual	e064e7384f	mm/ptdump: split note_page() into level specific callbacks Patch series "mm/ptdump: Drop assumption that pxd_val() is u64", v2. Last argument passed down in note_page() is u64 assuming pxd_val() returned value (all page table levels) is 64 bit - which might not be the case going ahead when D128 page tables is enabled on arm64 platform. Besides pxd_val() is very platform specific and its type should not be assumed in generic MM. A similar problem exists for effective_prot(), although it is restricted to x86 platform. This series splits note_page() and effective_prot() into individual page table level specific callbacks which accepts corresponding pxd_t page table entry as an argument instead and later on all subscribing platforms could derive pxd_val() from the table entries as required and proceed as before. Define ptdesc_t type which describes the basic page table descriptor layout on arm64 platform. Subsequently all level specific pxxval_t descriptors are derived from ptdesc_t thus establishing a common original format, which can also be appropriate for page table entries, masks and protection values etc which are used at all page table levels. This patch (of 3): Last argument passed down in note_page() is u64 assuming pxd_val() returned value (all page table levels) is 64 bit - which might not be the case going ahead when D128 page tables is enabled on arm64 platform. Besides pxd_val() is very platform specific and its type should not be assumed in generic MM. Split note_page() into individual page table level specific callbacks which accepts corresponding pxd_t argument instead and then subscribing platforms just derive pxd_val() from the entries as required and proceed as earlier. Also add a note_page_flush() callback for flushing the last page table page that was being handled earlier via level = -1. Link: https://lkml.kernel.org/r/20250407053113.746295-1-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/20250407053113.746295-2-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:48:19 -07:00
Dmitry V. Levin	cc6622730b	syscall.h: introduce syscall_set_nr() Similar to syscall_set_arguments() that complements syscall_get_arguments(), introduce syscall_set_nr() that complements syscall_get_nr(). syscall_set_nr() is going to be needed along with syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. Link: https://lkml.kernel.org/r/20250303112020.GD24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> # mips Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:48:15 -07:00
Dmitry V. Levin	17fc7b8f9b	syscall.h: add syscall_set_arguments() This function is going to be needed on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. This partially reverts commit `7962c2eddb` ("arch: remove unused function syscall_set_arguments()") by reusing some of old syscall_set_arguments() implementations. [nathan@kernel.org: fix compile time fortify checks] Link: https://lkml.kernel.org/r/20250408213131.GA2872426@ax162 Link: https://lkml.kernel.org/r/20250303112009.GC24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> [mips] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:48:15 -07:00
Matthew Wilcox (Oracle)	5071ea3d7b	arch: remove mk_pmd() There are now no callers of mk_huge_pmd() and mk_pmd(). Remove them. Link: https://lkml.kernel.org/r/20250402181709.2386022-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:48:04 -07:00
Matthew Wilcox (Oracle)	cb5b13cd6c	mm: introduce a common definition of mk_pte() Most architectures simply call pfn_pte(). Centralise that as the normal definition and remove the definition of mk_pte() from the architectures which have either that exact definition or something similar. Link: https://lkml.kernel.org/r/20250402181709.2386022-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Cc: Zi Yan <ziy@nvidia.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:48:02 -07:00
Linus Torvalds	627277ba7c	* Add BHB mitigations to cBPF programs. * Add new CPUs local mitigation 'k' values. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEmVzZdC2f8yLvolS4hFk2x3H8xgYFAmgcylYACgkQhFk2x3H8 xga4ag/7Bgh9TpBGVuCzKVW8dqv9N7xMo312PjIxqVcxrYN2OBMBScAkxlUDlBuj zdUnGDfc+4u7jwHdcFG2xh3eg3V7Qktsa3LhfKmRZ0tWS4FGf0Z+ffJFnjk1oUva FPUYvjAyxMZZr8XwMDEYrEE/z/iq9ucdWE8XV2bzxndDrsk17IluHCjH1ZOCDtmW f3658hNowbaHW9l6n7TPIaBGngmATMGbRLGXI6w69+MPhY7Rx3Acdbnpxez8OHvh VW2SpKJ2snYq1q/iGgArFYuVLbHde12UCZQCn1UUQjazuzBt1XjAskZYralGm8Ql /qJBcXS9R30KDSM5B2uNCtVbSc1VyHRTgG7vtLAVkhLcDVMheMsls7wTGiTB9keH fQetOh/WmhTmv7pIhFdGOBHZMRGCijgTlDbfA5EP1lM4JieY7RYhS+RK4KiNkFun WRew/II2fhOCsYgs8QtFMHIn7NOCZi+q2bc1gXaRe/kSzCQXfuJ6z0aCMK3AHVTE 2MTFvh5Pzjhrn03vvVpPsjsWnOInWpWV9peb7o+o86dcSa2V61J3topZyp4mpogV 1yTQ3hDX0X2NmpZ7tfMMZs3qmyqB3A7/k2au353vgiOwyPFf+pazLk3EWnpC8ZqB +/9dSnpLGJkfzZgCDNHWTSu8Wq2kSmtenqSXLR/wMhSBfmroVts= =SlZT -----END PGP SIGNATURE----- Merge tag 'arm64_cbpf_mitigation_2025_05_08' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 cBPF BHB mitigation from James Morse: "This adds the BHB mitigation into the code JITted for cBPF programs as these can be loaded by unprivileged users via features like seccomp. The existing mechanisms to disable the BHB mitigation will also prevent the mitigation being JITted. In addition, cBPF programs loaded by processes with the SYS_ADMIN capability are not mitigated as these could equally load an eBPF program that does the same thing. For good measure, the list of 'k' values for CPU's local mitigations is updated from the version on arm's website" * tag 'arm64_cbpf_mitigation_2025_05_08' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: proton-pack: Add new CPUs 'k' values for branch mitigation arm64: bpf: Only mitigate cBPF programs loaded by unprivileged users arm64: bpf: Add BHB mitigation to the epilogue for cBPF programs arm64: proton-pack: Expose whether the branchy loop k value arm64: proton-pack: Expose whether the platform is mitigated by firmware arm64: insn: Add support for encoding DSB	2025-05-11 17:45:00 -07:00
Linus Torvalds	cd802e7e5f	ARM: * Avoid use of uninitialized memcache pointer in user_mem_abort() * Always set HCR_EL2.xMO bits when running in VHE, allowing interrupts to be taken while TGE=0 and fixing an ugly bug on AmpereOne that occurs when taking an interrupt while clearing the xMO bits (AC03_CPU_36) * Prevent VMMs from hiding support for AArch64 at any EL virtualized by KVM * Save/restore the host value for HCRX_EL2 instead of restoring an incorrect fixed value * Make host_stage2_set_owner_locked() check that the entire requested range is memory rather than just the first page RISC-V: * Add missing reset of smstateen CSRs x86: * Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid causing problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to sanitize the VMCB as its state is undefined after SHUTDOWN, emulating INIT is the least awful choice). * Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM KVM doesn't goof a sanity check in the future. * Free obsolete roots when (re)loading the MMU to fix a bug where pre-faulting memory can get stuck due to always encountering a stale root. * When dumping GHCB state, use KVM's snapshot instead of the raw GHCB page to print state, so that KVM doesn't print stale/wrong information. * When changing memory attributes (e.g. shared <=> private), add potential hugepage ranges to the mmu_invalidate_range_{start,end} set so that KVM doesn't create a shared/private hugepage when the the corresponding attributes will become mixed (the attributes are commited after KVM finishes the invalidation). * Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM has at least one active VM. Effectively BP_SPEC_REDUCE when KVM is loaded led to very measurable performance regressions for non-KVM workloads. -----BEGIN PGP SIGNATURE----- iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmgfbqAUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNAywf+J9Ux+RccM8K2my3REQn7Z6WwMevX CYgvdYBGt79AG8mjMKMfISzRDo3PrTi9wr+mEHfCpJ1F7CZTec/qdGY61tIjOhnE 86A5EoJcaoWhZcl4ubtQwRc//ENapwb6qI5uy10Nt30KTqS1S38M7FcZLvTYBYBx A1Xehcnc8NOsOvXMyHvnsAi/X+yvj/wUfzETfzt5CFg8s9MHnmEFWlP+oOgNggbR TKJVIvD0CTQR8lmdEcJYDrgWfhUsRq8qZyPAO37SoAn1tWfYAcpUUHEH2t2C6waW shqmRx0HLshhbIWgySU2AdRx6Q3iyMIPSmTvzUhATEhEzM/IDk/DZstOyQ== =aJFD -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "ARM: - Avoid use of uninitialized memcache pointer in user_mem_abort() - Always set HCR_EL2.xMO bits when running in VHE, allowing interrupts to be taken while TGE=0 and fixing an ugly bug on AmpereOne that occurs when taking an interrupt while clearing the xMO bits (AC03_CPU_36) - Prevent VMMs from hiding support for AArch64 at any EL virtualized by KVM - Save/restore the host value for HCRX_EL2 instead of restoring an incorrect fixed value - Make host_stage2_set_owner_locked() check that the entire requested range is memory rather than just the first page RISC-V: - Add missing reset of smstateen CSRs x86: - Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid causing problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to sanitize the VMCB as its state is undefined after SHUTDOWN, emulating INIT is the least awful choice). - Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM KVM doesn't goof a sanity check in the future. - Free obsolete roots when (re)loading the MMU to fix a bug where pre-faulting memory can get stuck due to always encountering a stale root. - When dumping GHCB state, use KVM's snapshot instead of the raw GHCB page to print state, so that KVM doesn't print stale/wrong information. - When changing memory attributes (e.g. shared <=> private), add potential hugepage ranges to the mmu_invalidate_range_{start,end} set so that KVM doesn't create a shared/private hugepage when the the corresponding attributes will become mixed (the attributes are commited after KVM finishes the invalidation). - Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM has at least one active VM. Effectively BP_SPEC_REDUCE when KVM is loaded led to very measurable performance regressions for non-KVM workloads" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions KVM: arm64: Fix memory check in host_stage2_set_owner_locked() KVM: arm64: Kill HCRX_HOST_FLAGS KVM: arm64: Properly save/restore HCRX_EL2 KVM: arm64: selftest: Don't try to disable AArch64 support KVM: arm64: Prevent userspace from disabling AArch64 support at any virtualisable EL KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort() KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fields KVM: RISC-V: reset smstateen CSRs KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload() KVM: x86: Check that the high 32bits are clear in kvm_arch_vcpu_ioctl_run() KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception	2025-05-11 11:30:13 -07:00
Marc Zyngier	938a79d0aa	KVM: arm64: Validate FGT register descriptions against RES0 masks In order to point out to the unsuspecting KVM hacker that they are missing something somewhere, validate that the known FGT bits do not intersect with the corresponding RES0 mask, as computed at boot time. THis check is also performed at boot time, ensuring that there is no runtime overhead. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-10 11:04:35 +01:00
Marc Zyngier	63d423a763	KVM: arm64: Switch to table-driven FGU configuration Defining the FGU behaviour is extremely tedious. It relies on matching each set of bits from FGT registers with am architectural feature, and adding them to the FGU list if the corresponding feature isn't advertised to the guest. It is however relatively easy to dump most of that information from the architecture JSON description, and use that to control the FGU bits. Let's introduce a new set of tables descripbing the mapping between FGT bits and features. Most of the time, this is only a lookup in an idreg field, with a few more complex exceptions. While this is obviously many more lines in a new file, this is mostly generated, and is pretty easy to maintain. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-10 11:04:35 +01:00
Marc Zyngier	397411c743	KVM: arm64: Handle PSB CSYNC traps The architecture introduces a trap for PSB CSYNC that fits in the same EC as LS64. Let's deal with it in a similar way as LS64. It's not that we expect this to be useful any time soon anyway. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-10 11:04:35 +01:00
Marc Zyngier	ef6d7d2682	KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask We do not have a computed table for HCRX_EL2, so statically define the bits we know about. A warning will fire if the architecture grows bits that are not handled yet. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-10 11:04:35 +01:00
Marc Zyngier	3ce9bbba93	KVM: arm64: Remove hand-crafted masks for FGT registers These masks are now useless, and can be removed. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-10 11:04:35 +01:00
Ryan Roberts	5fdd05efa1	arm64/mm: Batch barriers when updating kernel mappings Because the kernel can't tolerate page faults for kernel mappings, when setting a valid, kernel space pte (or pmd/pud/p4d/pgd), it emits a dsb(ishst) to ensure that the store to the pgtable is observed by the table walker immediately. Additionally it emits an isb() to ensure that any already speculatively determined invalid mapping fault gets canceled. We can improve the performance of vmalloc operations by batching these barriers until the end of a set of entry updates. arch_enter_lazy_mmu_mode() and arch_leave_lazy_mmu_mode() provide the required hooks. vmalloc improves by up to 30% as a result. Two new TIF_ flags are created; TIF_LAZY_MMU tells us if the task is in the lazy mode and can therefore defer any barriers until exit from the lazy mode. TIF_LAZY_MMU_PENDING is used to remember if any pte operation was performed while in the lazy mode that required barriers. Then when leaving lazy mode, if that flag is set, we emit the barriers. Since arch_enter_lazy_mmu_mode() and arch_leave_lazy_mmu_mode() are used for both user and kernel mappings, we need the second flag to avoid emitting barriers unnecessarily if only user mappings were updated. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Link: https://lore.kernel.org/r/20250422081822.1836315-12-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-09 13:43:08 +01:00
Ryan Roberts	06fc959fcf	arm64/mm: Support huge pte-mapped pages in vmap Implement the required arch functions to enable use of contpte in the vmap when VM_ALLOW_HUGE_VMAP is specified. This speeds up vmap operations due to only having to issue a DSB and ISB per contpte block instead of per pte. But it also means that the TLB pressure reduces due to only needing a single TLB entry for the whole contpte block. Since vmap uses set_huge_pte_at() to set the contpte, that API is now used for kernel mappings for the first time. Although in the vmap case we never expect it to be called to modify a valid mapping so clear_flush() should never be called, it's still wise to make it robust for the kernel case, so amend the tlb flush function if the mm is for kernel space. Tested with vmalloc performance selftests: # kself/mm/test_vmalloc.sh \ run_test_mask=1 test_repeat_count=5 nr_pages=256 test_loop_count=100000 use_huge=1 Duration reduced from 1274243 usec to 1083553 usec on Apple M2 for 15% reduction in time taken. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Link: https://lore.kernel.org/r/20250422081822.1836315-10-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-09 13:43:07 +01:00
Ryan Roberts	f89b399e8d	arm64/mm: Hoist barriers out of set_ptes_anysz() loop set_ptes_anysz() previously called __set_pte() for each PTE in the range, which would conditionally issue a DSB and ISB to make the new PTE value immediately visible to the table walker if the new PTE was valid and for kernel space. We can do better than this; let's hoist those barriers out of the loop so that they are only issued once at the end of the loop. We then reduce the cost by the number of PTEs in the range. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Link: https://lore.kernel.org/r/20250422081822.1836315-7-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-09 13:43:07 +01:00
Ryan Roberts	ef493d2343	arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() Refactor __set_ptes(), set_pmd_at() and set_pud_at() so that they are all a thin wrapper around a new common __set_ptes_anysz(), which takes pgsize parameter. Additionally, refactor __ptep_get_and_clear() and pmdp_huge_get_and_clear() to use a new common __ptep_get_and_clear_anysz() which also takes a pgsize parameter. These changes will permit the huge_pte API to efficiently batch-set pgtable entries and take advantage of the future barrier optimizations. Additionally since the new _anysz() helpers call the correct page_table_check__set() API based on pgsize, this means that huge_ptes will be able to get proper coverage. Currently the huge_pte API always uses the pte API which assumes an entry only covers a single page. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Link: https://lore.kernel.org/r/20250422081822.1836315-5-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-09 13:43:07 +01:00
Ryan Roberts	5b3f891764	arm64: hugetlb: Refine tlb maintenance scope When operating on contiguous blocks of ptes (or pmds) for some hugetlb sizes, we must honour break-before-make requirements and clear down the block to invalid state in the pgtable then invalidate the relevant tlb entries before making the pgtable entries valid again. However, the tlb maintenance is currently always done assuming the worst case stride (PAGE_SIZE), last_level (false) and tlb_level (TLBI_TTL_UNKNOWN). We can do much better with the hinting; In reality, we know the stride from the huge_pte pgsize, we are always operating only on the last level, and we always know the tlb_level, again based on pgsize. So let's start providing these hints. Additionally, avoid tlb maintenace in set_huge_pte_at(). Break-before-make is only required if we are transitioning the contiguous pte block from valid -> valid. So let's elide the clear-and-flush ("break") if the pte range was previously invalid. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Link: https://lore.kernel.org/r/20250422081822.1836315-3-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-09 13:43:06 +01:00
James Morse	efe676a1a7	arm64: proton-pack: Add new CPUs 'k' values for branch mitigation Update the list of 'k' values for the branch mitigation from arm's website. Add the values for Cortex-X1C. The MIDR_EL1 value can be found here: https://developer.arm.com/documentation/101968/0002/Register-descriptions/AArch> Link: https://developer.arm.com/documentation/110280/2-0/?lang=en Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>	2025-05-08 15:29:28 +01:00
Mark Rutland	6ef1d778ce	arm64/fpsimd: Add task_smstop_sm() In a few places we want to transition a task from streaming mode to non-streaming mode, e.g. signal delivery where we historically tried to use an SMSTOP SM instruction. Add a new helper to manipulate a task's state in the same way as an SMSTOP SM instruction. I have not added a corresponding helper to simulate the effects of SMSTART SM. Only ptrace transitions a task into streaming mode, and ptrace has distinct semantics for such transitions. Per ARM DDI 0487 L.a, section B1.4.6: \| RRSWFQ \| When the Effective value of PSTATE.SM is changed by any method from 0 \| to 1, an entry to Streaming SVE mode is performed, and all implemented \| bits of Streaming SVE register state are set to zero. \| RKFRQZ \| When the Effective value of PSTATE.SM is changed by any method from 1 \| to 0, an exit from Streaming SVE mode is performed, and in the \| newly-entered mode, all implemented bits of the SVE scalable vector \| registers, SVE predicate registers, and FFR, are set to zero. Per ARM DDI 0487 L.a, section C5.2.9: \| On entry to or exit from Streaming SVE mode, FPMR is set to 0 Per ARM DDI 0487 L.a, section C5.2.10: \| On entry to or exit from Streaming SVE mode, FPSR.{IOC, DZC, OFC, UFC, \| IXC, IDC, QC} are set to 1 and the remaining bits are set to 0. This means bits 0, 1, 2, 3, 4, 7, and 27 respectively, i.e. 0x0800009f Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-9-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-08 15:29:09 +01:00
Mark Rutland	8738288a08	arm64/fpsimd: Factor out {sve,sme}_state_size() helpers In subsequent patches we'll need to determine the SVE/SME state size for a given SVE VL and SME VL regardless of whether a task is currently configured with those VLs. Split the sizing logic out of sve_state_size() and sme_state_size() so that we don't need to open-code this logic elsewhere. At the same time, apply minor cleanups: * Move sve_state_size() into fpsimd.h, matching the placement of sme_state_size(). * Remove the feature checks from sve_state_size(). We only call sve_state_size() when at least one of SVE and SME are supported, and when either of the two is not supported, the task's corresponding SVE/SME vector length will be zero. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-8-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-08 15:29:09 +01:00

1 2 3 4 5 ...

6076 Commits