linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-28 09:22:08 +00:00

Author	SHA1	Message	Date
Peter Zijlstra	e52c1dc745	x86/its: FineIBT-paranoid vs ITS FineIBT-paranoid was using the retpoline bytes for the paranoid check, disabling retpolines, because all parts that have IBT also have eIBRS and thus don't need no stinking retpolines. Except... ITS needs the retpolines for indirect calls must not be in the first half of a cacheline :-/ So what was the paranoid call sequence: <fineibt_paranoid_start>: 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d a: 4d 8d 5b <f0> lea -0x10(%r11), %r11 e: 75 fd jne d <fineibt_paranoid_start+0xd> 10: 41 ff d3 call %r11 13: 90 nop Now becomes: <fineibt_paranoid_start>: 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d a: 4d 8d 5b f0 lea -0x10(%r11), %r11 e: 2e e8 XX XX XX XX cs call __x86_indirect_paranoid_thunk_r11 Where the paranoid_thunk looks like: 1d: <ea> (bad) __x86_indirect_paranoid_thunk_r11: 1e: 75 fd jne 1d __x86_indirect_its_thunk_r11: 20: 41 ff eb jmp %r11 23: cc int3 [ dhansen: remove initialization to false ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:39:36 -07:00
Peter Zijlstra	872df34d7c	x86/its: Use dynamic thunks for indirect branches ITS mitigation moves the unsafe indirect branches to a safe thunk. This could degrade the prediction accuracy as the source address of indirect branches becomes same for different execution paths. To improve the predictions, and hence the performance, assign a separate thunk for each indirect callsite. This is also a defense-in-depth measure to avoid indirect branches aliasing with each other. As an example, 5000 dynamic thunks would utilize around 16 bits of the address space, thereby gaining entropy. For a BTB that uses 32 bits for indexing, dynamic thunks could provide better prediction accuracy over fixed thunks. Have ITS thunks be variable sized and use EXECMEM_MODULE_TEXT such that they are both more flexible (got to extend them later) and live in 2M TLBs, just like kernel code, avoiding undue TLB pressure. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:36:58 -07:00
Pawan Gupta	ebebe30794	x86/ibt: Keep IBT disabled during alternative patching cfi_rewrite_callers() updates the fineIBT hash matching at the caller side, but except for paranoid-mode it relies on apply_retpoline() and friends for any ENDBR relocation. This could temporarily cause an indirect branch to land on a poisoned ENDBR. For instance, with para-virtualization enabled, a simple wrmsrl() could have an indirect branch pointing to native_write_msr() who's ENDBR has been relocated due to fineIBT: <wrmsrl>: push %rbp mov %rsp,%rbp mov %esi,%eax mov %rsi,%rdx shr $0x20,%rdx mov %edi,%edi mov %rax,%rsi call 0x21e65d0(%rip) # <pv_ops+0xb8> ^^^^^^^^^^^^^^^^^^^^^^^ Such an indirect call during the alternative patching could #CP if the caller is not yet* adjusted for the new target ENDBR. To prevent a false #CP, keep CET-IBT disabled until all callers are patched. Patching during the module load does not need to be guarded by IBT-disable because the module code is not executed until the patching is complete. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:33:35 -07:00
Pawan Gupta	facd226f7e	x86/its: Add support for RSB stuffing mitigation When retpoline mitigation is enabled for spectre-v2, enabling call-depth-tracking and RSB stuffing also mitigates ITS. Add cmdline option indirect_target_selection=stuff to allow enabling RSB stuffing mitigation. When retpoline mitigation is not enabled, =stuff option is ignored, and default mitigation for ITS is deployed. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:05 -07:00
Pawan Gupta	2665281a07	x86/its: Add "vmexit" option to skip mitigation on some CPUs Ice Lake generation CPUs are not affected by guest/host isolation part of ITS. If a user is only concerned about KVM guests, they can now choose a new cmdline option "vmexit" that will not deploy the ITS mitigation when CPU is not affected by guest/host isolation. This saves the performance overhead of ITS mitigation on Ice Lake gen CPUs. When "vmexit" option selected, if the CPU is affected by ITS guest/host isolation, the default ITS mitigation is deployed. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:05 -07:00
Pawan Gupta	f4818881c4	x86/its: Enable Indirect Target Selection mitigation Indirect Target Selection (ITS) is a bug in some pre-ADL Intel CPUs with eIBRS. It affects prediction of indirect branch and RETs in the lower half of cacheline. Due to ITS such branches may get wrongly predicted to a target of (direct or indirect) branch that is located in the upper half of the cacheline. Scope of impact =============== Guest/host isolation -------------------- When eIBRS is used for guest/host isolation, the indirect branches in the VMM may still be predicted with targets corresponding to branches in the guest. Intra-mode ---------- cBPF or other native gadgets can be used for intra-mode training and disclosure using ITS. User/kernel isolation --------------------- When eIBRS is enabled user/kernel isolation is not impacted. Indirect Branch Prediction Barrier (IBPB) ----------------------------------------- After an IBPB, indirect branches may be predicted with targets corresponding to direct branches which were executed prior to IBPB. This is mitigated by a microcode update. Add cmdline parameter indirect_target_selection=off\|on\|force to control the mitigation to relocate the affected branches to an ITS-safe thunk i.e. located in the upper half of cacheline. Also add the sysfs reporting. When retpoline mitigation is deployed, ITS safe-thunks are not needed, because retpoline sequence is already ITS-safe. Similarly, when call depth tracking (CDT) mitigation is deployed (retbleed=stuff), ITS safe return thunk is not used, as CDT prevents RSB-underflow. To not overcomplicate things, ITS mitigation is not supported with spectre-v2 lfence;jmp mitigation. Moreover, it is less practical to deploy lfence;jmp mitigation on ITS affected parts anyways. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:05 -07:00
Pawan Gupta	a75bf27fe4	x86/its: Add support for ITS-safe return thunk RETs in the lower half of cacheline may be affected by ITS bug, specifically when the RSB-underflows. Use ITS-safe return thunk for such RETs. RETs that are not patched: - RET in retpoline sequence does not need to be patched, because the sequence itself fills an RSB before RET. - RET in Call Depth Tracking (CDT) thunks __x86_indirect_{call\|jump}_thunk and call_depth_return_thunk are not patched because CDT by design prevents RSB-underflow. - RETs in .init section are not reachable after init. - RETs that are explicitly marked safe with ANNOTATE_UNRET_SAFE. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:05 -07:00
Pawan Gupta	8754e67ad4	x86/its: Add support for ITS-safe indirect thunk Due to ITS, indirect branches in the lower half of a cacheline may be vulnerable to branch target injection attack. Introduce ITS-safe thunks to patch indirect branches in the lower half of cacheline with the thunk. Also thunk any eBPF generated indirect branches in emit_indirect_jump(). Below category of indirect branches are not mitigated: - Indirect branches in the .init section are not mitigated because they are discarded after boot. - Indirect branches that are explicitly marked retpoline-safe. Note that retpoline also mitigates the indirect branches against ITS. This is because the retpoline sequence fills an RSB entry before RET, and it does not suffer from RSB-underflow part of the ITS. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:04 -07:00
Pawan Gupta	159013a7ca	x86/its: Enumerate Indirect Target Selection (ITS) bug ITS bug in some pre-Alderlake Intel CPUs may allow indirect branches in the first half of a cache line get predicted to a target of a branch located in the second half of the cache line. Set X86_BUG_ITS on affected CPUs. Mitigation to follow in later commits. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:04 -07:00
Ingo Molnar	367ed4e357	treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try() Move this API to the canonical timer_*() namespace. Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250507175338.672442-9-mingo@kernel.org	2025-05-08 19:49:33 +02:00
Guenter Roeck	00a241f528	x86: disable image size check for test builds 64-bit allyesconfig builds fail with x86_64-linux-ld: kernel image bigger than KERNEL_IMAGE_SIZE Bisect points to commit `6f110a5e4f` ("Disable SLUB_TINY for build testing") as the responsible commit. Reverting that patch does indeed fix the problem. Further analysis shows that disabling SLUB_TINY enables KASAN, and that KASAN is responsible for the image size increase. Solve the build problem by disabling the image size check for test builds. [akpm@linux-foundation.org: add comment, fix nearby typo (sink->sync)] [akpm@linux-foundation.org: fix comment snafu Link: https://lore.kernel.org/oe-kbuild-all/202504191813.4r9H6Glt-lkp@intel.com/ Link: https://lkml.kernel.org/r/20250417010950.2203847-1-linux@roeck-us.net Fixes: `6f110a5e4f` ("Disable SLUB_TINY for build testing") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-07 23:39:37 -07:00
Mostafa Saleh	d683a85618	ubsan: Remove regs from report_ubsan_failure() report_ubsan_failure() doesn't use argument regs, and soon it will be called from the hypervisor context were regs are not available. So, remove the unused argument. Signed-off-by: Mostafa Saleh <smostafa@google.com> Acked-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-3-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-07 11:21:35 +01:00
Pawan Gupta	073fdbe02c	x86/bhi: Do not set BHI_DIS_S in 32-bit mode With the possibility of intra-mode BHI via cBPF, complete mitigation for BHI is to use IBHF (history fence) instruction with BHI_DIS_S set. Since this new instruction is only available in 64-bit mode, setting BHI_DIS_S in 32-bit mode is only a partial mitigation. Do not set BHI_DIS_S in 32-bit mode so as to avoid reporting misleading mitigated status. With this change IBHF won't be used in 32-bit mode, also remove the CONFIG_X86_64 check from emit_spectre_bhb_barrier(). Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-06 08:18:59 -07:00
Daniel Sneddon	9f725eec8f	x86/bpf: Add IBHF call at end of classic BPF Classic BPF programs can be run by unprivileged users, allowing unprivileged code to execute inside the kernel. Attackers can use this to craft branch history in kernel mode that can influence the target of indirect branches. BHI_DIS_S provides user-kernel isolation of branch history, but cBPF can be used to bypass this protection by crafting branch history in kernel mode. To stop intra-mode attacks via cBPF programs, Intel created a new instruction Indirect Branch History Fence (IBHF). IBHF prevents the predicted targets of subsequent indirect branches from being influenced by branch history prior to the IBHF. IBHF is only effective while BHI_DIS_S is enabled. Add the IBHF instruction to cBPF jitted code's exit path. Add the new fence when the hardware mitigation is enabled (i.e., X86_FEATURE_CLEAR_BHB_HW is set) or after the software sequence (X86_FEATURE_CLEAR_BHB_LOOP) is being used in a virtual machine. Note that X86_FEATURE_CLEAR_BHB_HW and X86_FEATURE_CLEAR_BHB_LOOP are mutually exclusive, so the JIT compiler will only emit the new fence, not the SW sequence, when X86_FEATURE_CLEAR_BHB_HW is set. Hardware that enumerates BHI_NO basically has BHI_DIS_S protections always enabled, regardless of the value of BHI_DIS_S. Since BHI_DIS_S doesn't protect against intra-mode attacks, enumerate BHI bug on BHI_NO hardware as well. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-06 08:18:48 -07:00
Ingo Molnar	83725bdf94	Linux 6.15-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9 FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ b/dCGNg= =xDGd -----END PGP SIGNATURE----- Merge tag 'v6.15-rc4' into x86/asm, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-06 12:03:03 +02:00
Chao Gao	32d5fa804d	x86/fpu: Drop @perm from guest pseudo FPU container Remove @perm from the guest pseudo FPU container. The field is initialized during allocation and never used later. Rename fpu_init_guest_permissions() to show that its sole purpose is to lock down guest permissions. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eric Biggers <ebiggers@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <kees@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mitchell Levy <levymitchell0@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/kvm/af972fe5981b9e7101b64de43c7be0a8cc165323.camel@redhat.com/ Link: https://lore.kernel.org/r/20250506093740.2864458-3-chao.gao@intel.com	2025-05-06 11:52:22 +02:00
Sean Christopherson	d8414603b2	x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm When granting userspace or a KVM guest access to an xfeature, preserve the entity's existing supervisor and software-defined permissions as tracked by __state_perm, i.e. use __state_perm to track all permissions even though all supported supervisor xfeatures are granted to all FPUs and FPU_GUEST_PERM_LOCKED disallows changing permissions. Effectively clobbering supervisor permissions results in inconsistent behavior, as xstate_get_group_perm() will report supervisor features for process that do NOT request access to dynamic user xfeatures, whereas any and all supervisor features will be absent from the set of permissions for any process that is granted access to one or more dynamic xfeatures (which right now means AMX). The inconsistency isn't problematic because fpu_xstate_prctl() already strips out everything except user xfeatures: case ARCH_GET_XCOMP_PERM: /* * Lockless snapshot as it can also change right after the * dropping the lock. / permitted = xstate_get_host_group_perm(); permitted &= XFEATURE_MASK_USER_SUPPORTED; return put_user(permitted, uptr); case ARCH_GET_XCOMP_GUEST_PERM: permitted = xstate_get_guest_group_perm(); permitted &= XFEATURE_MASK_USER_SUPPORTED; return put_user(permitted, uptr); and similarly KVM doesn't apply the __state_perm to supervisor states (kvm_get_filtered_xcr0() incorporates xstate_get_guest_group_perm()): case 0xd: { u64 permitted_xcr0 = kvm_get_filtered_xcr0(); u64 permitted_xss = kvm_caps.supported_xss; But if KVM in particular were to ever change, dropping supervisor permissions would result in subtle bugs in KVM's reporting of supported CPUID settings. And the above behavior also means that having supervisor xfeatures in __state_perm is correctly handled by all users. Dropping supervisor permissions also creates another landmine for KVM. If more dynamic user xfeatures are ever added, requesting access to multiple xfeatures in separate ARCH_REQ_XCOMP_GUEST_PERM calls will result in the second invocation of __xstate_request_perm() computing the wrong ksize, as as the mask passed to xstate_calculate_size() would not contain any* supervisor features. Commit `781c64bfcb` ("x86/fpu/xstate: Handle supervisor states in XSTATE permissions") fudged around the size issue for userspace FPUs, but for reasons unknown skipped guest FPUs. Lack of a fix for KVM "works" only because KVM doesn't yet support virtualizing features that have supervisor xfeatures, i.e. as of today, KVM guest FPUs will never need the relevant xfeatures. Simply extending the hack-a-fix for guests would temporarily solve the ksize issue, but wouldn't address the inconsistency issue and would leave another lurking pitfall for KVM. KVM support for virtualizing CET will likely add CET_KERNEL as a guest-only xfeature, i.e. CET_KERNEL will not be set in xfeatures_mask_supervisor() and would again be dropped when granting access to dynamic xfeatures. Note, the existing clobbering behavior is rather subtle. The @permitted parameter to __xstate_request_perm() comes from: permitted = xstate_get_group_perm(guest); which is either fpu->guest_perm.__state_perm or fpu->perm.__state_perm, where __state_perm is initialized to: fpu->perm.__state_perm = fpu_kernel_cfg.default_features; and copied to the guest side of things: /* Same defaults for guests / fpu->guest_perm = fpu->perm; fpu_kernel_cfg.default_features contains everything except the dynamic xfeatures, i.e. everything except XFEATURE_MASK_XTILE_DATA: fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features; fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC; When __xstate_request_perm() restricts the local "mask" variable to compute the user state size: mask &= XFEATURE_MASK_USER_SUPPORTED; usize = xstate_calculate_size(mask, false); it subtly overwrites the target __state_perm with "mask" containing only user xfeatures: perm = guest ? &fpu->guest_perm : &fpu->perm; / Pairs with the READ_ONCE() in xstate_get_group_perm() */ WRITE_ONCE(perm->__state_perm, mask); Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Allen <john.allen@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mitchell Levy <levymitchell0@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Vignesh Balasubramanian <vigbalas@amd.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xin Li <xin3.li@intel.com> Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/all/ZTqgzZl-reO1m01I@google.com Link: https://lore.kernel.org/r/20250506093740.2864458-2-chao.gao@intel.com	2025-05-06 11:42:04 +02:00
Ahmed S. Darwish	cc663ba3fe	x86/cpu: Sanitize CPUID(0x80000000) output CPUID(0x80000000).EAX returns the max extended CPUID leaf available. On x86-32 machines without an extended CPUID range, a CPUID(0x80000000) query will just repeat the output of the last valid standard CPUID leaf on the CPU; i.e., a garbage values. Current tip:x86/cpu code protects against this by doing: eax = cpuid_eax(0x80000000); c->extended_cpuid_level = eax; if ((eax & 0xffff0000) == 0x80000000) { // CPU has an extended CPUID range. Check for 0x80000001 if (eax >= 0x80000001) { cpuid(0x80000001, ...); } } This is correct so far. Afterwards though, the same possibly broken EAX value is used to check the availability of other extended CPUID leaves: if (c->extended_cpuid_level >= 0x80000007) ... if (c->extended_cpuid_level >= 0x80000008) ... if (c->extended_cpuid_level >= 0x8000000a) ... if (c->extended_cpuid_level >= 0x8000001f) ... which is invalid. Fix this by immediately setting the CPU's max extended CPUID leaf to zero if CPUID(0x80000000).EAX doesn't indicate a valid CPUID extended range. While at it, add a comment, similar to kernel/head_32.S, clarifying the CPUID(0x80000000) sanity check. References: `8a50e5135a` ("x86-32: Use symbolic constants, safer CPUID when enabling EFER.NX") Fixes: `3da99c9776` ("x86: make (early)_identify_cpu more the same between 32bit and 64 bit") Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250506050437.10264-3-darwi@linutronix.de	2025-05-06 10:04:57 +02:00
Ingo Molnar	24035886d7	Linux 6.15-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgX1CgeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGxiIH/A7LHlVatGEQgRFi 0JALDgcuGTMtMU1qD43rv8Z1GXqTpCAlaBt9D1C9cUH/86MGyBTVRWgVy0wkaU2U 8QSfFWQIbrdaIzelHtzmAv5IDtb+KrcX1iYGLcMb6ZYaWkv8/CMzMX1nkgxEr1QT 37Xo3/F17yJumAdNQxdRhVLGy2d3X5rScecpufwh97sMwoddllMCDs2LIoeSAYpG 376/wzni09G2fADa8MEKqcaMue4qcf0FOo/gOkT8YwFGSZLKa6uumlBLg04QoCt0 foK2vfcci1q4H4ZbCu3uQESYGLQHY0f2ICDCwC3m25VF9a81TmlbC3MLum3vhmKe RtLDcXg= =xyaI -----END PGP SIGNATURE----- Merge tag 'v6.15-rc5' into x86/cpu, to resolve conflicts Conflicts: tools/arch/x86/include/asm/cpufeatures.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-06 10:00:58 +02:00
Yazen Ghannam	ab81310287	x86/CPU/AMD: Print the reason for the last reset The following register contains bits that indicate the cause for the previous reset. PMx000000C0 (FCH::PM::S5_RESET_STATUS) This is useful for debug. The reasons for reset are broken into 6 high level categories. Decode it by category and print during boot. Specifics within a category are split off into debugging documentation. The register is accessed indirectly through a "PM" port in the FCH. Use MMIO access in order to avoid restrictions with legacy port access. Use a late_initcall() to ensure that MMIO has been set up before trying to access the register. This register was introduced with AMD Family 17h, so avoid access on older families. There is no CPUID feature bit for this register. [ bp: Simplify the reason dumping loop. - merge a fix to not access an array element after the last one: https://lore.kernel.org/r/20250505133609.83933-1-superm1@kernel.org Reported-by: James Dutton <james.dutton@gmail.com> ] [ mingo: - Use consistent .rst formatting - Fix 'Sleep' class field to 'ACPI-State' - Standardize pin messages around the 'tripped' verbiage - Remove reference to ring-buffer printing & simplify the wording - Use curly braces for multi-line conditional statements ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250422234830.2840784-6-superm1@kernel.org	2025-05-05 15:51:24 +02:00
Kees Cook	960bc2bcba	x86/fpu: Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y crash Borislav Petkov reported the following boot crash on x86-32, with CONFIG_HARDENED_USERCOPY=y: \| usercopy: Kernel memory overwrite attempt detected to SLUB object 'task_struct' (offset 2112, size 160)! \| ... \| kernel BUG at mm/usercopy.c:102! So the useroffset and usersize arguments are what control the allowed window of copying in/out of the "task_struct" kmem cache: /* create a slab on which task_structs can be allocated / task_struct_whitelist(&useroffset, &usersize); task_struct_cachep = kmem_cache_create_usercopy("task_struct", arch_task_struct_size, align, SLAB_PANIC\|SLAB_ACCOUNT, useroffset, usersize, NULL); task_struct_whitelist() positions this window based on the location of the thread_struct within task_struct, and gets the arch-specific details via arch_thread_struct_whitelist(offset, size): static void __init task_struct_whitelist(unsigned long offset, unsigned long size) { / Fetch thread_struct whitelist for the architecture. / arch_thread_struct_whitelist(offset, size); / * Handle zero-sized whitelist or empty thread_struct, otherwise * adjust offset to position of thread_struct in task_struct. / if (unlikely(size == 0)) offset = 0; else offset += offsetof(struct task_struct, thread); } Commit `cb7ca40a38` ("x86/fpu: Make task_struct::thread constant size") removed the logic for the window, leaving: static inline void arch_thread_struct_whitelist(unsigned long offset, unsigned long size) { offset = 0; size = 0; } So now there is no window that usercopy hardening will allow to be copied in/out of task_struct. But as reported above, there is a copy in copy_uabi_to_xstate(). (It seems there are several, actually.) int copy_sigframe_from_user_to_xstate(struct task_struct tsk, const void __user ubuf) { return copy_uabi_to_xstate(x86_task_fpu(tsk)->fpstate, NULL, ubuf, &tsk->thread.pkru); } This appears to be writing into x86_task_fpu(tsk)->fpstate. With or without CONFIG_X86_DEBUG_FPU, this resolves to: ((struct fpu )((void )(task) + sizeof((task)))) i.e. the memory "after task_struct" is cast to "struct fpu", and the uses the "fpstate" pointer. How that pointer gets set looks to be variable, but I think the one we care about here is: fpu->fpstate = &fpu->__fpstate; And struct fpu::__fpstate says: struct fpstate __fpstate; / * WARNING: '__fpstate' is dynamically-sized. Do not put * anything after it here. / So we're still dealing with a dynamically sized thing, even if it's not within the literal struct task_struct -- it's still in the kmem cache, though. Looking at the kmem cache size, it has allocated "arch_task_struct_size" bytes, which is calculated in fpu__init_task_struct_size(): int task_size = sizeof(struct task_struct); task_size += sizeof(struct fpu); / * Subtract off the static size of the register state. * It potentially has a bunch of padding. / task_size -= sizeof(union fpregs_state); / * Add back the dynamically-calculated register state * size. / task_size += fpu_kernel_cfg.default_size; / * We dynamically size 'struct fpu', so we require that * 'state' be at the end of 'it: / CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate); arch_task_struct_size = task_size; So, this is still copying out of the kmem cache for task_struct, and the window seems unchanged (still fpu regs). This is what the window was before: void fpu_thread_struct_whitelist(unsigned long offset, unsigned long size) { offset = offsetof(struct thread_struct, fpu.__fpstate.regs); *size = fpu_kernel_cfg.default_size; } And the same commit I mentioned above removed it. I think the misunderstanding is here: \| The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed, \| now that the FPU structure is not embedded in the task struct anymore, which \| reduces text footprint a bit. Yes, FPU is no longer in task_struct, but it IS in the kmem cache named "task_struct", since the fpstate is still being allocated there. Partially revert the earlier mentioned commit, along with a recalculation of the fpstate regs location. Fixes: `cb7ca40a38` ("x86/fpu: Make task_struct::thread constant size") Reported-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/all/20250409211127.3544993-1-mingo@kernel.org/ # Discussion #1 Link: https://lore.kernel.org/r/202505041418.F47130C4C8@keescook # Discussion #2	2025-05-05 13:24:32 +02:00
Borislav Petkov (AMD)	5214a9f6c0	x86/microcode: Consolidate the loader enablement checking Consolidate the whole logic which determines whether the microcode loader should be enabled or not into a single function and call it everywhere. Well, almost everywhere - not in mk_early_pgtbl_32() because there the kernel is running without paging enabled and checking dis_ucode_ldr et al would require physical addresses and uglification of the code. But since this is 32-bit, the easier thing to do is to simply map the initrd unconditionally especially since that mapping is getting removed later anyway by zap_early_initrd_mapping() and avoid the uglification. In doing so, address the issue of old 486er machines without CPUID support, not booting current kernels. [ mingo: Fix no previous prototype for ‘microcode_loader_disabled’ [-Wmissing-prototypes] ] Fixes: `4c585af718` ("x86/boot/32: Temporarily map initrd for microcode loading") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/CANpbe9Wm3z8fy9HbgS8cuhoj0TREYEEkBipDuhgkWFvqX0UoVQ@mail.gmail.com	2025-05-05 10:51:00 +02:00
Ard Biesheuvel	419cbaf6a5	x86/boot: Add a bunch of PIC aliases Add aliases for all the data objects that the startup code references - this is needed so that this code can be moved into its own confined area where it can only access symbols that have a __pi_ prefix. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-39-ardb+git@google.com	2025-05-04 15:59:43 +02:00
Ard Biesheuvel	bd4a58beaa	x86/boot: Move early_setup_gdt() back into head64.c Move early_setup_gdt() out of the startup code that is callable from the 1:1 mapping - this is not needed, and instead, it is better to expose the helper that does reside in __head directly. This reduces the amount of code that needs special checks for 1:1 execution suitability. In particular, it avoids dealing with the GHCB page (and its physical address) in startup code, which runs from the 1:1 mapping, making physical to virtual translations ambiguous. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-26-ardb+git@google.com	2025-05-04 15:27:23 +02:00
Ingo Molnar	39ffd86dd7	Merge branch 'x86/urgent' into x86/boot, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-04 12:09:02 +02:00
Oleg Nesterov	016a2e6f8a	x86/fpu: Check TIF_NEED_FPU_LOAD instead of PF_KTHREAD\|PF_USER_WORKER in fpu__drop() PF_KTHREAD\|PF_USER_WORKER tasks should never clear TIF_NEED_FPU_LOAD, so the TIF_NEED_FPU_LOAD check should equally filter them out. And this way an exiting userspace task can avoid the unnecessary "fwait" if it does context_switch() at least once on its way to exit_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143856.GA9009@redhat.com	2025-05-04 10:29:25 +02:00
Oleg Nesterov	2d299e3d77	x86/fpu: Always use memcpy_and_pad() in arch_dup_task_struct() It makes no sense to copy the bytes after sizeof(struct task_struct), FPU state will be initialized in fpu_clone(). A plain memcpy(dst, src, sizeof(struct task_struct)) should work too, but "_and_pad" looks safer. [ mingo: Simplify it a bit more. ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143850.GA8997@redhat.com	2025-05-04 10:29:25 +02:00
Oleg Nesterov	392bbe11c7	x86/fpu: Remove x86_init_fpu It is not actually used after: `55bc30f2e3` ("x86/fpu: Remove the thread::fpu pointer") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143837.GA8985@redhat.com	2025-05-04 10:29:24 +02:00
Oleg Nesterov	730faa15a0	x86/fpu: Simplify the switch_fpu_prepare() + switch_fpu_finish() logic Now that switch_fpu_finish() doesn't load the FPU state, it makes more sense to fold it into switch_fpu_prepare() renamed to switch_fpu(), and more importantly, use the "prev_p" task as a target for TIF_NEED_FPU_LOAD. It doesn't make any sense to delay set_tsk_thread_flag(TIF_NEED_FPU_LOAD) until "prev_p" is scheduled again. There is no worry about the very first context switch, fpu_clone() must always set TIF_NEED_FPU_LOAD. Also, shift the test_tsk_thread_flag(TIF_NEED_FPU_LOAD) from the callers to switch_fpu(). Note that the "PF_KTHREAD \| PF_USER_WORKER" check can be removed but this deserves a separate patch which can change more functions, say, kernel_fpu_begin_mask(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143830.GA8982@redhat.com	2025-05-04 10:29:24 +02:00
Ingo Molnar	a78701fe4b	Linux 6.15-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9 FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ b/dCGNg= =xDGd -----END PGP SIGNATURE----- Merge tag 'v6.15-rc4' into x86/fpu, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-04 10:25:52 +02:00
Xin Li (Intel)	444b46a128	x86/msr: Replace wrmsr(msr, low, 0) with wrmsrq(msr, low) The third argument in wrmsr(msr, low, 0) is unnecessary. Instead, use wrmsrq(msr, low), which automatically sets the higher 32 bits of the MSR value to 0. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-15-xin@zytor.com	2025-05-02 10:36:36 +02:00
Xin Li (Intel)	0c2678efed	x86/pvops/msr: Refactor pv_cpu_ops.write_msr{,_safe}() An MSR value is represented as a 64-bit unsigned integer, with existing MSR instructions storing it in EDX:EAX as two 32-bit segments. The new immediate form MSR instructions, however, utilize a 64-bit general-purpose register to store the MSR value. To unify the usage of all MSR instructions, let the default MSR access APIs accept an MSR value as a single 64-bit argument instead of two 32-bit segments. The dual 32-bit APIs are still available as convenient wrappers over the APIs that handle an MSR value as a single 64-bit argument. The following illustrates the updated derivation of the MSR write APIs: __wrmsrq(u32 msr, u64 val) / \ / \ native_wrmsrq(msr, val) native_wrmsr(msr, low, high) \| \| native_write_msr(msr, val) / \ / \ wrmsrq(msr, val) wrmsr(msr, low, high) When CONFIG_PARAVIRT is enabled, wrmsrq() and wrmsr() are defined on top of paravirt_write_msr(): paravirt_write_msr(u32 msr, u64 val) / \ / \ wrmsrq(msr, val) wrmsr(msr, low, high) paravirt_write_msr() invokes cpu.write_msr(msr, val), an indirect layer of pv_ops MSR write call: If on native: cpu.write_msr = native_write_msr If on Xen: cpu.write_msr = xen_write_msr Therefore, refactor pv_cpu_ops.write_msr{_safe}() to accept an MSR value in a single u64 argument, replacing the current dual u32 arguments. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-14-xin@zytor.com	2025-05-02 10:36:36 +02:00
Xin Li (Intel)	3204877d05	x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses __rdmsr() is the lowest level MSR write API, with native_rdmsr() and native_rdmsrq() serving as higher-level wrappers around it. #define native_rdmsr(msr, val1, val2) \ do { \ u64 __val = __rdmsr((msr)); \ (void)((val1) = (u32)__val); \ (void)((val2) = (u32)(__val >> 32)); \ } while (0) static __always_inline u64 native_rdmsrq(u32 msr) { return __rdmsr(msr); } However, __rdmsr() continues to be utilized in various locations. MSR APIs are designed for different scenarios, such as native or pvops, with or without trace, and safe or non-safe. Unfortunately, the current MSR API names do not adequately reflect these factors, making it challenging to select the most appropriate API for various situations. To pave the way for improving MSR API names, convert __rdmsr() uses to native_rdmsrq() to ensure consistent usage. Later, these APIs can be renamed to better reflect their implications, such as native or pvops, with or without trace, and safe or non-safe. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-10-xin@zytor.com	2025-05-02 10:36:35 +02:00
Xin Li (Intel)	519be7da37	x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() uses __wrmsr() is the lowest level MSR write API, with native_wrmsr() and native_wrmsrq() serving as higher-level wrappers around it: #define native_wrmsr(msr, low, high) \ __wrmsr(msr, low, high) #define native_wrmsrl(msr, val) \ __wrmsr((msr), (u32)((u64)(val)), \ (u32)((u64)(val) >> 32)) However, __wrmsr() continues to be utilized in various locations. MSR APIs are designed for different scenarios, such as native or pvops, with or without trace, and safe or non-safe. Unfortunately, the current MSR API names do not adequately reflect these factors, making it challenging to select the most appropriate API for various situations. To pave the way for improving MSR API names, convert __wrmsr() uses to native_wrmsr{,q}() to ensure consistent usage. Later, these APIs can be renamed to better reflect their implications, such as native or pvops, with or without trace, and safe or non-safe. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-8-xin@zytor.com	2025-05-02 10:27:49 +02:00
Xin Li (Intel)	795ada5287	x86/msr: Convert the rdpmc() macro to an __always_inline function Functions offer type safety and better readability compared to macros. Additionally, always inline functions can match the performance of macros. Converting the rdpmc() macro into an always inline function is simple and straightforward, so just make the change. Moreover, the read result is now the returned value, further enhancing readability. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250427092027.1598740-6-xin@zytor.com	2025-05-02 10:26:56 +02:00
Xin Li (Intel)	7d9ccde56b	x86/msr: Rename rdpmcl() to rdpmc() Now that rdpmc() is gone, rdpmcl() is the sole PMC read helper, simply rename rdpmcl() to rdpmc(). Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250427092027.1598740-5-xin@zytor.com	2025-05-02 10:25:24 +02:00
Xin Li (Intel)	efef7f184f	x86/msr: Add explicit includes of <asm/msr.h> For historic reasons there are some TSC-related functions in the <asm/msr.h> header, even though there's an <asm/tsc.h> header. To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h> and to eventually eliminate the inclusion of <asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency to the source files that reference definitions from <asm/msr.h>. [ mingo: Clarified the changelog. ] Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com	2025-05-02 10:23:47 +02:00
Ingo Molnar	c9d8ea9d53	x86/msr: Rename DECLARE_ARGS() to EAX_EDX_DECLARE_ARGS DECLARE_ARGS() is way too generic of a name that says very little about why these args are declared in that fashion - use the EAX_EDX_ prefix to create a common prefix between the three helper methods: EAX_EDX_DECLARE_ARGS() EAX_EDX_VAL() EAX_EDX_RET() Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: linux-kernel@vger.kernel.org	2025-05-02 10:11:17 +02:00
Ingo Molnar	0c7b20b852	Linux 6.15-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9 FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ b/dCGNg= =xDGd -----END PGP SIGNATURE----- Merge tag 'v6.15-rc4' into x86/msr, to pick up fixes and resolve conflicts Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-02 09:43:44 +02:00
Ruben Wauters	003f144ca0	x86/CPU/AMD: Replace strcpy() with strscpy() strcpy() is deprecated due to issues with bounds checking and overflows. Replace it with strscpy(). Signed-off-by: Ruben Wauters <rubenru09@aol.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250429230710.54014-1-rubenru09@aol.com	2025-04-30 19:32:36 +02:00
Annie Li	b43dc4ab09	x86/microcode/AMD: Do not return error when microcode update is not necessary After 6f059e634dcd("x86/microcode: Clarify the late load logic"), if the load is up-to-date, the AMD side returns UCODE_OK which leads to load_late_locked() returning -EBADFD. Handle UCODE_OK in the switch case to avoid this error. [ bp: Massage commit message. ] Fixes: `6f059e634d` ("x86/microcode: Clarify the late load logic") Signed-off-by: Annie Li <jiayanli@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250430053424.77438-1-jiayanli@google.com	2025-04-30 17:10:46 +02:00
David Kaplan	1f4bb068b4	x86/bugs: Restructure SRSO mitigation Restructure SRSO to use select/update/apply functions to create consistent vulnerability handling. Like with retbleed, the command line options directly select mitigations which can later be modified. While at it, remove a comment which doesn't apply anymore due to the changed mitigation detection flow. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-17-david.kaplan@amd.com	2025-04-30 10:25:45 +02:00
David Kaplan	d43ba2dc8e	x86/bugs: Restructure L1TF mitigation Restructure L1TF to use select/apply functions to create consistent vulnerability handling. Define new AUTO mitigation for L1TF. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-16-david.kaplan@amd.com	2025-04-29 18:57:30 +02:00
David Kaplan	5ece59a2fc	x86/bugs: Restructure SSB mitigation Restructure SSB to use select/apply functions to create consistent vulnerability handling. Remove __ssb_select_mitigation() and split the functionality between the select/apply functions. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-15-david.kaplan@amd.com	2025-04-29 18:57:26 +02:00
David Kaplan	480e803dac	x86/bugs: Restructure spectre_v2 mitigation Restructure spectre_v2 to use select/update/apply functions to create consistent vulnerability handling. The spectre_v2 mitigation may be updated based on the selected retbleed mitigation. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-14-david.kaplan@amd.com	2025-04-29 18:53:35 +02:00
David Kaplan	efe313827c	x86/bugs: Restructure BHI mitigation Restructure BHI mitigation to use select/update/apply functions to create consistent vulnerability handling. BHI mitigation was previously selected from within spectre_v2_select_mitigation() and now is selected from cpu_select_mitigation() like with all others. Define new AUTO mitigation for BHI. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-13-david.kaplan@amd.com	2025-04-29 18:51:29 +02:00
David Kaplan	ddfca9430a	x86/bugs: Restructure spectre_v2_user mitigation Restructure spectre_v2_user to use select/update/apply functions to create consistent vulnerability handling. The IBPB/STIBP choices are first decided based on the spectre_v2_user command line but can be modified by the spectre_v2 command line option as well. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-12-david.kaplan@amd.com	2025-04-29 18:51:21 +02:00
David Kaplan	e3b78a7ad5	x86/bugs: Restructure retbleed mitigation Restructure retbleed mitigation to use select/update/apply functions to create consistent vulnerability handling. The retbleed_update_mitigation() simplifies the dependency between spectre_v2 and retbleed. The command line options now directly select a preferred mitigation which simplifies the logic. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-11-david.kaplan@amd.com	2025-04-29 10:22:08 +02:00
Eric Biggers	e59236b5a0	x86/sgx: Use SHA-256 library API instead of crypto_shash API This user of SHA-256 does not support any other algorithm, so the crypto_shash abstraction provides no value. Just use the SHA-256 library API instead, which is much simpler and easier to use. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250428183838.799333-1-ebiggers%40kernel.org	2025-04-28 12:39:33 -07:00
Eric Biggers	c0a62eadb6	x86/microcode/AMD: Use sha256() instead of init/update/final Just call sha256() instead of doing the init/update/final sequence. No functional changes. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250428183006.782501-1-ebiggers@kernel.org	2025-04-28 21:03:58 +02:00
David Kaplan	83d4b19331	x86/bugs: Allow retbleed=stuff only on Intel The retbleed=stuff mitigation is only applicable for Intel CPUs affected by retbleed. If this option is selected for another vendor, print a warning and fall back to the AUTO option. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-10-david.kaplan@amd.com	2025-04-28 19:55:50 +02:00
David Kaplan	46d5925b8e	x86/bugs: Restructure spectre_v1 mitigation Restructure spectre_v1 to use select/apply functions to create consistent vulnerability handling. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-9-david.kaplan@amd.com	2025-04-28 19:40:10 +02:00
David Kaplan	9dcad2fb31	x86/bugs: Restructure GDS mitigation Restructure GDS mitigation to use select/apply functions to create consistent vulnerability handling. Define new AUTO mitigation for GDS. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-8-david.kaplan@amd.com	2025-04-28 15:19:30 +02:00
David Kaplan	2178ac58e1	x86/bugs: Restructure SRBDS mitigation Restructure SRBDS to use select/apply functions to create consistent vulnerability handling. Define new AUTO mitigation for SRBDS. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-7-david.kaplan@amd.com	2025-04-28 15:05:41 +02:00
David Kaplan	6f0960a760	x86/bugs: Remove md_clear_*_mitigation() The functionality in md_clear_update_mitigation() and md_clear_select_mitigation() is now integrated into the select/update functions for the MDS, TAA, MMIO, and RFDS vulnerabilities. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-6-david.kaplan@amd.com	2025-04-28 14:50:33 +02:00
David Kaplan	203d81f8e1	x86/bugs: Restructure RFDS mitigation Restructure RFDS mitigation to use select/update/apply functions to create consistent vulnerability handling. [ bp: Rename the oneline helper to what it checks. ] Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-5-david.kaplan@amd.com	2025-04-28 13:46:11 +02:00
David Kaplan	4a5a04e61d	x86/bugs: Restructure MMIO mitigation Restructure MMIO mitigation to use select/update/apply functions to create consistent vulnerability handling. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-4-david.kaplan@amd.com	2025-04-28 13:22:24 +02:00
David Kaplan	bdd7fce7a8	x86/bugs: Restructure TAA mitigation Restructure TAA mitigation to use select/update/apply functions to create consistent vulnerability handling. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-3-david.kaplan@amd.com	2025-04-28 13:02:04 +02:00
David Kaplan	559c758bc7	x86/bugs: Restructure MDS mitigation Restructure MDS mitigation selection to use select/update/apply functions to create consistent vulnerability handling. [ bp: rename and beef up comment over VERW mitigation selected var for maximum clarity. ] Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-2-david.kaplan@amd.com	2025-04-28 12:53:33 +02:00
Peter Zijlstra	1caafd919e	Merge branch 'perf/urgent' Merge urgent fixes for dependencies. Signed-off-by: Peter Zijlstra <peterz@infradead.org>	2025-04-25 14:55:20 +02:00
Sean Christopherson	edaf3eded3	x86/irq: KVM: Add helper for harvesting PIR to deduplicate KVM and posted MSIs Now that posted MSI and KVM harvesting of PIR is identical, extract the code (and posted MSI's wonderful comment) to a common helper. No functional change intended. Link: https://lore.kernel.org/r/20250401163447.846608-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-04-24 11:19:41 -07:00
Sean Christopherson	f1459315f4	x86/irq: KVM: Track PIR bitmap as an "unsigned long" array Track the PIR bitmap in posted interrupt descriptor structures as an array of unsigned longs instead of using unionized arrays for KVM (u32s) versus IRQ management (u64s). In practice, because the non-KVM usage is (sanely) restricted to 64-bit kernels, all existing usage of the u64 variant is already working with unsigned longs. Using "unsigned long" for the array will allow reworking KVM's processing of the bitmap to read/write in 64-bit chunks on 64-bit kernels, i.e. will allow optimizing KVM by reducing the number of atomic accesses to PIR. Opportunstically replace the open coded literals in the posted MSIs code with the appropriate macro. Deliberately don't use ARRAY_SIZE() in the for-loops, even though it would be cleaner from a certain perspective, in anticipation of decoupling the processing from the array declaration. No functional change intended. Link: https://lore.kernel.org/r/20250401163447.846608-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-04-24 11:19:38 -07:00
Sean Christopherson	3cdb826150	x86/irq: Track if IRQ was found in PIR during initial loop (to load PIR vals) Track whether or not at least one IRQ was found in PIR during the initial loop to load PIR chunks from memory. Doing so generates slightly better code (arguably) for processing the for-loop of XCHGs, especially for the case where there are no pending IRQs. Note, while PIR can be modified between the initial load and the XCHG, it can only _gain_ new IRQs, i.e. there is no danger of a false positive due to the final version of pir_copy[] being empty. Opportunistically convert the boolean to an "unsigned long" and compute the effective boolean result via bitwise-OR. Some compilers, e.g. clang-14, need the extra "hint" to elide conditional branches. Opportunistically rename the variable in anticipation of moving the PIR accesses to a common helper that can be shared by posted MSIs and KVM. Old: <+74>: test %rdx,%rdx <+77>: je 0xffffffff812bbeb0 <handle_pending_pir+144> <pir[0]> <+88>: mov $0x1,%dl> <+90>: test %rsi,%rsi <+93>: je 0xffffffff812bbe8c <handle_pending_pir+108> <pir[1]> <+106>: mov $0x1,%dl <+108>: test %rcx,%rcx <+111>: je 0xffffffff812bbe9e <handle_pending_pir+126> <pir[2]> <+124>: mov $0x1,%dl <+126>: test %rax,%rax <+129>: je 0xffffffff812bbeb9 <handle_pending_pir+153> <pir[3]> <+142>: jmp 0xffffffff812bbec1 <handle_pending_pir+161> <+144>: xor %edx,%edx <+146>: test %rsi,%rsi <+149>: jne 0xffffffff812bbe7f <handle_pending_pir+95> <+151>: jmp 0xffffffff812bbe8c <handle_pending_pir+108> <+153>: test %dl,%dl <+155>: je 0xffffffff812bbf8e <handle_pending_pir+366> New: <+74>: mov %rax,%r8 <+77>: or %rcx,%r8 <+80>: or %rdx,%r8 <+83>: or %rsi,%r8 <+86>: setne %bl <+89>: je 0xffffffff812bbf88 <handle_pending_pir+360> <+95>: test %rsi,%rsi <+98>: je 0xffffffff812bbe8d <handle_pending_pir+109> <pir[0]> <+109>: test %rdx,%rdx <+112>: je 0xffffffff812bbe9d <handle_pending_pir+125> <pir[1]> <+125>: test %rcx,%rcx <+128>: je 0xffffffff812bbead <handle_pending_pir+141> <pir[2]> <+141>: test %rax,%rax <+144>: je 0xffffffff812bbebd <handle_pending_pir+157> <pir[3]> Link: https://lore.kernel.org/r/20250401163447.846608-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-04-24 11:19:37 -07:00
Sean Christopherson	600e960604	x86/irq: Ensure initial PIR loads are performed exactly once Ensure the PIR is read exactly once at the start of handle_pending_pir(), to guarantee that checking for an outstanding posted interrupt in a given chuck doesn't reload the chunk from the "real" PIR. Functionally, a reload is benign, but it would defeat the purpose of pre-loading into a copy. Fixes: `1b03d82ba1` ("x86/irq: Install posted MSI notification handler") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250401163447.846608-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-04-24 11:19:37 -07:00
Juergen Gross	4ce385f564	x86/mm: Fix _pgd_alloc() for Xen PV mode Recently _pgd_alloc() was switched from using __get_free_pages() to pagetable_alloc_noprof(), which might return a compound page in case the allocation order is larger than 0. On x86 this will be the case if CONFIG_MITIGATION_PAGE_TABLE_ISOLATION is set, even if PTI has been disabled at runtime. When running as a Xen PV guest (this will always disable PTI), using a compound page for a PGD will result in VM_BUG_ON_PGFLAGS being triggered when the Xen code tries to pin the PGD. Fix the Xen issue together with the not needed 8k allocation for a PGD with PTI disabled by replacing PGD_ALLOCATION_ORDER with an inline helper returning the needed order for PGD allocations. Fixes: `a9b3c355c2` ("asm-generic: pgalloc: provide generic __pgd_{alloc,free}") Reported-by: Petr Vaněk <arkamar@atlas.cz> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Petr Vaněk <arkamar@atlas.cz> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250422131717.25724-1-jgross%40suse.com	2025-04-23 07:49:14 -07:00
Ingo Molnar	a1b582a3ff	Merge branch 'x86/urgent' into x86/boot, to merge dependent commit and upstream fixes In particular we need this fix before applying subsequent changes: `d54d610243` ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-22 09:09:21 +02:00
Dave Hansen	4e2c719782	x86/cpu: Help users notice when running old Intel microcode Old microcode is bad for users and for kernel developers. For users, it exposes them to known fixed security and/or functional issues. These obviously rarely result in instant dumpster fires in every environment. But it is as important to keep your microcode up to date as it is to keep your kernel up to date. Old microcode also makes kernels harder to debug. A developer looking at an oops need to consider kernel bugs, known CPU issues and unknown CPU issues as possible causes. If they know the microcode is up to date, they can mostly eliminate known CPU issues as the cause. Make it easier to tell if CPU microcode is out of date. Add a list of released microcode. If the loaded microcode is older than the release, tell users in a place that folks can find it: /sys/devices/system/cpu/vulnerabilities/old_microcode Tell kernel kernel developers about it with the existing taint flag: TAINT_CPU_OUT_OF_SPEC == Discussion == When a user reports a potential kernel issue, it is very common to ask them to reproduce the issue on mainline. Running mainline, they will (independently from the distro) acquire a more up-to-date microcode version list. If their microcode is old, they will get a warning about the taint and kernel developers can take that into consideration when debugging. Just like any other entry in "vulnerabilities/", users are free to make their own assessment of their exposure. == Microcode Revision Discussion == The microcode versions in the table were generated from the Intel microcode git repo: 8ac9378a8487 ("microcode-20241112 Release") which as of this writing lags behind the latest microcode-20250211. It can be argued that the versions that the kernel picks to call "old" should be a revision or two old. Which specific version is picked is less important to me than picking a version and enforcing it. This repository contains only microcode versions that Intel has deemed to be OS-loadable. It is quite possible that the BIOS has loaded a newer microcode than the latest in this repo. If this happens, the system is considered to have new microcode, not old. Specifically, the sysfs file and taint flag answer the question: Is the CPU running on the latest OS-loadable microcode, or something even later that the BIOS loaded? In other words, Intel never publishes an authoritative list of CPUs and latest microcode revisions. Until it does, this is the best that Linux can do. Also note that the "intel-ucode-defs.h" file is simple, ugly and has lots of magic numbers. That's on purpose and should allow a single file to be shared across lots of stable kernel regardless of if they have the new "VFM" infrastructure or not. It was generated with a dumb script. == FAQ == Q: Does this tell me if my system is secure or insecure? A: No. It only tells you if your microcode was old when the system booted. Q: Should the kernel warn if the microcode list itself is too old? A: No. New kernels will get new microcode lists, both mainline and stable. The only way to have an old list is to be running an old kernel in which case you have bigger problems. Q: Is this for security or functional issues? A: Both. Q: If a given microcode update only has functional problems but no security issues, will it be considered old? A: Yes. All microcode image versions within a microcode release are treated identically. Intel appears to make security updates without disclosing them in the release notes. Thus, all updates are considered to be security-relevant. Q: Who runs old microcode? A: Anybody with an old distro. This happens all the time inside of Intel where there are lots of weird systems in labs that might not be getting regular distro updates and might also be running rather exotic microcode images. Q: If I update my microcode after booting will it stop saying "Vulnerable"? A: No. Just like all the other vulnerabilies, you need to reboot before the kernel will reassess your vulnerability. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "Ahmed S. Darwish" <darwi@linutronix.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/all/20250421195659.CF426C07%40davehans-spike.ostc.intel.com (cherry picked from commit 9127865b15eb0a1bd05ad7efe29489c44394bdc1)	2025-04-22 08:33:52 +02:00
Ingo Molnar	c96f564e6f	Merge branch 'x86/cpu' into x86/microcode, to pick up dependent commits Avoid a conflict in <asm/cpufeatures.h> by merging pending x86/cpu changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-22 08:31:41 +02:00
Mike Rapoport (Microsoft)	83b2d345e1	x86/e820: Discard high memory that can't be addressed by 32-bit systems Dave Hansen reports the following crash on a 32-bit system with CONFIG_HIGHMEM=y and CONFIG_X86_PAE=y: > 0xf75fe000 is the mem_map[] entry for the first page >4GB. It > obviously wasn't allocated, thus the oops. BUG: unable to handle page fault for address: f75fe000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page pdpt = 0000000002da2001 pde = 000000000300c067 *pte = 0000000000000000 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-00288-ge618ee89561b-dirty #311 PREEMPT(undef) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 EIP: __free_pages_core+0x3c/0x74 ... Call Trace: memblock_free_pages+0x11/0x2c memblock_free_all+0x2ce/0x3a0 mm_core_init+0xf5/0x320 start_kernel+0x296/0x79c i386_start_kernel+0xad/0xb0 startup_32_smp+0x151/0x154 The mem_map[] is allocated up to the end of ZONE_HIGHMEM which is defined by max_pfn. The bug was introduced by this recent commit: `6faea3422e` ("arch, mm: streamline HIGHMEM freeing") Previously, freeing of high memory was also clamped to the end of ZONE_HIGHMEM but after this change, memblock_free_all() tries to free memory above the of ZONE_HIGHMEM as well and that causes access to mem_map[] entries beyond the end of the memory map. To fix this, discard the memory after max_pfn from memblock on 32-bit systems so that core MM would be aware only of actually usable memory. Fixes: `6faea3422e` ("arch, mm: streamline HIGHMEM freeing") Reported-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Shevchenko <andy@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davide Ciminaghi <ciminaghi@gnudd.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/r/20250413080858.743221-1-rppt@kernel.org # discussion and submission	2025-04-19 16:48:18 +02:00
Linus Torvalds	3088d26962	Miscellaneous x86 fixes: - Fix hypercall detection on Xen guests - Extend the AMD microcode loader SHA check to Zen5, to block loading of any unreleased standalone Zen5 microcode patches - Add new Intel CPU model number for Bartlett Lake - Fix the workaround for AMD erratum 1054 - Fix buggy early memory acceptance between SEV-SNP guests and the EFI stub Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgCuUsRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jnuRAAqj+XgVPE25Yw0qyFjWxJwHg3duPPMBql BTs6766d8DJTZI2x+zGmiJzxUynKyMItQnkhS7w2n/T11YKw/wW2nMOYNqBgFJwS A7au/4PSp0LJARO5I3GxEVUFDQwawsf8ly+OeOZacKhtycmxL3Pr9pMHU98GWb5C TvqKQlohp/YI3SKpLxSE/LCJJZr/8o7uOt3i95Rx8fH9zEp4Bat5OPpFpPZpefDS kWnQKFq+iSwDldPMg00/SdpFDZVqhHItodeMqJdz6MMa7+sPB5dyjYNGuT9Dvf03 zMv3mjWFDTPjezvuQH+kTxhOfgFtV8VI+b35c2JqTyqkvSzrcrOV1W2EJausSt4H D//UXzDaAcJJaq/YWBuX+DaajyRdVl6i8trtgKMM0BWRPa7wTBFiJU7Lvt73gW/s 8/c5+V0iI0tkySkqoCZJKVwVVxHDxf9z5CQomEwupf7SrI+O8gjjwi0F8NwV0Zeo kP8InCOHVWFHKqf5G4lVsF7qqLgCSJFkKeyVXR8ZHtrcqEMDoF4eZDfky36K5d8f OMMWF/LAh3Fa2CyQDdwZkqtDi2D+3+99Bbw3zixOZpElPB90jtRrxbu2tfVOE8nC RCjdqLMYq7EPlrCzPiq85PbQjYLA8gDJw9WUZkeb3KJzFdv3zyV9VHc8eMF83JNq gRnKlwXAXPU= =cqkj -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix hypercall detection on Xen guests - Extend the AMD microcode loader SHA check to Zen5, to block loading of any unreleased standalone Zen5 microcode patches - Add new Intel CPU model number for Bartlett Lake - Fix the workaround for AMD erratum 1054 - Fix buggy early memory acceptance between SEV-SNP guests and the EFI stub * tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/sev: Avoid shared GHCB page for early memory acceptance x86/cpu/amd: Fix workaround for erratum 1054 x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove cores x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any unreleased standalone Zen5 microcode patches x86/xen: Fix __xen_hypercall_setfunc()	2025-04-18 14:04:57 -07:00
Sandipan Das	263e55949d	x86/cpu/amd: Fix workaround for erratum 1054 Erratum 1054 affects AMD Zen processors that are a part of Family 17h Models 00-2Fh and the workaround is to not set HWCR[IRPerfEn]. However, when X86_FEATURE_ZEN1 was introduced, the condition to detect unaffected processors was incorrectly changed in a way that the IRPerfEn bit gets set only for unaffected Zen 1 processors. Ensure that HWCR[IRPerfEn] is set for all unaffected processors. This includes a subset of Zen 1 (Family 17h Models 30h and above) and all later processors. Also clear X86_FEATURE_IRPERF on affected processors so that the IRPerfCount register is not used by other entities like the MSR PMU driver. Fixes: `232afb5578` ("x86/CPU/AMD: Add X86_FEATURE_ZEN1") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/caa057a9d6f8ad579e2f1abaa71efbd5bd4eaf6d.1744956467.git.sandipan.das@amd.com	2025-04-18 14:29:47 +02:00
Uros Bizjak	3ce4b1f1f2	x86/asm: Rename rep_nop() to native_pause() Rename rep_nop() function to what it really does. No functional change intended. Suggested-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lore.kernel.org/r/20250418080805.83679-1-ubizjak@gmail.com	2025-04-18 10:19:26 +02:00
Uros Bizjak	d109ff4f0b	x86/asm: Replace "REP; NOP" with PAUSE mnemonic Current minimum required version of binutils is 2.25, which supports PAUSE instruction mnemonic. Replace "REP; NOP" with this proper mnemonic. No functional change intended. Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lore.kernel.org/r/20250418080805.83679-2-ubizjak@gmail.com	2025-04-18 10:19:25 +02:00
Uros Bizjak	42c782fae3	x86/asm: Remove semicolon from "rep" prefixes Minimum version of binutils required to compile the kernel is 2.25. This version correctly handles the "rep" prefixes, so it is possible to remove the semicolon, which was used to support ancient versions of GNU as. Due to the semicolon, the compiler considers "rep; insn" (or its alternate "rep\n\tinsn" form) as two separate instructions. Removing the semicolon makes asm length calculations more accurate, consequently making scheduling and inlining decisions of the compiler more accurate. Removing the semicolon also enables assembler checks involving "rep" prefixes. Trying to assemble e.g. "rep addl %eax, %ebx" results in: Error: invalid instruction `add' after `rep' Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20250418071437.4144391-2-ubizjak@gmail.com	2025-04-18 09:33:33 +02:00
Jiri Olsa	610f6e14c2	uprobes/x86: Add support to emulate NOP instructions Add support to emulate all NOP instructions as the original uprobe instruction. This change speeds up uprobe on top of all NOP instructions and is a preparation for usdt probe optimization, that will be done on top of NOP5 instructions. With this change the usdt probe on top of NOP5s won't take the performance hit compared to usdt probe on top of standard NOP instructions. Suggested-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Hao Luo <haoluo@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20250414083647.1234007-1-jolsa@kernel.org	2025-04-18 09:03:05 +02:00
Pawan Gupta	d9b79111fd	x86/bugs: Rename mmio_stale_data_clear to cpu_buf_vm_clear The static key mmio_stale_data_clear controls the KVM-only mitigation for MMIO Stale Data vulnerability. Rename it to reflect its purpose. No functional change. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250416-mmio-rename-v2-1-ad1f5488767c@linux.intel.com	2025-04-16 19:40:01 +02:00
Chang S. Bae	de8304c319	x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs() The original function name came from an overly compressed form of 'fpstate_regs' by commit: `e61d6310a0` ("x86/fpu: Reset permission and fpstate on exec()") However, the term 'fpregs' typically refers to physical FPU registers. In contrast, this function copies the init values to fpu->fpstate->regs, not hardware registers. Rename the function to better reflect what it actually does. No functional change. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250416021720.12305-11-chang.seok.bae@intel.com	2025-04-16 10:01:03 +02:00
Chang S. Bae	70fe4a0266	x86/fpu: Remove export of mxcsr_feature_mask The variable was previously referenced in KVM code but the last usage was removed by: `ea4d6938d4` ("x86/fpu: Replace KVMs home brewed FPU copy from user") Remove its export symbol. Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250416021720.12305-10-chang.seok.bae@intel.com	2025-04-16 10:01:03 +02:00
Chang S. Bae	d1e420772c	x86/pkeys: Simplify PKRU update in signal frame The signal delivery logic was modified to always set the PKRU bit in xregs_state->header->xfeatures by this commit: `ae6012d72f` ("x86/pkeys: Ensure updated PKRU value is XRSTOR'd") However, the change derives the bitmask value using XGETBV(1), rather than simply updating the buffer that already holds the value. Thus, this approach induces an unnecessary dependency on XGETBV1 for PKRU handling. Eliminate the dependency by using the established helper function. Subsequently, remove the now-unused 'mask' argument. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20250416021720.12305-9-chang.seok.bae@intel.com	2025-04-16 10:01:03 +02:00
Chang S. Bae	64e54461ab	x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE Currently, saving register states in the signal frame, the legacy feature bits are always set in xregs_state->header->xfeatures. This code sequence can be generalized for reuse in similar cases. Refactor the logic to ensure a consistent approach across similar usages. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250416021720.12305-8-chang.seok.bae@intel.com	2025-04-16 10:01:00 +02:00
Chang S. Bae	39cd7fad39	x86/fpu: Log XSAVE disablement consistently Not all paths that lead to fpu__init_disable_system_xstate() currently emit a message indicating that XSAVE has been disabled. Move the print statement into the function to ensure the message in all cases. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250416021720.12305-7-chang.seok.bae@intel.com	2025-04-16 09:44:15 +02:00
Chang S. Bae	50c5b071e2	x86/fpu/apx: Enable APX state support With securing APX against conflicting MPX, it is now ready to be enabled. Include APX in the enabled xfeature set. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250416021720.12305-5-chang.seok.bae@intel.com	2025-04-16 09:44:14 +02:00
Chang S. Bae	ea68e39190	x86/fpu/apx: Disallow conflicting MPX presence XSTATE components are architecturally independent. There is no rule requiring their offsets in the non-compacted format to be strictly ascending or mutually non-overlapping. However, in practice, such overlaps have not occurred -- until now. APX is introduced as xstate component 19, following AMX. In the non-compacted XSAVE format, its offset overlaps with the space previously occupied by the now-deprecated MPX feature: `45fc24e89b` ("x86/mpx: remove MPX from arch/x86") To prevent conflicts, the kernel must ensure the CPU never expose both features at the same time. If so, it indicates unreliable hardware. In such cases, XSAVE should be disabled entirely as a precautionary measure. Add a sanity check to detect this condition and disable XSAVE if an invalid hardware configuration is identified. Note: MPX state components remain enabled on legacy systems solely for KVM guest support. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250416021720.12305-4-chang.seok.bae@intel.com	2025-04-16 09:44:14 +02:00
Chang S. Bae	bd0b10b795	x86/fpu/apx: Define APX state component Advanced Performance Extensions (APX) is associated with a new state component number 19. To support saving and restoring of the corresponding registers via the XSAVE mechanism, introduce the component definition along with the necessary sanity checks. Define the new component number, state name, and those register data type. Then, extend the size checker to validate the register data type and explicitly list the APX feature flag as a dependency for the new component in xsave_cpuid_features[]. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250416021720.12305-3-chang.seok.bae@intel.com	2025-04-16 09:44:14 +02:00
Chang S. Bae	b02dc185ee	x86/cpufeatures: Add X86_FEATURE_APX Intel Advanced Performance Extensions (APX) introduce a new set of general-purpose registers, managed as an extended state component via the xstate management facility. Before enabling this new xstate, define a feature flag to clarify the dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level 7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag in a scattered feature word. While this feature is intended only for userspace, exposing it via /proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2) mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the feature availability. Finally, clarify that APX depends on XSAVE. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250416021720.12305-2-chang.seok.bae@intel.com	2025-04-16 09:44:13 +02:00
Ingo Molnar	4e2547509f	Merge branch 'x86/cpu' into x86/fpu, to pick up dependent commits Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-16 09:35:49 +02:00
Ingo Molnar	06e09002bc	Merge branch 'linus' into x86/cpu, to resolve conflicts Conflicts: tools/arch/x86/include/asm/cpufeatures.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-16 07:03:58 +02:00
Xin Li (Intel)	3aba0b40ca	x86/cpufeatures: Shorten X86_FEATURE_AMD_HETEROGENEOUS_CORES Shorten X86_FEATURE_AMD_HETEROGENEOUS_CORES to X86_FEATURE_AMD_HTR_CORES to make the last column aligned consistently in the whole file. No functional changes. Suggested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250415175410.2944032-4-xin@zytor.com	2025-04-15 22:09:20 +02:00
Xin Li (Intel)	13327fada7	x86/cpufeatures: Shorten X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT Shorten X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT to X86_FEATURE_CLEAR_BHB_VMEXIT to make the last column aligned consistently in the whole file. There's no need to explain in the name what the mitigation does. No functional changes. Suggested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250415175410.2944032-3-xin@zytor.com	2025-04-15 22:09:16 +02:00
Borislav Petkov (AMD)	dd86a1d013	x86/bugs: Remove X86_BUG_MMIO_UNKNOWN Whack this thing because: - the "unknown" handling is done only for this vuln and not for the others - it doesn't do anything besides reporting things differently. It doesn't apply any mitigations - it is simply causing unnecessary complications to the code which don't bring anything besides maintenance overhead to what is already a very nasty spaghetti pile - all the currently unaffected CPUs can also be in "unknown" status so there's no need for special handling here so get rid of it. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: David Kaplan <david.kaplan@amd.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://lore.kernel.org/r/20250414150951.5345-1-bp@kernel.org	2025-04-14 17:15:27 +02:00
Ingo Molnar	0a35c9280a	x86/platform/amd: Move the <asm/amd_node.h> header to <asm/amd/node.h> Collect AMD specific platform header files in <asm/amd/*.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <superm1@kernel.org> Link: https://lore.kernel.org/r/20250413084144.3746608-7-mingo@kernel.org	2025-04-14 09:34:17 +02:00
Ingo Molnar	bcbb655595	x86/platform/amd: Move the <asm/amd_nb.h> header to <asm/amd/nb.h> Collect AMD specific platform header files in <asm/amd/*.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <superm1@kernel.org> Link: https://lore.kernel.org/r/20250413084144.3746608-4-mingo@kernel.org	2025-04-14 09:34:14 +02:00
Ingo Molnar	e3a52b67f5	x86/fpu: Clarify FPU context cacheline alignment Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/Z_ejggklB5-IWB5W@gmail.com	2025-04-14 08:18:29 +02:00
Ingo Molnar	8b2a7a7294	x86/fpu: Use 'fpstate' variable names consistently A few uses of 'fps' snuck in, which is rather confusing (to me) as it suggests frames-per-second. ;-) Rename them to the canonical 'fpstate' name. No change in functionality. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250409211127.3544993-9-mingo@kernel.org	2025-04-14 08:18:29 +02:00
Ingo Molnar	22aafe3bcb	x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks init_task's FPU state initialization was a bit of a hack: __x86_init_fpu_begin = .; . = __x86_init_fpu_begin + 128PAGE_SIZE; __x86_init_fpu_end = .; But the init task isn't supposed to be using the FPU context in any case, so remove the hack and add in some debug warnings. As Linus noted in the discussion, the init task (and other PF_KTHREAD tasks) can* use the FPU via kernel_fpu_begin()/_end(), but they don't need the context area because their FPU use is not preemptible or reentrant, and they don't return to user-space. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250409211127.3544993-8-mingo@kernel.org	2025-04-14 08:18:29 +02:00
Ingo Molnar	c360bdc593	x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD\|PF_USER_WORKER tasks during exit fpu__drop() and arch_release_task_struct() calls x86_task_fpu() unconditionally, while the FPU context area will not be present if it's the init task, and should not be in use when it's some other type of kthread. Return early for PF_KTHREAD or PF_USER_WORKER tasks. The debug warning in x86_task_fpu() will catch any kthreads attempting to use the FPU save area. Fixed-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250409211127.3544993-7-mingo@kernel.org	2025-04-14 08:18:29 +02:00
Ingo Molnar	ec2227e03a	x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call This encapsulates the fpu__drop() functionality better, and it will also enable other changes that want to check a task for PF_KTHREAD before calling x86_task_fpu(). Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250409211127.3544993-6-mingo@kernel.org	2025-04-14 08:18:29 +02:00
Ingo Molnar	55bc30f2e3	x86/fpu: Remove the thread::fpu pointer As suggested by Oleg, remove the thread::fpu pointer, as we can calculate it via x86_task_fpu() at compile-time. This improves code generation a bit: kepler:~/tip> size vmlinux.before vmlinux.after text data bss dec hex filename 26475405 10435342 1740804 38651551 24dc69f vmlinux.before 26475339 10959630 1216516 38651485 24dc65d vmlinux.after Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250409211127.3544993-5-mingo@kernel.org	2025-04-14 08:18:29 +02:00
Ingo Molnar	cb7ca40a38	x86/fpu: Make task_struct::thread constant size Turn thread.fpu into a pointer. Since most FPU code internals work by passing around the FPU pointer already, the code generation impact is small. This allows us to remove the old kludge of task_struct being variable size: struct task_struct { ... /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. / randomized_struct_fields_end / CPU-specific state of this task: / struct thread_struct thread; / * WARNING: on x86, 'thread_struct' contains a variable-sized * structure. It MUST be at the end of 'task_struct'. * * Do not put anything below here! */ }; ... which creates a number of problems, such as requiring thread_struct to be the last member of the struct - not allowing it to be struct-randomized, etc. But the primary motivation is to allow the decoupling of task_struct from hardware details (<asm/processor.h> in particular), and to eventually allow the per-task infrastructure: DECLARE_PER_TASK(type, name); ... per_task(current, name) = val; ... which requires task_struct to be a constant size struct. The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed, now that the FPU structure is not embedded in the task struct anymore, which reduces text footprint a bit. Fixed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250409211127.3544993-4-mingo@kernel.org	2025-04-14 08:18:29 +02:00
Ingo Molnar	e3bfa38599	x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu() This will make the removal of the task_struct::thread.fpu array easier. No change in functionality - code generated before and after this commit is identical on x86-defconfig: kepler:~/tip> diff -up vmlinux.before.asm vmlinux.after.asm kepler:~/tip> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250409211127.3544993-3-mingo@kernel.org	2025-04-14 08:18:29 +02:00
Chang S. Bae	cbe8e4dab1	x86/fpu/xstate: Adjust xstate copying logic for user ABI == Background == As feature positions in the userspace XSAVE buffer do not always align with their feature numbers, the XSAVE format conversion needs to be reconsidered to align with the revised xstate size calculation logic. * For signal handling, XSAVE and XRSTOR are used directly to save and restore extended registers. * For ptrace, KVM, and signal returns (for 32-bit frame), the kernel copies data between its internal buffer and the userspace XSAVE buffer. If memcpy() were used for these cases, existing offset helpers — such as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to handle the format conversion. == Problem == When copying data from the compacted in-kernel buffer to the non-compacted userspace buffer, the function follows the user_regset_get2_fn() prototype. This means it utilizes struct membuf helpers for the destination buffer. As defined in regset.h, these helpers update the memory pointer during the copy process, enforcing sequential writes within the loop. Since xstate components are processed sequentially, any component whose buffer position does not align with its feature number has an issue. == Solution == Replace for_each_extended_xfeature() with the newly introduced for_each_extended_xfeature_in_order(). This macro ensures xstate components are handled in the correct order based on their actual positions in the destination buffer, rather than their feature numbers. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250320234301.8342-5-chang.seok.bae@intel.com	2025-04-14 08:18:29 +02:00
Chang S. Bae	a758ae2885	x86/fpu/xstate: Adjust XSAVE buffer size calculation The current xstate size calculation assumes that the highest-numbered xstate feature has the highest offset in the buffer, determining the size based on the topmost bit in the feature mask. However, this assumption is not architecturally guaranteed -- higher-numbered features may have lower offsets. With the introduction of the xfeature order table and its helper macro, xstate components can now be traversed in their positional order. Update the non-compacted format handling to iterate through the table to determine the last-positioned feature. Then, set the offset accordingly. Since size calculation primarily occurs during initialization or in non-critical paths, looping to find the last feature is not expected to have a meaningful performance impact. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250320234301.8342-4-chang.seok.bae@intel.com	2025-04-14 08:18:29 +02:00
Chang S. Bae	15d51a2f6f	x86/fpu/xstate: Introduce xfeature order table and accessor macro The kernel has largely assumed that higher xstate component numbers correspond to later offsets in the buffer. However, this assumption no longer holds for the non-compacted format, where a newer state component may have a lower offset. When iterating over xstate components in offset order, using the feature number as an index may be misleading. At the same time, the CPU exposes each component’s size and offset based on its feature number, making it a key for state information. To provide flexibility in handling xstate ordering, introduce a mapping table: feature order -> feature number. The table is dynamically populated based on the CPU-exposed features and is sorted in offset order at boot time. Additionally, add an accessor macro to facilitate sequential traversal of xstate components based on their actual buffer positions, given a feature bitmask. This accessor macro will be particularly useful for computing custom non-compacted format sizes and iterating over xstate offsets in non-compacted buffers. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250320234301.8342-3-chang.seok.bae@intel.com	2025-04-14 08:18:29 +02:00
Chang S. Bae	031b33ef1a	x86/fpu/xstate: Remove xstate offset check Traditionally, new xstate components have been assigned sequentially, aligning feature numbers with their offsets in the XSAVE buffer. However, this ordering is not architecturally mandated in the non-compacted format, where a component's offset may not correspond to its feature number. The kernel caches CPUID-reported xstate component details, including size and offset in the non-compacted format. As part of this process, a sanity check is also conducted to ensure alignment between feature numbers and offsets. This check was likely intended as a general guideline rather than a strict requirement. Upcoming changes will support out-of-order offsets. Remove the check as becoming obsolete. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250320234301.8342-2-chang.seok.bae@intel.com	2025-04-14 08:18:29 +02:00
Borislav Petkov (AMD)	805b743fc1	x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any unreleased standalone Zen5 microcode patches All Zen5 machines out there should get BIOS updates which update to the correct microcode patches addressing the microcode signature issue. However, silly people carve out random microcode blobs from BIOS packages and think are doing other people a service this way... Block loading of any unreleased standalone Zen5 microcode patches. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Cc: Nikolay Borisov <nik.borisov@suse.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250410114222.32523-1-bp@kernel.org	2025-04-12 21:09:42 +02:00
Ard Biesheuvel	dbe0ad775c	x86/boot: Move early kernel mapping code into startup/ The startup code that constructs the kernel virtual mapping runs from the 1:1 mapping of memory itself, and therefore, cannot use absolute symbol references. Before making changes in subsequent patches, move this code into a separate source file under arch/x86/boot/startup/ where all such code will be kept from now on. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-16-ardb+git@google.com	2025-04-12 11:13:05 +02:00
Ard Biesheuvel	4cecebf200	x86/boot: Move the early GDT/IDT setup code into startup/ Move the early GDT/IDT setup code that runs long before the kernel virtual mapping is up into arch/x86/boot/startup/, and build it in a way that ensures that the code tolerates being called from the 1:1 mapping of memory. The code itself is left unchanged by this patch. Also tweak the sed symbol matching pattern in the decompressor to match on lower case 't' or 'b', as these will be emitted by Clang for symbols with hidden linkage. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-15-ardb+git@google.com	2025-04-12 11:13:04 +02:00
Ard Biesheuvel	bcceba3c72	x86/asm: Make rip_rel_ptr() usable from fPIC code RIP_REL_REF() is used in non-PIC C code that is called very early, before the kernel virtual mapping is up, which is the mapping that the linker expects. It is currently used in two different ways: - to refer to the value of a global variable, including as an lvalue in assignments; - to take the address of a global variable via the mapping that the code currently executes at. The former case is only needed in non-PIC code, as PIC code will never use absolute symbol references when the address of the symbol is not being used. But taking the address of a variable in PIC code may still require extra care, as a stack allocated struct assignment may be emitted as a memcpy() from a statically allocated copy in .rodata. For instance, this void startup_64_setup_gdt_idt(void) { struct desc_ptr startup_gdt_descr = { .address = (__force unsigned long)gdt_page.gdt, .size = GDT_SIZE - 1, }; may result in an absolute symbol reference in PIC code, even though the struct is allocated on the stack and populated at runtime. To address this case, make rip_rel_ptr() accessible in PIC code, and update any existing uses where the address of a global variable is taken using RIP_REL_REF. Once all code of this nature has been moved into arch/x86/boot/startup and built with -fPIC, RIP_REL_REF() can be retired, and only rip_rel_ptr() will remain. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-14-ardb+git@google.com	2025-04-12 11:13:04 +02:00
Peter Zijlstra	4873f494bb	x86/mm: Remove 'mm' argument from unuse_temporary_mm() again Now that unuse_temporary_mm() lives in tlb.c it can access cpu_tlbstate.loaded_mm. [ mingo: Merged it on top of x86/alternatives ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20250402094540.3586683-5-mingo@kernel.org	2025-04-12 10:05:56 +02:00
Andy Lutomirski	d376972c98	x86/mm: Make use_/unuse_temporary_mm() non-static This prepares them for use outside of the alternative machinery. The code is unchanged. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20250402094540.3586683-4-mingo@kernel.org	2025-04-12 10:05:52 +02:00
Peter Zijlstra	0812e096cf	x86/mm: Add 'mm' argument to unuse_temporary_mm() In commit `209954cbc7` ("x86/mm/tlb: Update mm_cpumask lazily") unuse_temporary_mm() grew the assumption that it gets used on poking_mm exclusively. While this is currently true, lets not hard code this assumption. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20250402094540.3586683-2-mingo@kernel.org	2025-04-12 10:05:37 +02:00
Ahmed S. Darwish	62e5652739	x86/cacheinfo: Standardize header files and CPUID references Reference header files using their canonical form <linux/cacheinfo.h>. Standardize on CPUID(0xN), instead of CPUID(N), for all standard leaves. This removes ambiguity and aligns them with their extended counterparts like CPUID(0x8000001d). References: `0dd09e215a` ("x86/cacheinfo: Apply maintainer-tip coding style fixes") Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/r/20250411070401.1358760-3-darwi@linutronix.de	2025-04-11 11:14:55 +02:00
Ingo Molnar	9f13acb240	Linux 6.15-rc1 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfy3/YeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/ygIAItY5dzf5fVnVEPy UrF+EzIaWGWRw3N+41AyT5X7z77FPX7E0cA6MD4KxfWW/OYzeAoeZSyrM2xIsEh3 26qiohvJjpHjfHdzvKmxNItvW8+xBv3km00U/CWWqJo89JsIVnJtrSBHOut2/gNp f6sGoOrrR4GXXz8JX3yG/pmizr23lN81ZkVdz0ayYEK4uY92hSsBspvyFWcdffgF o8NCtR+JVGac8xm+f3VPSLyunLMXsh8NWETumMHP6tHQif36I3BQqeU8DgXCgjEK pfZ8gEyRtXIKbEt+qniUetT+2Cwu/lAN2GjTu0LqIe9Ro3HzjtotwQdk5h6kC+Lc BogxIs8= =bf5G -----END PGP SIGNATURE----- Merge tag 'v6.15-rc1' into x86/cpu, to refresh the branch with upstream changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-11 11:13:27 +02:00
Nikolay Borisov	23a76739d6	x86/alternatives: Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish() Simplify the alternatives interface some more by moving the poke_batch_finish check into poke_batch_process and renaming the latter. The net effect is one less function name to consider when reading the code. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-54-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	4f9534719e	x86/alternatives: Add comment about noinstr expectations Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-53-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	023f42dd59	x86/alternatives: Rename 'apply_relocation()' to 'text_poke_apply_relocation()' Join the text_poke_*() API namespace. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-52-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	dac0d75427	x86/alternatives: Update the comments in smp_text_poke_batch_process() - Capitalize 'INT3' consistently, - make it clear that 'sync cores' means an SMP sync to all CPUs, - fix typos and spelling. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-51-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	2c373ca064	x86/alternatives: Remove 'smp_text_poke_batch_flush()' It only has a single user left, merge it into smp_text_poke_batch_add() and remove the helper function. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-50-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	b1bb39185d	x86/alternatives: Move declarations of vmlinux.lds.S defined section symbols to <asm/alternative.h> Move it from the middle of a .c file next to the similar declarations of __alt_instructions[] et al. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-49-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	db5c68c88c	x86/alternatives: Simplify the #include section We accumulated lots of unnecessary header inclusions over the years, trim them. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-48-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	3c8454dfc9	x86/alternatives: Rename 'POKE_MAX_OPCODE_SIZE' to 'TEXT_POKE_MAX_OPCODE_SIZE' Join the TEXT_POKE_ namespace. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-47-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	8036fbe5a5	x86/alternatives: Rename 'TP_ARRAY_NR_ENTRIES_MAX' to 'TEXT_POKE_ARRAY_MAX' Standardize on TEXT_POKE_ namespace for CPP constants too. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-46-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	22b9662313	x86/alternatives: Standardize on 'tpl' local variable names for 'struct smp_text_poke_loc *' There's no toilet paper in this code. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-45-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	3e6f47573e	x86/alternatives: Simplify and clean up patch_cmp() - No need to cast over to 'struct smp_text_poke_loc ', void is just fine for a binary search, - Use the canonical (a, b) input parameter nomenclature of cmp_func_t functions and rename the input parameters from (tp, elt) to (tpl_a, tpl_b). Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-44-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	6af9540379	x86/alternatives: Constify text_poke_addr() This will also allow the simplification of patch_cmp(). Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-43-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	0e67e587e2	x86/alternatives: Simplify text_poke_addr_ordered() - Use direct 'void *' pointer comparison, there's no need to force the type to 'unsigned long'. - Remove the 'tp' local variable indirection Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-42-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	6e4955a9d7	x86/alternatives: Rename 'text_poke_sync()' to 'smp_text_poke_sync_each_cpu()' Unlike sync_core(), text_poke_sync() is a very heavy operation, as it sends an IPI to every online CPU in the system and waits for completion. Reflect this in the name. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-41-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	7fbadb50d9	x86/alternatives: Move text_poke_array completion from smp_text_poke_batch_finish() and smp_text_poke_batch_flush() to smp_text_poke_batch_process() Simplifies the code and improves code generation a bit: text data bss dec hex filename 14769 1017 4112 19898 4dba alternative.o.before 14742 1017 4112 19871 4d9f alternative.o.after Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-40-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	cca3473956	x86/alternatives: Add documentation for smp_text_poke_batch_add() Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-39-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	9647ce4652	x86/alternatives: Document 'smp_text_poke_single()' Extend the documentation to better describe its purpose. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-38-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	8a6a1b4e0e	x86/alternatives: Remove the mixed-patching restriction on smp_text_poke_single() At this point smp_text_poke_single(addr, opcode, len, emulate) is equivalent to: smp_text_poke_batch_add(addr, opcode, len, emulate); smp_text_poke_batch_finish(); So remove the restriction on mixing single-instruction patching with multi-instruction patching. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-37-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	0e351aec2b	x86/alternatives: Move the text_poke_array manipulation into text_poke_int3_loc_init() and rename it to __smp_text_poke_batch_add() This simplifies the code and code generation a bit: text data bss dec hex filename 14802 1029 4112 19943 4de7 alternative.o.before 14784 1029 4112 19925 4dd5 alternative.o.after Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-36-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	74e8e2bf95	x86/alternatives: Simplify smp_text_poke_batch_process() This function is now using the text_poke_array state exclusively, make that explicit by removing the redundant input parameters. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-34-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	8e35752f0c	x86/alternatives: Simplify smp_text_poke_int3_handler() Remove the 'desc' local variable indirection and use text_poke_array directly. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-33-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	b6a25841c1	x86/alternatives: Simplify try_get_text_poke_array() There's no need to return a pointer on success - it's always the same pointer. Return a bool instead. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-32-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	3916eec516	x86/alternatives: Rename 'put_desc()' to 'put_text_poke_array()' Just like with try_get_text_poke_array(), this name better reflects what the underlying code is doing, there's no 'descriptor' indirection anymore. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-31-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	46f3d9d329	x86/alternatives: Rename 'try_get_desc()' to 'try_get_text_poke_array()' This better reflects what the underlying code is doing, there's no 'descriptor' indirection anymore. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-30-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	0494b16b9c	x86/alternatives: Remove the tp_vec indirection At this point we are always working out of an uptodate text_poke_array, there's no need for smp_text_poke_int3_handler() to read via the int3_vec indirection - remove it. This simplifies the code: 1 file changed, 5 insertions(+), 15 deletions(-) Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-29-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	6e7dc03aee	x86/alternatives: Introduce 'struct smp_text_poke_array' and move tp_vec and tp_vec_nr to it struct text_poke_array is an equivalent structure to these global variables: static struct smp_text_poke_loc tp_vec[TP_VEC_MAX]; static int tp_vec_nr; Note that we intentionally mirror much of the naming of 'struct text_poke_int3_vec', which will further highlight the unecessary layering going on in this code, and will ease its removal. No change in functionality. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-28-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	37725b64a9	x86/alternatives: Assert input parameters in smp_text_poke_batch_process() At this point the 'tp' input parameter must always be the global 'tp_vec' array, and 'nr_entries' must always be equal to 'tp_vec_nr'. Assert these conditions - which will allow the removal of a layer of indirection between these values. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-27-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	476ad071c6	x86/alternatives: Assert that smp_text_poke_int3_handler() can only ever handle 'tp_vec[]' based requests Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-26-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	c8976ade0c	x86/alternatives: Simplify smp_text_poke_single() by using tp_vec and existing APIs Instead of constructing a vector on-stack, just use the already available batch-patching vector - which should always be empty at this point. This will allow subsequent simplifications. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-25-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	eaa24c9177	x86/alternatives: Remove the 'addr == NULL means forced-flush' hack from smp_text_poke_batch_finish()/smp_text_poke_batch_flush()/text_poke_addr_ordered() There's this weird hack used by smp_text_poke_batch_finish() to indicate a 'forced flush': smp_text_poke_batch_flush(NULL); Just open-code the vector-flush in a straightforward fashion: smp_text_poke_batch_process(tp_vec, tp_vec_nr); tp_vec_nr = 0; And get rid of !addr hack from text_poke_addr_ordered(). Leave a WARN_ON_ONCE(), just in case some external code learned to rely on this behavior. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-24-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	2d0cf10a1e	x86/alternatives: Use non-inverted logic instead of 'tp_order_fail()' tp_order_fail() uses inverted logic: it returns true in case something is false, which is only a plus at the IOCCC. Instead rename it to regular parity as 'text_poke_addr_ordered()', and adjust the code accordingly. Also add a comment explaining how the address ordering should be understood. No change in functionality intended. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-23-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	87836af1ea	x86/alternatives: Add text_mutex) assert to smp_text_poke_batch_flush() It's possible to escape the text_mutex-held assert in smp_text_poke_batch_process() if the caller uses a properly batched and sorted series of patch requests, so add an explicit lockdep_assert_held() to make sure it's held by all callers. All text_poke_int3_*() APIs will call either smp_text_poke_batch_process() or smp_text_poke_batch_flush() internally. The text_mutex must be held, because tp_vec and tp_vec_nr et al are all globals, and the INT3 patching machinery itself relies on external serialization. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-22-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	3bd7546ff2	x86/alternatives: Rename 'int3_desc' to 'int3_vec' Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-21-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	a81d43c46e	x86/alternatives: Rename 'struct text_poke_loc' to 'struct smp_text_poke_loc' Make it clear that this structure is part of the INT3 based SMP patching facility, not the regular text_poke*() MM-switch based facility. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-19-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	fb802d6393	x86/alternatives: Rename 'text_poke_loc_init()' to 'text_poke_int3_loc_init()' This name is actively confusing as well, because the simple text_poke() APIs use MM-switching based code patching, while text_poke_loc_init() is part of the INT3 based text_poke_int3_() machinery that is an additional layer of functionality on top of regular text_poke*() functionality. Rename it to text_poke_int3_loc_init() to make it clear which layer it belongs to. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-18-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	732c7c33a0	x86/alternatives: Rename 'text_poke_queue()' to 'smp_text_poke_batch_add()' This name is actively confusing as well, because the simple text_poke() APIs use MM-switching based code patching, while text_poke_queue() is part of the INT3 based text_poke_int3_() machinery that is an additional layer of functionality on top of regular text_poke*() functionality. Rename it to smp_text_poke_batch_add() to make it clear which layer it belongs to. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-17-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	e8d7b8c2bb	x86/alternatives: Rename 'text_poke_finish()' to 'smp_text_poke_batch_finish()' This name is actively confusing as well, because the simple text_poke() APIs use MM-switching based code patching, while text_poke_finish() is part of the INT3 based text_poke_int3_() machinery that is an additional layer of functionality on top of regular text_poke*() functionality. Rename it to smp_text_poke_batch_finish() to make it clear which layer it belongs to. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-16-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	aedb60c2c6	x86/alternatives: Rename 'text_poke_flush()' to 'smp_text_poke_batch_flush()' This name is actually actively confusing, because the simple text_poke() APIs use MM-switching based code patching, while text_poke_flush() is part of the INT3 based text_poke_int3_() machinery that is an additional layer of functionality on top of regular text_poke*() functionality. Rename it to smp_text_poke_batch_flush() to make it clear which layer it belongs to. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-15-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	f5afa2e8ef	x86/alternatives: Remove the confusing, inaccurate & unnecessary 'temp_mm_state_t' abstraction So the temp_mm_state_t abstraction used by use_temporary_mm() and unuse_temporary_mm() is super confusing: - The whole machinery is about temporarily switching to the text_poke_mm utility MM that got allocated during bootup for text-patching purposes alone: temp_mm_state_t prev; /* * Loading the temporary mm behaves as a compiler barrier, which * guarantees that the PTE will be set at the time memcpy() is done. / prev = use_temporary_mm(text_poke_mm); - Yet the value that gets saved in the temp_mm_state_t variable is not the temporary MM ... but the previous MM... - Ie. we temporarily put the non-temporary MM into a variable that has the temp_mm_state_t type. This makes no sense whatsoever. - The confusion continues in unuse_temporary_mm(): static inline void unuse_temporary_mm(temp_mm_state_t prev_state) Here we unuse an MM that is ... not the temporary MM, but the previous MM. :-/ Fix up all this confusion by removing the unnecessary layer of abstraction and using a bog-standard 'struct mm_struct prev_mm' variable to save the MM to. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-14-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	762255b743	x86/alternatives: Remove duplicate 'text_poke_early()' prototype It's declared in <asm/text-patching.h> already. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-12-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	e84c31b9c9	x86/alternatives: Rename 'bp_desc' to 'int3_desc' Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-11-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	da364fc547	x86/alternatives: Rename 'poking_addr' to 'text_poke_mm_addr' Put it into the text_poke_* namespace of <asm/text-patching.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-10-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	a5c832e047	x86/alternatives: Rename 'poking_mm' to 'text_poke_mm' Put it into the text_poke_* namespace of <asm/text-patching.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-9-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	5236b6a0fe	x86/alternatives: Rename 'poke_int3_handler()' to 'smp_text_poke_int3_handler()' All related functions in this subsystem already have a text_poke_int3_ prefix - add it to the trap handler as well. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-8-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	9586ae48e7	x86/alternatives: Rename 'text_poke_bp()' to 'smp_text_poke_single()' Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-7-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	bee4fcfbc1	x86/alternatives: Rename 'text_poke_bp_batch()' to 'smp_text_poke_batch_process()' Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-6-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	28fb79092d	x86/alternatives: Rename 'bp_refs' to 'text_poke_array_refs' Make it clear that these reference counts lock access to text_poke_array. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-5-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Ingo Molnar	84e5ba949b	x86/alternatives: Rename 'struct bp_patching_desc' to 'struct text_poke_int3_vec' Follow the INT3 text-poking nomenclature, and also adopt the 'vector' name for the entire object, instead of the rather opaque 'descriptor' naming. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-4-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Peter Zijlstra	d60e4b2410	x86/alternatives: Document the text_poke_bp_batch() synchronization rules a bit more Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20250411054105.2341982-3-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Eric Dumazet	4334336e76	x86/alternatives: Improve code-patching scalability by removing false sharing in poke_int3_handler() eBPF programs can be run 50,000,000 times per second on busy servers. Whenever /proc/sys/kernel/bpf_stats_enabled is turned off, hundreds of calls sites are patched from text_poke_bp_batch() and we see a huge loss of performance due to false sharing on bp_desc.refs lasting up to three seconds. 51.30% server_bin [kernel.kallsyms] [k] poke_int3_handler \| \|--46.45%--poke_int3_handler \| exc_int3 \| asm_exc_int3 \| \| \| \|--24.26%--cls_bpf_classify \| \| tcf_classify \| \| __dev_queue_xmit \| \| ip6_finish_output2 \| \| ip6_output \| \| ip6_xmit \| \| inet6_csk_xmit \| \| __tcp_transmit_skb Fix this by replacing bp_desc.refs with a per-cpu bp_refs. Before the patch, on a host with 240 cores (480 threads): $ sysctl -wq kernel.bpf_stats_enabled=0 text_poke_bp_batch(nr_entries=164) : Took 2655300 usec $ bpftool prog \| grep run_time_ns ... 105: sched_cls name hn_egress tag 699fc5eea64144e3 gpl run_time_ns 3009063719 run_cnt 82757845 : average cost is 36 nsec per call After this patch: $ sysctl -wq kernel.bpf_stats_enabled=0 text_poke_bp_batch(nr_entries=164) : Took 702 usec $ bpftool prog \| grep run_time_ns ... 105: sched_cls name hn_egress tag 699fc5eea64144e3 gpl run_time_ns 1928223019 run_cnt 67682728 : average cost is 28 nsec per call Ie. text-patching performance improved 3700x: from 2.65 seconds to 0.0007 seconds. Since the atomic_cond_read_acquire(refs, !VAL) spin-loop was not triggered even once in my tests, add an unlikely() annotation, because this appears to be the common case. [ mingo: Improved the changelog some more. ] Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20250411054105.2341982-2-mingo@kernel.org	2025-04-11 11:01:33 +02:00
Fernando Fernandez Mancera	3940f5349b	x86/i8253: Call clockevent_i8253_disable() with interrupts disabled There's a lockdep false positive warning related to i8253_lock: WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ... systemd-sleep/3324 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffffffffb2c23398 (i8253_lock){+.+.}-{2:2}, at: pcspkr_event+0x3f/0xe0 [pcspkr] ... ... which became HARDIRQ-irq-unsafe at: ... lock_acquire+0xd0/0x2f0 _raw_spin_lock+0x30/0x40 clockevent_i8253_disable+0x1c/0x60 pit_timer_init+0x25/0x50 hpet_time_init+0x46/0x50 x86_late_time_init+0x1b/0x40 start_kernel+0x962/0xa00 x86_64_start_reservations+0x24/0x30 x86_64_start_kernel+0xed/0xf0 common_startup_64+0x13e/0x141 ... Lockdep complains due pit_timer_init() using the lock in an IRQ-unsafe fashion, but it's a false positive, because there is no deadlock possible at that point due to init ordering: at the point where pit_timer_init() is called there is no other possible usage of i8253_lock because the system is still in the very early boot stage with no interrupts. But in any case, pit_timer_init() should disable interrupts before calling clockevent_i8253_disable() out of general principle, and to keep lockdep working even in this scenario. Use scoped_guard() for that, as suggested by Thomas Gleixner. [ mingo: Cleaned up the changelog. ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/Z-uwd4Bnn7FcCShX@gmail.com	2025-04-11 07:28:20 +02:00
David Woodhouse	de085ddd49	x86/kexec: Invalidate GDT/IDT from relocate_kernel() instead of earlier Reduce the window during which exceptions are unhandled, by leaving the GDT/IDT in place all the way into the relocate_kernel() function, until the moment that %cr3 gets replaced. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250326142404.256980-4-dwmw2@infradead.org	2025-04-10 12:17:14 +02:00
David Woodhouse	7516e7216b	x86/kexec: Add 8250 MMIO serial port output This supports the same 32-bit MMIO-mapped 8250 as the early_printk code. It's not clear why the early_printk code supports this form and only this form; the actual runtime 8250_pci doesn't seem to support it. But having hacked up QEMU to expose such a device, early_printk does work with it, and now so does the kexec debug code. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250326142404.256980-3-dwmw2@infradead.org	2025-04-10 12:17:14 +02:00
David Woodhouse	d358b45120	x86/kexec: Add 8250 serial port output If a serial port was configured for early_printk, use it for debug output from the relocate_kernel exception handler too. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250326142404.256980-2-dwmw2@infradead.org	2025-04-10 12:17:13 +02:00
Ingo Molnar	eef476f15c	x86/msr: Rename 'wrmsrl_cstar()' to 'wrmsrq_cstar()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:59:37 +02:00
Ingo Molnar	7cbc2ba7c1	x86/msr: Rename 'native_wrmsrl()' to 'native_wrmsrq()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:59:28 +02:00
Ingo Molnar	604d15d15e	x86/msr: Rename 'wrmsrl_amd_safe()' to 'wrmsrq_amd_safe()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:59:24 +02:00
Ingo Molnar	e2b8af0c69	x86/msr: Rename 'rdmsrl_amd_safe()' to 'rdmsrq_amd_safe()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:59:19 +02:00
Ingo Molnar	8e44e83f57	x86/msr: Rename 'mce_wrmsrl()' to 'mce_wrmsrq()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:59:14 +02:00
Ingo Molnar	ebe29309c4	x86/msr: Rename 'mce_rdmsrl()' to 'mce_rdmsrq()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:59:09 +02:00
Ingo Molnar	c895ecdab2	x86/msr: Rename 'wrmsrl_on_cpu()' to 'wrmsrq_on_cpu()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:59:05 +02:00
Ingo Molnar	d7484babd2	x86/msr: Rename 'rdmsrl_on_cpu()' to 'rdmsrq_on_cpu()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:59:00 +02:00
Ingo Molnar	27a23a544a	x86/msr: Rename 'wrmsrl_safe_on_cpu()' to 'wrmsrq_safe_on_cpu()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:55 +02:00
Ingo Molnar	5e404cb7ac	x86/msr: Rename 'rdmsrl_safe_on_cpu()' to 'rdmsrq_safe_on_cpu()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:49 +02:00
Ingo Molnar	6fa17efe45	x86/msr: Rename 'wrmsrl_safe()' to 'wrmsrq_safe()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:44 +02:00
Ingo Molnar	6fe22abacd	x86/msr: Rename 'rdmsrl_safe()' to 'rdmsrq_safe()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:38 +02:00
Ingo Molnar	78255eb239	x86/msr: Rename 'wrmsrl()' to 'wrmsrq()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:33 +02:00
Ingo Molnar	c435e608cf	x86/msr: Rename 'rdmsrl()' to 'rdmsrq()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:27 +02:00
Ingo Molnar	73bd1e01e9	x86/msr: Use u64 in rdmsrl_amd_safe() and wrmsrl_amd_safe() Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:02 +02:00
Ingo Molnar	78a84fbfa4	Linux 6.15-rc1 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfy3/YeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/ygIAItY5dzf5fVnVEPy UrF+EzIaWGWRw3N+41AyT5X7z77FPX7E0cA6MD4KxfWW/OYzeAoeZSyrM2xIsEh3 26qiohvJjpHjfHdzvKmxNItvW8+xBv3km00U/CWWqJo89JsIVnJtrSBHOut2/gNp f6sGoOrrR4GXXz8JX3yG/pmizr23lN81ZkVdz0ayYEK4uY92hSsBspvyFWcdffgF o8NCtR+JVGac8xm+f3VPSLyunLMXsh8NWETumMHP6tHQif36I3BQqeU8DgXCgjEK pfZ8gEyRtXIKbEt+qniUetT+2Cwu/lAN2GjTu0LqIe9Ro3HzjtotwQdk5h6kC+Lc BogxIs8= =bf5G -----END PGP SIGNATURE----- Merge tag 'v6.15-rc1' into x86/mm, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-09 22:00:25 +02:00
Ingo Molnar	6ce0fdaae0	Linux 6.15-rc1 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfy3/YeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/ygIAItY5dzf5fVnVEPy UrF+EzIaWGWRw3N+41AyT5X7z77FPX7E0cA6MD4KxfWW/OYzeAoeZSyrM2xIsEh3 26qiohvJjpHjfHdzvKmxNItvW8+xBv3km00U/CWWqJo89JsIVnJtrSBHOut2/gNp f6sGoOrrR4GXXz8JX3yG/pmizr23lN81ZkVdz0ayYEK4uY92hSsBspvyFWcdffgF o8NCtR+JVGac8xm+f3VPSLyunLMXsh8NWETumMHP6tHQif36I3BQqeU8DgXCgjEK pfZ8gEyRtXIKbEt+qniUetT+2Cwu/lAN2GjTu0LqIe9Ro3HzjtotwQdk5h6kC+Lc BogxIs8= =bf5G -----END PGP SIGNATURE----- Merge tag 'v6.15-rc1' into x86/asm, to refresh the branch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-09 21:39:43 +02:00
Ahmed S. Darwish	d02c83d75f	x86/cacheinfo: Properly parse CPUID(0x80000006) L2/L3 associativity Complete the AMD CPUID(4) emulation logic, which uses CPUID(0x80000006) for L2/L3 cache info and an assocs[] associativity mapping array, by adding entries for 3-way caches and 6-way caches. Properly handle the case where CPUID(0x80000006) returns an L2/L3 associativity of 9. This is not real associativity, but a marker to indicate that the respective L2/L3 cache information should be retrieved from CPUID(0x8000001d) instead. If such a marker is encountered, return early from legacy_amd_cpuid4(), thus effectively emulating an "invalid index" CPUID(4) response with a cache type of zero. When checking if CPUID(0x80000006) L2/L3 cache info output is valid, and given the associtivity marker 9 above, do not just check if the whole ECX/EDX register is zero. Rather, check if the associativity is zero or 9. An associativity of zero implies no L2/L3 cache, which make it the more correct check anyway vs. a zero check of the whole output register. Fixes: `a326e948c5` ("x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors") Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250409122233.1058601-3-darwi@linutronix.de	2025-04-09 20:47:05 +02:00
Ahmed S. Darwish	d274cde0db	x86/cacheinfo: Properly parse CPUID(0x80000005) L1d/L1i associativity For the AMD CPUID(4) emulation cache info logic, the same associativity mapping array, assocs[], is used for both CPUID(0x80000005) and CPUID(0x80000006). This is incorrect since per the AMD manuals, the mappings for CPUID(0x80000005) L1d/L1i associativity is: n = 0x1 -> 0xfe n n = 0xff fully associative while assocs[] maps these values to: n = 0x1, 0x2, 0x4 n n = 0x3, 0x7, 0x9 0 n = 0x6 8 n = 0x8 16 n = 0xa 32 n = 0xb 48 n = 0xc 64 n = 0xd 96 n = 0xe 128 n = 0xf fully associative which is only valid for CPUID(0x80000006). Parse CPUID(0x80000005) L1d/L1i associativity values as shown in the AMD manuals. Since the 0xffff literal is used to denote full associativity at the AMD CPUID(4)-emulation logic, define AMD_CPUID4_FULLY_ASSOCIATIVE for it instead of spreading that literal in more places. Mark the assocs[] mapping array as only valid for CPUID(0x80000006) L2/L3 cache information. Fixes: `a326e948c5` ("x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors") Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250409122233.1058601-2-darwi@linutronix.de	2025-04-09 20:47:05 +02:00
Dave Hansen	f0df00ebc5	x86/cpu: Avoid running off the end of an AMD erratum table The NULL array terminator at the end of erratum_1386_microcode was removed during the switch from x86_cpu_desc to x86_cpu_id. This causes readers to run off the end of the array. Replace the NULL. Fixes: `f3f3251526` ("x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'") Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>	2025-04-09 07:57:16 -07:00
Josh Poimboeuf	83f6665a49	x86/bugs: Add RSB mitigation document Create a document to summarize hard-earned knowledge about RSB-related mitigations, with references, and replace the overly verbose yet incomplete comments with a reference to the document. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/ab73f4659ba697a974759f07befd41ae605e33dd.1744148254.git.jpoimboe@kernel.org	2025-04-09 12:42:09 +02:00
Josh Poimboeuf	27ce8299bc	x86/bugs: Don't fill RSB on context switch with eIBRS User->user Spectre v2 attacks (including RSB) across context switches are already mitigated by IBPB in cond_mitigation(), if enabled globally or if either the prev or the next task has opted in to protection. RSB filling without IBPB serves no purpose for protecting user space, as indirect branches are still vulnerable. User->kernel RSB attacks are mitigated by eIBRS. In which case the RSB filling on context switch isn't needed, so remove it. Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Amit Shah <amit.shah@amd.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/98cdefe42180358efebf78e3b80752850c7a3e1b.1744148254.git.jpoimboe@kernel.org	2025-04-09 12:42:09 +02:00
Josh Poimboeuf	18bae0dfec	x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline eIBRS protects against guest->host RSB underflow/poisoning attacks. Adding retpoline to the mix doesn't change that. Retpoline has a balanced CALL/RET anyway. So the current full RSB filling on VMEXIT with eIBRS+retpoline is overkill. Disable it or do the VMEXIT_LITE mitigation if needed. Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Amit Shah <amit.shah@amd.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: David Woodhouse <dwmw2@infradead.org> Link: https://lore.kernel.org/r/84a1226e5c9e2698eae1b5ade861f1b8bf3677dc.1744148254.git.jpoimboe@kernel.org	2025-04-09 12:41:55 +02:00
Josh Poimboeuf	b1b19cfcf4	x86/bugs: Fix RSB clearing in indirect_branch_prediction_barrier() IBPB is expected to clear the RSB. However, if X86_BUG_IBPB_NO_RET is set, that doesn't happen. Make indirect_branch_prediction_barrier() take that into account by calling write_ibpb() which clears RSB on X86_BUG_IBPB_NO_RET: /* Make sure IBPB clears return stack preductions too. */ FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_BUG_IBPB_NO_RET Note that, as of the previous patch, write_ibpb() also reads 'x86_pred_cmd' in order to use SBPB when applicable: movl _ASM_RIP(x86_pred_cmd), %eax Therefore that existing behavior in indirect_branch_prediction_barrier() is not lost. Fixes: `50e4b3b940` ("x86/entry: Have entry_ibpb() invalidate return predictions") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/bba68888c511743d4cd65564d1fc41438907523f.1744148254.git.jpoimboe@kernel.org	2025-04-09 12:41:30 +02:00
Josh Poimboeuf	13235d6d50	x86/bugs: Rename entry_ibpb() to write_ibpb() There's nothing entry-specific about entry_ibpb(). In preparation for calling it from elsewhere, rename it to write_ibpb(). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/1e54ace131e79b760de3fe828264e26d0896e3ac.1744148254.git.jpoimboe@kernel.org	2025-04-09 12:41:29 +02:00
Andy Shevchenko	996457176b	x86/early_printk: Use 'mmio32' for consistency, fix comments First of all, using 'mmio' prevents proper implementation of 8-bit accessors. Second, it's simply inconsistent with uart8250 set of options. Rename it to 'mmio32'. While at it, remove rather misleading comment in the documentation. From now on mmio32 is self-explanatory and pciserial supports not only 32-bit MMIO accessors. Also, while at it, fix the comment for the "pciserial" case. The comment seems to be a copy'n'paste error when mentioning "serial" instead of "pciserial" (with double quotes). Fix this. With that, move it upper, so we don't calculate 'buf' twice. Fixes: `3181424aea` ("x86/early_printk: Add support for MMIO-based UARTs") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Denis Mukhin <dmukhin@ford.com> Link: https://lore.kernel.org/r/20250407172214.792745-1-andriy.shevchenko@linux.intel.com	2025-04-09 12:27:08 +02:00
James Morse	45c2e30bbd	x86/resctrl: Fix rdtgroup_mkdir()'s unlocked use of kernfs_node::name Since `741c10b096` ("kernfs: Use RCU to access kernfs_node::name.") a helper rdt_kn_name() that checks that rdtgroup_mutex is held has been used for all accesses to the kernfs node name. rdtgroup_mkdir() uses the name to determine if a valid monitor group is being created by checking the parent name is "mon_groups". This is done without holding rdtgroup_mutex, and now triggers the following warning: \| WARNING: suspicious RCU usage \| 6.15.0-rc1 #4465 Tainted: G E \| ----------------------------- \| arch/x86/kernel/cpu/resctrl/internal.h:408 suspicious rcu_dereference_check() usage! [...] \| Call Trace: \| <TASK> \| dump_stack_lvl \| lockdep_rcu_suspicious.cold \| is_mon_groups \| rdtgroup_mkdir \| kernfs_iop_mkdir \| vfs_mkdir \| do_mkdirat \| __x64_sys_mkdir \| do_syscall_64 \| entry_SYSCALL_64_after_hwframe Creating a control or monitor group calls mkdir_rdt_prepare(), which uses rdtgroup_kn_lock_live() to take the rdtgroup_mutex. To avoid taking and dropping the lock, move the check for the monitor group name and position into mkdir_rdt_prepare() so that it occurs under rdtgroup_mutex. Hoist is_mon_groups() earlier in the file. [ bp: Massage. ] Fixes: `741c10b096` ("kernfs: Use RCU to access kernfs_node::name.") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250407124637.2433230-1-james.morse@arm.com	2025-04-09 11:35:08 +02:00
Myrrh Periwinkle	f2f29da9f0	x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions() While debugging kexec/hibernation hangs and crashes, it turned out that the current implementation of e820__register_nosave_regions() suffers from multiple serious issues: - The end of last region is tracked by PFN, causing it to find holes that aren't there if two consecutive subpage regions are present - The nosave PFN ranges derived from holes are rounded out (instead of rounded in) which makes it inconsistent with how explicitly reserved regions are handled Fix this by: - Treating reserved regions as if they were holes, to ensure consistent handling (rounding out nosave PFN ranges is more correct as the kernel does not use partial pages) - Tracking the end of the last RAM region by address instead of pages to detect holes more precisely These bugs appear to have been introduced about ~18 years ago with the very first version of e820_mark_nosave_regions(), and its flawed assumptions were carried forward uninterrupted through various waves of rewrites and renames. [ mingo: Added Git archeology details, for kicks and giggles. ] Fixes: `e8eff5ac29` ("[PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64") Reported-by: Roberto Ricci <io@r-ricci.it> Tested-by: Roberto Ricci <io@r-ricci.it> Signed-off-by: Myrrh Periwinkle <myrrhperiwinkle@qtmlabs.xyz> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Len Brown <len.brown@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250406-fix-e820-nosave-v3-1-f3787bc1ee1d@qtmlabs.xyz Closes: https://lore.kernel.org/all/Z4WFjBVHpndct7br@desktop0a/	2025-04-07 19:20:08 +02:00
Petr Vaněk	8b37357a78	x86/acpi: Don't limit CPUs to 1 for Xen PV guests due to disabled ACPI Xen disables ACPI for PV guests in DomU, which causes acpi_mps_check() to return 1 when CONFIG_X86_MPPARSE is not set. As a result, the local APIC is disabled and the guest is later limited to a single vCPU, despite being configured with more. This regression was introduced in version 6.9 in commit `7c0edad364` ("x86/cpu/topology: Rework possible CPU management"), which added an early check that limits CPUs to 1 if apic_is_disabled. Update the acpi_mps_check() logic to return 0 early when running as a Xen PV guest in DomU, preventing APIC from being disabled in this specific case and restoring correct multi-vCPU behaviour. Fixes: `7c0edad364` ("x86/cpu/topology: Rework possible CPU management") Signed-off-by: Petr Vaněk <arkamar@atlas.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250407132445.6732-2-arkamar@atlas.cz	2025-04-07 16:35:21 +02:00
Boris Ostrovsky	321550859f	x86/microcode/AMD: Clean the cache if update did not load microcode If microcode did not get loaded there is no reason to keep it in the cache. Moreover, if loading failed it will not be possible to load an earlier version of microcode since the failed revision will always be selected from the cache on the next reload attempt. Since the failed revisions is not easily available at this point just clean the whole cache. It will be rebuilt later if needed. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250327230503.1850368-3-boris.ostrovsky@oracle.com	2025-04-07 14:46:56 +02:00
Andi Kleen	e37aa1211f	x86/cpuid: Add AMX and SPEC_CTRL dependencies Add some missing dependencies to the CPUID dependency table: - All the AMX features depend on AMX_TILE - All the SPEC_CTRL features depend on SPEC_CTRL [ mingo: Keep the AMX part of the table grouped ... ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Ahmed S. Darwish <darwi@linutronix.de> Link: https://lore.kernel.org/r/20240924170128.2611854-1-ak@linux.intel.com	2025-04-06 19:54:35 +02:00
Linus Torvalds	16cd1c2657	A set of final cleanups for the timer subsystem: 1) Convert all del_timer[_sync]() instances over to the new timer_delete[_sync]() API and remove the legacy wrappers. Conversion was done with coccinelle plus some manual fixups as coccinelle chokes on scoped_guard(). 2) The final cleanup of the hrtimer_init() to hrtimer_setup() conversion. This has been delayed to the end of the merge window, so that all patches which have been merged through other trees are in mainline and all new users are catched. Doing this right before rc1 ensures that new code which is merged post rc1 is not introducing new instances of the original functionality. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfyXi0THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoYzlD/4ykDZbUzgTreYOxEQpBJ9elPwBhxfL 1v8OwDjRWlNrmLup8RiUfKrlbmztGl1J/u9ld0qhjcqkywCCBC1N5S+DhCjYetyP MPWLbi2Dc35cFA+M7i8fMgxI2K9MLz2Zj1UKxz1MdsSuNHm07N3mul/3T11Ye4Rz nPlzeQBTBDFCKTEGKjr8zjuoD15Wl48sObM0AjV35BPuQR1jfY4CE6VXo2h78+0c jYwpJpDmcd+o1bDrfFhWUME2DzABEkHhn4wNSETnM4E5RXZRMUbi4UiigzInibQr JOUTKwPJXTMX/Erd0XyXErrYf2qy1X9BQy6NlyDDOv+8kLEVRsC9Efplx9uoEtfi QvVT/UmgmhZFJBfIT3/B8OvasrfwOropaYoG4L0zbDpp1b09VY47N5lCLlNr/mZf jb2TwIln8Szy2EfIT2RSd0ZNupyU8V4aH/mYNpSlbUJ6mfvfIAttBSS/YH+Zeqku 7zOJkoCusaySOCZCOQkeikL3ZBN+FHtNteXxmGnp34ed/tsfgGZj1lsbmkM2rrWo f2mQsYAclUA4KQeY9z/Xf7/c5wJUkME69PxOaaN23dOpBR7GA58Cvb0PQTnPlAiT KnH/JRweBHtcv4KEHMi2f5no4cxcmXyKTj7/TLyYNjc8LATL9Eo/nxG36PLxy4lN QPOWz11zEBLjQQ== =8Ftq -----END PGP SIGNATURE----- Merge tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A set of final cleanups for the timer subsystem: - Convert all del_timer[_sync]() instances over to the new timer_delete[_sync]() API and remove the legacy wrappers. Conversion was done with coccinelle plus some manual fixups as coccinelle chokes on scoped_guard(). - The final cleanup of the hrtimer_init() to hrtimer_setup() conversion. This has been delayed to the end of the merge window, so that all patches which have been merged through other trees are in mainline and all new users are catched. Doing this right before rc1 ensures that new code which is merged post rc1 is not introducing new instances of the original functionality" * tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/timers: Rename the hrtimer_init event to hrtimer_setup hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack() hrtimers: Rename debug_init() to debug_setup() hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns() hrtimers: Make callback function pointer private hrtimers: Merge __hrtimer_init() into __hrtimer_setup() hrtimers: Switch to use __htimer_setup() hrtimers: Delete hrtimer_init() treewide: Convert new and leftover hrtimer_init() users treewide: Switch/rename to timer_delete[_sync]()	2025-04-06 08:35:37 -07:00
Thomas Gleixner	8fa7292fee	treewide: Switch/rename to timer_delete[_sync]() timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-05 10:30:12 +02:00
Jiri Slaby (SUSE)	825dfab23b	irqdomain: Rename irq_set_default_host() to irq_set_default_domain() Naming interrupt domains host is confusing at best and the irqdomain code uses both domain and host inconsistently. Therefore rename irq_set_default_host() to irq_set_default_domain(). Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-3-jirislaby@kernel.org	2025-04-04 16:39:10 +02:00
Andrew Cooper	1f13c60d84	x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR in mwait_idle_with_hints() and prefer_mwait_c1_over_halt() The following commit, 12 years ago: `7e98b71920` ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers") added barriers around the CLFLUSH in mwait_idle_with_hints(), justified with: ... and add memory barriers around it since the documentation is explicit that CLFLUSH is only ordered with respect to MFENCE. This also triggered, 11 years ago, the same adjustment in: `f8e617f458` ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs") during development, although it failed to get the static_cpu_has_bug() treatment. X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel CPUs, and the SDM currently states: Executions of the CLFLUSH instruction are ordered with respect to each other and with respect to writes, locked read-modify-write instructions, and fence instructions[1]. With footnote 1 reading: Earlier versions of this manual specified that executions of the CLFLUSH instruction were ordered only by the MFENCE instruction. All processors implementing the CLFLUSH instruction also order it relative to the other operations enumerated above. i.e. The SDM was incorrect at the time, and barriers should not have been inserted. Double checking the original AAI65 errata (not available from intel.com any more) shows no mention of barriers either. Note: If this were a general codepath, the MFENCEs would be needed, because AMD CPUs of the same vintage do sport otherwise-unordered CLFLUSHs. Remove the unnecessary barriers. Furthermore, use a plain alternative(), rather than static_cpu_has_bug() and/or no optimisation. The workaround is a single instruction. Use an explicit %rax pointer rather than a general memory operand, because MONITOR takes the pointer implicitly in the same way. [ mingo: Cleaned up the commit a bit. ] Fixes: `7e98b71920` ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250402172458.1378112-1-andrew.cooper3@citrix.com	2025-04-02 22:02:26 +02:00
Linus Torvalds	6cb094583a	* Avoid direct HLT instruction execution in TDX guests -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmfsa8QACgkQaDWVMHDJ krBCAhAAodPYiIEy+qpad1Q8HPhaKYUJ5jzkIdt1GYXCBf2dfY6Zj8w7edSApUhA 7og9gK8ku8hwpf6oCGmp2Lm74FgATIj7q0ac07XBW3OsrfFQc73DfPJn6WMDYjRV ec9baSzX5GcqUyezq7woyJayZT9LRLBexF/vk7dAQ7nuecCOUhqLXWBN5eUT0e+K 58kFjZoZZx/4Y9zh7UIxBQyCbL88IeI6rclW5tZJlRHNuD7B64x606ETwQJKK9GK YHPhqRKtjJRzSOn/xGYT4AQDPbF9u14Q4WGVO+bvgv8Z6BtmiYV2fG0q5GU14h0z +gwjja3Edo+F6zSIIZonQbrSVHspwm1IPJQQZHljhFOEt7Ezu3hLIYouUWVlNRgl mRzubZBmhQUfJOAtfGmHktdg6j+QinYDQr+/CjoXoeh8EknL+KtqamXJnyb8KAMN qH6X+N2coaCcl334zW44m6YTmTipdIhmHFj6edYwqdR3Ux6DDaX9PKopIIpiZEcb GH1o++4JMp9OBIaTu0Yp1WgWJ+EyUSWDJbydqCMOdthuESqKW45IQkLhPxZpIhB4 5Wra4Ot7AdsThyPqNPaEu3ND+BXu4tAAa8r8GK+AP7DqRxXz/bbWTHqNepm9wSvP pnOlLyVTri/difMWWsJJPK6QRYbNnemrny3Do3PbIZVKS08vgLs= =XvoD -----END PGP SIGNATURE----- Merge tag 'x86_tdx_for_6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TDX updates from Dave Hansen: "Avoid direct HLT instruction execution in TDX guests. TDX guests aren't expected to use the HLT instruction directly. It causes a virtualization exception (#VE). While the #VE _can_ be handled, the current handling is slow and buggy and the easiest thing is just to avoid HLT in the first place. Plus, the kernel already has paravirt infrastructure that makes it relatively painless. Make TDX guests require paravirt and add some TDX-specific paravirt handlers which avoid HLT in the normal halt routines. Also add a warning in case another HLT sneaks in. There was a report that this leads to a "major performance improvement" on specjbb2015, probably because of the extra #VE overhead or missed wakeups from the buggy HLT handling" * tag 'x86_tdx_for_6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tdx: Emit warning if IRQs are enabled during HLT #VE handling x86/tdx: Fix arch_safe_halt() execution for TDX VMs x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT	2025-04-02 11:33:20 -07:00
Sohil Mehta	f2e01dcf6d	x86/nmi: Improve NMI duration console printouts Convert the last remaining printk() in nmi.c to pr_info(). Along with it, use timespec macros to calculate the NMI handler duration. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20250327234629.3953536-10-sohil.mehta@intel.com	2025-04-01 22:26:38 +02:00
Sohil Mehta	05279a2863	x86/nmi: Clean up NMI selftest The expected_testcase_failures variable in the NMI selftest has never been set since its introduction. Remove this unused variable along with the related checks to simplify the code. While at it, replace printk() with the corresponding pr_{cont,info}() calls. Also, get rid of the superfluous testname wrapper and the redundant file path comment. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20250327234629.3953536-9-sohil.mehta@intel.com	2025-04-01 22:26:32 +02:00
Sohil Mehta	59cddd397a	x86/nmi: Improve and relocate NMI handler comments Some of the comments in the default NMI handling code are out of place or inadequate. Move them to the appropriate locations and update them as needed. Move the comment related to CPU-specific NMIs closer to the actual code. Also, add more details about how back-to-back NMIs are detected since that isn't immediately obvious. Opportunistically, replace an #ifdef section in the vicinity with an IS_ENABLED() check to make the code easier to read. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Kai Huang <kai.huang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20250327234629.3953536-7-sohil.mehta@intel.com	2025-04-01 22:26:16 +02:00
Sohil Mehta	b4bc3144c1	x86/nmi: Fix comment in unknown_nmi_error() The comment in unknown_nmi_error() is incorrect and misleading. There is no longer a restriction on having a single Unknown NMI handler. Also, nmi_handle() never used the 'b2b' parameter. The commits that made the comment outdated are: `0d443b70cc` ("x86/platform: Remove warning message for duplicate NMI handlers") `bf9f2ee28d` ("x86/nmi: Remove the 'b2b' parameter from nmi_handle()") Remove the old comment and update it to reflect the current logic. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20250327234629.3953536-6-sohil.mehta@intel.com	2025-04-01 22:26:11 +02:00
Sohil Mehta	6325f94701	x86/nmi: Remove export of local_touch_nmi() Commit: `feb6cd6a0f` ("thermal/intel_powerclamp: stop sched tick in forced idle") got rid of the last exported user of local_touch_nmi() a while back. Remove the unnecessary export. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20250327234629.3953536-5-sohil.mehta@intel.com	2025-04-01 22:26:06 +02:00
Sohil Mehta	4a8fba4be8	x86/nmi: Use a macro to initialize NMI descriptors The NMI descriptors for each NMI type are stored in an array. However, they are currently initialized using raw numbers, which makes it difficult to understand the code. Introduce a macro to initialize the NMI descriptors using the NMI type enum values to make the code more readable. No functional change intended. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20250327234629.3953536-4-sohil.mehta@intel.com	2025-04-01 22:26:01 +02:00
Sohil Mehta	78a0323506	x86/nmi: Consolidate NMI panic variables Commit: `c305a4e983` ("x86: Move sysctls into arch/x86") recently moved the sysctl handling of panic_on_unrecovered_nmi and panic_on_io_nmi to x86-specific code. These variables no longer need to be declared in the generic header file. Relocate the variable definitions and declarations closer to where they are used. This makes all the NMI panic options consistent and easier to track. [ mingo: Fixed up the SHA1 of the commit reference. ] Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20250327234629.3953536-3-sohil.mehta@intel.com	2025-04-01 22:25:56 +02:00
Sohil Mehta	2e016da1cb	x86/nmi: Simplify unknown NMI panic handling The unknown_nmi_panic variable is used to control whether the kernel should panic on unknown NMIs. There is a sysctl entry under: /proc/sys/kernel/unknown_nmi_panic which can be used to change the behavior at runtime. However, it seems that in some places, the option unnecessarily depends on CONFIG_X86_LOCAL_APIC. Other code in nmi.c uses unknown_nmi_panic without such a dependency. This results in a few messy #ifdefs splattered across the code. The dependency was likely introduce due to a potential build bug reported a long time ago: https://lore.kernel.org/lkml/40BC67F9.3000609@myrealbox.com/ This build bug no longer exists. Also, similar NMI panic options, such as panic_on_unrecovered_nmi and panic_on_io_nmi, do not have an explicit dependency on the local APIC either. Though, it's hard to imagine a production system without the local APIC configuration, making a specific NMI sysctl option dependent on it doesn't make sense. Remove the explicit dependency between unknown NMI handling and the local APIC to make the code cleaner and more consistent. While at it, reorder the header includes to maintain alphabetical order. [ mingo: Cleaned up the changelog a bit, truly ordered the headers ... ] Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20250327234629.3953536-2-sohil.mehta@intel.com	2025-04-01 22:25:41 +02:00
Linus Torvalds	2cd5769fb0	Driver core updates for 6.15-rc1 Here is the big set of driver core updates for 6.15-rc1. Lots of stuff happened this development cycle, including: - kernfs scaling changes to make it even faster thanks to rcu - bin_attribute constify work in many subsystems - faux bus minor tweaks for the rust bindings - rust binding updates for driver core, pci, and platform busses, making more functionaliy available to rust drivers. These are all due to people actually trying to use the bindings that were in 6.14. - make Rafael and Danilo full co-maintainers of the driver core codebase - other minor fixes and updates. This has been in linux-next for a while now, with the only reported issue being some merge conflicts with the rust tree. Depending on which tree you pull first, you will have conflicts in one of them. The merge resolution has been in linux-next as an example of what to do, or can be found here: https://lore.kernel.org/r/CANiq72n3Xe8JcnEjirDhCwQgvWoE65dddWecXnfdnbrmuah-RQ@mail.gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ+mMrg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylRgwCdH58OE3BgL0uoFY5vFImStpmPtqUAoL5HpVWI jtbJ+UuXGsnmO+JVNBEv =gy6W -----END PGP SIGNATURE----- Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updatesk from Greg KH: "Here is the big set of driver core updates for 6.15-rc1. Lots of stuff happened this development cycle, including: - kernfs scaling changes to make it even faster thanks to rcu - bin_attribute constify work in many subsystems - faux bus minor tweaks for the rust bindings - rust binding updates for driver core, pci, and platform busses, making more functionaliy available to rust drivers. These are all due to people actually trying to use the bindings that were in 6.14. - make Rafael and Danilo full co-maintainers of the driver core codebase - other minor fixes and updates" * tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits) rust: platform: require Send for Driver trait implementers rust: pci: require Send for Driver trait implementers rust: platform: impl Send + Sync for platform::Device rust: pci: impl Send + Sync for pci::Device rust: platform: fix unrestricted &mut platform::Device rust: pci: fix unrestricted &mut pci::Device rust: device: implement device context marker rust: pci: use to_result() in enable_device_mem() MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers rust/kernel/faux: mark Registration methods inline driver core: faux: only create the device if probe() succeeds rust/faux: Add missing parent argument to Registration::new() rust/faux: Drop #[repr(transparent)] from faux::Registration rust: io: fix devres test with new io accessor functions rust: io: rename `io::Io` accessors kernfs: Move dput() outside of the RCU section. efi: rci2: mark bin_attribute as __ro_after_init rapidio: constify 'struct bin_attribute' firmware: qemu_fw_cfg: constify 'struct bin_attribute' powerpc/perf/hv-24x7: Constify 'struct bin_attribute' ...	2025-04-01 11:02:03 -07:00
Linus Torvalds	d6b02199cd	- The 7 patch series "powerpc/crash: use generic crashkernel reservation" from Sourabh Jain changes powerpc's kexec code to use more of the generic layers. - The 2 patch series "get_maintainer: report subsystem status separately" from Vlastimil Babka makes some long-requested improvements to the get_maintainer output. - The 4 patch series "ucount: Simplify refcounting with rcuref_t" from Sebastian Siewior cleans up and optimizing the refcounting in the ucount code. - The 12 patch series "reboot: support runtime configuration of emergency hw_protection action" from Ahmad Fatoum improves the ability for a driver to perform an emergency system shutdown or reboot. - The 16 patch series "Converge on using secs_to_jiffies() part two" from Easwar Hariharan performs further migrations from msecs_to_jiffies() to secs_to_jiffies(). - The 7 patch series "lib/interval_tree: add some test cases and cleanup" from Wei Yang permits more userspace testing of kernel library code, adds some more tests and performs some cleanups. - The 2 patch series "hung_task: Dump the blocking task stacktrace" from Masami Hiramatsu arranges for the hung_task detector to dump the stack of the blocking task and not just that of the blocked task. - The 4 patch series "resource: Split and use DEFINE_RES() macros" from Andy Shevchenko provides some cleanups to the resource definition macros. - Plus the usual shower of singleton patches - please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq r6khQUIcQImPPcjFqEFpuiSOU0MBZA0= =Kii8 -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "powerpc/crash: use generic crashkernel reservation" from Sourabh Jain changes powerpc's kexec code to use more of the generic layers. - The series "get_maintainer: report subsystem status separately" from Vlastimil Babka makes some long-requested improvements to the get_maintainer output. - The series "ucount: Simplify refcounting with rcuref_t" from Sebastian Siewior cleans up and optimizing the refcounting in the ucount code. - The series "reboot: support runtime configuration of emergency hw_protection action" from Ahmad Fatoum improves the ability for a driver to perform an emergency system shutdown or reboot. - The series "Converge on using secs_to_jiffies() part two" from Easwar Hariharan performs further migrations from msecs_to_jiffies() to secs_to_jiffies(). - The series "lib/interval_tree: add some test cases and cleanup" from Wei Yang permits more userspace testing of kernel library code, adds some more tests and performs some cleanups. - The series "hung_task: Dump the blocking task stacktrace" from Masami Hiramatsu arranges for the hung_task detector to dump the stack of the blocking task and not just that of the blocked task. - The series "resource: Split and use DEFINE_RES() macros" from Andy Shevchenko provides some cleanups to the resource definition macros. - Plus the usual shower of singleton patches - please see the individual changelogs for details. * tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) mailmap: consolidate email addresses of Alexander Sverdlin fs/procfs: fix the comment above proc_pid_wchan() relay: use kasprintf() instead of fixed buffer formatting resource: replace open coded variant of DEFINE_RES() resource: replace open coded variants of DEFINE_RES_*_NAMED() resource: replace open coded variant of DEFINE_RES_NAMED_DESC() resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED() samples: add hung_task detector mutex blocking sample hung_task: show the blocker task if the task is hung on mutex kexec_core: accept unaccepted kexec segments' destination addresses watchdog/perf: optimize bytes copied and remove manual NUL-termination lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap() lib/interval_tree: skip the check before go to the right subtree lib/interval_tree: add test case for span iteration lib/interval_tree: add test case for interval_tree_iter_xxx() helpers lib/rbtree: add random seed lib/rbtree: split tests lib/rbtree: enable userland test suite for rbtree related data structure checkpatch: describe --min-conf-desc-length scripts/gdb/symbols: determine KASLR offset on s390 ...	2025-04-01 10:06:52 -07:00
Linus Torvalds	eb0ece1602	- The 6 patch series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was founf to be incorrect. - The 4 patch series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The 17 patch series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The 5 patch series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The 4 patch series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The 12 patch series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The 2 patch series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The 3 patch series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The 4 patch series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The 4 patch series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The 4 patch series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The 18 patch series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The 27 patch series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The 12 patch series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The 8 patch series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The 2 patch series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The 3 patch series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The 3 patch series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The 3 patch series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The 9 patch series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The 5 patch series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The 13 patch series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The 13 patch series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The 3 patch series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The 8 patch series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The 2 patch series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The 3 patch series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The 4 patch series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The 3 patch series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The 5 patch series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The 5 patch series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The 4 patch series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The 2 patch series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. -----BEGIN PGP SIGNATURE----- iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF Ee93qSSLR1BkNdDw+931Pu0mXfbnBw== =Pn2K -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. * tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits) mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex() x86/mm: restore early initialization of high_memory for 32-bits mm/vmscan: don't try to reclaim hwpoison folio mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper cgroup: docs: add pswpin and pswpout items in cgroup v2 doc mm: vmscan: split proactive reclaim statistics from direct reclaim statistics selftests/mm: speed up split_huge_page_test selftests/mm: uffd-unit-tests support for hugepages > 2M docs/mm/damon/design: document active DAMOS filter type mm/damon: implement a new DAMOS filter type for active pages fs/dax: don't disassociate zero page entries MM documentation: add "Unaccepted" meminfo entry selftests/mm: add commentary about 9pfs bugs fork: use __vmalloc_node() for stack allocation docs/mm: Physical Memory: Populate the "Zones" section xen: balloon: update the NR_BALLOON_PAGES state hv_balloon: update the NR_BALLOON_PAGES state balloon_compaction: update the NR_BALLOON_PAGES state meminfo: add a per node counter for balloon drivers mm: remove references to folio in __memcg_kmem_uncharge_page() ...	2025-04-01 09:29:18 -07:00
Linus Torvalds	01d5b167dc	Modules changes for 6.15-rc1 - Use RCU instead of RCU-sched The mix of rcu_read_lock(), rcu_read_lock_sched() and preempt_disable() in the module code and its users has been replaced with just rcu_read_lock(). - The rest of changes are smaller fixes and updates. The changes have been on linux-next for at least 2 weeks, with the RCU cleanup present for 2 months. One performance problem was reported with the RCU change when KASAN + lockdep were enabled, but it was effectively addressed by the already merged `ee57ab5a32` ("locking/lockdep: Disable KASAN instrumentation of lockdep.c"). -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEEIduBR9MnFA82q/jtumpXJwqY6poFAmfmwrsUHHBldHIucGF2 bHVAc3VzZS5jb20ACgkQumpXJwqY6prWxgf/S7Pvdywm10vJ6fooYa+GxXNMwhyh XRjZ4m9gjeTNf2KLwX0XHv0XZeFHOmHfjd3iI+pS6CXZnCFTN9J3XPLYsrTxXUb6 U6zzLf8Zsz8TzeI4dgvSBsZln7oICSACkAgdJCq23hpNKeaeRo91dgiZaIwyZJG3 FekqSFtP7pYhfFoNkrFKysqbgl1+RWWZ79L2qRJA0bPzVFlvRUuh6cOHQw+8RMqf BYLwnArjTkW8AcXpxIXSiwphDHVZ81B96xoplavyoprA5FDpv1W+8y4DtxdWFn+1 QVWCs/ZV3KrwXWpZev625w3fIOOIXILqRINOzLfvXTw+1xFS3TzSQEpVeg== =4OKc -----END PGP SIGNATURE----- Merge tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull modules updates from Petr Pavlu: - Use RCU instead of RCU-sched The mix of rcu_read_lock(), rcu_read_lock_sched() and preempt_disable() in the module code and its users has been replaced with just rcu_read_lock() - The rest of changes are smaller fixes and updates * tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (32 commits) MAINTAINERS: Update the MODULE SUPPORT section module: Remove unnecessary size argument when calling strscpy() module: Replace deprecated strncpy() with strscpy() params: Annotate struct module_param_attrs with __counted_by() bug: Use RCU instead RCU-sched to protect module_bug_list. static_call: Use RCU in all users of __module_text_address(). kprobes: Use RCU in all users of __module_text_address(). bpf: Use RCU in all users of __module_text_address(). jump_label: Use RCU in all users of __module_text_address(). jump_label: Use RCU in all users of __module_address(). x86: Use RCU in all users of __module_address(). cfi: Use RCU while invoking __module_address(). powerpc/ftrace: Use RCU in all users of __module_text_address(). LoongArch: ftrace: Use RCU in all users of __module_text_address(). LoongArch/orc: Use RCU in all users of __module_address(). arm64: module: Use RCU in all users of __module_text_address(). ARM: module: Use RCU in all users of __module_text_address(). module: Use RCU in all users of __module_text_address(). module: Use RCU in all users of __module_address(). module: Use RCU in search_module_extables(). ...	2025-03-30 15:44:36 -07:00
Linus Torvalds	7405c0f01a	Miscellaneous x86 fixes and updates: - Fix a large number of x86 Kconfig dependency and help text accuracy bugs/problems, by Mateusz Jończyk and David Heideberg. - Fix a VM_PAT interaction with fork() crash. This also touches core kernel code. - Fix an ORC unwinder bug for interrupt entries - Fixes and cleanups. - Fix an AMD microcode loader bug that can promote verification failures into success. - Add early-printk support for MMIO based UARTs on an x86 board that had no other serial debugging facility and also experienced early boot crashes. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfnFBERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iVDxAAmiB4soT3/WbaWJJdeVyxEL7sOmUNOm04 5kAVHJVK8QGdje0eWa6h7xmuQD3UOxafE2coCrOxHhZi2qpAAY6CPIIy6oIBRwZK gLgT5xn1CHojfm4UFC3YUOyecBRPUF2C5jfkajWdZHumyPP/sOObqvGanpQRAYd5 bfPHEvrBpeEeS7WkATCdyF2j+I5xYflD4g/MDAsMmqasQHOnjBuFX5VBeVxxkysC dMsFkFpxqcA95MnnyOnxXzgOtRTY0UystX07D3Bk1pqhG9zor+mp8OynsTRCU87T ZPPbUr2qACNmCqEEXl+F1mAkgj5H66xE2gaJdYx0/jBAIbX8Nwih7mMxhJShVU07 Lhc0tukmVrDoDaVIr2HsxqI8iokuYLszUjDAqEQmQDrgelL6usPYghN1b2bDSJ9r 0hCO/s79024H/U9oMrC+CF52D5UH/fE98ipigrbKRIO/hOsoxiiniF3DG2NVWZM2 n5nPnOdbperqjCEteN1nxQfr7XZkvP95Bwmuqqc90XH+tzKJdHruUkbm4ua7NEEz WKgsUIYFjeN5ZrHbJaNtHlQueTyvsyGmL1nlaLi/MaJbSXPsM/WfwvHsaKTh3NrE BFwEAhMZVLDHEfnFT0Ev7Mm1MGpW8MbHoRBR1+E5FWWNS4X0yGLKXWRp8diw25Tm W3ZVsn65E6U= =/qKX -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes and updates from Ingo Molnar: - Fix a large number of x86 Kconfig dependency and help text accuracy bugs/problems, by Mateusz Jończyk and David Heideberg - Fix a VM_PAT interaction with fork() crash. This also touches core kernel code - Fix an ORC unwinder bug for interrupt entries - Fixes and cleanups - Fix an AMD microcode loader bug that can promote verification failures into success - Add early-printk support for MMIO based UARTs on an x86 board that had no other serial debugging facility and also experienced early boot crashes * tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Fix __apply_microcode_amd()'s return value x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range() x86/fpu: Update the outdated comment above fpstate_init_user() x86/early_printk: Add support for MMIO-based UARTs x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1 x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text x86/Kconfig: Correct X86_X2APIC help text x86/speculation: Remove the extra #ifdef around CALL_NOSPEC x86/Kconfig: Document release year of glibc 2.3.3 x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32 x86/Kconfig: Document CONFIG_PCI_MMCONFIG x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE x86/Kconfig: Enable X86_X2APIC by default and improve help text	2025-03-30 15:25:15 -07:00
Linus Torvalds	b4c5c57c2d	Miscellaneous locking fixes and updates: - Fix a locking self-test FAIL on PREEMPT_RT kernels - Fix nr_unused_locks accounting bug - Simplify the split-lock debugging feature's fast-path Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfnDqERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jXVw/+IGVWjYnw5Ac+HyccKwY7+OEmKiGx+9C4 yS3pc1wrofkpIVRsr0GuOv2h5V+YZJvuRrZPbatVidUkEJAQ++Iaeef+gnunfMX2 0mRk5B0CG1M+y3lNFOhIEafkGDQOtdFTOmoRv508Xj3mb8cRMRCgkf8S3600l2IB ZEw67f8waHy1PwKC+dFlr0QjGL0+qbkrDqB1m4VSy0/egaSNMFjYVLLBAe6gn+/X zU/SWBqyV9jSfmwABSiKCpjt/GziuwGoADJpxAbSR/1NaCy866VSBLRjiiLI/c5q xdKd3GyD3IhLDCbzKrwVugeioxoPzXLmZKajVhZARv2kPW8DbzUMymP/eVkOf7VJ 6dDNV3Yyq568YQBi3PQu2vESumHRctLOsRSAr2TnXlbFzBvjBM+Ia11e+p4mg9FU Hcn98Rt3FfgigrzH4IeLaYTbm6A5amj2ymoMwjR/vchnB8tlXddc3/KumIdak04Q hHRHA0cyNg1YvB/8yB0NKf5jETpbYaaKhCO9KlsceLDjuhrwlmi/XFz+bwBJIBgD zug2hevSW2RVRGidJ5Qz91qG76xBkhH038qorcegzGybMQ0p+Hw+/BN7OzPSueaq ZMRFde3CtrB332SS73KToG5XvyuffhW8XyCvkOt5W6z9P7n3UOFzDcKr/wrePV9d 9kEJPdOCOwM= =jcdr -----END PGP SIGNATURE----- Merge tag 'locking-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc locking fixes and updates from Ingo Molnar: - Fix a locking self-test FAIL on PREEMPT_RT kernels - Fix nr_unused_locks accounting bug - Simplify the split-lock debugging feature's fast-path * tag 'locking-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class() lockdep: Fix wait context check on softirq for PREEMPT_RT x86/split_lock: Simplify reenabling	2025-03-30 15:18:36 -07:00
Boris Ostrovsky	31ab12df72	x86/microcode/AMD: Fix __apply_microcode_amd()'s return value When verify_sha256_digest() fails, __apply_microcode_amd() should propagate the failure by returning false (and not -1 which is promoted to true). Fixes: `50cef76d5c` ("x86/microcode/AMD: Load only SHA256-checksummed patches") Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250327230503.1850368-2-boris.ostrovsky@oracle.com	2025-03-28 12:49:02 +01:00
Linus Torvalds	7b667acd69	powerpc updates for 6.15 - Removal of support for IBM Cell Blades - SMP support for microwatt platform - Support for inline static calls on PPC32 - Enable pmu selftests for power11 platform - Enable hardware trace macro (HTM) hcall support - Support for limited address mode capability - Changes to RMA size from 512 MB to 768 MB to handle fadump - Misc fixes and cleanups Thanks to: Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann, Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom, Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook, Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani (IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav Jain, Venkat Rao Bagalkote. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmfjZnYACgkQpnEsdPSH ZJTmKg/+NGW7wyFY9d8Iai9ncYY7GSzsMSDTaan7qg0QWOd5gjHbsdbava7TM/DW 8p9XsC+17kSeftRNUjtc52bSN8Ei2gBdsXIagQG1alfB2X2e6wkNauifK+dz3Su6 usMEZZTO5R/jFDotFXNM1nsUj+8dvjnPgOUrji/P8k7PT5295wpza0hz1fy5SrOA hM5cliBP36UgFe5Efvgm4OUX2gQIhbc3stt9MVfymW/k0Mit5f41UIPuVGiTWowY s0cUJGkhxUlGXT3VfOVKuZfn4u9KMha7UCl9afSceJzXOdnUIKIbskui1VEv6cD/ iSIxi839uErAobFHlsLYprgYFciYLII3xe2qNZCA/ZxeIMS/Mm6xokESeWLhBnfa P7ke6l0z3GDtTvgI2eSeU9BdrVveF1NgbP9GYSKgT6gtw/kRRnxgHF8tzmLON5PT KXpQlzz8VuSBRtF2jnLFU89+FFwSA1bRUhDrp89HyYFqw1B5g4N7kFFTUJWHOuKS fwPGy+cveKehmCUBedeTRFqHvvqdwpD/WnPlQzCly3WxqdL8U/eTXYftMiAwuK28 ovLuSs3vRThKRQ8DnUa5oB0UGsjMpRV5LdvYkhw+x8mZKUR59oj4fx2ae4TtPakg dbAYuPPkCORdaSga/nV6vQgsLprFpcGX3dq6E19+BVBAY5D+1PE= =GFcj -----END PGP SIGNATURE----- Merge tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Madhavan Srinivasan: - Remove support for IBM Cell Blades - SMP support for microwatt platform - Support for inline static calls on PPC32 - Enable pmu selftests for power11 platform - Enable hardware trace macro (HTM) hcall support - Support for limited address mode capability - Changes to RMA size from 512 MB to 768 MB to handle fadump - Misc fixes and cleanups Thanks to Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann, Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom, Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook, Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani (IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav Jain, and Venkat Rao Bagalkote. * tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (61 commits) powerpc/kexec: fix physical address calculation in clear_utlb_entry() crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD powerpc: Fix 'intra_function_call not a direct call' warning powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu' KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7 powerpc/microwatt: Add SMP support powerpc: Define config option for processors with broadcast TLBIE powerpc/microwatt: Define an idle power-save function powerpc/microwatt: Device-tree updates powerpc/microwatt: Select COMMON_CLK in order to get the clock framework net: toshiba: Remove reference to PPC_IBM_CELL_BLADE net: spider_net: Remove powerpc Cell driver cpufreq: ppc_cbe: Remove powerpc Cell driver genirq: Remove IRQ_EDGE_EOI_HANDLER docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR powerpc: Remove UDBG_RTAS_CONSOLE powerpc/io: Use standard barrier macros in io.c powerpc/io: Rename _insw_ns() etc. powerpc/io: Use generic raw accessors ...	2025-03-27 19:39:08 -07:00
Vishal Annapurve	9f98a4f4e7	x86/tdx: Fix arch_safe_halt() execution for TDX VMs Direct HLT instruction execution causes #VEs for TDX VMs which is routed to hypervisor via TDCALL. If HLT is executed in STI-shadow, resulting #VE handler will enable interrupts before TDCALL is routed to hypervisor leading to missed wakeup events, as current TDX spec doesn't expose interruptibility state information to allow #VE handler to selectively enable interrupts. Commit `bfe6ed0c67` ("x86/tdx: Add HLT support for TDX guests") prevented the idle routines from executing HLT instruction in STI-shadow. But it missed the paravirt routine which can be reached via this path as an example: kvm_wait() => safe_halt() => raw_safe_halt() => arch_safe_halt() => irq.safe_halt() => pv_native_safe_halt() To reliably handle arch_safe_halt() for TDX VMs, introduce explicit dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt() routines with TDX-safe versions that execute direct TDCALL and needed interrupt flag updates. Executing direct TDCALL brings in additional benefit of avoiding HLT related #VEs altogether. As tested by Ryan Afranji: "Tested with the specjbb2015 benchmark. It has heavy lock contention which leads to many halt calls. TDX VMs suffered a poor score before this patchset. Verified the major performance improvement with this patchset applied." Fixes: `bfe6ed0c67` ("x86/tdx: Add HLT support for TDX guests") Signed-off-by: Vishal Annapurve <vannapurve@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Ryan Afranji <afranji@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250228014416.3925664-3-vannapurve@google.com	2025-03-26 08:51:20 +01:00
Kirill A. Shutemov	22cc5ca5de	x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT CONFIG_PARAVIRT_XXL is mainly defined/used by XEN PV guests. For other VM guest types, features supported under CONFIG_PARAVIRT are self sufficient. CONFIG_PARAVIRT mainly provides support for TLB flush operations and time related operations. For TDX guest as well, paravirt calls under CONFIG_PARVIRT meets most of its requirement except the need of HLT and SAFE_HLT paravirt calls, which is currently defined under CONFIG_PARAVIRT_XXL. Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest like platforms, move HLT and SAFE_HLT paravirt calls under CONFIG_PARAVIRT. Moving HLT and SAFE_HLT paravirt calls are not fatal and should not break any functionality for current users of CONFIG_PARAVIRT. Fixes: `bfe6ed0c67` ("x86/tdx: Add HLT support for TDX guests") Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Vishal Annapurve <vannapurve@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Tested-by: Ryan Afranji <afranji@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20250228014416.3925664-2-vannapurve@google.com	2025-03-26 08:48:18 +01:00
Linus Torvalds	ee6740fd34	CRC updates for 6.15 Another set of improvements to the kernel's CRC (cyclic redundancy check) code: - Rework the CRC64 library functions to be directly optimized, like what I did last cycle for the CRC32 and CRC-T10DIF library functions. - Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ support and acceleration for crc64_be and crc64_nvme. - Rewrite the riscv Zbc-optimized CRC code, and add acceleration for crc_t10dif, crc64_be, and crc64_nvme. - Remove crc_t10dif and crc64_rocksoft from the crypto API, since they are no longer needed there. - Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect. - Add kunit test cases for crc64_nvme and crc7. - Eliminate redundant functions for calculating the Castagnoli CRC32, settling on just crc32c(). - Remove unnecessary prompts from some of the CRC kconfig options. - Further optimize the x86 crc32c code. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZ+CGGhQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK3wRAP4tbnzawUmlIHIF0hleoADXehUgAhMt NZn15mGvyiuwIQEA8W9qvnLdFXZkdxhxAEvDDFjyrRauL6eGtr/GvCx4AQY= =wmKG -----END PGP SIGNATURE----- Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: "Another set of improvements to the kernel's CRC (cyclic redundancy check) code: - Rework the CRC64 library functions to be directly optimized, like what I did last cycle for the CRC32 and CRC-T10DIF library functions - Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ support and acceleration for crc64_be and crc64_nvme - Rewrite the riscv Zbc-optimized CRC code, and add acceleration for crc_t10dif, crc64_be, and crc64_nvme - Remove crc_t10dif and crc64_rocksoft from the crypto API, since they are no longer needed there - Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect - Add kunit test cases for crc64_nvme and crc7 - Eliminate redundant functions for calculating the Castagnoli CRC32, settling on just crc32c() - Remove unnecessary prompts from some of the CRC kconfig options - Further optimize the x86 crc32c code" * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (36 commits) x86/crc: drop the avx10_256 functions and rename avx10_512 to avx512 lib/crc: remove unnecessary prompt for CONFIG_CRC64 lib/crc: remove unnecessary prompt for CONFIG_LIBCRC32C lib/crc: remove unnecessary prompt for CONFIG_CRC8 lib/crc: remove unnecessary prompt for CONFIG_CRC7 lib/crc: remove unnecessary prompt for CONFIG_CRC4 lib/crc7: unexport crc7_be_syndrome_table lib/crc_kunit.c: update comment in crc_benchmark() lib/crc_kunit.c: add test and benchmark for crc7_be() x86/crc32: optimize tail handling for crc32c short inputs riscv/crc64: add Zbc optimized CRC64 functions riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function riscv/crc32: reimplement the CRC32 functions using new template riscv/crc: add "template" for Zbc optimized CRC functions x86/crc: add ANNOTATE_NOENDBR to suppress objtool warnings x86/crc32: improve crc32c_arch() code generation with clang x86/crc64: implement crc64_be and crc64_nvme using new template x86/crc-t10dif: implement crc_t10dif using new template x86/crc32: implement crc32_le using new template x86/crc: add "template" for [V]PCLMULQDQ based CRC functions ...	2025-03-25 18:33:04 -07:00
Linus Torvalds	7d20aa5c32	Power management updates for 6.15-rc1 - Manage sysfs attributes and boost frequencies efficiently from cpufreq core to reduce boilerplate code in drivers (Viresh Kumar). - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider, Dhananjay Ugwekar, Imran Shaik, zuoqian). - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky Bai). - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski). - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng). - Optimize the amd-pstate driver to avoid cases where call paths end up calling the same writes multiple times and needlessly caching variables through code reorganization, locking overhaul and tracing adjustments (Mario Limonciello, Dhananjay Ugwekar). - Make it possible to avoid enabling capacity-aware scheduling (CAS) in the intel_pstate driver and relocate a check for out-of-band (OOB) platform handling in it to make it detect OOB before checking HWP availability (Rafael Wysocki). - Fix dbs_update() to avoid inadvertent conversions of negative integer values to unsigned int which causes CPU frequency selection to be inaccurate in some cases when the "conservative" cpufreq governor is in use (Jie Zhan). - Update the handling of the most recent idle intervals in the menu cpuidle governor to prevent useful information from being discarded by it in some cases and improve the prediction accuracy (Rafael Wysocki). - Make it possible to tell the intel_idle driver to ignore its built-in table of idle states for the given processor, clean up the handling of auto-demotion disabling on Baytrail and Cherrytrail chips in it, and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy, Rafael Wysocki). - Make some cpuidle drivers use for_each_present_cpu() instead of for_each_possible_cpu() during initialization to avoid issues occurring when nosmp or maxcpus=0 are used (Jacky Bai). - Clean up the Energy Model handling code somewhat (Rafael Wysocki). - Use kfree_rcu() to simplify the handling of runtime Energy Model updates (Li RongQing). - Add an entry for the Energy Model framework to MAINTAINERS as properly maintained (Lukasz Luba). - Address RCU-related sparse warnings in the Energy Model code (Rafael Wysocki). - Remove ENERGY_MODEL dependency on SMP and allow it to be selected when DEVFREQ is set without CPUFREQ so it can be used on a wider range of systems (Jeson Gao). - Unify error handling during runtime suspend and runtime resume in the core to help drivers to implement more consistent runtime PM error handling (Rafael Wysocki). - Drop a redundant check from pm_runtime_force_resume() and rearrange documentation related to __pm_runtime_disable() (Rafael Wysocki). - Rework the handling of the "smart suspend" driver flag in the PM core to avoid issues hat may occur when drivers using it depend on some other drivers and clean up the related PM core code (Rafael Wysocki, Colin Ian King). - Fix the handling of devices with the power.direct_complete flag set if device_suspend() returns an error for at least one device to avoid situations in which some of them may not be resumed (Rafael Wysocki). - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a possible deadlock that may occur if the "compressor" hibernation module parameter is accessed during the registration of a new ieee80211 device (Lizhi Xu). - Suppress sleeping parent warning in device_pm_add() in the case when new children are added under a device with the power.direct_complete set after it has been processed by device_resume() (Xu Yang). - Remove needless return in three void functions related to system wakeup (Zijun Hu). - Replace deprecated kmap_atomic() with kmap_local_page() in the hibernation core code (David Reaver). - Remove unused helper functions related to system sleep (David Alan Gilbert). - Clean up s2idle_enter() so it does not lock and unlock CPU offline in vain and update comments in it (Ulf Hansson). - Clean up broken white space in dpm_wait_for_children() (Geert Uytterhoeven). - Update the cpupower utility to fix lib version-ing in it and memory leaks in error legs, remove hard-coded values, and implement CPU physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah Khan, Yiwei Lin, Zhongqiu Han). -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmfhhTYSHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO16/gIAKuRiG1fFgUcUSXC1iFu42vrB/1i4wpA 02GICACqM3K6/5jd3ct/WOU28GUgDs+xcmqH7CnMaM6y9nXEWjWarmSfFekAO+0q TPtQ7xTy0hBCB3he1P2uLKBJBin4Wn47U9/rvs4J7mQd5zDxTINKIiVoHg2lEE+s HAeSoNRb2sp5IZDm9+/LfhHNYRP1mJ97cbZlymqctGB3xgDL7qMLid/1+gFPHAQS 4/LXj3IgyU8DpA/j5nhtpaAqjN5g2QxIUfQgADRIcESK99Y/7aAMs1/G0WhJKaay 9yx+4/xmkGvVCZQx1DphksFLISEzltY0SFWLsoppPzBTGVEW2GQQsNI= =LqVy -----END PGP SIGNATURE----- Merge tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These are dominated by cpufreq updates which in turn are dominated by updates related to boost support in the core and drivers and amd-pstate driver optimizations. Apart from the above, there are some cpuidle updates including a rework of the most recent idle intervals handling in the venerable menu governor that leads to significant improvements in some performance benchmarks, as the governor is now more likely to predict a shorter idle duration in some cases, and there are updates of the core device power management code, mostly related to system suspend and resume, that should help to avoid potential issues arising when the drivers of devices depending on one another want to use different optimizations. There is also a usual collection of assorted fixes and cleanups, including removal of some unused code. Specifics: - Manage sysfs attributes and boost frequencies efficiently from cpufreq core to reduce boilerplate code in drivers (Viresh Kumar) - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider, Dhananjay Ugwekar, Imran Shaik, zuoqian) - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky Bai) - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski) - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng) - Optimize the amd-pstate driver to avoid cases where call paths end up calling the same writes multiple times and needlessly caching variables through code reorganization, locking overhaul and tracing adjustments (Mario Limonciello, Dhananjay Ugwekar) - Make it possible to avoid enabling capacity-aware scheduling (CAS) in the intel_pstate driver and relocate a check for out-of-band (OOB) platform handling in it to make it detect OOB before checking HWP availability (Rafael Wysocki) - Fix dbs_update() to avoid inadvertent conversions of negative integer values to unsigned int which causes CPU frequency selection to be inaccurate in some cases when the "conservative" cpufreq governor is in use (Jie Zhan) - Update the handling of the most recent idle intervals in the menu cpuidle governor to prevent useful information from being discarded by it in some cases and improve the prediction accuracy (Rafael Wysocki) - Make it possible to tell the intel_idle driver to ignore its built-in table of idle states for the given processor, clean up the handling of auto-demotion disabling on Baytrail and Cherrytrail chips in it, and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy, Rafael Wysocki) - Make some cpuidle drivers use for_each_present_cpu() instead of for_each_possible_cpu() during initialization to avoid issues occurring when nosmp or maxcpus=0 are used (Jacky Bai) - Clean up the Energy Model handling code somewhat (Rafael Wysocki) - Use kfree_rcu() to simplify the handling of runtime Energy Model updates (Li RongQing) - Add an entry for the Energy Model framework to MAINTAINERS as properly maintained (Lukasz Luba) - Address RCU-related sparse warnings in the Energy Model code (Rafael Wysocki) - Remove ENERGY_MODEL dependency on SMP and allow it to be selected when DEVFREQ is set without CPUFREQ so it can be used on a wider range of systems (Jeson Gao) - Unify error handling during runtime suspend and runtime resume in the core to help drivers to implement more consistent runtime PM error handling (Rafael Wysocki) - Drop a redundant check from pm_runtime_force_resume() and rearrange documentation related to __pm_runtime_disable() (Rafael Wysocki) - Rework the handling of the "smart suspend" driver flag in the PM core to avoid issues hat may occur when drivers using it depend on some other drivers and clean up the related PM core code (Rafael Wysocki, Colin Ian King) - Fix the handling of devices with the power.direct_complete flag set if device_suspend() returns an error for at least one device to avoid situations in which some of them may not be resumed (Rafael Wysocki) - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a possible deadlock that may occur if the "compressor" hibernation module parameter is accessed during the registration of a new ieee80211 device (Lizhi Xu) - Suppress sleeping parent warning in device_pm_add() in the case when new children are added under a device with the power.direct_complete set after it has been processed by device_resume() (Xu Yang) - Remove needless return in three void functions related to system wakeup (Zijun Hu) - Replace deprecated kmap_atomic() with kmap_local_page() in the hibernation core code (David Reaver) - Remove unused helper functions related to system sleep (David Alan Gilbert) - Clean up s2idle_enter() so it does not lock and unlock CPU offline in vain and update comments in it (Ulf Hansson) - Clean up broken white space in dpm_wait_for_children() (Geert Uytterhoeven) - Update the cpupower utility to fix lib version-ing in it and memory leaks in error legs, remove hard-coded values, and implement CPU physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah Khan, Yiwei Lin, Zhongqiu Han)" * tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits) PM: sleep: Fix bit masking operation dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650 dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1 dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible cpufreq: Init cpufreq only for present CPUs PM: sleep: Fix handling devices with direct_complete set on errors cpuidle: Init cpuidle only for present CPUs PM: clk: Remove unused pm_clk_remove() PM: sleep: core: Fix indentation in dpm_wait_for_children() PM: s2idle: Extend comment in s2idle_enter() PM: s2idle: Drop redundant locks when entering s2idle PM: sleep: Remove unused pm_generic_ wrappers cpufreq: tegra186: Share policy per cluster cpupower: Make lib versioning scheme more obvious and fix version link PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL PM: EM: Address RCU-related sparse warnings cpupower: Implement CPU physical core querying pm: cpupower: remove hard-coded topology depth values pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology ...	2025-03-25 15:00:18 -07:00
Linus Torvalds	21e0ff5b10	ACPI updates for 6.15-rc1 - Use the str_on_off() helper function instead of hard-coded strings in the ACPI power resources handling code (Thorsten Blum). - Add fan speed reporting for ACPI fans that have _FST, but otherwise do not support the entire ACPI 4 fan interface (Joshua Grisham). - Fix a stale comment regarding trip points in acpi_thermal_add() that diverged from the commented code after removing _CRT evaluation from acpi_thermal_get_trip_points() (xueqin Luo). - Make ACPI button driver also subscribe to system events (Mario Limonciello). - Use the str_yes_no() helper function instead of hard-coded strings in the ACPI backlight (video) driver (Thorsten Blum). - Add a missing header file include to the x86 arch CPPC code (Mario Limonciello). - Rework the sysfs attributes implementation in the ACPI platform-profile driver and improve the unregistration code in it (Nathan Chancellor, Kurt Borja). - Prevent the ACPI HED driver from being built as a module and change its initcall level to subsys_initcall to avoid initialization ordering issues related to it (Xiaofei Tan). - Update a maintainer email address in the ACPI PMIC entry in MAINTAINERS (Mika Westerberg). - Address a GCC 15's -Wunterminated-string-initialization warning in the core PNP subsystem code and remove some dead code from it (Kees Cook, David Alan Gilbert). -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmfhhZYSHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO1yCMH/3IbFftpA2sJg504igRgLdMzDAhTc/Y3 x8Y37v1e1Psxyp0SQni84H4E11QWaytSXngemnp39+LgN+14KW243z6v+PBGioyI +cdJw7kAk8v1aX+Ujel2Z3BIz9QPFqZd6d3R0AsZkZcI/28VW7kHNRXZ6p2kYmxK 7acx0Y1cM1k0UotzpzQ4RDaTnFNKUGJFQwdKTEJU237gsFrVh8ev2hLm9Hy4FU6R zGtjrU2/oBEmoCVgrXG4n6bZYP4dZwCZ8ewckIvepGuTPTHP8tYMxrELw/3A1901 +lnN6zK6nMpvCd9cl0ongT5iFG4gsuBGanvnP7Mf51YI380jmNSLeCI= =/b+r -----END PGP SIGNATURE----- Merge tag 'acpi-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "From the functional perspective, the most significant changes here are the ACPI fan driver update allowing it to handle fans with fine-grained state checking supported, but without fine-grained control, and the ACPI button driver update making it subscribe to system event notifications (in addition to device notifications) which on some systems is requisite for waking up the system from sleep. The rest is fixes and cleanups including removal of some dead code. Specifics: - Use the str_on_off() helper function instead of hard-coded strings in the ACPI power resources handling code (Thorsten Blum) - Add fan speed reporting for ACPI fans that have _FST, but otherwise do not support the entire ACPI 4 fan interface (Joshua Grisham) - Fix a stale comment regarding trip points in acpi_thermal_add() that diverged from the commented code after removing _CRT evaluation from acpi_thermal_get_trip_points() (xueqin Luo) - Make ACPI button driver also subscribe to system events (Mario Limonciello) - Use the str_yes_no() helper function instead of hard-coded strings in the ACPI backlight (video) driver (Thorsten Blum) - Add a missing header file include to the x86 arch CPPC code (Mario Limonciello) - Rework the sysfs attributes implementation in the ACPI platform-profile driver and improve the unregistration code in it (Nathan Chancellor, Kurt Borja) - Prevent the ACPI HED driver from being built as a module and change its initcall level to subsys_initcall to avoid initialization ordering issues related to it (Xiaofei Tan) - Update a maintainer email address in the ACPI PMIC entry in MAINTAINERS (Mika Westerberg) - Address a GCC 15's -Wunterminated-string-initialization warning in the core PNP subsystem code and remove some dead code from it (Kees Cook, David Alan Gilbert)" * tag 'acpi-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PNP: Expand length of fixup id string PNP: Remove prehistoric deadcode ACPI: button: Install notifier for system events as well ACPI: fan: Add fan speed reporting for fans with only _FST ACPI: HED: Always initialize before evged x86/ACPI: CPPC: Add missing include ACPI: video: Use str_yes_no() helper in acpi_video_bus_add() ACPI: platform_profile: Improve platform_profile_unregister() ACPI: platform-profile: Fix CFI violation when accessing sysfs files ACPI: power: Use str_on_off() helper function ACPI: thermal: Fix stale comment regarding trip points MAINTAINERS: Use my kernel.org address for ACPI PMIC work	2025-03-25 14:56:33 -07:00
Linus Torvalds	a5b3d8660b	hyperv-next for 6.15 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmfhlLATHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXgchCADOz33rSm4G4w4r0qT05dTDi/lZkEdK 64dQq322XXP/C9FfR66d30243gsAmuM5a0SvzFHLXAOu6yqM270Xehd/Rud+Um2s lSVnc0Ux0AWBgksqFd0t577aN7zmJEukosEYO5lBNop+zOcadrm3S6Th/AoL2h/D yphPkhH13bsCK+Wll/eBOQLIhC9iA0konYbBLuEQ5MqvUbrzc6Rmb5gxsHHZKOqg vLjkrYR/d3s2gIpKxiFp0RwvzGyffZEHxvU/YF3hTenPMlTlnXWbyspBSTVmWggP 13IFLzqxDdW9RgUnGB4xRc424AC1LKqEr42QPQE7zGvl2jdJriA2Q1LT =BXqj -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Add support for running as the root partition in Hyper-V (Microsoft Hypervisor) by exposing /dev/mshv (Nuno and various people) - Add support for CPU offlining in Hyper-V (Hamza Mahfooz) - Misc fixes and cleanups (Roman Kisel, Tianyu Lan, Wei Liu, Michael Kelley, Thorsten Blum) * tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits) x86/hyperv: fix an indentation issue in mshyperv.h x86/hyperv: Add comments about hv_vpset and var size hypercall input args Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs hyperv: Add definitions for root partition driver to hv headers x86: hyperv: Add mshv_handler() irq handler and setup function Drivers: hv: Introduce per-cpu event ring tail Drivers: hv: Export some functions for use by root partition module acpi: numa: Export node_to_pxm() hyperv: Introduce hv_recommend_using_aeoi() arm64/hyperv: Add some missing functions to arm64 x86/mshyperv: Add support for extended Hyper-V features hyperv: Log hypercall status codes as strings x86/hyperv: Fix check of return value from snp_set_vmsa() x86/hyperv: Add VTL mode callback for restarting the system x86/hyperv: Add VTL mode emergency restart callback hyperv: Remove unused union and structs hyperv: Add CONFIG_MSHV_ROOT to gate root partition support hyperv: Change hv_root_partition into a function hyperv: Convert hypercall statuses to linux error codes drivers/hv: add CPU offlining support ...	2025-03-25 14:47:04 -07:00
Linus Torvalds	0d86c23953	- A cleanup to the MCE notification machinery -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmfixMAACgkQEsHwGGHe VUrZ/A/+LmqCE4TQEsS04oLezqu9hfPHiQ1Z1DK1RpjLyx23YZGBoYslGHtltknZ rXt15UDINSYXlbnVBzIFtuTQX8GKWySr/DbN9rV/rb4xW81giOPwW96/M/LfnPBh kV3KCrgsPvIfKE1pmCkK0ThkemcVcjtvq83Jpn3C6ppsqDZdOrSH+e8ZCnKSVK3q n3IwDQXmBJXV+wZFiAMvMTqUpJVNCTeiQj+ACrfOqgnAZsGFsEsEKqZkdO2ouaEy 7QAdY+6AELX3LnAu4mJKDVsZ/HSUip7uVOqqM02TMw5O4z/cpzd3rH8tObSTW9hr DLlLXmfOJliQdJGHAECv79DiViycF6axVdZh6WvfvJHzZYUyNSrjoWUTDUW3JFI1 ZikhBh/hQlfas12k0dYYcObgF1li45LyfFl/uSyIfoO1aIno+Od8yv1/jRrd8s50 7ehS5OFtpb4EsqCED2arAsiDiaoHwrYWAP8aoJVwXg5AZB6ShritSC6QlQpOgDCw 81VOeARaJoYJggxDzxGYCjLQORzoweDuuMs41qZLqn3DfinYvotHUjo6j5DL2JEm iFEce2NeKvi+T2dB8k1EzqyGL0VKSh1ogI53RzGnaWUt1f8JnJuM7Je+VXI5LGL3 Ce8sSVzZbc5MFPOCncoxXw7f68aND+P0lm+yA79lVMT7ytp54KA= =rY2E -----END PGP SIGNATURE----- Merge tag 'ras_core_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS update from Borislav Petkov: - A cleanup to the MCE notification machinery * tag 'ras_core_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/inject: Remove call to mce_notify_irq()	2025-03-25 14:13:35 -07:00
Linus Torvalds	2899aa3973	- First part of the MPAM work: split the architectural part of resctrl from the filesystem part so that ARM's MPAM varian of resource control can be added later while sharing the user interface with x86 (James Morse) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmfi118ACgkQEsHwGGHe VUrhZhAAj9brJYnluZpNgOMl231QRJaK0Exz1TLMFvmEZxQSnRs6TJ4PVDqU7QQb lrvqbobf77BfO8u3jtFLZvcoXxG+zzaTEoDEqGc/57Gu4G/kC64S8kYWa88aUf4I lHS5kZvNUxVBh4L/33QaprigN61pZbhLoejOCdr3zWRJ62+/xoNXs1rV8N3Zwgdv 6p/B56MMi1CBXXHbFSzBI1bXSb/gW9jMjTnvrHbg3sOzrvVVuigJMVYgEfcEi0lh npc0Iz/Gz3Bzemxcl05bm2eJ+Z9WR9CIHMp+PAewqL7eJCV0OBHUkClU9Ui92Js+ BA7XhL4XAnnZAaXHQoBfskGzcQ91pWPpkjJwSQO7y3zl8A8lvTFJCb89tZMWiLDl bF9MmbyjJFMtEaIYLHlhoasilN2laRrnTW41ZhxEtSJ0IofE4OInJ2+pPB/TfT7O HfZtkadIDrH6p5qLXy9bRwPxHskuM+NX0bw0OxWfu49DGw3O8pRhTFkiQ/+ofuBb oJNwVBAH11AiXUZBR1ZunpYEkwMFlL4FyNOkq/OS6C51UUE72dYITR5HB0/wkTp2 cc2oiX3CSQPKrA4G8BAvMb7zGTmryXRZ7nOkTzScVTm8BoyyZf9F69aTpg1Deuuf W8Z9WrabVBCEs7EhZ7OH9bvmpBFapoNDUwmt+gnTAw6U0QDspZw= =FRzf -----END PGP SIGNATURE----- Merge tag 'x86_cache_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - First part of the MPAM work: split the architectural part of resctrl from the filesystem part so that ARM's MPAM varian of resource control can be added later while sharing the user interface with x86 (James Morse) * tag 'x86_cache_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) x86/resctrl: Move get_{mon,ctrl}_domain_from_cpu() to live with their callers x86/resctrl: Move get_config_index() to a header x86/resctrl: Handle throttle_mode for SMBA resources x86/resctrl: Move RFTYPE flags to be managed by resctrl x86/resctrl: Make resctrl_arch_pseudo_lock_fn() take a plr x86/resctrl: Make prefetch_disable_bits belong to the arch code x86/resctrl: Allow an architecture to disable pseudo lock x86/resctrl: Add resctrl_arch_ prefix to pseudo lock functions x86/resctrl: Move mbm_cfg_mask to struct rdt_resource x86/resctrl: Move mba_mbps_default_event init to filesystem code x86/resctrl: Change mon_event_config_{read,write}() to be arch helpers x86/resctrl: Add resctrl_arch_is_evt_configurable() to abstract BMEC x86/resctrl: Move the is_mbm__enabled() helpers to asm/resctrl.h x86/resctrl: Rewrite and move the for_each__rdt_resource() walkers x86/resctrl: Move monitor init work to a resctrl init call x86/resctrl: Move monitor exit work to a resctrl exit call x86/resctrl: Add an arch helper to reset one resource x86/resctrl: Move resctrl types to a separate header x86/resctrl: Move rdt_find_domain() to be visible to arch and fs code x86/resctrl: Expose resctrl fs's init function to the rest of the kernel ...	2025-03-25 13:51:28 -07:00
Linus Torvalds	906174776c	- Some preparatory work to convert the mitigations machinery to mitigating attack vectors instead of single vulnerabilities - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag - Add support for a Zen5-specific SRSO mitigation - Cleanups and minor improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmfixS0ACgkQEsHwGGHe VUpi1xAAgvH2u8Eo8ibT5dABQpD65w3oQiykO+9aDpObG9w9beDVGlld8DJE61Rz 6tcE0Clp2H/tMcCbn8zXIJ92TQ3wIX/85uZwLi1VEM1Tx7A6VtAbPv8WKfZE3FCX 9v92HRKnK3ql+A2ZR+oyy+/8RedUmia7y7/bXH1H7Zf2uozoKkmq5cQnwfq5iU4A qNiKuvSlQwjZ8Zz6Ax1ugHUkE4R7mlKh8rccLXl4+mVr63/lkPHSY3OFTjcYf4HW Ir92N86Spfo0/l0vsOOsWoYKmoaiVP7ouJh7YbKR3B0BGN0pt2MT476mehkEs427 m4J6XhRKhIrsYmzEkLvvpsg12zO4/PKk8BEYNS7YPYlRaOwjV4ivyFS2aY6e55rh yUHyo9s+16f/Mp+/fNFXll3mdMxYBioPWh3M191nJkdfyKMrtf0MdKPRibaJB8wH yMF4D1gMx+hFbs0/VOS6dtqD9DKW7VgPg0LW+RysfhnLTuFFb5iBcH6Of7l7Z/Ca vVK+JxrhB1EDVI1+MKnESKPF9c6j3DRa2xrQHi/XYje1TGqnQ1v4CmsEObYBuJDN 9M9t4QLzNuA/DA5tS7cxxtQ3YUthuJjPLcO4EVHOCvnqCAxkzp0i3dVMUr+YISl+ 2yFqaZdTt8s8FjTI21LOyuloCo30ZLlzaorFa0lp2cIyYup+1vg= =btX/ -----END PGP SIGNATURE----- Merge tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 speculation mitigation updates from Borislav Petkov: - Some preparatory work to convert the mitigations machinery to mitigating attack vectors instead of single vulnerabilities - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag - Add support for a Zen5-specific SRSO mitigation - Cleanups and minor improvements * tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2 x86/bugs: Use the cpu_smt_possible() helper instead of open-coded code x86/bugs: Add AUTO mitigations for mds/taa/mmio/rfds x86/bugs: Relocate mds/taa/mmio/rfds defines x86/bugs: Add X86_BUG_SPECTRE_V2_USER x86/bugs: Remove X86_FEATURE_USE_IBPB KVM: nVMX: Always use IBPB to properly virtualize IBRS x86/bugs: Use a static branch to guard IBPB on vCPU switch x86/bugs: Remove the X86_FEATURE_USE_IBPB check in ib_prctl_set() x86/mm: Remove X86_FEATURE_USE_IBPB checks in cond_mitigation() x86/bugs: Move the X86_FEATURE_USE_IBPB check into callers x86/bugs: KVM: Add support for SRSO_MSR_FIX	2025-03-25 13:30:18 -07:00
Linus Torvalds	2d09a9449e	arm64 updates for 6.15: Perf and PMUs: - Support for the "Rainier" CPU PMU from Arm - Preparatory driver changes and cleanups that pave the way for BRBE support - Support for partial virtualisation of the Apple-M1 PMU - Support for the second event filter in Arm CSPMU designs - Minor fixes and cleanups (CMN and DWC PMUs) - Enable EL2 requirements for FEAT_PMUv3p9 Power, CPU topology: - Support for AMUv1-based average CPU frequency - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It adds a generic topology_is_primary_thread() function overridden by x86 and powerpc New(ish) features: - MOPS (memcpy/memset) support for the uaccess routines Security/confidential compute: - Fix the DMA address for devices used in Realms with Arm CCA. The CCA architecture uses the address bit to differentiate between shared and private addresses - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by default Memory management clean-ups: - Drop the PD_TABLE_BIT definition in preparation for 128-bit PTEs - Some minor page table accessor clean-ups - PIE/POE (permission indirection/overlay) helpers clean-up Kselftests: - MTE: skip hugetlb tests if MTE is not supported on such mappings and user correct naming for sync/async tag checking modes Miscellaneous: - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people request) - Sysreg updates for new register fields - CPU type info for some Qualcomm Kryo cores -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmfjB2QACgkQa9axLQDI XvGrfg//W3Bx9+jw1G/XHHEQqGEVFmvltvxZUkvgV0Qki0rPSMnappJhZRL9n0Nm V6PvGd2KoKHZuL3g5ViZb3cs2R9BiD2JB6PncwBKuxumHGh3vz3kk1JMkDVfWdHv qAceOckFJD9rXjPZn+PDsfYiEi2i3RRWIP5VglZ14ue8j3prHQ6DJXLUQF2GYvzE /bgLSq44wp5N59ddy23+qH9rxrHzz3bgpbVv/F56W/LErvE873mRmyFwiuGJm+M0 Pn8ra572rI6a4sgSwrMTeNPBU+F9o5AbqwauVhkz428RdMvgfEuW6qHUBnGWJDmt HotXmu+4Eb2KJks/iQkDo4OTJ38yUqvvZZJtP171ms3E4yqESSJngWP6O2A6LF+y xhe0sESF/Ew6jLhM6/hvOmBcE2AyB14JE3ymqLkXbWub4NXddBn2AF1WXFjF4CBw F8KSUhNLekrCYKv1k9M3nhvkcpoS9FkTF/TI+zEg546alI/GLPih6uDRkgMAODh1 RDJYixHsf2NDDRQbfwvt9Xua/KKpDF6qNkHLA4OiqqVUwh1hkas24Lrnp8vmce4o wIpWCLqYWey8Rl3XWuWgWz2Xu58fHH4Dl2k72Z8I0pwp3abCDa9xEj79G0Svk7Si Q+FCYrNlpKee1RXBC+1MUD/Gl5r/28dEUFkAzPD80F7AgafXPd0= =Kc9c -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Nothing major this time around. Apart from the usual perf/PMU updates, some page table cleanups, the notable features are average CPU frequency based on the AMUv1 counters, CONFIG_HOTPLUG_SMT and MOPS instructions (memcpy/memset) in the uaccess routines. Perf and PMUs: - Support for the 'Rainier' CPU PMU from Arm - Preparatory driver changes and cleanups that pave the way for BRBE support - Support for partial virtualisation of the Apple-M1 PMU - Support for the second event filter in Arm CSPMU designs - Minor fixes and cleanups (CMN and DWC PMUs) - Enable EL2 requirements for FEAT_PMUv3p9 Power, CPU topology: - Support for AMUv1-based average CPU frequency - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It adds a generic topology_is_primary_thread() function overridden by x86 and powerpc New(ish) features: - MOPS (memcpy/memset) support for the uaccess routines Security/confidential compute: - Fix the DMA address for devices used in Realms with Arm CCA. The CCA architecture uses the address bit to differentiate between shared and private addresses - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by default Memory management clean-ups: - Drop the PD_TABLE_BIT definition in preparation for 128-bit PTEs - Some minor page table accessor clean-ups - PIE/POE (permission indirection/overlay) helpers clean-up Kselftests: - MTE: skip hugetlb tests if MTE is not supported on such mappings and user correct naming for sync/async tag checking modes Miscellaneous: - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people request) - Sysreg updates for new register fields - CPU type info for some Qualcomm Kryo cores" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits) arm64: mm: Don't use %pK through printk perf/arm_cspmu: Fix missing io.h include arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists arm64: cputype: Add MIDR_CORTEX_A76AE arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list arm64/sysreg: Enforce whole word match for open/close tokens arm64/sysreg: Fix unbalanced closing block arm64: Kconfig: Enable HOTPLUG_SMT arm64: topology: Support SMT control on ACPI based system arch_topology: Support SMT control for OF based system cpu/SMT: Provide a default topology_is_primary_thread() arm64/mm: Define PTDESC_ORDER perf/arm_cspmu: Add PMEVFILT2R support perf/arm_cspmu: Generalise event filtering perf/arm_cspmu: Move register definitons to header arm64/kernel: Always use level 2 or higher for early mappings arm64/mm: Drop PXD_TABLE_BIT arm64/mm: Check pmd_table() in pmd_trans_huge() ...	2025-03-25 13:16:16 -07:00
Linus Torvalds	0f40464674	Updates for interrupt chip drivers: - Support for hard indices on RISC-V. The hart index identifies a hart (core) within a specific interrupt domain in RISC-V's Priviledged Architecture. - Rework of the RISC-V MSI driver. This moves the driver over to the generic MSI library and solves the affinity problem of unmaskable PCI/MSI controllers. Unmaskable PCI/MSI controllers are prone to lose interrupts when the MSI message is updated to change the affinity because the message write consists of three 32-bit subsequent writes, which update address and data. As these writes are non-atomic versus the device raising an interrupt, the device can observe a half written update and issue an interrupt on the wrong vector. This is mitiated by a carefully orchestrated step by step update and the observation of an eventually pending interrupt on the CPU which issues the update. The algorithm follows the well established method of the X86 MSI driver. - A new driver for the RISC-V Sophgo SG2042 MSI controller - Overhaul of the Renesas RZQ2L driver. Simplification of the probe function by using devm_() mechanisms, which avoid the endless list of error prone gotos in the failure paths. - Expand the Renesas RZV2H driver to support RZ/G3E SoCs - A workaround for Rockchip 3568002 erratum in the GIC-V3 driver to ensure that the addressing is limited to the lower 32-bit of the physical address space. - Add support for the Allwinner AS23 NMI controller - Expand the IMX irqsteer driver to handle up to 960 input interrupts - The usual small updates, cleanups and device tree changes. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff454THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoZqoD/4kdHzbxfLpf7vC3NnG8NWwTq5FpbSx 6grQC9hWNMAs4n2IFjJRFLrjeX3AcdAQXL/BWuM0LfW9tQDQaVmqlSIlB/bn69KB 7HyAR6ozbOgnHKGAqFUXSLf+4pq+6q3mOgGKIF289dy14HFu4ta0DqKgkPZeQnVs R/J8i7REUnn+YuxzSt5eOqyDPyt2EHJosSUABSWQZBlrM9jy1W7f6NqDFwawiVsa +tv4U/bz91vjzVxwTIgt7nJK+b2HVYdxoZYuKJwPaTsj26ANPp6ltjRTeOmZhb5h uKgw+OyzDnk6q+tjGcRqrqwl291VKxCvnRiqHFfu3CERdmI9qvpN9IRcEJqIbkcN cakekhAyt7OO7sEPcql5vBL97e9hpb7EcH78gYxwHf8Dy0rFZUvSC5v+L6VRFnJS XcKA1L+f9B6u5qxnBtLan9IW08HYNdvmPq6AuVjk+ndKioPUFqB2q6AtXpuA3Rmu Y3XH/wh/q5wk0pgeByxQW6swsfpMN3OYK3mpLx475wFh2NKzcdGlwGhDFhiw8DKX m1AESy3UZatj1a0qGaFS/M+mm9KGrDYIMrje832Wf4Yf1LGmTsDkd3/V99oazSsq Jm4qhDASXChJXd0imQICX9hPw0aHTlLYNs54obUXVULH4HivQKIgWhUXrjG0dBDL +tttjuv5FJxr3A== =jPHa -----END PGP SIGNATURE----- Merge tag 'irq-drivers-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq driver updates from Thomas Gleixner: - Support for hard indices on RISC-V. The hart index identifies a hart (core) within a specific interrupt domain in RISC-V's Priviledged Architecture. - Rework of the RISC-V MSI driver This moves the driver over to the generic MSI library and solves the affinity problem of unmaskable PCI/MSI controllers. Unmaskable PCI/MSI controllers are prone to lose interrupts when the MSI message is updated to change the affinity because the message write consists of three 32-bit subsequent writes, which update address and data. As these writes are non-atomic versus the device raising an interrupt, the device can observe a half written update and issue an interrupt on the wrong vector. This is mitiated by a carefully orchestrated step by step update and the observation of an eventually pending interrupt on the CPU which issues the update. The algorithm follows the well established method of the X86 MSI driver. - A new driver for the RISC-V Sophgo SG2042 MSI controller - Overhaul of the Renesas RZQ2L driver Simplification of the probe function by using devm_() mechanisms, which avoid the endless list of error prone gotos in the failure paths. - Expand the Renesas RZV2H driver to support RZ/G3E SoCs - A workaround for Rockchip 3568002 erratum in the GIC-V3 driver to ensure that the addressing is limited to the lower 32-bit of the physical address space. - Add support for the Allwinner AS23 NMI controller - Expand the IMX irqsteer driver to handle up to 960 input interrupts - The usual small updates, cleanups and device tree changes * tag 'irq-drivers-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) irqchip/imx-irqsteer: Support up to 960 input interrupts irqchip/sunxi-nmi: Support Allwinner A523 NMI controller dt-bindings: irq: sun7i-nmi: Document the Allwinner A523 NMI controller irqchip/davinci-cp-intc: Remove public header irqchip/renesas-rzv2h: Add RZ/G3E support irqchip/renesas-rzv2h: Update macros ICU_TSSR_TSSEL_{MASK,PREP} irqchip/renesas-rzv2h: Update TSSR_TIEN macro irqchip/renesas-rzv2h: Add field_width to struct rzv2h_hw_info irqchip/renesas-rzv2h: Add max_tssel to struct rzv2h_hw_info irqchip/renesas-rzv2h: Add struct rzv2h_hw_info with t_offs variable irqchip/renesas-rzv2h: Use devm_pm_runtime_enable() irqchip/renesas-rzv2h: Use devm_reset_control_get_exclusive_deasserted() irqchip/renesas-rzv2h: Simplify rzv2h_icu_init() irqchip/renesas-rzv2h: Drop irqchip from struct rzv2h_icu_priv irqchip/renesas-rzv2h: Fix wrong variable usage in rzv2h_tint_set_type() dt-bindings: interrupt-controller: renesas,rzv2h-icu: Document RZ/G3E SoC riscv: sophgo: dts: Add msi controller for SG2042 irqchip: Add the Sophgo SG2042 MSI interrupt controller dt-bindings: interrupt-controller: Add Sophgo SG2042 MSI arm64: dts: rockchip: rk356x: Move PCIe MSI to use GIC ITS instead of MBI ...	2025-03-25 09:54:36 -07:00
David Woodhouse	3d66af75b0	x86/kexec: Debugging support: Dump registers on exception The actual serial output function is a no-op for now. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20250314173226.3062535-3-dwmw2@infradead.org	2025-03-25 12:49:05 +01:00
David Woodhouse	8df505af7f	x86/kexec: Debugging support: Load an IDT and basic exception entry points [ mingo: Minor readability edits ] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20250314173226.3062535-2-dwmw2@infradead.org	2025-03-25 12:49:05 +01:00
Ahmed S. Darwish	0dd09e215a	x86/cacheinfo: Apply maintainer-tip coding style fixes The x86/cacheinfo code has been heavily refactored and fleshed out at parent commits, where any necessary coding style fixes were also done in place. Apply Documentation/process/maintainer-tip.rst coding style fixes to the rest of the code, and align its assignment expressions for readability. Standardize on CPUID(n) when mentioning leaf queries. Avoid breaking long lines when doing so helps readability. At cacheinfo_amd_init_llc_id(), rename variable 'msb' to 'index_msb' as this is how it's called at the rest of cacheinfo.c code. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-30-darwi@linutronix.de	2025-03-25 10:23:37 +01:00
Ahmed S. Darwish	6c963c42fc	x86/cacheinfo: Introduce cpuid_amd_hygon_has_l3_cache() Multiple code paths at cacheinfo.c and amd_nb.c check for AMD/Hygon CPUs L3 cache presensce by directly checking leaf 0x80000006 EDX output. Extract that logic into its own function. While at it, rework the AMD/Hygon LLC topology ID caclculation comments for clarity. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-29-darwi@linutronix.de	2025-03-25 10:23:30 +01:00
Ahmed S. Darwish	eeeebc4fc6	x86/cacheinfo: Relocate CPUID leaf 0x4 cache_type mapping The cache_type_map[] array is used to map Intel leaf 0x4 cache_type values to their corresponding types at <linux/cacheinfo.h>. Move that array's definition after the actual CPUID leaf 0x4 structures, instead of having it in the middle of AMD leaf 0x4 emulation code. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-28-darwi@linutronix.de	2025-03-25 10:23:27 +01:00
Ahmed S. Darwish	05d48035e5	x86/cacheinfo: Extract out cache self-snoop checks The logic of not doing a cache flush if the CPU declares cache self snooping support is repeated across the x86/cacheinfo code. Extract it into its own function. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-27-darwi@linutronix.de	2025-03-25 10:23:24 +01:00
Ahmed S. Darwish	fda5f817ae	x86/cacheinfo: Extract out cache level topology ID calculation For Intel CPUID leaf 0x4 parsing, refactor the cache level topology ID calculation code into its own method instead of repeating the same logic twice for L2 and L3. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-26-darwi@linutronix.de	2025-03-25 10:23:21 +01:00
Ahmed S. Darwish	66122616e2	x86/cacheinfo: Separate Intel CPUID leaf 0x4 handling init_intel_cacheinfo() was overly complex. It parsed leaf 0x4 data, leaf 0x2 data, and performed post-processing, all within one function. Parent commit moved leaf 0x2 parsing and the post-processing logic into their own functions. Continue the refactoring by extracting leaf 0x4 parsing into its own function. Initialize local L2/L3 topology ID variables to BAD_APICID by default, thus ensuring they can be used unconditionally. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-25-darwi@linutronix.de	2025-03-25 10:23:18 +01:00
Ahmed S. Darwish	5adfd36758	x86/cacheinfo: Separate CPUID leaf 0x2 handling and post-processing logic The logic of init_intel_cacheinfo() is quite convoluted: it mixes leaf 0x4 parsing, leaf 0x2 parsing, plus some post-processing, in a single place. Begin simplifying its logic by extracting the leaf 0x2 parsing code, and the post-processing logic, into their own functions. While at it, rework the SMT LLC topology ID comment for clarity. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-24-darwi@linutronix.de	2025-03-25 10:23:15 +01:00
Ahmed S. Darwish	4772304ee6	x86/cpu: Use consolidated CPUID leaf 0x2 descriptor table CPUID leaf 0x2 output is a stream of one-byte descriptors, each implying certain details about the CPU's cache and TLB entries. At previous commits, the mapping tables for such descriptors were merged into one consolidated table. The mapping was also transformed into a hash lookup instead of a loop-based lookup for each descriptor. Use the new consolidated table and its hash-based lookup through the for_each_leaf_0x2_tlb_entry() accessor. Remove the TLB-specific mapping, intel_tlb_table[], as it is now no longer used. Remove the <cpuid/types.h> macro, for_each_leaf_0x2_desc(), since the converted code was its last user. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-23-darwi@linutronix.de	2025-03-25 10:23:12 +01:00
Ahmed S. Darwish	da23a62598	x86/cacheinfo: Use consolidated CPUID leaf 0x2 descriptor table CPUID leaf 0x2 output is a stream of one-byte descriptors, each implying certain details about the CPU's cache and TLB entries. At previous commits, the mapping tables for such descriptors were merged into one consolidated table. The mapping was also transformed into a hash lookup instead of a loop-based lookup for each descriptor. Use the new consolidated table and its hash-based lookup through the for_each_leaf_0x2_tlb_entry() accessor. Remove the old cache-specific mapping, cache_table[], as it is no longer used. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-22-darwi@linutronix.de	2025-03-25 10:23:08 +01:00
Thomas Gleixner	37aedb806b	x86/cpu: Consolidate CPUID leaf 0x2 tables CPUID leaf 0x2 describes TLBs and caches. So there are two tables with the respective descriptor constants in intel.c and cacheinfo.c. The tables occupy almost 600 byte and require a loop based lookup for each variant. Combining them into one table occupies exactly 1k rodata and allows to get rid of the loop based lookup by just using the descriptor byte provided by CPUID leaf 0x2 as index into the table, which simplifies the code and reduces text size. The conversion of the intel.c and cacheinfo.c code is done separately. [ darwi: Actually define struct leaf_0x2_table. Tab-align all of cpuid_0x2_table[] mapping entries. Define needed SZ_* macros at <linux/sizes.h> instead (merged commit.) Use CACHE_L1_{INST,DATA} as names for L1 cache descriptor types. Set descriptor 0x63 type as TLB_DATA_1G_2M_4M and explain why. Use enums for cache and TLB descriptor types (parent commits.) Start enum types at 1 since type 0 is reserved for unknown descriptors. Ensure that cache and TLB enum type values do not intersect. Add leaf 0x2 table accessor for_each_leaf_0x2_entry() + documentation. ] Co-developed-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-21-darwi@linutronix.de	2025-03-25 10:23:04 +01:00
Ahmed S. Darwish	543904cdfe	x86/cpu: Use enums for TLB descriptor types The leaf 0x2 one-byte TLB descriptor types: TLB_INST_4K TLB_INST_4M TLB_INST_2M_4M ... are just discriminators to be used within the intel_tlb_table[] mapping. Their specific values are irrelevant. Use enums for such types. Make the enum packed and static assert that its values remain within a single byte so that the intel_tlb_table[] size do not go out of hand. Use a __CHECKER__ guard for the static_assert(sizeof(enum) == 1) line as sparse ignores the __packed annotation on enums. This is similar to: `fe3944fb24` ("fs: Move enum rw_hint into a new header file") for the core SCSI code. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/Z9rsTirs9lLfEPD9@lx-t490 Link: https://lore.kernel.org/r/20250324133324.23458-20-darwi@linutronix.de	2025-03-25 10:23:00 +01:00
Ahmed S. Darwish	e1e6b57146	x86/cacheinfo: Use enums for cache descriptor types The leaf 0x2 one-byte cache descriptor types: CACHE_L1_INST CACHE_L1_DATA CACHE_L2 CACHE_L3 are just discriminators to be used within the cache_table[] mapping. Their specific values are irrelevant. Use enums for such types. Make the enum packed and static assert that its values remain within a single byte so that the cache_table[] array size do not go out of hand. Use a __CHECKER__ guard for the static_assert(sizeof(enum) == 1) line as sparse ignores the __packed annotation on enums. This is similar to: `fe3944fb24` ("fs: Move enum rw_hint into a new header file") for the core SCSI code. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/Z9rsTirs9lLfEPD9@lx-t490 Link: https://lore.kernel.org/r/20250324133324.23458-19-darwi@linutronix.de	2025-03-25 10:22:56 +01:00
Ahmed S. Darwish	7596ab7a10	x86/cacheinfo: Clarify type markers for CPUID leaf 0x2 cache descriptors CPUID leaf 0x2 output is a stream of one-byte descriptors, each implying certain details about the CPU's cache and TLB entries. Two separate tables exist for interpreting these descriptors: one for TLBs at intel.c and one for caches at cacheinfo.c. These mapping tables will be merged in further commits, among other improvements to their model. In preparation for this, use more descriptive type names for the leaf 0x2 descriptors associated with cpu caches. Namely: LVL_1_INST => CACHE_L1_INST LVL_1_DATA => CACHE_L1_DATA LVL_2 => CACHE_L2 LVL_3 => CACHE_L3 After the TLB and cache descriptors mapping tables are merged, this will make it clear that such descriptors correspond to cpu caches. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-18-darwi@linutronix.de	2025-03-25 10:22:52 +01:00
Ahmed S. Darwish	eb1c7c08c5	x86/cacheinfo: Rename 'struct _cpuid4_info_regs' to 'struct _cpuid4_info' Parent commits decoupled amd_northbridge from _cpuid4_info_regs, moved AMD L3 northbridge cache_disable_0/1 sysfs code to its own file, and splitted AMD vs. Intel leaf 0x4 handling into: amd_fill_cpuid4_info() intel_fill_cpuid4_info() fill_cpuid4_info() After doing all that, the "_cpuid4_info_regs" name becomes a mouthful. It is also not totally accurate, as the structure holds cpuid4 derived information like cache node ID and size -- not just regs. Rename struct _cpuid4_info_regs to _cpuid4_info. That new name also better matches the AMD/Intel leaf 0x4 functions mentioned above. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-17-darwi@linutronix.de	2025-03-25 10:22:49 +01:00
Ahmed S. Darwish	2d56cc8722	x86/cacheinfo: Separate Intel and AMD CPUID leaf 0x4 code paths The CPUID leaf 0x4 parsing code at cpuid4_cache_lookup_regs() is ugly and convoluted. It is tangled with multiple nested conditions to handle: * AMD with TOPEXT, or Hygon CPUs via leaf 0x8000001d * Legacy AMD fallback via leaf 0x4 emulation * Intel CPUs via the actual CPUID leaf 0x4 Moreover, AMD L3 northbridge initialization is also awkwardly placed alongside the CPUID calls of the first two scenarios above. Refactor all of that as follows: * Update AMD's leaf 0x4 emulation comment to represent current state * Clearly label the AMD leaf 0x4 emulation function as a fallback * Split AMD/Hygon and Intel code paths into separate functions * Move AMD L3 northbridge initialization out of CPUID leaf 0x4 code, and into populate_cache_leaves() where it belongs. There, ci_info_init() can directly store the initialized object in the private pointer of the <linux/cacheinfo.h> API. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-16-darwi@linutronix.de	2025-03-25 10:22:45 +01:00
Ahmed S. Darwish	071f4ad649	x86/cacheinfo: Use sysfs_emit() for sysfs attributes show() Per Documentation/filesystems/sysfs.rst, a sysfs attribute's show() method should only use sysfs_emit() or sysfs_emit_at() when returning values to user space. Use sysfs_emit() for the AMD L3 cache sysfs attributes cache_disable_0, cache_disable_1, and subcaches. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-15-darwi@linutronix.de	2025-03-25 10:22:43 +01:00
Ahmed S. Darwish	365e960d29	x86/cacheinfo: Move AMD cache_disable_0/1 handling to separate file Parent commit decoupled amd_northbridge out of _cpuid4_info_regs, where it was merely "parked" there until ci_info_init() can store it in the private pointer of the <linux/cacheinfo.h> API. Given that decoupling, move the AMD-specific L3 cache_disable_0/1 sysfs code from the generic (and already extremely convoluted) x86/cacheinfo code into its own file. Compile the file only if CONFIG_AMD_NB and CONFIG_SYSFS are both enabled, which mirrors the existing logic. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-14-darwi@linutronix.de	2025-03-25 10:22:39 +01:00
Ahmed S. Darwish	c58ed2d4da	x86/cacheinfo: Separate amd_northbridge from _cpuid4_info_regs 'struct _cpuid4_info_regs' is meant to hold the CPUID leaf 0x4 output registers (EAX, EBX, and ECX), as well as derived information such as the cache node ID and size. It also contains a reference to amd_northbridge, which is there only to be "parked" until ci_info_init() can store it in the priv pointer of the <linux/cacheinfo.h> API. That priv pointer is then used by AMD-specific L3 cache_disable_0/1 sysfs attributes. Decouple amd_northbridge from _cpuid4_info_regs and pass it explicitly through the functions at x86/cacheinfo. Doing so clarifies when amd_northbridge is actually needed (AMD-only code) and when it is not (Intel-specific code). It also prepares for moving the AMD-specific L3 cache_disable_0/1 sysfs code into its own file in next commit. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250324133324.23458-13-darwi@linutronix.de	2025-03-25 10:22:36 +01:00

... 3 4 5 6 7 ...

20797 Commits