linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-28 09:22:08 +00:00

Author	SHA1	Message	Date
Borislav Petkov (AMD)	d8010d4ba4	x86/bugs: Add a Transient Scheduler Attacks mitigation Add the required features detection glue to bugs.c et all in order to support the TSA mitigation. Co-developed-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>	2025-06-17 17:17:02 +02:00
Thomas Weißschuh	fb506e31b3	sysfs: treewide: switch back to attribute_group::bin_attrs The normal bin_attrs field can now handle const pointers. This makes the _new variant unnecessary. Switch all users back. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-4-724bfcf05b99@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-17 10:44:15 +02:00
Thomas Weißschuh	2fbe82037a	sysfs: treewide: switch back to bin_attribute::read()/write() The bin_attribute argument of bin_attribute::read() is now const. This makes the _new() callbacks unnecessary. Switch all users back. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-3-724bfcf05b99@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-17 10:44:13 +02:00
Qinyun Tan	594902c986	x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain structure representing a NUMA node relies on the cacheinfo interface (rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map) for monitoring. The L3 cache information of a SNC NUMA node determines which domains are summed for the "top level" L3-scoped events. rdt_mon_domain::ci is initialized using the first online CPU of a NUMA node. When this CPU goes offline, its shared_cpu_map is cleared to contain only the offline CPU itself. Subsequently, attempting to read counters via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning zero values for "top-level events" without any error indication. Replace the cacheinfo references in struct rdt_mon_domain and struct rmid_read with the cacheinfo ID (a unique identifier for the L3 cache). rdt_domain_hdr::cpu_mask contains the online CPUs associated with that domain. When reading "top-level events", select a CPU from rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine valid CPUs for reading RMID counter via the MSR interface. Considering all CPUs associated with the L3 cache improves the chances of picking a housekeeping CPU on which the counter reading work can be queued, avoiding an unnecessary IPI. Fixes: `328ea68874` ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files") Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250530182053.37502-2-qinyuntan@linux.alibaba.com	2025-06-16 21:06:12 +02:00
Linus Torvalds	9afe652958	* Further fixups for ITS mitigation * Avoid using large pages for kernel mappings when PSE is not enumerated * Avoid ever making indirect calls to TDX assembly helpers * Fix a FRED single step issue when not using an external debugger -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmhQWhsACgkQaDWVMHDJ krBOLg/8CQ4VVzSqaE5llP/nYkehQblmWM03UFHaOr8gzzPF4Pi+Aov9SpPu/X5E k0va6s4SuV2XVuWAyKkzJlPxjKBzp8gmOI0XgerEqKMTEihElmSkb89d5O5EhqqM OYNHe5dIhfDbESrbUep2HFvSWR9Q5Df1Gt8gMDhBzjxT7PlJF/U/sle2/G33Ydhj 9IJvAyh349sJTF9+N8nVUB9YYNE2L6ozf/o14lDF+SoBytLJWugYUVgAukwrQeII kjcfLYuUGPLeWZcczFA/mqKYWQ+MJ+2Qx8Rh2P00HQhIAdGGKyoIla2b4Lw9MfAk xYPuWWjvOyI0IiumdKfMfnKRgHpqC8IuCZSPlooNiB8Mk2Nd+0L3C0z96Ov08cwg nKgKQVS1E0FGq35S2VzS01pitasgxFsBBQft9fTIfL+EBtYNfm+TK3bAr6Ep9b4M RqcnE997AkFx2D4AWnBLY3lKMi2XcP6b0GPGSXSJAHQlzPZNquYXPNEzI8jOQ4IH E0F8f5P26XDvFCX5P7EfV9TjACSPYRxZtHtwN9OQWjFngbK8cMmP94XmORSFyFec AFBZ5ZcgLVfQmNzjKljTUvfZvpQCNIiC8oADAVlcfUsm2B45cnYwpvsuz8vswmdQ uDrDa87Mh20d8JRMLIMZ2rh7HrXaB2skdxQ9niS/uv0L8jKNq9o= =BEsb -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Dave Hansen: "This is a pretty scattered set of fixes. The majority of them are further fixups around the recent ITS mitigations. The rest don't really have a coherent story: - Some flavors of Xen PV guests don't support large pages, but the set_memory.c code assumes all CPUs support them. Avoid problems with a quick CPU feature check. - The TDX code has some wrappers to help retry calls to the TDX module. They use function pointers to assembly functions and the compiler usually generates direct CALLs. But some new compilers, plus -Os turned them in to indirect CALLs and the assembly code was not annotated for indirect calls. Force inlining of the helper to fix it up. - Last, a FRED issue showed up when single-stepping. It's fine when using an external debugger, but was getting stuck returning from a SIGTRAP handler otherwise. Clear the FRED 'swevent' bit to ensure that forward progress is made" * tag 'x86_urgent_for_6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "mm/execmem: Unify early execmem_cache behaviour" x86/its: explicitly manage permissions for ITS pages x86/its: move its_pages array to struct mod_arch_specific x86/Kconfig: only enable ROX cache in execmem when STRICT_MODULE_RWX is set x86/mm/pat: don't collapse pages without PSE set x86/virt/tdx: Avoid indirect calls to TDX assembly functions selftests/x86: Add a test to detect infinite SIGTRAP handler loop x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler	2025-06-16 11:36:21 -07:00
Borislav Petkov (AMD)	f9af88a3d3	x86/bugs: Rename MDS machinery to something more generic It will be used by other x86 mitigations. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>	2025-06-16 18:45:18 +02:00
Thomas Zimmermann	33b4e4fcd2	video: Make global edid_info depend on CONFIG_FIRMWARE_EDID Protect global edid_info behind CONFIG_FIRMWARE_EDID and remove the config tests for CONFIG_X86. Makes edid_info available iff its option has been enabled. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Helge Deller <deller@gmx.de> Link: https://lore.kernel.org/r/20250602075537.137759-3-tzimmermann@suse.de	2025-06-16 11:00:29 +02:00
Rafael J. Wysocki	dd3581853c	Merge branch 'pm-cpuidle' Merge cpuidle updates for 6.16-rc2: - Update data types of variables passed as arguments to mwait_idle_with_hints() to match the function definition after recent changes (Uros Bizjak). - Eliminate mwait_play_dead_cpuid_hint() again after reverting its elimination during the merge window due to a problem with handling "dead" SMT siblings, but this time prevent leaving them in C1 after initialization by taking them online and back offline when a proper cpuidle driver for the platform has been registered (Rafael Wysocki). * pm-cpuidle: intel_idle: Update arguments of mwait_idle_with_hints() Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()" ACPI: processor: Rescan "dead" SMT siblings during initialization intel_idle: Rescan "dead" SMT siblings during initialization x86/smp: PM/hibernate: Split arch_resume_nosmt() intel_idle: Use subsys_initcall_sync() for initialization	2025-06-13 21:28:07 +02:00
Peter Zijlstra (Intel)	a82b26451d	x86/its: explicitly manage permissions for ITS pages execmem_alloc() sets permissions differently depending on the kernel configuration, CPU support for PSE and whether a page is allocated before or after mark_rodata_ro(). Add tracking for pages allocated for ITS when patching the core kernel and make sure the permissions for ITS pages are explicitly managed for both kernel and module allocations. Fixes: `872df34d7c` ("x86/its: Use dynamic thunks for indirect branches") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250603111446.2609381-5-rppt@kernel.org	2025-06-11 11:20:52 +02:00
Mike Rapoport (Microsoft)	0b0cae7119	x86/its: move its_pages array to struct mod_arch_specific The of pages with ITS thunks allocated for modules are tracked by an array in 'struct module'. Since this is very architecture specific data structure, move it to 'struct mod_arch_specific'. No functional changes. Fixes: `872df34d7c` ("x86/its: Use dynamic thunks for indirect branches") Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250603111446.2609381-4-rppt@kernel.org	2025-06-11 11:20:51 +02:00
Xin Li (Intel)	e34dbbc85d	x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler Clear the software event flag in the augmented SS to prevent immediate repeat of single step trap on return from SIGTRAP handler if the trap flag (TF) is set without an external debugger attached. Following is a typical single-stepping flow for a user process: 1) The user process is prepared for single-stepping by setting RFLAGS.TF = 1. 2) When any instruction in user space completes, a #DB is triggered. 3) The kernel handles the #DB and returns to user space, invoking the SIGTRAP handler with RFLAGS.TF = 0. 4) After the SIGTRAP handler finishes, the user process performs a sigreturn syscall, restoring the original state, including RFLAGS.TF = 1. 5) Goto step 2. According to the FRED specification: A) Bit 17 in the augmented SS is designated as the software event flag, which is set to 1 for FRED event delivery of SYSCALL, SYSENTER, or INT n. B) If bit 17 of the augmented SS is 1 and ERETU would result in RFLAGS.TF = 1, a single-step trap will be pending upon completion of ERETU. In step 4) above, the software event flag is set upon the sigreturn syscall, and its corresponding ERETU would restore RFLAGS.TF = 1. This combination causes a pending single-step trap upon completion of ERETU. Therefore, another #DB is triggered before any user space instruction is executed, which leads to an infinite loop in which the SIGTRAP handler keeps being invoked on the same user space IP. Fixes: `14619d912b` ("x86/fred: FRED entry/exit and dispatch code") Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250609084054.2083189-2-xin%40zytor.com	2025-06-09 08:50:58 -07:00
Linus Torvalds	0529ef8c36	A small set of x86 fixes: - Cure IO bitmap inconsistencies A failed fork cleans up all resources of the newly created thread via exit_thread(). exit_thread() invokes io_bitmap_exit() which does the IO bitmap cleanups, which unfortunately assume that the cleanup is related to the current task, which is obviously bogus. Make it work correctly - A lockdep fix in the resctrl code removed the clearing of the command buffer in two places, which keeps stale error messages around. Bring them back. - Remove unused trace events -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmhFO0MTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoZYqEADAB+3WsgT9WZlGfksOgcL05uD0sGKP mrGkrVuyGLQaMK+Vv7F3Vll0khe3nMJ1DzOOdmejerzEi2yb7CaZHIT1esujVHAk kieR/D7t5zUvPislBSUxgsmNjqwoCDAwe5vKTSM0p20T7CWkenAzCcA4cUFQfGWo 2TqJWLDgdlvOk6qnaUTm0tzGjeYD9PegoUEyz5wgrf+YjsiKIWJLRbDMb3tXsOpy La2vh2APV6Tt/PMalQiedQdi3VAf5yD+3HpvlXJ6b1BhJy3iPO6iMQPp1kwsKFjm dzFT0IqM7lOERjMbhhnntLWe9z0vyezwIEj6dPlCf6MG8lTJ3io0bWJ2LwQEzt3q RkFJO7wc7QqJoIGsZARGREA99Zkh6/qTh8QovbjpHWoQBrMIZ9mxw9gJ6QR/vjNa dVhIEi4mOBNw+bq4BmYExC8j60qpGBrEi+lJrGWZoEBWHqhUY3I38DU1oH97mn/s qL1iZo1TOGyxGuHIN79fKWduEpY+tyvhLxB2zW5m9+B26XRXZBec1lZ+qfIpPo/f ZuWIbNYsKKPgpHZLSb21RJ/ost+sTr4iEb3XuguP+YAxfjOVJR7CAhsfWRTDNhPS wGiyPAGmdbQrx/2Wn4vZnznOm/OH4aQX5cxbNR8TPknHJzYRmIY3VTDjDr8Xqxx+ R4/IRQOmcJ0Q/Q== =x7j0 -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - Cure IO bitmap inconsistencies A failed fork cleans up all resources of the newly created thread via exit_thread(). exit_thread() invokes io_bitmap_exit() which does the IO bitmap cleanups, which unfortunately assume that the cleanup is related to the current task, which is obviously bogus. Make it work correctly - A lockdep fix in the resctrl code removed the clearing of the command buffer in two places, which keeps stale error messages around. Bring them back. - Remove unused trace events" * tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex x86/iopl: Cure TIF_IO_BITMAP inconsistencies x86/fpu: Remove unused trace events	2025-06-08 11:27:20 -07:00
Linus Torvalds	8630c59e99	Kbuild updates for v6.16 - Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which exports a symbol only to specified modules - Improve ABI handling in gendwarfksyms - Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n - Add checkers for redundant or missing <linux/export.h> inclusion - Deprecate the extra-y syntax - Fix a genksyms bug when including enum constants from .symref files -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmhEZc4VHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGVAgQAKLRdBGga1kBJJFIkUOHWC5+g/je U/dO5rGnuOLviWDexC6QT8AQV2N+dQXhB11x+KacSu1bwowsEvwuegtA6VqwbETs tyWmB0PftEzVyPfc+Rjfy0LDfKkiKkm4RhXiMwcem/rlw45gvJXrVU7jJin9fI3A So8glpOAX+mEizUHkjZkS51nkYCZFDsn7hVo0X43vqjeFrrFGLEQ5xas4Ci+dkY3 9g8Q5bFL8CC5PHjSO8wFftCcAWwTukAht6CSSb522MKGnCVZ9RxTmRwEPXrBmXtS 5eWa8yg6y0tFVmot8iwZGBYleAWDNsj0a2j2oVjUN+EF91sk3WQApJVNBok/nQFb 4MgO3N3UXZdy4tYkBX8tMgOcGkfjZAFoNxSUm5oVouh9NyT0dpqYHhJHBNVbVJoF igQWeVOYcioDjeU1iXnP2cw64q44ROfxmOpDxOSRz9PTM6CCya1R0m/zzBLV6Lwk rzlXk1LLf+jIfgmS5RLlkCgrXS1U0vNGXxQH9Ui9dZSEtzdU7qt5WQ/Rz44bEBhS OeIlJfMMx6QYJztJc/BaUjkKsutTkII52QctRbRCj/nKswHd8SnHV+xk1c2WPxrg yKq10rPpdg1BcvmODY6cmcndt7ogDRfkogm2gvGQIBZEglRimpmpg51sZQRD0ueE 0rt12TmktsLbglB4 =Dy49 -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which exports a symbol only to specified modules - Improve ABI handling in gendwarfksyms - Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n - Add checkers for redundant or missing <linux/export.h> inclusion - Deprecate the extra-y syntax - Fix a genksyms bug when including enum constants from .symref files * tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits) genksyms: Fix enum consts from a reference affecting new values arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES} efi/libstub: use 'targets' instead of extra-y in Makefile module: make __mod_device_table__* symbols static scripts/misc-check: check unnecessary #include <linux/export.h> when W=1 scripts/misc-check: check missing #include <linux/export.h> when W=1 scripts/misc-check: add double-quotes to satisfy shellcheck kbuild: move W=1 check for scripts/misc-check to top-level Makefile scripts/tags.sh: allow to use alternative ctags implementation kconfig: introduce menu type enum docs: symbol-namespaces: fix reST warning with literal block kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION docs/core-api/symbol-namespaces: drop table of contents and section numbering modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time kbuild: move kbuild syntax processing to scripts/Makefile.build Makefile: remove dependency on archscripts for header installation Documentation/kbuild: Add new gendwarfksyms kABI rules Documentation/kbuild: Drop section numbers ...	2025-06-07 10:05:35 -07:00
Rafael J. Wysocki	a18d098f2a	Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()" Revert commit `70523f3357` ("Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"") to reapply the changes from commit `96040f7273` ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()") reverted by it. Previously, these changes caused idle power to rise on systems booting with "nosmt" in the kernel command line because they effectively caused "dead" SMT siblings to remain in idle state C1 after executing the HLT instruction, which prevented the processor from reaching package idle states deeper than PC2 going forward. Now, the "dead" SMT siblings are rescanned after initializing a proper cpuidle driver for the processor (either intel_idle or ACPI idle), at which point they are able to enter a sufficiently deep idle state in native_play_dead() via cpuidle, so the code changes in question can be reapplied. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Link: https://patch.msgid.link/7813065.EvYhyI6sBW@rjwysocki.net	2025-06-07 14:23:22 +02:00
Rafael J. Wysocki	4c529a4a72	x86/smp: PM/hibernate: Split arch_resume_nosmt() Move the inner part of the arch_resume_nosmt() code into a separate function called arch_cpu_rescan_dead_smt_siblings(), so it can be used in other places where "dead" SMT siblings may need to be taken online and offline again in order to get into deep idle states. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Link: https://patch.msgid.link/3361688.44csPzL39Z@rjwysocki.net [ rjw: Prevent build issues with CONFIG_SMP unset ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-07 14:22:56 +02:00
Masahiro Yamada	e21efe833e	arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds The extra-y syntax is deprecated. Instead, use always-$(KBUILD_BUILTIN), which behaves equivalently. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Nicolas Schier <n.schier@avm.de>	2025-06-07 14:38:07 +09:00
Linus Torvalds	c00b285024	hyperv-next for v6.16 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmg+jmETHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXiWaCACjYSQcCXW2nnZuWUnGMJq8HD5XGBAH tNYzOyp2Y4bXEJzfmbHv8UpJynGr3IFKybCnhm0uAQZCmiR5k4CfMvjPQXcJu9LK 7yUI/dTGrRGG7f3NClWK2vXg7ATqzRGiPuPDk2lDcP04aQQWaUMDYe5SXIgcqKyZ cm2OVHapHGbQ7wA+xXGQcUBb6VJ5+BrQUVOqaEQyl4LURvjaQcn7rVDS0SmEi8gq 42+KDVd8uWYos5dT57HIq9UI5og3PeTvAvHsx26eX8JWNqwXLgvxRH83kstK+GWY uG3sOm5yRbJvErLpJHnyBOlXDFNw2EBeLC1VyhdJXBR8RabgI+H/mrY3 =4bTC -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel) - Fixes for Hyper-V UIO driver (Long Li) - Fixes for Hyper-V PCI driver (Michael Kelley) - Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley) - Documentation updates for Hyper-V VMBus (Michael Kelley) - Enhance logging for hv_kvp_daemon (Shradha Gupta) * tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits) Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir Documentation: hyperv: Update VMBus doc with new features and info PCI: hv: Remove unnecessary flex array in struct pci_packet Drivers: hv: Remove hv_alloc/free_* helpers Drivers: hv: Use kzalloc for panic page allocation uio_hv_generic: Align ring size to system page uio_hv_generic: Use correct size for interrupt and monitor pages Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary arch/x86: Provide the CPU number in the wakeup AP callback x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap() PCI: hv: Get vPCI MSI IRQ domain from DeviceTree ACPI: irq: Introduce acpi_get_gsi_dispatcher() Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device() Drivers: hv: vmbus: Get the IRQ number from DeviceTree dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties arm64, x86: hyperv: Report the VTL the system boots in arm64: hyperv: Initialize the Virtual Trust Level field Drivers: hv: Provide arch-neutral implementation of get_vtl() Drivers: hv: Enable VTL mode for arm64 ...	2025-06-03 08:39:20 -07:00
Thomas Gleixner	8b68e97871	x86/iopl: Cure TIF_IO_BITMAP inconsistencies io_bitmap_exit() is invoked from exit_thread() when a task exists or when a fork fails. In the latter case the exit_thread() cleans up resources which were allocated during fork(). io_bitmap_exit() invokes task_update_io_bitmap(), which in turn ends up in tss_update_io_bitmap(). tss_update_io_bitmap() operates on the current task. If current has TIF_IO_BITMAP set, but no bitmap installed, tss_update_io_bitmap() crashes with a NULL pointer dereference. There are two issues, which lead to that problem: 1) io_bitmap_exit() should not invoke task_update_io_bitmap() when the task, which is cleaned up, is not the current task. That's a clear indicator for a cleanup after a failed fork(). 2) A task should not have TIF_IO_BITMAP set and neither a bitmap installed nor IOPL emulation level 3 activated. This happens when a kernel thread is created in the context of a user space thread, which has TIF_IO_BITMAP set as the thread flags are copied and the IO bitmap pointer is cleared. Other than in the failed fork() case this has no impact because kernel threads including IO workers never return to user space and therefore never invoke tss_update_io_bitmap(). Cure this by adding the missing cleanups and checks: 1) Prevent io_bitmap_exit() to invoke task_update_io_bitmap() if the to be cleaned up task is not the current task. 2) Clear TIF_IO_BITMAP in copy_thread() unconditionally. For user space forks it is set later, when the IO bitmap is inherited in io_bitmap_share(). For paranoia sake, add a warning into tss_update_io_bitmap() to catch the case, when that code is invoked with inconsistent state. Fixes: `ea5f1cd7ab` ("x86/ioperm: Remove bitmap if all permissions dropped") Reported-by: syzbot+e2b1803445d236442e54@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/87wmdceom2.ffs@tglx	2025-06-03 15:56:39 +02:00
Linus Torvalds	7f9039c524	Generic: * Clean up locking of all vCPUs for a VM by using the _nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra. Add MGLRU support to the access tracking perf test. ARM fixes: * Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change. * Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated. * Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum. * Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping. * Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet. s390: * Fix interaction between some filesystems and Secure Execution * Some cleanups and refactorings, preparing for an upcoming big series x86: * Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN. * Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM. * Refine and harden handling of spurious faults. * Add support for ALLOWED_SEV_FEATURES. * Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y. * Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits. * Don't account temporary allocations in sev_send_update_data(). * Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold. * Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX. * Advertise support to userspace for WRMSRNS and PREFETCHI. * Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing. * Add a module param to control and enumerate support for device posted interrupts. * Fix a potential overflow with nested virt on Intel systems running 32-bit kernels. * Flush shadow VMCSes on emergency reboot. * Add support for SNP to the various SEV selftests. * Add a selftest to verify fastops instructions via forced emulation. * Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg9TjwUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOUxQf7B7nnWqIKd7jSkGzSD6YsSX9TXktr 2tJIOfWM3zNYg5GRCidg+m4Y5+DqQWd3Hi5hH2P9wUw7RNuOjOFsDe+y0VBr8ysE ve39t/yp+mYalNmHVFl8s3dBDgrIeGKiz+Wgw3zCQIBZ18rJE1dREhv37RlYZ3a2 wSvuObe8sVpCTyKIowDs1xUi7qJUBoopMSuqfleSHawRrcgCpV99U8/KNFF5plLH 7fXOBAHHniVCVc+mqQN2wxtVJDhST+U3TaU4GwlKy9Yevr+iibdOXffveeIgNEU4 D6q1F2zKp6UdV3+p8hxyaTTbiCVDqsp9WOgY/0I/f+CddYn0WVZgOlR+ow== =mYFL -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more kvm updates from Paolo Bonzini: Generic: - Clean up locking of all vCPUs for a VM by using the _nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra - Add MGLRU support to the access tracking perf test ARM fixes: - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet s390: - Fix interaction between some filesystems and Secure Execution - Some cleanups and refactorings, preparing for an upcoming big series x86: - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM - Refine and harden handling of spurious faults - Add support for ALLOWED_SEV_FEATURES - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits - Don't account temporary allocations in sev_send_update_data() - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX - Advertise support to userspace for WRMSRNS and PREFETCHI - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing - Add a module param to control and enumerate support for device posted interrupts - Fix a potential overflow with nested virt on Intel systems running 32-bit kernels - Flush shadow VMCSes on emergency reboot - Add support for SNP to the various SEV selftests - Add a selftest to verify fastops instructions via forced emulation - Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler" tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) rtmutex_api: provide correct extern functions KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race KVM: arm64: Unmap vLPIs affected by changes to GSI routing information KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock KVM: arm64: Use lock guard in vgic_v4_set_forwarding() KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue KVM: s390: Simplify and move pv code KVM: s390: Refactor and split some gmap helpers KVM: s390: Remove unneeded srcu lock s390: Remove unneeded includes s390/uv: Improve splitting of large folios that cannot be split while dirty s390/uv: Always return 0 from s390_wiggle_split_folio() if successful s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful rust: add helper for mutex_trylock RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation ...	2025-06-02 12:24:58 -07:00
Linus Torvalds	7d4e49a77d	- The 3 patch series "hung_task: extend blocking task stacktrace dump to semaphore" from Lance Yang enhances the hung task detector. The detector presently dumps the blocking tasks's stack when it is blocked on a mutex. Lance's series extends this to semaphores. - The 2 patch series "nilfs2: improve sanity checks in dirty state propagation" from Wentao Liang addresses a couple of minor flaws in nilfs2. - The 2 patch series "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn fixes a couple of issues in the gdb scripts. - The 9 patch series "Support kdump with LUKS encryption by reusing LUKS volume keys" from Coiby Xu addresses a usability problem with kdump. When the dump device is LUKS-encrypted, the kdump kernel may not have the keys to the encrypted filesystem. A full writeup of this is in the series [0/N] cover letter. - The 2 patch series "sysfs: add counters for lockups and stalls" from Max Kellermann adds /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and /sys/kernel/rcu_stall_count. - The 3 patch series "fork: Page operation cleanups in the fork code" from Pasha Tatashin implements a number of code cleanups in fork.c. - The 3 patch series "scripts/gdb/symbols: determine KASLR offset on s390 during early boot" from Ilya Leoshkevich fixes some s390 issues in the gdb scripts. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDuCvQAKCRDdBJ7gKXxA jrkxAQCnFAp/uK9ckkbN4nfpJ0+OMY36C+A+dawSDtuRsIkXBAEAq3e6MNAUdg5W Ca0cXdgSIq1Op7ZKEA+66Km6Rfvfow8= =g45L -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "hung_task: extend blocking task stacktrace dump to semaphore" from Lance Yang enhances the hung task detector. The detector presently dumps the blocking tasks's stack when it is blocked on a mutex. Lance's series extends this to semaphores - "nilfs2: improve sanity checks in dirty state propagation" from Wentao Liang addresses a couple of minor flaws in nilfs2 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn fixes a couple of issues in the gdb scripts - "Support kdump with LUKS encryption by reusing LUKS volume keys" from Coiby Xu addresses a usability problem with kdump. When the dump device is LUKS-encrypted, the kdump kernel may not have the keys to the encrypted filesystem. A full writeup of this is in the series [0/N] cover letter - "sysfs: add counters for lockups and stalls" from Max Kellermann adds /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and /sys/kernel/rcu_stall_count - "fork: Page operation cleanups in the fork code" from Pasha Tatashin implements a number of code cleanups in fork.c - "scripts/gdb/symbols: determine KASLR offset on s390 during early boot" from Ilya Leoshkevich fixes some s390 issues in the gdb scripts * tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits) llist: make llist_add_batch() a static inline delayacct: remove redundant code and adjust indentation squashfs: add optional full compressed block caching crash_dump, nvme: select CONFIGFS_FS as built-in scripts/gdb/symbols: determine KASLR offset on s390 during early boot scripts/gdb/symbols: factor out pagination_off() scripts/gdb/symbols: factor out get_vmlinux() kernel/panic.c: format kernel-doc comments mailmap: update and consolidate Casey Connolly's name and email nilfs2: remove wbc->for_reclaim handling fork: define a local GFP_VMAP_STACK fork: check charging success before zeroing stack fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code fork: clean-up ifdef logic around stack allocation kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count x86/crash: make the page that stores the dm crypt keys inaccessible x86/crash: pass dm crypt keys to kdump kernel Revert "x86/mm: Remove unused __set_memory_prot()" crash_dump: retrieve dm crypt keys in kdump kernel ...	2025-05-31 19:12:53 -07:00
Linus Torvalds	00c010e130	- The 11 patch series "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - The 8 patch series "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - The 3 patch series "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - The 2 patch series "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - The 8 patch series "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - The 6 patch series "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - The 3 patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - The 2 patch series "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - The 3 patch series "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - The 3 patch series "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - The 12 patch series "Always call constructor for kernel page tables" from Kevin Brodsky "ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables". This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - The 9 patch series "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - The 3 patch series "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - The 4 patch series "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - The 6 patch series "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - The 3 patch series "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - The 3 patch series ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - The 7 patch series "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - The 5 patch series "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - The 2 patch series "vmscan: enforce mems_effective during demotion" from Gregory Price "changes reclaim to respect cpuset.mems_effective during demotion when possible". because "presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated." "This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently." - The 2 patch series ""Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - The 3 patch series "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - The 4 patch series "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - The 17 patch series "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - The 7 patch series "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - The 2 patch series "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - The 2 patch series "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - The 4 patch series "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - The 6 patch series "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - The 2 patch series "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - The 8 patch series "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - The 3 patch series "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - The 4 patch series "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - The 6 patch series "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is "yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents." - The 7 patch series "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - The 4 patch series "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDt5qgAKCRDdBJ7gKXxA ju6XAP9nTiSfRz8Cz1n5LJZpFKEGzLpSihCYyR6P3o1L9oe3mwEAlZ5+XAwk2I5x Qqb/UGMEpilyre1PayQqOnct3aSL9Ao= =tYYm -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...	2025-05-31 15:44:16 -07:00
Linus Torvalds	976aa630da	More power management updates for 6.16-rc1 - Revert an x86 commit that went into 6.15 and caused idle power, including power in suspend-to-idle, to rise rather dramatically on systems booting with "nosmt" in the kernel command line (Rafael Wysocki). - Prevent freeing an uninitialized pointer in error path of dt_idle_state_present() in the PSCI cpuidle driver (Dan Carpenter). - Use KHz as the nominal_freq units in get_max_boost_ratio() in the ACPI cpufreq driver (iGautham Shenoy). - Add Rust abstractions for CPUFreq framework (Viresh Kumar). - Add Rust abstractions for OPP framework (Viresh Kumar). - Add basic Rust abstractions for Clk and Cpumask frameworks (Viresh Kumar). - Clean up the SCMI cpufreq driver somewhat (Mike Tipton). -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmg5+xcSHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO16TEH/AgeHkVPfEDWDsaPDSHmYPyITQKu6NOD udFxbh93TTcgEOK3iradZLdZvXDpqiHqwMxoiCadETuJMvMHdUpJJ/sPG5mju9AY lRd7ajhHNQOV4I+BIwP7CUVK3CSpnBBlBDHk/SuEbviSL+oJifeZdRvk0GTzfkz1 fbr51qAS2TfAcxI1Y+KnFbrUW6R0lC38kf7ZlMbSt5ZcWFWlLxuzrxaqeriObs7Z jNQCypbOi/btbVkPfC+0m+qc6PVmxV22naBHWV/rqI3y5Xg6UPMTlquxO1C/K51J p3k37pvWSwVXF4AbgsRz074QXsrugfvgbsJArq7zk180XwTj5aiY4sY= =ADMT -----END PGP SIGNATURE----- Merge tag 'pm-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These revert an x86 commit that introduced a nasty power regression on some systems, fix PSCI cpuidle driver and ACPI cpufreq driver regressions, add Rust abstractions for cpufreq, OPP, clk, and cpumasks, add a Rust-based cpufreq-dt driver, and do a minor SCMI cpufreq driver cleanup: - Revert an x86 commit that went into 6.15 and caused idle power, including power in suspend-to-idle, to rise rather dramatically on systems booting with "nosmt" in the kernel command line (Rafael Wysocki) - Prevent freeing an uninitialized pointer in error path of dt_idle_state_present() in the PSCI cpuidle driver (Dan Carpenter) - Use KHz as the nominal_freq units in get_max_boost_ratio() in the ACPI cpufreq driver (iGautham Shenoy) - Add Rust abstractions for CPUFreq framework (Viresh Kumar) - Add Rust abstractions for OPP framework (Viresh Kumar) - Add basic Rust abstractions for Clk and Cpumask frameworks (Viresh Kumar) - Clean up the SCMI cpufreq driver somewhat (Mike Tipton)" * tag 'pm-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (21 commits) Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()" acpi-cpufreq: Fix nominal_freq units to KHz in get_max_boost_ratio() rust: opp: Move `cfg(CONFIG_OF)` attribute to the top of doc test cpuidle: psci: Fix uninitialized variable in dt_idle_state_present() rust: opp: Make the doctest example depend on CONFIG_OF cpufreq: scmi: Skip SCMI devices that aren't used by the CPUs cpufreq: Add Rust-based cpufreq-dt driver rust: opp: Extend OPP abstractions with cpufreq support rust: cpufreq: Extend abstractions for driver registration rust: cpufreq: Extend abstractions for policy and driver ops rust: cpufreq: Add initial abstractions for cpufreq framework rust: opp: Add abstractions for the configuration options rust: opp: Add abstractions for the OPP table rust: opp: Add initial abstractions for OPP framework rust: cpu: Add from_cpu() rust: macros: enable use of hyphens in module names rust: clk: Add initial abstractions rust: clk: Add helpers for Rust code MAINTAINERS: Add entry for Rust cpumask API rust: cpumask: Add initial abstractions ...	2025-05-30 11:54:29 -07:00
Rafael J. Wysocki	3d031d0d8d	Merge branch 'pm-cpuidle' Fix an issue in the PSCI cpuidle driver introduced recently and a nasty x86 power regression introduced in 6.15: - Prevent freeing an uninitialized pointer in error path of dt_idle_state_present() in the PSCI cpuidle driver (Dan Carpenter). - Revert an x86 commit that went into 6.15 and caused idle power, including power in suspend-to-idle, to rise rather dramatically on systems booting with "nosmt" in the kernel command line (Rafael Wysocki). * pm-cpuidle: Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()" cpuidle: psci: Fix uninitialized variable in dt_idle_state_present()	2025-05-30 20:21:36 +02:00
Linus Torvalds	bbd9c366bf	* Make SGX less likely to induce fatal machine checks * Use much more compact SHA-256 library API -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmg4oWMACgkQaDWVMHDJ krDmaQ/9ESbv6zhDZJDwBk2mO9fWKWsHVPjDSa9JdTZvfh/X4XDVc0cLbXI02D7H 7yd0eouresljayhybsoPAbpWydepDXXP7bGfDQlC5zsXuPs7+I2gYRUHyTvu316Z 7dQjTJ/QvlqEHGVa0SPt5cBj4pdCwd41uo/kFiEVI3a6EpbsgHZKPk83xchdXzE0 Egy/evnDq1t1Fnc2Aq3/r87pHqSSCv5AHT8LYbQvW1mIURcp1Ik6FvmDdSPV9jhd QOTBjFHqh8Mmteqtxfl1/Uq0sa05dYvbiBHvawbC7spYe0VNhfpAfSULOBAHA5Mg scw+MoARj6LcDV0pOXKb36RI7UME6B8/uV0MVYEepRRwFfXnK/LlmAEYmh8XQg55 IxsRHsj6fvnEVruuoeJDOKhR0wLMwIogmkPthfqj6hokDdipme2FMxZOwuLqtvwo bVB4Xrgjlfsab+t54bQFfYIbiVM/1sKfwEFRF1FbW5leLGHQhyzJ5oT6LKdqey5z 6rZpWRATQuwLxwjfK6WeiY+p+k8dAHh/ngg5uXcXkD2xlKDnvlR+L1/cSixyyoaf peTCgXTZs21rOY4WMPx+SzwHWlMrOK7Umd3m3QwHzdIy7aWWtqlUBR3PoKq/7c1o 6VZRMiVIUscJy+m6fap4ZyWgatvIRGoSkQMBCDoZcAZ4f/9QxRA= =h5H1 -----END PGP SIGNATURE----- Merge tag 'x86_sgx_for_6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel software guard extension (SGX) updates from Dave Hansen: "A couple of x86/sgx changes. The first one is a no-brainer to use the (simple) SHA-256 library. For the second one, some folks doing testing noticed that SGX systems under memory pressure were inducing fatal machine checks at pretty unnerving rates, despite the SGX code having _some_ awareness of memory poison. It turns out that the SGX reclaim path was not checking for poison _and_ it always accesses memory to copy it around. Make sure that poisoned pages are not reclaimed" * tag 'x86_sgx_for_6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Prevent attempts to reclaim poisoned pages x86/sgx: Use SHA-256 library API instead of crypto_shash API	2025-05-29 21:13:17 -07:00
Rafael J. Wysocki	70523f3357	Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()" Revert commit `96040f7273` ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()") because it introduced a significant power regression on systems that start with "nosmt" in the kernel command line. Namely, on such systems, SMT siblings permanently go offline early, when cpuidle has not been initialized yet, so after the above commit, hlt_play_dead() is called for them. Later on, when the processor attempts to enter a deep package C-state, including PC10 which is requisite for reaching minimum power in suspend-to-idle, it is not able to do that because of the SMT siblings staying in C1 (which they have been put into by HLT). As a result, the idle power (including power in suspend-to-idle) rises quite dramatically on those systems with all of the possible consequences, which (needless to say) may not be expected by their users. This issue is hard to debug and potentially dangerous, so it needs to be addressed as soon as possible in a way that will work for 6.15.y, hence the revert. Of course, after this revert, the issue that commit `96040f7273` attempted to address will be back and it will need to be fixed again later. Fixes: `96040f7273` ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()") Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com> Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: 6.15+ <stable@vger.kernel.org> # 6.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/12674167.O9o76ZdvQC@rjwysocki.net	2025-05-29 17:34:18 +02:00
Linus Torvalds	43db111107	ARM: * Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. * Enable nested virtualisation support on systems that support it, though it is disabled by default. * Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. * Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. * Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. * New selftest checking the pKVM ownership transition rules * Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. * Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. * Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. * Add a new selftest for the SVE host state being corrupted by a guest. * Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. * Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. * Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. * and the usual random cleanups. LoongArch: * Don't flush tlb if the host supports hardware page table walks. * Add KVM selftests support. RISC-V: * Add vector registers to get-reg-list selftest * VCPU reset related improvements * Remove scounteren initialization from VCPU reset * Support VCPU reset from userspace using set_mpstate() ioctl x86: * Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit `7bcf7246c4` ("Merge branch 'kvm-tdx-finish-initial' into HEAD"). -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg02hwUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNnkwf/db4xeWKSMseCIvBVR+ObDn3LXhwT hAgmTkDkP1zq9RfbfJSbUA1DXRwfP+f1sWySLMWECkFEQW9fGIJF9fOQRDSXKmhX 158U3+FEt+3jxLRCGFd4zyXAqyY3C8JSkPUyJZxCpUbXtB5tdDNac4rZAXKDULwe sUi0OW/kFDM2yt369pBGQAGdN+75/oOrYISGOSvMXHxjccNqvveX8MUhpBjYIuuj 73iBWmsfv3vCtam56Racz3C3v44ie498PmWFtnB0R+CVfWfrnUAaRiGWx+egLiBW dBPDiZywMn++prmphEUFgaStDTQy23JBLJ8+RvHkp+o5GaTISKJB3nedZQ== =adZU -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "As far as x86 goes this pull request "only" includes TDX host support. Quotes are appropriate because (at 6k lines and 100+ commits) it is much bigger than the rest, which will come later this week and consists mostly of bugfixes and selftests. s390 changes will also come in the second batch. ARM: - Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. - Enable nested virtualisation support on systems that support it, though it is disabled by default. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. LoongArch: - Don't flush tlb if the host supports hardware page table walks. - Add KVM selftests support. RISC-V: - Add vector registers to get-reg-list selftest - VCPU reset related improvements - Remove scounteren initialization from VCPU reset - Support VCPU reset from userspace using set_mpstate() ioctl x86: - Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit `7bcf7246c4` ('Merge branch 'kvm-tdx-finish-initial' into HEAD')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits) x86/tdx: mark tdh_vp_enter() as __flatten Documentation: virt/kvm: remove unreferenced footnote RISC-V: KVM: lock the correct mp_state during reset KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET RISC-V: KVM: Remove scounteren initialization KVM: RISC-V: remove unnecessary SBI reset state ...	2025-05-29 08:10:01 -07:00
Linus Torvalds	350a604221	A single change to verify the presence of fixed MTRR ranges before accessing the respective MSRs. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmg0rc4ACgkQEsHwGGHe VUpGwg/7BM6DUWooaDavYaRNKhv5WH06xfBo8VXTdqf+5sTUO6sS+v7wJw51qC5y wtevkILpI8sfPBfnmyhmUlceng4imhatEnFYN8YFc08srkh9jobQIl5KWi5IMRe1 rLjW19AaliOqQTsN4ksEOqbaEshaD5WweD8rX2NTFVp7R87CXqfouEh/qRWDJ9Ec fYJMf34MirGSDojlii/DkHl4Xz5ScJ0BKlctsvdYjMC/F63uSkTLsmxn4RvV9PmH sn59agBb5eV8N1qzt1Xt235B2Qs1Co36TP7kc/EtX0mt+2fGLxLIvYl6tPITdBI/ 4tJ1w+zd1bGrex+OtNRMmftQy4el6Akffno7wx/MjAWYQcMFXpPtSZsQ0TUdaZug tq+v1i00YqxzZ78qkzLfrqURmOAJZqA6bYXpAi6/UiHNJjWgW8v2T5s3ksfhAlmA Waoa6js+xTNknF+6X5a1zzL6SaGv3gmmg6BJdlQ0nGhcQ7Pn1rfws8BgZcqyAYxN QW5VTKqXRYPuJzQJrjK2RM0bMIj9JWxXzgIB/XOxmHcDEIYKe6uYhZDTRJZh8+Mc /u76YAdJjq5/b2mSdbUKFPNRIGTy8D5hVLZ51C4xtxTZcnlb2mYhyPi/PNDgAHyR UGRU84u1axT1ZN8AvGrP4FHTjEllD9654ccqr1G1PA9/b6f+0iU= =7rAH -----END PGP SIGNATURE----- Merge tag 'x86_mtrr_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull mtrr update from Borislav Petkov: "A single change to verify the presence of fixed MTRR ranges before accessing the respective MSRs" * tag 'x86_mtrr_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mtrr: Check if fixed-range MTRRs exist in mtrr_save_fixed_ranges()	2025-05-27 10:17:03 -07:00
Linus Torvalds	664a231d90	Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmg0UDwACgkQEsHwGGHe VUqsZw//SNSNcVHF7Gz2YvHrMXGYQFBETScg6fRWn/pTe3x1NrKEJedzMANXpAIy 1sBAsfDSOyi8MxIZnvMYapLcRdfLGAD+6FQTkyu/IQ3oSsjAxPgrTXornhxUswMY LUs40hCv/UaEMkg35NVrRqDlT973kWLwA4iDNNnm6IGtrC8qv4EmdJvgVWHyPTjk D80KA5ta+iPzK4l8noBrqyhUIZN3ZAJVJLrjS3Tx/gabuolLURE6p4IdlF/O6WzC 4NcqUjpwDeFpHpl2M9QJLVEKXHxKz9zZF2gLpT8Eon/ftqqQigBjzsUx/FKp07hZ fe2AiQsd4gN9GZa3BGX+Lv+bjvyFadARsOoFbY45szuiUb0oceaRYtFF1ihmO0bV bD4nAROE1kAfZpr/9ZRZT63LfE/DAm9TR1YBsViq1rrJvp4odvL15YbdOlIDHZD3 SmxhTxAokj058MRnhGdHoiMtPa54iw186QYDp0KxLQHLrToBPd7RBtRE8jsYrqrv 2EvwUxYKyO4vtwr9tzr0ZfptZ/DEsGovoTYD5EtlEGjotQUqsmi5Rxx4+SEQuwFw CKSJ3j73gpxqDXTujjOe9bCeeXJqyEbrIkaWpkiBRwm5of7eFPG3Sw74jaCGvm4L NM4UufMSDtyVAKfu3HmPkGhujHv0/7h1zYND51aW+GXEroKxy9s= =eNCr -----END PGP SIGNATURE----- Merge tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API" * tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) MAINTAINERS: Add reviewers for fs/resctrl x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl x86/resctrl: Always initialise rid field in rdt_resources_all[] x86/resctrl: Relax some asm #includes x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() x86/resctrl: Squelch whitespace anomalies in resctrl core code x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs x86/resctrl: Move enum resctrl_event_id to resctrl.h x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl fs/resctrl: Add boiler plate for external resctrl code x86/resctrl: Add 'resctrl' to the title of the resctrl documentation x86/resctrl: Split trace.h x86/resctrl: Expand the width of domid by replacing mon_data_bits x86/resctrl: Add end-marker to the resctrl_event_id enum x86/resctrl: Move is_mba_sc() out of core.c x86/resctrl: Drop __init/__exit on assorted symbols x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point x86/resctrl: Check all domains are offline in resctrl_exit() x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" ...	2025-05-27 09:53:02 -07:00
Paolo Bonzini	db44dcbdf8	KVM x86 posted interrupt changes for 6.16: Refine and optimize KVM's software processing of the PIR, and ultimately share PIR harvesting code between KVM and the kernel's Posted MSI handler -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmgwmWcACgkQOlYIJqCj N/3mUw/9HN4OLRqFytu+GjEocl8I7JelJdwCsNMsUwZRnNVnYGDqsjvw8rzqeFmx RoQ8uNqMd1PqZOgAdN6suLES949ItErbnG2+UlBvZeNgR63K8fyNJaPUzSXh0Kyd vNNzGschI0txZXNEtMHcIsCuQknU/arlE6v+HOAokb1jxaIZH2h06vrBAj6pLAHO hbcZPkaQEaFoQhqCbYm015ecJQRPv3IZoW7H1cK5nC4q6QdNo3LPfGqUJwgHV3Wq hbfS+2J78nTqLhSn7HHE/y5z3R5+ZyPwFQwbqfvjjap5/DW5w8Tltg2Oif597lf2 klBukBkJyfzSdhjaPKb3V23kCNabNyyX7KUDZnW5HCiEu62Lnl0MexXCvFvSvtmy YDSsXMg3KdtlESwUOaxGjd2J81tx36L3ZvWRaopDLzA2A6KVyVQCSANGOGkKrRzq Qq3R/frzp1uUVpVDtdyDIO1AujoXkRecdOj1uAIr2XQBg8jx0kveAUyrkXFbQVjK oNbfRlOiu6/vnXkWqwZ2w/Q0kRRrK7M+vensOZlculqDqxPH+BLWB+dfPqjGikb/ cL01KPu6n/GQJpwAxIbGU4eUIQPAVOcHm3iRaIlRqEoDCs7C8fTRIyDx+cD1vW8O O9j/r05EV/Ck5XF2ks6bHIK+C3wemNrCvoeFbnO1uicqtdO+Tqw= =dU1G -----END PGP SIGNATURE----- Merge tag 'kvm-x86-pir-6.16' of https://github.com/kvm-x86/linux into HEAD KVM x86 posted interrupt changes for 6.16: Refine and optimize KVM's software processing of the PIR, and ultimately share PIR harvesting code between KVM and the kernel's Posted MSI handler	2025-05-27 12:15:01 -04:00
Linus Torvalds	5e8bbb2caa	Another set of timer API cleanups: - Convert init_timer(), try_to_del_timer_sync() and destroy_timer_on_stack() over to the canonical timer_() namespace convention. There are is another large converstion pending, which has not been included because it would have caused a gazillion of merge conflicts in next. The conversion scripts will be run towards the end of the merge window and a pull request sent once all conflict dependencies have been merged. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzgTkTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYodwVD/97rF1Juqm1JZNIZPN/vMqwCxRoUkc6 tsK0+UC7UXusuJadxJ+Bsv25iPF+qejnThMU+SQ5yTVj/PNfxOe0WPdCEGGiL8Ye 2JCk6GqSOB/360SlLmtR1B1xHDwsuuUcQTz0w57CH66HRV5vpoWSMSwj/ypy+8nU PlgjItaxdCKa9NJ+SUJZPWIxRkt/PsA1kwlV1OcxkgB++IiIHQEbPxECq9mlzWXF b4Sq/Sdf2OmEePN+DYoey4fneRwJnkjkeX/o+CqosCPHRIiWUlSu5W/lU5IYojM3 s3XpMNNg/z8PMXR4JA2VaPYWLUZyBOs+3dM7Y6Am+z55EoxMxfzg6pGx2tfM4ftl vF8wG3Z1c9MmpLk+P9LatNvfHeVLNve8KgOLa5phMDQ/El/a8KqLu6HmRDPONvKp d6iXdPq1CP8P6jOtlFfzLmKPShgEcp+Zz9W3CaQR/0ZJEsEqrpKOLzdT86hJhBV0 mBCdzixmGtKAh0BdPdmg2FCLScqER3HKIJhZSdV8I+jSETIHCuMiIfbMXR7iwm/H R1/ayvxrbc1mPseo28scqvo7m6cn5BFBxIUf4Sokp52ZCapz1v2aWzo4vHI0cTgT ZOjlTrf+fgYLn1dqdD45TJiQPnmRrw4dU+WWSFRFJY2qjfyucj80vdqdkE5zkp5b UPomlVimG4ccPg== =FHGU -----END PGP SIGNATURE----- Merge tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "Another set of timer API cleanups: - Convert init_timer(), try_to_del_timer_sync() and destroy_timer_on_stack() over to the canonical timer_() namespace convention. There is another large conversion pending, which has not been included because it would have caused a gazillion of merge conflicts in next. The conversion scripts will be run towards the end of the merge window and a pull request sent once all conflict dependencies have been merged" * tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack() treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try() timers: Rename init_timers() as timers_init() timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA timers: Rename __init_timer_on_stack() as __timer_init_on_stack() timers: Rename __init_timer() as __timer_init() timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack() timers: Rename init_timer_key() as timer_init_key()	2025-05-27 08:31:21 -07:00
Linus Torvalds	2bd1bea5fa	A set of cleanups for the generic interrupt subsystem: - Consolidate on one set of functions for the interrupt domain code to get rid of pointlessly duplicated code with only marginal different semantics. - Update the documentation accordingly and consolidate the coding style of the irqdomain header. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzd+MTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYodTRD/0RmG5tngCbEJmTw6lPDQzRZH4OO3ja yRYlyBipemoRmvJRGjV4uHqN2QPrdOuoqMuyBO1aWcMdkpww5bAHcbgSFrlGM1lW kqtaxVMbufPiLQSGYe7OQf478CE1ykoBd5Va8whFKrtA73qEUdEMfWT0stspg780 7BlmQOemL91p7Ytf03FbDdo8tZ5Xu9uXGAulwY9FZsFtsCNyvhl7nOv5Sk8ZQtGO xHRCeunjZLWR+IaK59hdakvQybXwSnjT6jODp96nlyKABEKSPShGSPFDWd3g9px7 4911QwgnvTbcrsk6YmQEmPIOgXZzypjbnjpJr8tFpTbkVIy+6chi5cBJzXoqsUaM ylTwFcUQNvcP8yF447qb+nyPFKM5xsC07W0UpZMuJUDmhhPRtDm5pK0jpsif96GP l4aMsWe65PUmXHQqLdE89RJXAa8XQ2qspKVtNKq9DmEVgTviQ09Z9SSQIx4U0yIx w+YPde8kH2+O+YtMUn/MmfHhUP4MKya7j5zd8Bnv8wLBi7XGPPA5EKKh9I0dz9m+ X94lweNXyH+Q8U9mt2cQf8VG8Yzgk0eeC0sliJIlybwRgEgRcQbVWw0VvZUA1ySa VBlaj3SinO90FEQ0CctT51ss2mUJ/XsGCnxpiGZXfqIZzFbyD1YfZQnXJH0H67DI CqdHw22I27Mu/A== =9nLp -----END PGP SIGNATURE----- Merge tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A set of cleanups for the generic interrupt subsystem: - Consolidate on one set of functions for the interrupt domain code to get rid of pointlessly duplicated code with only marginal different semantics. - Update the documentation accordingly and consolidate the coding style of the irqdomain header" * tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) irqdomain: Consolidate coding style irqdomain: Fix kernel-doc and add it to Documentation Documentation: irqdomain: Update it Documentation: irq-domain.rst: Simple improvements Documentation: irq/concepts: Minor improvements Documentation: irq/concepts: Add commas and reflow irqdomain: Improve kernel-docs of functions irqdomain: Make struct irq_domain_info variables const irqdomain: Use irq_domain_instantiate()'s return value as initializers irqdomain: Drop irq_linear_revmap() pinctrl: keembay: Switch to irq_find_mapping() irqchip/armada-370-xp: Switch to irq_find_mapping() gpu: ipu-v3: Switch to irq_find_mapping() gpio: idt3243x: Switch to irq_find_mapping() sh: Switch to irq_find_mapping() powerpc: Switch to irq_find_mapping() irqdomain: Drop irq_domain_add_() functions powerpc: Switch irq_domain_add_nomap() to use fwnode thermal: Switch to irq_domain_create_linear() soc: Switch to irq_domain_create_() ...	2025-05-27 08:07:32 -07:00
Linus Torvalds	0aee061726	Move the x86 page fault tracepoints to generic code, because other architectures would like to make use of them as well. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy+RARHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jTHA//eIBOFKJdxmhpJ95kzA0tRXue+FUSTAX+ j9rMZOJpR9hnVkr0pBxH8bU42lji4+6b2vujMHaT59n5i2kH5tPFHW1xfEnpbVNw thSRsFxrUKsNnKPBju0vK9WQs9e1cn2ZvVBbh2SHrATKQrcTCmJroEERZDX0cdnn VrPeGoc7UUAjxE23c3vnZOzAJDapIc9zPAdfVGRa7xHqlq5grryG+SfHFzT/fd08 5Qwu8TN37jo1HU5v2I4RYIh4Alc1lXtWTfJAc0bks0Cpryu+Et9+N2XANu/VatVw cve/Ubwdou9m0QxQtUTULttEbMSBB8Ylc7DJ1PdGkhULxNM8cCb+Yx9C8Gk0+8Rf SP8/ZSVK8EE+3ETP+J8r8VXoXrNgTPSjMeI1s4rZD/b9QpRKE4g/Khu+R9UA8JBV yuYdy2xkeRbfFVzoGDSVnZItk18MuAoq4hSNqgAxl9/S33HWG84KHQAnjzixCqb4 9Ai7n3/FBEe1edLJXKoqWK96mTa5P/vpGjMnL8wQ0rAnSYI+V2OSwPpZ9HHviw3g qYYMqsmiU6ChbfcUnuub/YwdJFdRieVSOa7wh3H6mfKAuakpS0At8fIyD5mBtFtA /qeSD9INII/guT1gdTgqGsirXeObbmNpC+HJjz8hRvsoP6hdoT2L/UZsUH89LcDl qd8MKeV1Kew= =xi0h -----END PGP SIGNATURE----- Merge tag 'x86-debug-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 debug updates from Ingo Molnar: "Move the x86 page fault tracepoints to generic code, because other architectures would like to make use of them as well" * tag 'x86-debug-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tracing, x86/mm: Move page fault tracepoints to generic x86/tracing, x86/mm: Remove redundant trace_pagefault_key	2025-05-26 21:18:59 -07:00
Linus Torvalds	020fca04c6	Misc x86 cleanups: kernel-doc updates and a string API transition patch. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy+I0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1g5vRAAr0+sk9sxuu7RfVwLaYzYqK2tzwYYr4o4 bud0fDR/h53cGtanaGdWGPk/YIYraQJHrzCVbHa4XcLoeUVNIKSJqQjpujxgatMC ixQgLmWJNiu7ZqPwNiisLPSxshB+5Svi+mFXFt4MoqZuUY/IcG/f5EI6ntd2z9/g Yc+iU94KLJDlAvDixvrb+DvVxGkYo8OmwXELIqJm/K/4Wm6iz+OboJpZLBe0hcGx N9DO/n8jGj61XTCWO//hX0xziUZuSNRtrr7DieobGrYZJrSysWu9pTVzry9zGCk2 x/mTt/7zaY1kC/eKAQhi+gZ95rx+R/5HrKfPPS9mQZ+Jf5KNKTVJ94QB0AmLrg5U OWUY+3N2alJPuUOP4q8CjQ2BrVijzUaZg/5K1G+8ekzlBJYlkWaZlsgEUi1bTM2M UaGMajDrXrZyANlx2ux79R0nFWJKdrJVNHN7MEpB4P6ezozbuN0r4NJBIKDAKysW o3m2cE6oe3Bi1Xe8btbVuWInfWwdOnZHr1yry27htumkuqlO5mzV22qlLL7zXno8 ZvpfGoAXkIFPFyYW30NGyGQfYwCCEnsLizPa3K0t7Oc8rDsKAXX8Xt+4dDo7J+JY TTnmMVGlDvSKX+Wi+cZrRTXH7LvibuDvaWHumhz51plGBtL0WS6SxO3MXmgBGhr1 DXBYD2qoOgc= =0OZh -----END PGP SIGNATURE----- Merge tag 'x86-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc x86 cleanups: kernel-doc updates and a string API transition patch" * tag 'x86-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/power: hibernate: Fix W=1 build kernel-doc warnings x86/mm/pat: Fix W=1 build kernel-doc warning x86/CPU/AMD: Replace strcpy() with strscpy()	2025-05-26 21:15:06 -07:00
Linus Torvalds	785cdec46e	Core x86 updates for v6.16: Boot code changes: - A large series of changes to reorganize the x86 boot code into a better isolated and easier to maintain base of PIC early startup code in arch/x86/boot/startup/, by Ard Biesheuvel. Motivation & background: \| Since commit \| \| `c88d71508e` ("x86/boot/64: Rewrite startup_64() in C") \| \| dated Jun 6 2017, we have been using C code on the boot path in a way \| that is not supported by the toolchain, i.e., to execute non-PIC C \| code from a mapping of memory that is different from the one provided \| to the linker. It should have been obvious at the time that this was a \| bad idea, given the need to sprinkle fixup_pointer() calls left and \| right to manipulate global variables (including non-pointer variables) \| without crashing. \| \| This C startup code has been expanding, and in particular, the SEV-SNP \| startup code has been expanding over the past couple of years, and \| grown many of these warts, where the C code needs to use special \| annotations or helpers to access global objects. This tree includes the first phase of this work-in-progress x86 boot code reorganization. Scalability enhancements and micro-optimizations: - Improve code-patching scalability (Eric Dumazet) - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper) CPU features enumeration updates: - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S. Darwish) - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish, Thomas Gleixner) - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish) Memory management changes: - Allow temporary MMs when IRQs are on (Andy Lutomirski) - Opt-in to IRQs-off activate_mm() (Andy Lutomirski) - Simplify choose_new_asid() and generate better code (Borislav Petkov) - Simplify 32-bit PAE page table handling (Dave Hansen) - Always use dynamic memory layout (Kirill A. Shutemov) - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov) - Make 5-level paging support unconditional (Kirill A. Shutemov) - Stop prefetching current->mm->mmap_lock on page faults (Mateusz Guzik) - Predict valid_user_address() returning true (Mateusz Guzik) - Consolidate initmem_init() (Mike Rapoport) FPU support and vector computing: - Enable Intel APX support (Chang S. Bae) - Reorgnize and clean up the xstate code (Chang S. Bae) - Make task_struct::thread constant size (Ingo Molnar) - Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y (Kees Cook) - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg Nesterov) - Always preserve non-user xfeatures/flags in __state_perm (Sean Christopherson) Microcode loader changes: - Help users notice when running old Intel microcode (Dave Hansen) - AMD: Do not return error when microcode update is not necessary (Annie Li) - AMD: Clean the cache if update did not load microcode (Boris Ostrovsky) Code patching (alternatives) changes: - Simplify, reorganize and clean up the x86 text-patching code (Ingo Molnar) - Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish() (Nikolay Borisov) - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra) Debugging support: - Add early IDT and GDT loading to debug relocate_kernel() bugs (David Woodhouse) - Print the reason for the last reset on modern AMD CPUs (Yazen Ghannam) - Add AMD Zen debugging document (Mario Limonciello) - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu) - Stop decoding i64 instructions in x86-64 mode at opcode (Masami Hiramatsu) CPU bugs and bug mitigations: - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov) - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov) - Restructure and harmonize the various CPU bug mitigation methods (David Kaplan) - Fix spectre_v2 mitigation default on Intel (Pawan Gupta) MSR API: - Large MSR code and API cleanup (Xin Li) - In-kernel MSR API type cleanups and renames (Ingo Molnar) PKEYS: - Simplify PKRU update in signal frame (Chang S. Bae) NMI handling code: - Clean up, refactor and simplify the NMI handling code (Sohil Mehta) - Improve NMI duration console printouts (Sohil Mehta) Paravirt guests interface: - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov) SEV support: - Share the sev_secrets_pa value again (Tom Lendacky) x86 platform changes: - Introduce the <asm/amd/> header namespace (Ingo Molnar) - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to <asm/amd/fch.h> (Mario Limonciello) Fixes and cleanups: - x86 assembly code cleanups and fixes (Uros Bizjak) - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf, Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy9WARHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jJSw/+OW2zvAx602doujBIE17vFLU7R10Xwj5H lVgomkWCoTNscUZPhdT/iI+/kQF1fG8PtN9oZKUsTAUswldKJsqu7KevobviesiW qI+FqH/fhHaIk7GVh9VP65Dgrdki8zsgd7BFxD8pLRBlbZTxTxXNNkuNJrs6LxJh SxWp/FVtKo6Wd57qlUcsdo0tilAfcuhlEweFUarX55X2ouhdeHjcGNpxj9dHKOh8 M7R5yMYFrpfdpSms+WaCnKKahWHaIQtQTsPAyKwoVdtfl1kK+7NgaCF55Gbo3ogp r59JwC/CGruDa5QnnDizCwFIwpZw9M52Q1NhP/eLEZbDGB4Yya3b5NW+Ya+6rPvO ZZC3e1uUmlxW3lrYflUHurnwrVb2GjkQZOdf0gfnly/7LljIicIS2dk4qIQF9NBd sQPpW5hjmIz9CsfeL8QaJW38pQyMsQWznFuz4YVuHcLHvleb3hR+n4fNfV5Lx9bw oirVETSIT5hy/msAgShPqTqFUEiVCgp16ow20YstxxzFu/FQ+VG987tkeUyFkPMe q1v5yF1hty+TkM4naKendIZ/MJnsrv0AxaegFz9YQrKGL1UPiOajQbSyKbzbto7+ ozmtN0W80E8n4oQq008j8htpgIhDV91UjF5m33qB82uSqKihHPPTsVcbeg5nZwh2 ti5g/a1jk94= =JgQo -----END PGP SIGNATURE----- Merge tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "Boot code changes: - A large series of changes to reorganize the x86 boot code into a better isolated and easier to maintain base of PIC early startup code in arch/x86/boot/startup/, by Ard Biesheuvel. Motivation & background: \| Since commit \| \| `c88d71508e` ("x86/boot/64: Rewrite startup_64() in C") \| \| dated Jun 6 2017, we have been using C code on the boot path in a way \| that is not supported by the toolchain, i.e., to execute non-PIC C \| code from a mapping of memory that is different from the one provided \| to the linker. It should have been obvious at the time that this was a \| bad idea, given the need to sprinkle fixup_pointer() calls left and \| right to manipulate global variables (including non-pointer variables) \| without crashing. \| \| This C startup code has been expanding, and in particular, the SEV-SNP \| startup code has been expanding over the past couple of years, and \| grown many of these warts, where the C code needs to use special \| annotations or helpers to access global objects. This tree includes the first phase of this work-in-progress x86 boot code reorganization. Scalability enhancements and micro-optimizations: - Improve code-patching scalability (Eric Dumazet) - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper) CPU features enumeration updates: - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S. Darwish) - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish, Thomas Gleixner) - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish) Memory management changes: - Allow temporary MMs when IRQs are on (Andy Lutomirski) - Opt-in to IRQs-off activate_mm() (Andy Lutomirski) - Simplify choose_new_asid() and generate better code (Borislav Petkov) - Simplify 32-bit PAE page table handling (Dave Hansen) - Always use dynamic memory layout (Kirill A. Shutemov) - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov) - Make 5-level paging support unconditional (Kirill A. Shutemov) - Stop prefetching current->mm->mmap_lock on page faults (Mateusz Guzik) - Predict valid_user_address() returning true (Mateusz Guzik) - Consolidate initmem_init() (Mike Rapoport) FPU support and vector computing: - Enable Intel APX support (Chang S. Bae) - Reorgnize and clean up the xstate code (Chang S. Bae) - Make task_struct::thread constant size (Ingo Molnar) - Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y (Kees Cook) - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg Nesterov) - Always preserve non-user xfeatures/flags in __state_perm (Sean Christopherson) Microcode loader changes: - Help users notice when running old Intel microcode (Dave Hansen) - AMD: Do not return error when microcode update is not necessary (Annie Li) - AMD: Clean the cache if update did not load microcode (Boris Ostrovsky) Code patching (alternatives) changes: - Simplify, reorganize and clean up the x86 text-patching code (Ingo Molnar) - Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish() (Nikolay Borisov) - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra) Debugging support: - Add early IDT and GDT loading to debug relocate_kernel() bugs (David Woodhouse) - Print the reason for the last reset on modern AMD CPUs (Yazen Ghannam) - Add AMD Zen debugging document (Mario Limonciello) - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu) - Stop decoding i64 instructions in x86-64 mode at opcode (Masami Hiramatsu) CPU bugs and bug mitigations: - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov) - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov) - Restructure and harmonize the various CPU bug mitigation methods (David Kaplan) - Fix spectre_v2 mitigation default on Intel (Pawan Gupta) MSR API: - Large MSR code and API cleanup (Xin Li) - In-kernel MSR API type cleanups and renames (Ingo Molnar) PKEYS: - Simplify PKRU update in signal frame (Chang S. Bae) NMI handling code: - Clean up, refactor and simplify the NMI handling code (Sohil Mehta) - Improve NMI duration console printouts (Sohil Mehta) Paravirt guests interface: - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov) SEV support: - Share the sev_secrets_pa value again (Tom Lendacky) x86 platform changes: - Introduce the <asm/amd/> header namespace (Ingo Molnar) - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to <asm/amd/fch.h> (Mario Limonciello) Fixes and cleanups: - x86 assembly code cleanups and fixes (Uros Bizjak) - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf, Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)" * tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits) x86/bugs: Fix spectre_v2 mitigation default on Intel x86/bugs: Restructure ITS mitigation x86/xen/msr: Fix uninitialized variable 'err' x86/msr: Remove a superfluous inclusion of <asm/asm.h> x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only x86/mm/64: Make 5-level paging support unconditional x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model x86/mm/64: Always use dynamic memory layout x86/bugs: Fix indentation due to ITS merge x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor() x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2() x86/cpuid: Rename have_cpuid_p() to cpuid_feature() x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h> x86/msr: Add rdmsrl_on_cpu() compatibility wrapper x86/mm: Fix kernel-doc descriptions of various pgtable methods x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too x86/boot: Defer initialization of VM space related global variables ...	2025-05-26 16:04:17 -07:00
Linus Torvalds	ddddf9d64f	Performance events updates for v6.16: Core & generic-arch updates: - Add support for dynamic constraints and propagate it to the Intel driver (Kan Liang) - Fix & enhance driver-specific throttling support (Kan Liang) - Record sample last_period before updating on the x86 and PowerPC platforms (Mark Barnett) - Make perf_pmu_unregister() usable (Peter Zijlstra) - Unify perf_event_free_task() / perf_event_exit_task_context() (Peter Zijlstra) - Simplify perf_event_release_kernel() and perf_event_free_task() (Peter Zijlstra) - Allocate non-contiguous AUX pages by default (Yabin Cui) Uprobes updates: - Add support to emulate NOP instructions (Jiri Olsa) - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa) x86 Intel PMU enhancements: - Support Intel Auto Counter Reload [ACR] (Kan Liang) - Add PMU support for Clearwater Forest (Dapeng Mi) - Arch-PEBS preparatory changes: (Dapeng Mi) - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs - Decouple BTS initialization from PEBS initialization - Introduce pairs of PEBS static calls x86 AMD PMU enhancements: - Use hrtimer for handling overflows in the AMD uncore driver (Sandipan Das) - Prevent UMC counters from saturating (Sandipan Das) Fixes and cleanups: - Fix put_ctx() ordering (Frederic Weisbecker) - Fix irq work dereferencing garbage (Frederic Weisbecker) - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan Das, Thorsten Blum) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy4zoRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1j6QRAAvQ4GBPrdJLb8oXkLjCmWSp9PfM1h2IW0 reUrcV0BPRAwz4T60QEU2KyiEjvKxNghR6bNw4i3slAZ8EFwP9eWE/0ZYOo5+W/N wv8vsopv/oZd2L2G5TgxDJf+tLPkqnTvp651LmGAbquPFONN1lsya9UHVPnt2qtv fvFhjW6D828VoevRcUCsdoEUNlFDkUYQ2c3M1y5H2AI6ILDVxLsp5uYtuVUP+2lQ 7UI/elqRIIblTGT7G9LvTGiXZMm8T58fe1OOLekT6NdweJ3XEt1kMdFo/SCRYfzU eDVVVLSextZfzBXNPtAEAlM3aSgd8+4m5sACiD1EeOUNjo5J9Sj1OOCa+bZGF/Rl XNv5Kcp6Kh1T4N5lio8DE/NabmHDqDMbUGfud+VTS8uLLku4kuOWNMxJTD1nQ2Zz BMfJhP89G9Vk07F9fOGuG1N6mKhIKNOgXh0S92tB7XDHcdJegueu2xh4ZszBL1QK JVXa4DbnDj+y0LvnV+A5Z6VILr5RiCAipDb9ascByPja6BbN10Nf9Aj4nWwRTwbO ut5OK/fDKmSjEHn1+a42d4iRxdIXIWhXCyxEhH+hJXEFx9htbQ3oAbXAEedeJTlT g9QYGAjL96QEd0CqviorV8KyU59nVkEPoLVCumXBZ0WWhNwU6GdAmsW1hLfxQdLN sp+XHhfxf8M= =tPRs -----END PGP SIGNATURE----- Merge tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: "Core & generic-arch updates: - Add support for dynamic constraints and propagate it to the Intel driver (Kan Liang) - Fix & enhance driver-specific throttling support (Kan Liang) - Record sample last_period before updating on the x86 and PowerPC platforms (Mark Barnett) - Make perf_pmu_unregister() usable (Peter Zijlstra) - Unify perf_event_free_task() / perf_event_exit_task_context() (Peter Zijlstra) - Simplify perf_event_release_kernel() and perf_event_free_task() (Peter Zijlstra) - Allocate non-contiguous AUX pages by default (Yabin Cui) Uprobes updates: - Add support to emulate NOP instructions (Jiri Olsa) - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa) x86 Intel PMU enhancements: - Support Intel Auto Counter Reload [ACR] (Kan Liang) - Add PMU support for Clearwater Forest (Dapeng Mi) - Arch-PEBS preparatory changes: (Dapeng Mi) - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs - Decouple BTS initialization from PEBS initialization - Introduce pairs of PEBS static calls x86 AMD PMU enhancements: - Use hrtimer for handling overflows in the AMD uncore driver (Sandipan Das) - Prevent UMC counters from saturating (Sandipan Das) Fixes and cleanups: - Fix put_ctx() ordering (Frederic Weisbecker) - Fix irq work dereferencing garbage (Frederic Weisbecker) - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan Das, Thorsten Blum)" * tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) perf/headers: Clean up <linux/perf_event.h> a bit perf/uapi: Clean up <uapi/linux/perf_event.h> a bit perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h> mips/perf: Remove driver-specific throttle support xtensa/perf: Remove driver-specific throttle support sparc/perf: Remove driver-specific throttle support loongarch/perf: Remove driver-specific throttle support csky/perf: Remove driver-specific throttle support arc/perf: Remove driver-specific throttle support alpha/perf: Remove driver-specific throttle support perf/apple_m1: Remove driver-specific throttle support perf/arm: Remove driver-specific throttle support s390/perf: Remove driver-specific throttle support powerpc/perf: Remove driver-specific throttle support perf/x86/zhaoxin: Remove driver-specific throttle support perf/x86/amd: Remove driver-specific throttle support perf/x86/intel: Remove driver-specific throttle support perf: Only dump the throttle log for the leader perf: Fix the throttle logic for a group perf/core: Add the is_event_in_freq_mode() helper to simplify the code ...	2025-05-26 15:40:23 -07:00
Linus Torvalds	14418ddcc2	This update includes the following changes: API: - Fix memcpy_sglist to handle partially overlapping SG lists. - Use memcpy_sglist to replace null skcipher. - Rename CRYPTO_TESTS to CRYPTO_BENCHMARK. - Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS. - Hide CRYPTO_MANAGER. - Add delayed freeing of driver crypto_alg structures. Compression: - Allocate large buffers on first use instead of initialisation in scomp. - Drop destination linearisation buffer in scomp. - Move scomp stream allocation into acomp. - Add acomp scatter-gather walker. - Remove request chaining. - Add optional async request allocation. Hashing: - Remove request chaining. - Add optional async request allocation. - Move partial block handling into API. - Add ahash support to hmac. - Fix shash documentation to disallow usage in hard IRQs. Algorithms: - Remove unnecessary SIMD fallback code on x86 and arm/arm64. - Drop avx10_256 xts(aes)/ctr(aes) on x86. - Improve avx-512 optimisations for xts(aes). - Move chacha arch implementations into lib/crypto. - Move poly1305 into lib/crypto and drop unused Crypto API algorithm. - Disable powerpc/poly1305 as it has no SIMD fallback. - Move sha256 arch implementations into lib/crypto. - Convert deflate to acomp. - Set block size correctly in cbcmac. Drivers: - Do not use sg_dma_len before mapping in sun8i-ss. - Fix warm-reboot failure by making shutdown do more work in qat. - Add locking in zynqmp-sha. - Remove cavium/zip. - Add support for PCI device 0x17D8 to ccp. - Add qat_6xxx support in qat. - Add support for RK3576 in rockchip-rng. - Add support for i.MX8QM in caam. Others: - Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up. - Add new SEV/SNP platform shutdown API in ccp. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmgz47AACgkQxycdCkmx i6fvKRAAr4Xa903L0r1Q1P1alQqoFFCqimUWeH72m68LiWynHWi0lUo0z/+tKweg mnPStz7/Ha9HRHJjdNCMPnlJqXQDkuH3bIOuBJCwduDuhHo9VGOd46XGzmGMv3gb HKuZhI0lk7pznK3CSyD/2nHmbDCHD+7feTZSBMoN9mm875+aSoM6fdxgak8uPFcq KbB1L+hObTn2kAPSqRrNOR8/xG2N7hdH8eax7Li+LAtqYNVT5HvWVECsB/CKRPfB sgAv3UTzcIFapSSHUHaONppSeoqPAIAeV7SdQhJvlT+EUUR/h/B6+D9OUQQqbphQ LBalgTnqMKl0ymDEQFQ6QyYCat9ZfNmDft2WcXEsxc8PxImkgJI1W3B8O51sOjbG 78D8JqVQ96dleo4FsBhM2wfG0b41JM6zU4raC4vS7a3qsUS+Q1MpehvcS1iORicy SpGdE8e7DLlxKhzWyW1xJnbrtMZDC7Sa2hUnxrvP0/xOvRhChKscRVtWcf0a5q7X 8JmuvwVSOJuSbQ3MeFbQvpo5lR9+0WsNjM6e9miiH6Y7vZUKmWcq2yDp377qVzeh 7NK6+OwGIQZZExrmtPw2BXwssT9Eg+ks6Y7g2Ne7yzvrjVNfEPY7Cws/5w7p8mRS qhrcpbJNFlWgD7YYkmGZFTQ8DCN25ipP8lklO/hbcfchqLE/o1o= =O8L5 -----END PGP SIGNATURE----- Merge tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Fix memcpy_sglist to handle partially overlapping SG lists - Use memcpy_sglist to replace null skcipher - Rename CRYPTO_TESTS to CRYPTO_BENCHMARK - Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS - Hide CRYPTO_MANAGER - Add delayed freeing of driver crypto_alg structures Compression: - Allocate large buffers on first use instead of initialisation in scomp - Drop destination linearisation buffer in scomp - Move scomp stream allocation into acomp - Add acomp scatter-gather walker - Remove request chaining - Add optional async request allocation Hashing: - Remove request chaining - Add optional async request allocation - Move partial block handling into API - Add ahash support to hmac - Fix shash documentation to disallow usage in hard IRQs Algorithms: - Remove unnecessary SIMD fallback code on x86 and arm/arm64 - Drop avx10_256 xts(aes)/ctr(aes) on x86 - Improve avx-512 optimisations for xts(aes) - Move chacha arch implementations into lib/crypto - Move poly1305 into lib/crypto and drop unused Crypto API algorithm - Disable powerpc/poly1305 as it has no SIMD fallback - Move sha256 arch implementations into lib/crypto - Convert deflate to acomp - Set block size correctly in cbcmac Drivers: - Do not use sg_dma_len before mapping in sun8i-ss - Fix warm-reboot failure by making shutdown do more work in qat - Add locking in zynqmp-sha - Remove cavium/zip - Add support for PCI device 0x17D8 to ccp - Add qat_6xxx support in qat - Add support for RK3576 in rockchip-rng - Add support for i.MX8QM in caam Others: - Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up - Add new SEV/SNP platform shutdown API in ccp" * tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits) x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining crypto: qat - add missing header inclusion crypto: api - Redo lookup on EEXIST Revert "crypto: testmgr - Add hash export format testing" crypto: marvell/cesa - Do not chain submitted requests crypto: powerpc/poly1305 - add depends on BROKEN for now Revert "crypto: powerpc/poly1305 - Add SIMD fallback" crypto: ccp - Add missing tee info reg for teev2 crypto: ccp - Add missing bootloader info reg for pspv5 crypto: sun8i-ce - move fallback ahash_request to the end of the struct crypto: octeontx2 - Use dynamic allocated memory region for lmtst crypto: octeontx2 - Initialize cptlfs device info once crypto: xts - Only add ecb if it is not already there crypto: lrw - Only add ecb if it is not already there crypto: testmgr - Add hash export format testing crypto: testmgr - Use ahash for generic tfm crypto: hmac - Add ahash support crypto: testmgr - Ignore EEXIST on shash allocation crypto: algapi - Add driver template support to crypto_inst_setname crypto: shash - Set reqsize in shash_alg ...	2025-05-26 13:47:28 -07:00
Paolo Bonzini	4d526b02df	KVM/arm64 updates for 6.16 * New features: - Add large stage-2 mapping support for non-protected pKVM guests, clawing back some performance. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Enable nested virtualisation support on systems that support it (yes, it has been a long time coming), though it is disabled by default. * Improvements, fixes and cleanups: - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. This ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmgwU7UACgkQI9DQutE9 ekN93g//fNnejxf01dBFIbuylzYEyHZSEH0iTGLeM+ES9zvntCzciTYVzb27oqNG RDLShlQYp3w4rAe6ORzyePyHptOmKXCxfj/VXUFp3A7H9QYOxt1nacD3WxI9fCOo LzaSLquvgwFBaeTdDE0KdeTUKQHluId+w1Azh0lnHGeUP+lOHNZ8FqoP1/la0q04 GvVL+l3wz/IhPP8r1YA0Q1bzJ5SLfSpjIw/0F5H/xgI4lyYdHzgFL8sKuSyFeCyM 2STQi+ZnTCsAs4bkXkw2Pp9CFYrfQgZi+sf7Om+noAKhbJo3vb7/RHpgjv+QCjJy Kx4g9CbxHfaM03cH6uSLBoFzsACR1iAuUz8BCSRvvVNH4RVT6H+34nzjLZXLncrP gm1uYs9aMTLr91caeAx0aYIMWGYa1uqV0rum3WxyIHezN9Q/NuQoZyfprUufr8oX wCYE+ot4VT3DwG0UFZKKwj0BiCbYcbph9nBLVyZJsg8OKxpvspkCtPriFp1kb6BP dTTGSXd9JJqwSgP9qJLxijcv6Nfgp2gT42TWwh/dJRZXhnTCvr9IyclFIhoIIq3G Q2BkFCXOoEoNQhBA1tiWzJ9nDHf52P72Z2K1gPyyMZwF49HGa2BZBCJGkqX06wSs Riolf1/cjFhDno1ThiHKsHT0sG1D4oc9k/1NLq5dyNAEGcgATIA= =Jju3 -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.16 * New features: - Add large stage-2 mapping support for non-protected pKVM guests, clawing back some performance. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Enable nested virtualisation support on systems that support it (yes, it has been a long time coming), though it is disabled by default. * Improvements, fixes and cleanups: - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. This ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups.	2025-05-26 16:19:46 -04:00
Eric Biggers	2297554f01	x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining irq_fpu_usable() incorrectly returned true before the FPU is initialized. The x86 CPU onlining code can call sha256() to checksum AMD microcode images, before the FPU is initialized. Since sha256() recently gained a kernel-mode FPU optimized code path, a crash occurred in kernel_fpu_begin_mask() during hotplug CPU onlining. (The crash did not occur during boot-time CPU onlining, since the optimized sha256() code is not enabled until subsys_initcalls run.) Fix this by making irq_fpu_usable() return false before fpu__init_cpu() has run. To do this without adding any additional overhead to irq_fpu_usable(), replace the existing per-CPU bool in_kernel_fpu with kernel_fpu_allowed which tracks both initialization and usage rather than just usage. The initial state is false; FPU initialization sets it to true; kernel-mode FPU sections toggle it to false and then back to true; and CPU offlining restores it to the initial state of false. Fixes: `11d7956d52` ("crypto: x86/sha256 - implement library instead of shash") Reported-by: Ayush Jain <Ayush.Jain3@amd.com> Closes: https://lore.kernel.org/r/20250516112217.GBaCcf6Yoc6LkIIryP@fat_crate.local Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ayush Jain <Ayush.Jain3@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2025-05-26 10:58:50 +08:00
Roman Kisel	43cb39ad26	arch/x86: Provide the CPU number in the wakeup AP callback When starting APs, confidential guests and paravisor guests need to know the CPU number, and the pattern of using the linear search has emerged in several places. With N processors that leads to the O(N^2) time complexity. Provide the CPU number in the AP wake up callback so that one can get the CPU number in constant time. Suggested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250507182227.7421-3-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250507182227.7421-3-romank@linux.microsoft.com>	2025-05-23 16:30:56 +00:00
Marc Zyngier	cb86616c39	Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/next * kvm-arm64/ubsan-el2: : . : Add UBSAN support to the EL2 portion of KVM, reusing most of the : existing logic provided by CONFIG_IBSAN_TRAP. : : Patches courtesy of Mostafa Saleh. : . KVM: arm64: Handle UBSAN faults KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2 ubsan: Remove regs from report_ubsan_failure() arm64: Introduce esr_is_ubsan_brk() Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-23 10:57:32 +01:00
Coiby Xu	cc66e4863a	x86/crash: make the page that stores the dm crypt keys inaccessible This adds an addition layer of protection for the saved copy of dm crypt key. Trying to access the saved copy will cause page fault. Link: https://lkml.kernel.org/r/20250502011246.99238-9-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Suggested-by: Pingfan Liu <kernelfans@gmail.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-21 10:48:22 -07:00
Coiby Xu	5eb3f60554	x86/crash: pass dm crypt keys to kdump kernel 1st kernel will build up the kernel command parameter dmcryptkeys as similar to elfcorehdr to pass the memory address of the stored info of dm crypt key to kdump kernel. Link: https://lkml.kernel.org/r/20250502011246.99238-8-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-21 10:48:21 -07:00
Pawan Gupta	6a7c3c2606	x86/bugs: Fix spectre_v2 mitigation default on Intel Commit `480e803dac` ("x86/bugs: Restructure spectre_v2 mitigation") inadvertently changed the spectre-v2 mitigation default from eIBRS to IBRS on Intel. While splitting the spectre_v2 mitigation in select/update/apply functions, eIBRS and IBRS selection logic was separated in select and update. This caused IBRS selection to not consider that eIBRS mitigation is already selected, fix it. Fixes: `480e803dac` ("x86/bugs: Restructure spectre_v2 mitigation") Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250520-eibrs-fix-v1-1-91bacd35ed09@linux.intel.com	2025-05-21 11:51:32 +02:00
David Kaplan	61ab72c2c6	x86/bugs: Restructure ITS mitigation Restructure the ITS mitigation to use select/update/apply functions like the other mitigations. There is a particularly complex interaction between ITS and Retbleed as CDT (Call Depth Tracking) is a mitigation for both, and either its=stuff or retbleed=stuff will attempt to enable CDT. retbleed_update_mitigation() runs first and will check the necessary pre-conditions for CDT if either ITS or Retbleed stuffing is selected. If checks pass and ITS stuffing is selected, it will select stuffing for Retbleed as well. its_update_mitigation() runs after and will either select stuffing if retbleed stuffing was enabled, or fall back to the default (aligned thunks) if stuffing could not be enabled. Enablement of CDT is done exclusively in retbleed_apply_mitigation(). its_apply_mitigation() is only used to enable aligned thunks. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/20250516193212.128782-1-david.kaplan@amd.com	2025-05-21 08:45:27 +02:00
Ingo Molnar	412751aa69	Linux 6.15-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgqSbkeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGr6sH/1ICAvlin1GuxffE ISVNz3xhXQpXG2k8yl9r0umpdCfPQbGrxm30vZyuIDNutY/FuMvkIqfu+Z1NnLg0 GidZW015LtXrp7/puKtTnUD5CPSjdETMXig+Q7c1PrxkkmHwz8sBbbm173AIDbDB t7wwqSEUQh2AIDouGwN+DXB+6bR2FoOXb/k/njmtappIwR3rBc2f1HQJnP095rKO 5AKw1c9DMv5Wq2cEdBOCP48e4CFZEIN1ycW0nvtjpnOmcPOJjLoEothRbntQolqF udtj5UeTGdAJqmjigv7KHmlrmFNe+GqBq4+beHl5MRxhBaT2uGGaM9jCJiSxT3Jx sHyYYr8= =Ddma -----END PGP SIGNATURE----- Merge tag 'v6.15-rc7' into x86/core, to pick up fixes Pick up build fixes from upstream to make this tree more testable. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-21 08:45:03 +02:00
Linus Torvalds	56b2b1fc90	Misc x86 fixes: - Fix SEV-SNP kdump bugs - Update the email address of Alexey Makhalov in MAINTAINERS - Add the CPU feature flag for the Zen6 microarchitecture - Fix typo in system message Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgoj3MRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hppg//S/eodSXrgxzTOvZLu0gFeYN4xyxUnfWl 0Dvc+FRasGCpBpQcD9sl3w9xKnTkaGY250NPP4/OKW2JgiizP6E3UcFYvaDnZ96I TU3/y3acAI5zpvASOuOuDlwDt0w9xIk5L/K0gcVec9dYnGdAOmTE4jjZV6wDm0Q4 rto8k5E0RmSs5HQ4GcpU2sgzJSlaQlkkxZMo6HaUE6oJUiuodmPnxHkjoLgAQiU9 I0ALcrPVtyI1jap52DVxAIDcMsrOddazYley4IyDRqWezwrtrxkNaEzvNkMWO4ZV iAnTYe/21HrppsQ40KuYa5VY5k0Dkv+QVzb23rGT2sZlPaXAiPIVUtt25z4VGtve 1z/kn1TszfcqC9sPodVcHQkzNrTktlaEKXd3u9GuFlfMkuj7iSnmYnGoPMo6x7T9 vcbBF6PUQ+uNi7QZXDvww8S0OMBVVlMDOjhuGjFBFzkmfVzkFtdyC1oGXppiXNzg KG0LjiTDlOeI4B8bxG1Wwldwl7vLfwHJag2xWaw0uQR8mjstkCTLXibjdvz3QNwi bM14hlG3TxmxJSsYl8QNFnF45DwzApWGKz9K81OPz/yZ2Z6KB1uQqrN2l8+blFt9 OUMEukY9sAcmUR1hkt3Rdynb1ri+jGMcJUGOn48w2ne+qiLoVicp8LEgWO6KoI3Z vgLnVmqIa9o= =cD7r -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2025-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix SEV-SNP kdump bugs - Update the email address of Alexey Makhalov in MAINTAINERS - Add the CPU feature flag for the Zen6 microarchitecture - Fix typo in system message * tag 'x86-urgent-2025-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove duplicated word in warning message x86/CPU/AMD: Add X86_FEATURE_ZEN6 x86/sev: Make sure pages are not skipped during kdump x86/sev: Do not touch VMSA pages during SNP guest memory kdump MAINTAINERS: Update Alexey Makhalov's email address x86/sev: Fix operator precedence in GHCB_MSR_VMPL_REQ_LEVEL macro	2025-05-17 08:43:51 -07:00
Kirill A. Shutemov	09230b7554	x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only PARAVIRT_XXL is exclusively utilized by XEN_PV, which is only compatible with 64-bit machines. Clearly designate PARAVIRT_XXL as 64-bit only and remove ifdefs to support CONFIG_PGTABLE_LEVELS < 5. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250516123306.3812286-5-kirill.shutemov@linux.intel.com	2025-05-17 10:38:29 +02:00
Kirill A. Shutemov	7212b58d6d	x86/mm/64: Make 5-level paging support unconditional Both Intel and AMD CPUs support 5-level paging, which is expected to become more widely adopted in the future. All major x86 Linux distributions have the feature enabled. Remove CONFIG_X86_5LEVEL and related #ifdeffery for it to make it more readable. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250516123306.3812286-4-kirill.shutemov@linux.intel.com	2025-05-17 10:38:16 +02:00
Kirill A. Shutemov	1bffe6f689	x86/mm/64: Always use dynamic memory layout Dynamic memory layout is used by KASLR and 5-level paging. CONFIG_X86_5LEVEL is going to be removed, making 5-level paging support unconditional which requires unconditional support of dynamic memory layout. Remove CONFIG_DYNAMIC_MEMORY_LAYOUT. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250516123306.3812286-2-kirill.shutemov@linux.intel.com	2025-05-17 10:33:44 +02:00
Borislav Petkov (AMD)	a0f3fe547e	x86/bugs: Fix indentation due to ITS merge No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-17 10:33:32 +02:00
Jiri Slaby (SUSE)	b712918091	x86/io_apic: Switch to of_fwnode_handle() of_node_to_fwnode() is irqdomain's reimplementation of the "officially" defined of_fwnode_handle(). The former is in the process of being removed, so use the latter instead. [ tglx: Fix up subject prefix ] Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-11-jirislaby@kernel.org	2025-05-16 21:06:08 +02:00
James Morse	7168ae330e	x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl Resctrl is a filesystem interface to hardware that provides cache allocation policy and bandwidth control for groups of tasks or CPUs. To support more than one architecture, resctrl needs to live in /fs/. Move the code that is concerned with the filesystem interface to /fs/resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-25-james.morse@arm.com	2025-05-16 14:36:09 +02:00
James Morse	f6b25be204	x86/resctrl: Always initialise rid field in rdt_resources_all[] x86 has an array, rdt_resources_all[], of all possible resources. The for-each-resource walkers depend on the rid field of all resources being initialised. If the array ever grows due to another architecture adding a resource type that is not defined on x86, the for-each-resources walkers will loop forever. Initialise all the rid values in resctrl_arch_late_init() before any for-each-resource walker can be called. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-24-james.morse@arm.com	2025-05-16 14:35:22 +02:00
Dave Martin	b7b57edbf5	x86/resctrl: Relax some asm #includes checkpatch.pl identifies some direct #includes of asm headers that can be satisfied by including the corresponding <linux/...> header instead. Fix them. No intentional functional change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-23-james.morse@arm.com	2025-05-16 14:34:31 +02:00
Dave Martin	df3dc0efcc	x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() rdt_init_fs_context() sizes a typed allocation using an explicit sizeof(type) expression, which checkpatch.pl complains about. Since this code is about to be factored out and made generic, this is a good opportunity to fix the code to size the allocation based on the target pointer instead, to reduce the chance of future mis- maintenance. Fix it. No functional change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-22-james.morse@arm.com	2025-05-16 14:33:56 +02:00
Dave Martin	556f48a509	x86/resctrl: Squelch whitespace anomalies in resctrl core code checkpatch.pl complains about some whitespace anomalies in the resctrl core code. This doesn't matter, but since this code is about to be factored out and made generic, this is a good opportunity to fix these issues and so reduce future checkpatch fuzz. Fix them. No functional change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-21-james.morse@arm.com	2025-05-16 14:09:18 +02:00
James Morse	3d95a49b36	x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl Once the filesystem parts of resctrl move to fs/resctrl, it cannot rely on definitions in x86's internal.h. Move definitions in internal.h that need to be shared between the filesystem and architecture code to header files that fs/resctrl can include. Doing this separately means the filesystem code only moves between files of the same name, instead of having these changes mixed in too. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-17-james.morse@arm.com	2025-05-16 11:35:23 +02:00
James Morse	bff70402d6	fs/resctrl: Add boiler plate for external resctrl code Add Makefile and Kconfig for fs/resctrl. Add ARCH_HAS_CPU_RESCTRL for the common parts of the resctrl interface and make X86_CPU_RESCTRL select this. Adding an include of asm/resctrl.h to linux/resctrl.h allows the /fs/resctrl files to switch over to using this header instead. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-16-james.morse@arm.com	2025-05-16 11:05:40 +02:00
Ahmed S. Darwish	3bf8ce8284	x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor() In order to let all the APIs under <cpuid/api.h> have a shared "cpuid_" namespace, rename hypervisor_cpuid_base() to cpuid_base_hypervisor(). To align with the new style, also rename: for_each_possible_hypervisor_cpuid_base(function) to: for_each_possible_cpuid_base_hypervisor(function) Adjust call-sites accordingly. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/aCZOi0Oohc7DpgTo@lx-t490	2025-05-16 10:54:47 +02:00
Ahmed S. Darwish	119deb95b0	x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter The CPUID(0x2) descriptors iterator has been renamed from: for_each_leaf_0x2_entry() to: for_each_cpuid_0x2_desc() since it iterates over CPUID(0x2) cache and TLB "descriptors", not "entries". In the macro's x86/cpu call-site, rename the parameter denoting the parsed descriptor at each iteration from 'entry' to 'desc'. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-8-darwi@linutronix.de	2025-05-16 10:49:55 +02:00
Ahmed S. Darwish	4b21e71ad6	x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter The CPUID(0x2) descriptors iterator has been renamed from: for_each_leaf_0x2_entry() to: for_each_cpuid_0x2_desc() since it iterates over CPUID(0x2) cache and TLB "descriptors", not "entries". In the macro's x86/cacheinfo call-site, rename the parameter denoting the parsed descriptor at each iteration from 'entry' to 'desc'. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-7-darwi@linutronix.de	2025-05-16 10:49:55 +02:00
Ahmed S. Darwish	e7df7289f1	x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2() Rename the CPUID(0x2) register accessor function: cpuid_get_leaf_0x2_regs(regs) to: cpuid_leaf_0x2(regs) for consistency with other <cpuid/api.h> accessors that return full CPUID registers outputs like: cpuid_leaf(regs) cpuid_subleaf(regs) In the same vein, rename the CPUID(0x2) iteration macro: for_each_leaf_0x2_entry() to: for_each_cpuid_0x2_desc() to include "cpuid" in the macro name, and since what is iterated upon is CPUID(0x2) cache and TLB "descriptos", not "entries". Prefix an underscore to that iterator macro parameters, so that the newly renamed 'desc' parameter do not get mixed with "union leaf_0x2_regs :: desc[]" in the macro's implementation. Adjust all the affected call-sites accordingly. While at it, use "CPUID(0x2)" instead of "CPUID leaf 0x2" as this is the recommended style. No change in functionality intended. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-6-darwi@linutronix.de	2025-05-16 10:49:48 +02:00
James Morse	270f00bcc9	x86/resctrl: Split trace.h trace.h contains all the tracepoints. After the move to /fs/resctrl, some of these will be left behind. All the pseudo_lock tracepoints remain part of the architecture. The lone tracepoint in monitor.c moves to /fs/resctrl. Split trace.h so that each C file includes a different trace header file. This means the trace header files are not modified when they are moved. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-14-james.morse@arm.com	2025-05-16 10:47:49 +02:00
James Morse	2a65660385	x86/resctrl: Expand the width of domid by replacing mon_data_bits MPAM platforms retrieve the cache-id property from the ACPI PPTT table. The cache-id field is 32 bits wide. Under resctrl, the cache-id becomes the domain-id, and is packed into the mon_data_bits union bitfield. The width of cache-id in this field is 14 bits. Expanding the union would break 32bit x86 platforms as this union is stored as the kernfs kn->priv pointer. This saved allocating memory for the priv data storage. The firmware on MPAM platforms have used the PPTT cache-id field to expose the interconnect's id for the cache, which is sparse and uses more than 14 bits. Use of this id is to enable PCIe direct cache injection hints. Using this feature with VFIO means the value provided by the ACPI table should be exposed to user-space. To support cache-id values greater than 14 bits, convert the mon_data_bits union to a structure. These are shared between control and monitor groups, and are allocated on first use. The list of allocated struct mon_data is free'd when the filesystem is umount()ed. Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-13-james.morse@arm.com	2025-05-16 10:46:45 +02:00
James Morse	d4fb6b8e46	x86/resctrl: Add end-marker to the resctrl_event_id enum The resctrl_event_id enum gives names to the counter event numbers on x86. These are used directly by resctrl. To allow the MPAM driver to keep an array of these the size of the enum needs to be known. Add a 'num_events' enum entry which can be used to size an array. This is added to the enum to reduce conflicts with another series, which in turn requires get_arch_mbm_state() to have a default case. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-12-james.morse@arm.com	2025-05-16 10:44:36 +02:00
Nam Cao	d49ae4172c	x86/tracing, x86/mm: Remove redundant trace_pagefault_key trace_pagefault_key is used to optimize the pagefault tracepoints when it is disabled. However, tracepoints already have built-in static_key for this exact purpose. Remove this redundant key. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-trace-kernel@vger.kernel.org Link: https://lore.kernel.org/r/827c7666d2989f08742a4fb869b1ed5bfaaf1dbf.1747046848.git.namcao@linutronix.de	2025-05-16 10:13:59 +02:00
James Morse	6c72fb8d8b	x86/resctrl: Move is_mba_sc() out of core.c is_mba_sc() is defined in core.c, but has no callers there. It does not access any architecture private structures. Move this to rdtgroup.c where the majority of callers are. This makes the move of the filesystem code to /fs/ cleaner. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-11-james.morse@arm.com	2025-05-16 10:06:45 +02:00
James Morse	bc740420d7	x86/resctrl: Drop __init/__exit on assorted symbols Because ARM's MPAM controls are probed using MMIO, resctrl can't be initialised until enough CPUs are online to have determined the system-wide supported num_closid. Arm64 also supports 'late onlined secondaries', where only a subset of CPUs are online during boot. These two combine to mean the MPAM driver may not be able to initialise resctrl until user-space has brought 'enough' CPUs online. To allow MPAM to initialise resctrl after __init text has been free'd, remove all the __init markings from resctrl. The existing __exit markings cause these functions to be removed by the linker as it has never been possible to build resctrl as a module. MPAM has an error interrupt which causes the driver to reset and disable itself. Remove the __exit markings to allow the MPAM driver to tear down resctrl when an error occurs. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-10-james.morse@arm.com	2025-05-15 21:06:02 +02:00
James Morse	8c992e24a0	x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point resctrl_exit() was intended for use when the 'resctrl' module was unloaded. resctrl can't be built as a module, and the kernfs helpers are not exported so this is unlikely to change. MPAM has an error interrupt which indicates the MPAM driver has gone haywire. Should this occur tasks could run with the wrong control values, leading to bad performance for important tasks. In this scenario the MPAM driver will reset the hardware, but it needs a way to tell resctrl that no further configuration should be attempted. In particular, moving tasks between control or monitor groups does not interact with the architecture code, so there is no opportunity for the arch code to indicate that the hardware is no-longer functioning. Using resctrl_exit() for this leaves the system in a funny state as resctrl is still mounted, but cannot be un-mounted because the sysfs directory that is typically used has been removed. Dave Martin suggests this may cause systemd trouble in the future as not all filesystems can be unmounted. Add calls to remove all the files and directories in resctrl, and remove the sysfs_remove_mount_point() call that leaves the system in a funny state. When triggered, this causes all the resctrl files to disappear. resctrl can be unmounted, but not mounted again. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-9-james.morse@arm.com	2025-05-15 21:04:49 +02:00
James Morse	8eb7ad66ba	x86/resctrl: Check all domains are offline in resctrl_exit() resctrl_exit() removes things like the resctrl mount point directory and unregisters the filesystem prior to freeing data structures that were allocated during resctrl_init(). This assumes that there are no online domains when resctrl_exit() is called. If any domain were online, the limbo or overflow handler could be scheduled to run. Add a check for any online control or monitor domains, and document that the architecture code is required to offline all monitor and control domains before calling resctrl_exit(). Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-8-james.morse@arm.com	2025-05-15 21:02:21 +02:00
James Morse	7704fb81bc	x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" resctrl_sched_in() loads the architecture specific CPU MSRs with the CLOSID and RMID values. This function was named before resctrl was split to have architecture specific code, and generic filesystem code. This function is obviously architecture specific, but does not begin with 'resctrl_arch_', making it the odd one out in the functions an architecture needs to support to enable resctrl. Rename it for consistency. This is purely cosmetic. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-7-james.morse@arm.com	2025-05-15 21:01:00 +02:00
Amit Singh Tomar	dcb1d3d3b7	x86/resctrl: Remove the limit on the number of CLOSID Resctrl allocates and finds free CLOSID values using the bits of a u32. This restricts the number of control groups that can be created by user-space. MPAM has an architectural limit of 2^16 CLOSID values, Intel x86 could be extended beyond 32 values. There is at least one MPAM platform which supports more than 32 CLOSID values. Replace the fixed size bitmap with calls to the bitmap API to allocate an array of a sufficient size. ffs() returns '1' for bit 0, hence the existing code subtracts 1 from the index to get the CLOSID value. find_first_bit() returns the bit number which does not need adjusting. [ morse: fixed the off-by-one in the allocator and the wrong not-found value. Removed the limit. Rephrase the commit message. ] Signed-off-by: Amit Singh Tomar <amitsinght@marvell.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-6-james.morse@arm.com	2025-05-15 20:55:40 +02:00
Yury Norov [NVIDIA]	94f7531430	x86/resctrl: Optimize cpumask_any_housekeeping() With the lack of cpumask_any_andnot_but(), cpumask_any_housekeeping() has to abuse cpumask_nth() functions. Update cpumask_any_housekeeping() to use the new cpumask_any_but() and cpumask_any_andnot_but(). These two functions understand RESCTRL_PICK_ANY_CPU, which simplifies cpumask_any_housekeeping() significantly. Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: James Morse <james.morse@arm.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-5-james.morse@arm.com	2025-05-15 20:53:26 +02:00
Andrew Zaborowski	ed16618c38	x86/sgx: Prevent attempts to reclaim poisoned pages TL;DR: SGX page reclaim touches the page to copy its contents to secondary storage. SGX instructions do not gracefully handle machine checks. Despite this, the existing SGX code will try to reclaim pages that it _knows_ are poisoned. Avoid even trying to reclaim poisoned pages. The longer story: Pages used by an enclave only get epc_page->poison set in arch_memory_failure() but they currently stay on sgx_active_page_list until sgx_encl_release(), with the SGX_EPC_PAGE_RECLAIMER_TRACKED flag untouched. epc_page->poison is not checked in the reclaimer logic meaning that, if other conditions are met, an attempt will be made to reclaim an EPC page that was poisoned. This is bad because 1. we don't want that page to end up added to another enclave and 2. it is likely to cause one core to shut down and the kernel to panic. Specifically, reclaiming uses microcode operations including "EWB" which accesses the EPC page contents to encrypt and write them out to non-SGX memory. Those operations cannot handle MCEs in their accesses other than by putting the executing core into a special shutdown state (affecting both threads with HT.) The kernel will subsequently panic on the remaining cores seeing the core didn't enter MCE handler(s) in time. Call sgx_unmark_page_reclaimable() to remove the affected EPC page from sgx_active_page_list on memory error to stop it being considered for reclaiming. Testing epc_page->poison in sgx_reclaim_pages() would also work but I assume it's better to add code in the less likely paths. The affected EPC page is not added to &node->sgx_poison_page_list until later in sgx_encl_release()->sgx_free_epc_page() when it is EREMOVEd. Membership on other lists doesn't change to avoid changing any of the lists' semantics except for sgx_active_page_list. There's a "TBD" comment in arch_memory_failure() about pre-emptive actions, the goal here is not to address everything that it may imply. This also doesn't completely close the time window when a memory error notification will be fatal (for a not previously poisoned EPC page) -- the MCE can happen after sgx_reclaim_pages() has selected its candidates or even inside a microcode operation (actually easy to trigger due to the amount of time spent in them.) The spinlock in sgx_unmark_page_reclaimable() is safe because memory_failure() runs in process context and no spinlocks are held, explicitly noted in a mm/memory-failure.c comment. Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: balrogg@gmail.com Cc: linux-sgx@vger.kernel.org Link: https://lore.kernel.org/r/20250508230429.456271-1-andrew.zaborowski@intel.com	2025-05-15 19:01:45 +02:00
Ahmed S. Darwish	2f924ca36d	x86/cpuid: Rename have_cpuid_p() to cpuid_feature() In order to let all the APIs under <cpuid/api.h> have a shared "cpuid_" namespace, rename have_cpuid_p() to cpuid_feature(). Adjust all call-sites accordingly. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-4-darwi@linutronix.de	2025-05-15 18:23:55 +02:00
Ahmed S. Darwish	968e300068	x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header The main CPUID header <asm/cpuid.h> was originally a storefront for the headers: <asm/cpuid/api.h> <asm/cpuid/leaf_0x2_api.h> Now that the latter CPUID(0x2) header has been merged into the former, there is no practical difference between <asm/cpuid.h> and <asm/cpuid/api.h>. Migrate all users to the <asm/cpuid/api.h> header, in preparation of the removal of <asm/cpuid.h>. Don't remove <asm/cpuid.h> just yet, in case some new code in -next started using it. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-3-darwi@linutronix.de	2025-05-15 18:23:55 +02:00
Ard Biesheuvel	25219c2578	x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too Expose certain 'struct cpuinfo_x86' fields via asm-offsets for x86_64 too, so that it will be possible to set CPU capabilities from 64-bit asm code. 32-bit already used these fields, so simply move those offset exports into the unified asm-offsets.c file. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250514104242.1275040-12-ardb+git@google.com	2025-05-15 09:12:07 +02:00
Ard Biesheuvel	64797551ba	x86/boot: Defer initialization of VM space related global variables The global pseudo-constants 'page_offset_base', 'vmalloc_base' and 'vmemmap_base' are not used extremely early during the boot, and cannot be used safely until after the KASLR memory randomization code in kernel_randomize_memory() executes, which may update their values. So there is no point in setting these variables extremely early, and it can wait until after the kernel itself is mapped and running from its permanent virtual mapping. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250513111157.717727-9-ardb+git@google.com	2025-05-14 10:06:35 +02:00
Eric Biggers	9f35e33144	x86/its: Fix build errors when CONFIG_MODULES=n Fix several build errors when CONFIG_MODULES=n, including the following: ../arch/x86/kernel/alternative.c:195:25: error: incomplete definition of type 'struct module' 195 \| for (int i = 0; i < mod->its_num_pages; i++) { Fixes: `872df34d7c` ("x86/its: Use dynamic thunks for indirect branches") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-05-13 14:36:08 -07:00
Yazen Ghannam	24ee8d9432	x86/CPU/AMD: Add X86_FEATURE_ZEN6 Add a synthetic feature flag for Zen6. [ bp: Move the feature flag to a free slot and avoid future merge conflicts from incoming stuff. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250513204857.3376577-1-yazen.ghannam@amd.com	2025-05-13 22:59:11 +02:00
Borislav Petkov (AMD)	891d3b8be3	x86/bugs: Fix SRSO reporting on Zen1/2 with SMT disabled `1f4bb068b4` ("x86/bugs: Restructure SRSO mitigation") does this: if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) { setup_force_cpu_cap(X86_FEATURE_SRSO_NO); srso_mitigation = SRSO_MITIGATION_NONE; return; } and, in particular, sets srso_mitigation to NONE. This leads to reporting Speculative Return Stack Overflow: Vulnerable on Zen2 machines. There's a far bigger confusion with what SRSO_NO means and how it is used in the code but this will be a matter of future fixes and restructuring to how the SRSO mitigation gets determined. Fix the reporting issue for now. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: David Kaplan <david.kaplan@amd.com> Link: https://lore.kernel.org/20250513110405.15872-1-bp@kernel.org	2025-05-13 22:31:08 +02:00
Ingo Molnar	c4070e1996	Merge commit 'its-for-linus-20250509-merge' into x86/core, to resolve conflicts Conflicts: Documentation/admin-guide/hw-vuln/index.rst arch/x86/include/asm/cpufeatures.h arch/x86/kernel/alternative.c arch/x86/kernel/cpu/bugs.c arch/x86/kernel/cpu/common.c drivers/base/cpu.c include/linux/cpu.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:47:10 +02:00
Ingo Molnar	7d40efd67d	Merge branch 'x86/platform' into x86/core, to merge dependent commits Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: `6f5bf947ba` Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:46:22 +02:00
Ingo Molnar	d6680b0077	Merge branch 'x86/nmi' into x86/core, to merge dependent commits Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: `6f5bf947ba` Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:46:04 +02:00
Ingo Molnar	1f82e8e1ca	Merge branch 'x86/msr' into x86/core, to resolve conflicts Conflicts: arch/x86/boot/startup/sme.c arch/x86/coco/sev/core.c arch/x86/kernel/fpu/core.c arch/x86/kernel/fpu/xstate.c Semantic conflict: arch/x86/include/asm/sev-internal.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:42:06 +02:00
Ingo Molnar	34be751998	Merge branch 'x86/mm' into x86/core, to resolve conflicts Conflicts: arch/x86/mm/numa.c arch/x86/mm/pgtable.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:39:22 +02:00
Ingo Molnar	69cb33e2f8	Merge branch 'x86/microcode' into x86/core, to merge dependent commits Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: `6f5bf947ba` Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:37:52 +02:00
Ingo Molnar	ec8f353f52	Merge branch 'x86/fpu' into x86/core, to merge dependent commits Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: `6f5bf947ba` Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:37:29 +02:00
Ingo Molnar	2fb8414e64	Merge branch 'x86/cpu' into x86/core, to resolve conflicts Conflicts: arch/x86/kernel/cpu/bugs.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:37:01 +02:00
Ingo Molnar	821f82125c	Merge branch 'x86/boot' into x86/core, to merge dependent commits Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: `6f5bf947ba` Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:35:27 +02:00
Ingo Molnar	206c07d6ab	Merge branch 'x86/bugs' into x86/core, to merge dependent commits Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: `6f5bf947ba` Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:35:14 +02:00
Ingo Molnar	fa6b90ee4f	Merge branch 'x86/asm' into x86/core, to merge dependent commits Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: `6f5bf947ba` Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:35:00 +02:00
Ingo Molnar	11d8f542d9	Merge branch 'x86/alternatives' into x86/core, to merge dependent commits Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: `6f5bf947ba` Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:33:41 +02:00
Alexander Graf	a2daf83e10	x86/e820: temporarily enable KHO scratch for memory below 1M KHO kernels are special and use only scratch memory for memblock allocations, but memory below 1M is ignored by kernel after early boot and cannot be naturally marked as scratch. To allow allocation of the real-mode trampoline and a few (if any) other very early allocations from below 1M forcibly mark the memory below 1M as scratch. After real mode trampoline is allocated, clear that scratch marking. Link: https://lkml.kernel.org/r/20250509074635.3187114-13-changyuanl@google.com Signed-off-by: Alexander Graf <graf@amazon.com> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Co-developed-by: Changyuan Lyu <changyuanl@google.com> Signed-off-by: Changyuan Lyu <changyuanl@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Gowans <jgowans@amazon.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Rob Herring <robh@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-12 23:50:41 -07:00
Alexander Graf	65a5d72785	x86/kexec: add support for passing kexec handover (KHO) data kexec handover (KHO) creates a metadata that the kernels pass between each other during kexec. This metadata is stored in memory and kexec image contains a (physical) pointer to that memory. In addition, KHO keeps "scratch regions" available for kexec: physically contiguous memory regions that are guaranteed to not have any memory that KHO would preserve. The new kernel bootstraps itself using the scratch regions and sets all handed over memory as in use. When subsystems that support KHO initialize, they introspect the KHO metadata, restore preserved memory regions, and retrieve their state stored in the preserved memory. Enlighten x86 kexec-file and boot path about the KHO metadata and make sure it gets passed along to the next kernel. Link: https://lkml.kernel.org/r/20250509074635.3187114-12-changyuanl@google.com Signed-off-by: Alexander Graf <graf@amazon.com> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Co-developed-by: Changyuan Lyu <changyuanl@google.com> Signed-off-by: Changyuan Lyu <changyuanl@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Gowans <jgowans@amazon.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Rob Herring <robh@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-12 23:50:41 -07:00
Mike Rapoport (Microsoft)	96383f1fb8	x86/setup: use memblock_reserve_kern for memory used by kernel memblock_reserve() does not distinguish memory used by firmware from memory used by kernel. The distinction is nice to have for accounting of early memory allocations and reservations, but it is essential for kexec handover (kho) to know how much memory kernel consumes during boot. Use memblock_reserve_kern() to reserve kernel memory, such as kernel image, initrd and setup data. Link: https://lkml.kernel.org/r/20250509074635.3187114-11-changyuanl@google.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Changyuan Lyu <changyuanl@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Graf <graf@amazon.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Gowans <jgowans@amazon.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Rob Herring <robh@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-12 23:50:41 -07:00
Jiaqing Zhao	824c6384e8	x86/mtrr: Check if fixed-range MTRRs exist in mtrr_save_fixed_ranges() When suspending, save_processor_state() calls mtrr_save_fixed_ranges() to save fixed-range MTRRs. On platforms without fixed-range MTRRs like the ACRN hypervisor which has removed fixed-range MTRR emulation, accessing these MSRs will trigger an unchecked MSR access error. Make sure fixed-range MTRRs are supported before access to prevent such error. Since mtrr_state.have_fixed is only set when MTRRs are present and enabled, checking the CPU feature flag in mtrr_save_fixed_ranges() is unnecessary. Fixes: `3ebad59056` ("[PATCH] x86: Save and restore the fixed-range MTRRs of the BSP when suspending") Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250509170633.3411169-2-jiaqing.zhao@linux.intel.com	2025-05-12 13:04:40 +02:00
Linus Torvalds	6f5bf947ba	* Mitigate Indirect Target Selection (ITS) issue -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmgebIwACgkQaDWVMHDJ krCGSA/+I+W/uqiz58Z2Zu4RrXMYFfKJxacF7My9wnOyRxaJduS3qrz1E5wHqBId f6M8wDx9nS24UxDkBbi84NdtlG1zj8nV8djtszGKVeqHG2DcQMMOXBKZSjOmTo2b GIZ3a3xEqXaFfnGQxXSZrvtHIwCmv10H2oyGHu0vBp/SJuWXNg72oivOGhbm0uWs 0/bdIK8+1sW7OAmhhKdvMVpmzL8TQJnkUHSkQilPB2Tsf9wWDfeY7kDkK5YwQpk2 ZK+hrmwCFXQZELY65F2+y/cFim/F38HiqVdvIkV1wFSVqVVE9hEKJ4BDZl1fXZKB p4qpDFgxO27E/eMo9IZfxRH4TdSoK6YLWo9FGWHKBPnciJfAeO9EP/AwAIhEQRdx YZlN9sGS6ja7O1Eh423BBw6cFj6ta0ck2T1PoYk32FXc6sgqCphsfvBD3+tJxz8/ xoZ3BzoErdPqSXbH5cSI972kQW0JLESiMTZa827qnJtT672t6uBcsnnmR0ZbJH1f TJCC9qgwpBiEkiGW3gwv00SC7CkXo3o0FJw0pa3MkKHGd7csxBtGBHI1b6Jj+oB0 yWf1HxSqwrq2Yek8R7lWd4jIxyWfKriEMTu7xCMUUFlprKmR2RufsADvqclNyedQ sGBCc4eu1cpZp2no/IFm+IvkuzUHnkS/WNL1LbZ9YI8h8unjZHE= =UVgZ -----END PGP SIGNATURE----- Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 ITS mitigation from Dave Hansen: "Mitigate Indirect Target Selection (ITS) issue. I'd describe this one as a good old CPU bug where the behavior is _obviously_ wrong, but since it just results in bad predictions it wasn't wrong enough to notice. Well, the researchers noticed and also realized that thus bug undermined a bunch of existing indirect branch mitigations. Thus the unusually wide impact on this one. Details: ITS is a bug in some Intel CPUs that affects indirect branches including RETs in the first half of a cacheline. Due to ITS such branches may get wrongly predicted to a target of (direct or indirect) branch that is located in the second half of a cacheline. Researchers at VUSec found this behavior and reported to Intel. Affected processors: - Cascade Lake, Cooper Lake, Whiskey Lake V, Coffee Lake R, Comet Lake, Ice Lake, Tiger Lake and Rocket Lake. Scope of impact: - Guest/host isolation: When eIBRS is used for guest/host isolation, the indirect branches in the VMM may still be predicted with targets corresponding to direct branches in the guest. - Intra-mode using cBPF: cBPF can be used to poison the branch history to exploit ITS. Realigning the indirect branches and RETs mitigates this attack vector. - User/kernel: With eIBRS enabled user/kernel isolation is not impacted by ITS. - Indirect Branch Prediction Barrier (IBPB): Due to this bug indirect branches may be predicted with targets corresponding to direct branches which were executed prior to IBPB. This will be fixed in the microcode. Mitigation: As indirect branches in the first half of cacheline are affected, the mitigation is to replace those indirect branches with a call to thunk that is aligned to the second half of the cacheline. RETs that take prediction from RSB are not affected, but they may be affected by RSB-underflow condition. So, RETs in the first half of cacheline are also patched to a return thunk that executes the RET aligned to second half of cacheline" * tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftest/x86/bugs: Add selftests for ITS x86/its: FineIBT-paranoid vs ITS x86/its: Use dynamic thunks for indirect branches x86/ibt: Keep IBT disabled during alternative patching mm/execmem: Unify early execmem_cache behaviour x86/its: Align RETs in BHB clear sequence to avoid thunking x86/its: Add support for RSB stuffing mitigation x86/its: Add "vmexit" option to skip mitigation on some CPUs x86/its: Enable Indirect Target Selection mitigation x86/its: Add support for ITS-safe return thunk x86/its: Add support for ITS-safe indirect thunk x86/its: Enumerate Indirect Target Selection (ITS) bug Documentation: x86/bugs/its: Add ITS documentation	2025-05-11 17:23:03 -07:00
Linus Torvalds	caf12fa9c0	* Mitigate Intra-mode Branch History Injection via classic BFP programs -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmgaKIwACgkQaDWVMHDJ krCAMQ/9EJiWaATpR24pkXYDDPIVgcXtph7kX7WTha3ZBoA5ab0aJEbvSKZWpBRi lmIja0iqysgwAtxZ3qMfilmpPN8x/G+Y+q11PnKT8yXtWRM5CoMYXUzWXGSWKCl0 D3hIqXFTLIzutpWn56CCvxId1KOz2O+GiiH+4HZnJTdwB3lIaQpHocqc7I9dXSmK P+hpXHAZV4o3m6/gdiUxy519Vuz+P+oRaZ2CqGEPP9g8b78ZOsivEJh4HLwgtFXr iajcpBMN/NLMEXGPKWZEt1JFNpW9cV8i2L5cGf4w6FU1YUIfOX1OSHXL+TjvPm7y XyU+eqquvI5Qv6Er4Nil1yat10RQhdiBWou6MQmURjwKtBiYHGlosIlnaS2A08CK VaLOmJ+DaknPnF+c41YeO8+ERjuJ6c6iMOhRLNvhnnFuQUq8Ktxm+qIAmskiiz4Q gvervpVHlC1I7BwbQSEQdOnBj20XqIP+HAwKzSiRt/PrjBgojuYjDx2U9/ci7DWD EgVqOw17lXq/HMZVDlnZxwqj2neqh/Nu9RTMKQn8zDbLDjuT+eEXnZrhIPaf0EHc LfzCXRU+JRUiNsvEdksnSUN7W/Si1073sZrlNje9pY0mjXrcB2A6ChPcSBJepdzc MJ3SU8Owr6HIcpmQiAK+2+bGYNSM/287dIKRLEUN+aPJZYRSOss= =b2Ix -----END PGP SIGNATURE----- Merge tag 'ibti-hisory-for-linus-2025-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 IBTI mitigation from Dave Hansen: "Mitigate Intra-mode Branch History Injection via classic BFP programs This adds the branch history clearing mitigation to cBPF programs for x86. Intra-mode BHI attacks via cBPF a.k.a IBTI-History was reported by researchers at VUSec. For hardware that doesn't support BHI_DIS_S, the recommended mitigation is to run the short software sequence followed by the IBHF instruction after cBPF execution. On hardware that does support BHI_DIS_S, enable BHI_DIS_S and execute the IBHF after cBPF execution. The Indirect Branch History Fence (IBHF) is a new instruction that prevents indirect branch target predictions after the barrier from using branch history from before the barrier while BHI_DIS_S is enabled. On older systems this will map to a NOP. It is recommended to add this fence at the end of the cBPF program to support VM migration. This instruction is required on newer parts with BHI_NO to fully mitigate against these attacks. The current code disables the mitigation for anything running with the SYS_ADMIN capability bit set. The intention was not to waste time mitigating a process that has access to anything it wants anyway" * tag 'ibti-hisory-for-linus-2025-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bhi: Do not set BHI_DIS_S in 32-bit mode x86/bpf: Add IBHF call at end of classic BPF x86/bpf: Call branch history clearing sequence on exit	2025-05-11 17:17:06 -07:00
Linus Torvalds	b9e62a2b8f	Fix a boot regression on very old x86 CPUs without CPUID support. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmggVx8RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gCsg/+LsBbk+wJyQpVADyYKyuYRx12mVY6Pb2F Oee+/ivaFeE92Su30nIfzDPyXe3PYCNFW9D26/y8wfgRNhMic5AWW2I2PRy/iV1V EjyTdP6ign8HUzHfrlZNPJOsextTYqU7uDhwRocgrPBKlgmK8Z9WUfgK+TnhvvMo fotb6HF96rfGIOww/kqCbGItKCzZ4app8tnzTWlj07qoYkCgrq+J57Jaj2cvYJYI vhmIsbaG/qrt+3Q9MYiA5HIkxxv6mvpd/MS18dhlaON070TCmhlQxgN0wD6JAfR9 5lkeqhMq50cK5/cr15KAbzAPuqyK0d8DY+6cmEXmNMe0GwOeyoVZPM+Ul3f3l/ku 93M1qd0r1oIJ0ltzFh1Pfw8c8EDMmHz0opiqJ40efaJYUPUsi3jrwmn3hI5GNXdE gISlPow9EY8MQe4dH4E20zHOTApEOvgJzhgkrR5jzh8JlEdbmFEezfF6fwf5tKfC m1HNeX0SkjumtUkvka7v++hgpD28UVBA0dau+nhfoRXtUBuithPapLvoacVhum/j 9QxlGMP6VqBmg3GP6AmiccuTcWNsfFBbIUIJ+KT2GI5E3wvPxb9Q9qJTr+mKtoJ9 n6krAi++WUqfInkd4tx0uEk+x8W2mk4gOS/xV/qV8So4R8cyioO1XLHn/r+kwx6a wLbdafPaJzY= =ibsn -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Fix a boot regression on very old x86 CPUs without CPUID support" * tag 'x86-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Consolidate the loader enablement checking	2025-05-11 11:08:55 -07:00
Peter Zijlstra	e52c1dc745	x86/its: FineIBT-paranoid vs ITS FineIBT-paranoid was using the retpoline bytes for the paranoid check, disabling retpolines, because all parts that have IBT also have eIBRS and thus don't need no stinking retpolines. Except... ITS needs the retpolines for indirect calls must not be in the first half of a cacheline :-/ So what was the paranoid call sequence: <fineibt_paranoid_start>: 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d a: 4d 8d 5b <f0> lea -0x10(%r11), %r11 e: 75 fd jne d <fineibt_paranoid_start+0xd> 10: 41 ff d3 call %r11 13: 90 nop Now becomes: <fineibt_paranoid_start>: 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d a: 4d 8d 5b f0 lea -0x10(%r11), %r11 e: 2e e8 XX XX XX XX cs call __x86_indirect_paranoid_thunk_r11 Where the paranoid_thunk looks like: 1d: <ea> (bad) __x86_indirect_paranoid_thunk_r11: 1e: 75 fd jne 1d __x86_indirect_its_thunk_r11: 20: 41 ff eb jmp %r11 23: cc int3 [ dhansen: remove initialization to false ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:39:36 -07:00
Peter Zijlstra	872df34d7c	x86/its: Use dynamic thunks for indirect branches ITS mitigation moves the unsafe indirect branches to a safe thunk. This could degrade the prediction accuracy as the source address of indirect branches becomes same for different execution paths. To improve the predictions, and hence the performance, assign a separate thunk for each indirect callsite. This is also a defense-in-depth measure to avoid indirect branches aliasing with each other. As an example, 5000 dynamic thunks would utilize around 16 bits of the address space, thereby gaining entropy. For a BTB that uses 32 bits for indexing, dynamic thunks could provide better prediction accuracy over fixed thunks. Have ITS thunks be variable sized and use EXECMEM_MODULE_TEXT such that they are both more flexible (got to extend them later) and live in 2M TLBs, just like kernel code, avoiding undue TLB pressure. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:36:58 -07:00
Pawan Gupta	ebebe30794	x86/ibt: Keep IBT disabled during alternative patching cfi_rewrite_callers() updates the fineIBT hash matching at the caller side, but except for paranoid-mode it relies on apply_retpoline() and friends for any ENDBR relocation. This could temporarily cause an indirect branch to land on a poisoned ENDBR. For instance, with para-virtualization enabled, a simple wrmsrl() could have an indirect branch pointing to native_write_msr() who's ENDBR has been relocated due to fineIBT: <wrmsrl>: push %rbp mov %rsp,%rbp mov %esi,%eax mov %rsi,%rdx shr $0x20,%rdx mov %edi,%edi mov %rax,%rsi call 0x21e65d0(%rip) # <pv_ops+0xb8> ^^^^^^^^^^^^^^^^^^^^^^^ Such an indirect call during the alternative patching could #CP if the caller is not yet* adjusted for the new target ENDBR. To prevent a false #CP, keep CET-IBT disabled until all callers are patched. Patching during the module load does not need to be guarded by IBT-disable because the module code is not executed until the patching is complete. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:33:35 -07:00
Pawan Gupta	facd226f7e	x86/its: Add support for RSB stuffing mitigation When retpoline mitigation is enabled for spectre-v2, enabling call-depth-tracking and RSB stuffing also mitigates ITS. Add cmdline option indirect_target_selection=stuff to allow enabling RSB stuffing mitigation. When retpoline mitigation is not enabled, =stuff option is ignored, and default mitigation for ITS is deployed. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:05 -07:00
Pawan Gupta	2665281a07	x86/its: Add "vmexit" option to skip mitigation on some CPUs Ice Lake generation CPUs are not affected by guest/host isolation part of ITS. If a user is only concerned about KVM guests, they can now choose a new cmdline option "vmexit" that will not deploy the ITS mitigation when CPU is not affected by guest/host isolation. This saves the performance overhead of ITS mitigation on Ice Lake gen CPUs. When "vmexit" option selected, if the CPU is affected by ITS guest/host isolation, the default ITS mitigation is deployed. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:05 -07:00
Pawan Gupta	f4818881c4	x86/its: Enable Indirect Target Selection mitigation Indirect Target Selection (ITS) is a bug in some pre-ADL Intel CPUs with eIBRS. It affects prediction of indirect branch and RETs in the lower half of cacheline. Due to ITS such branches may get wrongly predicted to a target of (direct or indirect) branch that is located in the upper half of the cacheline. Scope of impact =============== Guest/host isolation -------------------- When eIBRS is used for guest/host isolation, the indirect branches in the VMM may still be predicted with targets corresponding to branches in the guest. Intra-mode ---------- cBPF or other native gadgets can be used for intra-mode training and disclosure using ITS. User/kernel isolation --------------------- When eIBRS is enabled user/kernel isolation is not impacted. Indirect Branch Prediction Barrier (IBPB) ----------------------------------------- After an IBPB, indirect branches may be predicted with targets corresponding to direct branches which were executed prior to IBPB. This is mitigated by a microcode update. Add cmdline parameter indirect_target_selection=off\|on\|force to control the mitigation to relocate the affected branches to an ITS-safe thunk i.e. located in the upper half of cacheline. Also add the sysfs reporting. When retpoline mitigation is deployed, ITS safe-thunks are not needed, because retpoline sequence is already ITS-safe. Similarly, when call depth tracking (CDT) mitigation is deployed (retbleed=stuff), ITS safe return thunk is not used, as CDT prevents RSB-underflow. To not overcomplicate things, ITS mitigation is not supported with spectre-v2 lfence;jmp mitigation. Moreover, it is less practical to deploy lfence;jmp mitigation on ITS affected parts anyways. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:05 -07:00
Pawan Gupta	a75bf27fe4	x86/its: Add support for ITS-safe return thunk RETs in the lower half of cacheline may be affected by ITS bug, specifically when the RSB-underflows. Use ITS-safe return thunk for such RETs. RETs that are not patched: - RET in retpoline sequence does not need to be patched, because the sequence itself fills an RSB before RET. - RET in Call Depth Tracking (CDT) thunks __x86_indirect_{call\|jump}_thunk and call_depth_return_thunk are not patched because CDT by design prevents RSB-underflow. - RETs in .init section are not reachable after init. - RETs that are explicitly marked safe with ANNOTATE_UNRET_SAFE. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:05 -07:00
Pawan Gupta	8754e67ad4	x86/its: Add support for ITS-safe indirect thunk Due to ITS, indirect branches in the lower half of a cacheline may be vulnerable to branch target injection attack. Introduce ITS-safe thunks to patch indirect branches in the lower half of cacheline with the thunk. Also thunk any eBPF generated indirect branches in emit_indirect_jump(). Below category of indirect branches are not mitigated: - Indirect branches in the .init section are not mitigated because they are discarded after boot. - Indirect branches that are explicitly marked retpoline-safe. Note that retpoline also mitigates the indirect branches against ITS. This is because the retpoline sequence fills an RSB entry before RET, and it does not suffer from RSB-underflow part of the ITS. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:04 -07:00
Pawan Gupta	159013a7ca	x86/its: Enumerate Indirect Target Selection (ITS) bug ITS bug in some pre-Alderlake Intel CPUs may allow indirect branches in the first half of a cache line get predicted to a target of a branch located in the second half of the cache line. Set X86_BUG_ITS on affected CPUs. Mitigation to follow in later commits. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:04 -07:00
Ingo Molnar	367ed4e357	treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try() Move this API to the canonical timer_*() namespace. Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250507175338.672442-9-mingo@kernel.org	2025-05-08 19:49:33 +02:00
Guenter Roeck	00a241f528	x86: disable image size check for test builds 64-bit allyesconfig builds fail with x86_64-linux-ld: kernel image bigger than KERNEL_IMAGE_SIZE Bisect points to commit `6f110a5e4f` ("Disable SLUB_TINY for build testing") as the responsible commit. Reverting that patch does indeed fix the problem. Further analysis shows that disabling SLUB_TINY enables KASAN, and that KASAN is responsible for the image size increase. Solve the build problem by disabling the image size check for test builds. [akpm@linux-foundation.org: add comment, fix nearby typo (sink->sync)] [akpm@linux-foundation.org: fix comment snafu Link: https://lore.kernel.org/oe-kbuild-all/202504191813.4r9H6Glt-lkp@intel.com/ Link: https://lkml.kernel.org/r/20250417010950.2203847-1-linux@roeck-us.net Fixes: `6f110a5e4f` ("Disable SLUB_TINY for build testing") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-07 23:39:37 -07:00
Mostafa Saleh	d683a85618	ubsan: Remove regs from report_ubsan_failure() report_ubsan_failure() doesn't use argument regs, and soon it will be called from the hypervisor context were regs are not available. So, remove the unused argument. Signed-off-by: Mostafa Saleh <smostafa@google.com> Acked-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-3-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-07 11:21:35 +01:00
Pawan Gupta	073fdbe02c	x86/bhi: Do not set BHI_DIS_S in 32-bit mode With the possibility of intra-mode BHI via cBPF, complete mitigation for BHI is to use IBHF (history fence) instruction with BHI_DIS_S set. Since this new instruction is only available in 64-bit mode, setting BHI_DIS_S in 32-bit mode is only a partial mitigation. Do not set BHI_DIS_S in 32-bit mode so as to avoid reporting misleading mitigated status. With this change IBHF won't be used in 32-bit mode, also remove the CONFIG_X86_64 check from emit_spectre_bhb_barrier(). Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-06 08:18:59 -07:00
Daniel Sneddon	9f725eec8f	x86/bpf: Add IBHF call at end of classic BPF Classic BPF programs can be run by unprivileged users, allowing unprivileged code to execute inside the kernel. Attackers can use this to craft branch history in kernel mode that can influence the target of indirect branches. BHI_DIS_S provides user-kernel isolation of branch history, but cBPF can be used to bypass this protection by crafting branch history in kernel mode. To stop intra-mode attacks via cBPF programs, Intel created a new instruction Indirect Branch History Fence (IBHF). IBHF prevents the predicted targets of subsequent indirect branches from being influenced by branch history prior to the IBHF. IBHF is only effective while BHI_DIS_S is enabled. Add the IBHF instruction to cBPF jitted code's exit path. Add the new fence when the hardware mitigation is enabled (i.e., X86_FEATURE_CLEAR_BHB_HW is set) or after the software sequence (X86_FEATURE_CLEAR_BHB_LOOP) is being used in a virtual machine. Note that X86_FEATURE_CLEAR_BHB_HW and X86_FEATURE_CLEAR_BHB_LOOP are mutually exclusive, so the JIT compiler will only emit the new fence, not the SW sequence, when X86_FEATURE_CLEAR_BHB_HW is set. Hardware that enumerates BHI_NO basically has BHI_DIS_S protections always enabled, regardless of the value of BHI_DIS_S. Since BHI_DIS_S doesn't protect against intra-mode attacks, enumerate BHI bug on BHI_NO hardware as well. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-06 08:18:48 -07:00
Ingo Molnar	83725bdf94	Linux 6.15-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9 FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ b/dCGNg= =xDGd -----END PGP SIGNATURE----- Merge tag 'v6.15-rc4' into x86/asm, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-06 12:03:03 +02:00
Chao Gao	32d5fa804d	x86/fpu: Drop @perm from guest pseudo FPU container Remove @perm from the guest pseudo FPU container. The field is initialized during allocation and never used later. Rename fpu_init_guest_permissions() to show that its sole purpose is to lock down guest permissions. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eric Biggers <ebiggers@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <kees@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mitchell Levy <levymitchell0@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/kvm/af972fe5981b9e7101b64de43c7be0a8cc165323.camel@redhat.com/ Link: https://lore.kernel.org/r/20250506093740.2864458-3-chao.gao@intel.com	2025-05-06 11:52:22 +02:00
Sean Christopherson	d8414603b2	x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm When granting userspace or a KVM guest access to an xfeature, preserve the entity's existing supervisor and software-defined permissions as tracked by __state_perm, i.e. use __state_perm to track all permissions even though all supported supervisor xfeatures are granted to all FPUs and FPU_GUEST_PERM_LOCKED disallows changing permissions. Effectively clobbering supervisor permissions results in inconsistent behavior, as xstate_get_group_perm() will report supervisor features for process that do NOT request access to dynamic user xfeatures, whereas any and all supervisor features will be absent from the set of permissions for any process that is granted access to one or more dynamic xfeatures (which right now means AMX). The inconsistency isn't problematic because fpu_xstate_prctl() already strips out everything except user xfeatures: case ARCH_GET_XCOMP_PERM: /* * Lockless snapshot as it can also change right after the * dropping the lock. / permitted = xstate_get_host_group_perm(); permitted &= XFEATURE_MASK_USER_SUPPORTED; return put_user(permitted, uptr); case ARCH_GET_XCOMP_GUEST_PERM: permitted = xstate_get_guest_group_perm(); permitted &= XFEATURE_MASK_USER_SUPPORTED; return put_user(permitted, uptr); and similarly KVM doesn't apply the __state_perm to supervisor states (kvm_get_filtered_xcr0() incorporates xstate_get_guest_group_perm()): case 0xd: { u64 permitted_xcr0 = kvm_get_filtered_xcr0(); u64 permitted_xss = kvm_caps.supported_xss; But if KVM in particular were to ever change, dropping supervisor permissions would result in subtle bugs in KVM's reporting of supported CPUID settings. And the above behavior also means that having supervisor xfeatures in __state_perm is correctly handled by all users. Dropping supervisor permissions also creates another landmine for KVM. If more dynamic user xfeatures are ever added, requesting access to multiple xfeatures in separate ARCH_REQ_XCOMP_GUEST_PERM calls will result in the second invocation of __xstate_request_perm() computing the wrong ksize, as as the mask passed to xstate_calculate_size() would not contain any* supervisor features. Commit `781c64bfcb` ("x86/fpu/xstate: Handle supervisor states in XSTATE permissions") fudged around the size issue for userspace FPUs, but for reasons unknown skipped guest FPUs. Lack of a fix for KVM "works" only because KVM doesn't yet support virtualizing features that have supervisor xfeatures, i.e. as of today, KVM guest FPUs will never need the relevant xfeatures. Simply extending the hack-a-fix for guests would temporarily solve the ksize issue, but wouldn't address the inconsistency issue and would leave another lurking pitfall for KVM. KVM support for virtualizing CET will likely add CET_KERNEL as a guest-only xfeature, i.e. CET_KERNEL will not be set in xfeatures_mask_supervisor() and would again be dropped when granting access to dynamic xfeatures. Note, the existing clobbering behavior is rather subtle. The @permitted parameter to __xstate_request_perm() comes from: permitted = xstate_get_group_perm(guest); which is either fpu->guest_perm.__state_perm or fpu->perm.__state_perm, where __state_perm is initialized to: fpu->perm.__state_perm = fpu_kernel_cfg.default_features; and copied to the guest side of things: /* Same defaults for guests / fpu->guest_perm = fpu->perm; fpu_kernel_cfg.default_features contains everything except the dynamic xfeatures, i.e. everything except XFEATURE_MASK_XTILE_DATA: fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features; fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC; When __xstate_request_perm() restricts the local "mask" variable to compute the user state size: mask &= XFEATURE_MASK_USER_SUPPORTED; usize = xstate_calculate_size(mask, false); it subtly overwrites the target __state_perm with "mask" containing only user xfeatures: perm = guest ? &fpu->guest_perm : &fpu->perm; / Pairs with the READ_ONCE() in xstate_get_group_perm() */ WRITE_ONCE(perm->__state_perm, mask); Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Allen <john.allen@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mitchell Levy <levymitchell0@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Vignesh Balasubramanian <vigbalas@amd.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xin Li <xin3.li@intel.com> Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/all/ZTqgzZl-reO1m01I@google.com Link: https://lore.kernel.org/r/20250506093740.2864458-2-chao.gao@intel.com	2025-05-06 11:42:04 +02:00
Ahmed S. Darwish	cc663ba3fe	x86/cpu: Sanitize CPUID(0x80000000) output CPUID(0x80000000).EAX returns the max extended CPUID leaf available. On x86-32 machines without an extended CPUID range, a CPUID(0x80000000) query will just repeat the output of the last valid standard CPUID leaf on the CPU; i.e., a garbage values. Current tip:x86/cpu code protects against this by doing: eax = cpuid_eax(0x80000000); c->extended_cpuid_level = eax; if ((eax & 0xffff0000) == 0x80000000) { // CPU has an extended CPUID range. Check for 0x80000001 if (eax >= 0x80000001) { cpuid(0x80000001, ...); } } This is correct so far. Afterwards though, the same possibly broken EAX value is used to check the availability of other extended CPUID leaves: if (c->extended_cpuid_level >= 0x80000007) ... if (c->extended_cpuid_level >= 0x80000008) ... if (c->extended_cpuid_level >= 0x8000000a) ... if (c->extended_cpuid_level >= 0x8000001f) ... which is invalid. Fix this by immediately setting the CPU's max extended CPUID leaf to zero if CPUID(0x80000000).EAX doesn't indicate a valid CPUID extended range. While at it, add a comment, similar to kernel/head_32.S, clarifying the CPUID(0x80000000) sanity check. References: `8a50e5135a` ("x86-32: Use symbolic constants, safer CPUID when enabling EFER.NX") Fixes: `3da99c9776` ("x86: make (early)_identify_cpu more the same between 32bit and 64 bit") Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250506050437.10264-3-darwi@linutronix.de	2025-05-06 10:04:57 +02:00
Ingo Molnar	24035886d7	Linux 6.15-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgX1CgeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGxiIH/A7LHlVatGEQgRFi 0JALDgcuGTMtMU1qD43rv8Z1GXqTpCAlaBt9D1C9cUH/86MGyBTVRWgVy0wkaU2U 8QSfFWQIbrdaIzelHtzmAv5IDtb+KrcX1iYGLcMb6ZYaWkv8/CMzMX1nkgxEr1QT 37Xo3/F17yJumAdNQxdRhVLGy2d3X5rScecpufwh97sMwoddllMCDs2LIoeSAYpG 376/wzni09G2fADa8MEKqcaMue4qcf0FOo/gOkT8YwFGSZLKa6uumlBLg04QoCt0 foK2vfcci1q4H4ZbCu3uQESYGLQHY0f2ICDCwC3m25VF9a81TmlbC3MLum3vhmKe RtLDcXg= =xyaI -----END PGP SIGNATURE----- Merge tag 'v6.15-rc5' into x86/cpu, to resolve conflicts Conflicts: tools/arch/x86/include/asm/cpufeatures.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-06 10:00:58 +02:00
Yazen Ghannam	ab81310287	x86/CPU/AMD: Print the reason for the last reset The following register contains bits that indicate the cause for the previous reset. PMx000000C0 (FCH::PM::S5_RESET_STATUS) This is useful for debug. The reasons for reset are broken into 6 high level categories. Decode it by category and print during boot. Specifics within a category are split off into debugging documentation. The register is accessed indirectly through a "PM" port in the FCH. Use MMIO access in order to avoid restrictions with legacy port access. Use a late_initcall() to ensure that MMIO has been set up before trying to access the register. This register was introduced with AMD Family 17h, so avoid access on older families. There is no CPUID feature bit for this register. [ bp: Simplify the reason dumping loop. - merge a fix to not access an array element after the last one: https://lore.kernel.org/r/20250505133609.83933-1-superm1@kernel.org Reported-by: James Dutton <james.dutton@gmail.com> ] [ mingo: - Use consistent .rst formatting - Fix 'Sleep' class field to 'ACPI-State' - Standardize pin messages around the 'tripped' verbiage - Remove reference to ring-buffer printing & simplify the wording - Use curly braces for multi-line conditional statements ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250422234830.2840784-6-superm1@kernel.org	2025-05-05 15:51:24 +02:00
Kees Cook	960bc2bcba	x86/fpu: Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y crash Borislav Petkov reported the following boot crash on x86-32, with CONFIG_HARDENED_USERCOPY=y: \| usercopy: Kernel memory overwrite attempt detected to SLUB object 'task_struct' (offset 2112, size 160)! \| ... \| kernel BUG at mm/usercopy.c:102! So the useroffset and usersize arguments are what control the allowed window of copying in/out of the "task_struct" kmem cache: /* create a slab on which task_structs can be allocated / task_struct_whitelist(&useroffset, &usersize); task_struct_cachep = kmem_cache_create_usercopy("task_struct", arch_task_struct_size, align, SLAB_PANIC\|SLAB_ACCOUNT, useroffset, usersize, NULL); task_struct_whitelist() positions this window based on the location of the thread_struct within task_struct, and gets the arch-specific details via arch_thread_struct_whitelist(offset, size): static void __init task_struct_whitelist(unsigned long offset, unsigned long size) { / Fetch thread_struct whitelist for the architecture. / arch_thread_struct_whitelist(offset, size); / * Handle zero-sized whitelist or empty thread_struct, otherwise * adjust offset to position of thread_struct in task_struct. / if (unlikely(size == 0)) offset = 0; else offset += offsetof(struct task_struct, thread); } Commit `cb7ca40a38` ("x86/fpu: Make task_struct::thread constant size") removed the logic for the window, leaving: static inline void arch_thread_struct_whitelist(unsigned long offset, unsigned long size) { offset = 0; size = 0; } So now there is no window that usercopy hardening will allow to be copied in/out of task_struct. But as reported above, there is a copy in copy_uabi_to_xstate(). (It seems there are several, actually.) int copy_sigframe_from_user_to_xstate(struct task_struct tsk, const void __user ubuf) { return copy_uabi_to_xstate(x86_task_fpu(tsk)->fpstate, NULL, ubuf, &tsk->thread.pkru); } This appears to be writing into x86_task_fpu(tsk)->fpstate. With or without CONFIG_X86_DEBUG_FPU, this resolves to: ((struct fpu )((void )(task) + sizeof((task)))) i.e. the memory "after task_struct" is cast to "struct fpu", and the uses the "fpstate" pointer. How that pointer gets set looks to be variable, but I think the one we care about here is: fpu->fpstate = &fpu->__fpstate; And struct fpu::__fpstate says: struct fpstate __fpstate; / * WARNING: '__fpstate' is dynamically-sized. Do not put * anything after it here. / So we're still dealing with a dynamically sized thing, even if it's not within the literal struct task_struct -- it's still in the kmem cache, though. Looking at the kmem cache size, it has allocated "arch_task_struct_size" bytes, which is calculated in fpu__init_task_struct_size(): int task_size = sizeof(struct task_struct); task_size += sizeof(struct fpu); / * Subtract off the static size of the register state. * It potentially has a bunch of padding. / task_size -= sizeof(union fpregs_state); / * Add back the dynamically-calculated register state * size. / task_size += fpu_kernel_cfg.default_size; / * We dynamically size 'struct fpu', so we require that * 'state' be at the end of 'it: / CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate); arch_task_struct_size = task_size; So, this is still copying out of the kmem cache for task_struct, and the window seems unchanged (still fpu regs). This is what the window was before: void fpu_thread_struct_whitelist(unsigned long offset, unsigned long size) { offset = offsetof(struct thread_struct, fpu.__fpstate.regs); *size = fpu_kernel_cfg.default_size; } And the same commit I mentioned above removed it. I think the misunderstanding is here: \| The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed, \| now that the FPU structure is not embedded in the task struct anymore, which \| reduces text footprint a bit. Yes, FPU is no longer in task_struct, but it IS in the kmem cache named "task_struct", since the fpstate is still being allocated there. Partially revert the earlier mentioned commit, along with a recalculation of the fpstate regs location. Fixes: `cb7ca40a38` ("x86/fpu: Make task_struct::thread constant size") Reported-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/all/20250409211127.3544993-1-mingo@kernel.org/ # Discussion #1 Link: https://lore.kernel.org/r/202505041418.F47130C4C8@keescook # Discussion #2	2025-05-05 13:24:32 +02:00
Borislav Petkov (AMD)	5214a9f6c0	x86/microcode: Consolidate the loader enablement checking Consolidate the whole logic which determines whether the microcode loader should be enabled or not into a single function and call it everywhere. Well, almost everywhere - not in mk_early_pgtbl_32() because there the kernel is running without paging enabled and checking dis_ucode_ldr et al would require physical addresses and uglification of the code. But since this is 32-bit, the easier thing to do is to simply map the initrd unconditionally especially since that mapping is getting removed later anyway by zap_early_initrd_mapping() and avoid the uglification. In doing so, address the issue of old 486er machines without CPUID support, not booting current kernels. [ mingo: Fix no previous prototype for ‘microcode_loader_disabled’ [-Wmissing-prototypes] ] Fixes: `4c585af718` ("x86/boot/32: Temporarily map initrd for microcode loading") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/CANpbe9Wm3z8fy9HbgS8cuhoj0TREYEEkBipDuhgkWFvqX0UoVQ@mail.gmail.com	2025-05-05 10:51:00 +02:00
Ard Biesheuvel	419cbaf6a5	x86/boot: Add a bunch of PIC aliases Add aliases for all the data objects that the startup code references - this is needed so that this code can be moved into its own confined area where it can only access symbols that have a __pi_ prefix. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-39-ardb+git@google.com	2025-05-04 15:59:43 +02:00
Ard Biesheuvel	bd4a58beaa	x86/boot: Move early_setup_gdt() back into head64.c Move early_setup_gdt() out of the startup code that is callable from the 1:1 mapping - this is not needed, and instead, it is better to expose the helper that does reside in __head directly. This reduces the amount of code that needs special checks for 1:1 execution suitability. In particular, it avoids dealing with the GHCB page (and its physical address) in startup code, which runs from the 1:1 mapping, making physical to virtual translations ambiguous. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-26-ardb+git@google.com	2025-05-04 15:27:23 +02:00
Ingo Molnar	39ffd86dd7	Merge branch 'x86/urgent' into x86/boot, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-04 12:09:02 +02:00
Oleg Nesterov	016a2e6f8a	x86/fpu: Check TIF_NEED_FPU_LOAD instead of PF_KTHREAD\|PF_USER_WORKER in fpu__drop() PF_KTHREAD\|PF_USER_WORKER tasks should never clear TIF_NEED_FPU_LOAD, so the TIF_NEED_FPU_LOAD check should equally filter them out. And this way an exiting userspace task can avoid the unnecessary "fwait" if it does context_switch() at least once on its way to exit_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143856.GA9009@redhat.com	2025-05-04 10:29:25 +02:00
Oleg Nesterov	2d299e3d77	x86/fpu: Always use memcpy_and_pad() in arch_dup_task_struct() It makes no sense to copy the bytes after sizeof(struct task_struct), FPU state will be initialized in fpu_clone(). A plain memcpy(dst, src, sizeof(struct task_struct)) should work too, but "_and_pad" looks safer. [ mingo: Simplify it a bit more. ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143850.GA8997@redhat.com	2025-05-04 10:29:25 +02:00
Oleg Nesterov	392bbe11c7	x86/fpu: Remove x86_init_fpu It is not actually used after: `55bc30f2e3` ("x86/fpu: Remove the thread::fpu pointer") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143837.GA8985@redhat.com	2025-05-04 10:29:24 +02:00
Oleg Nesterov	730faa15a0	x86/fpu: Simplify the switch_fpu_prepare() + switch_fpu_finish() logic Now that switch_fpu_finish() doesn't load the FPU state, it makes more sense to fold it into switch_fpu_prepare() renamed to switch_fpu(), and more importantly, use the "prev_p" task as a target for TIF_NEED_FPU_LOAD. It doesn't make any sense to delay set_tsk_thread_flag(TIF_NEED_FPU_LOAD) until "prev_p" is scheduled again. There is no worry about the very first context switch, fpu_clone() must always set TIF_NEED_FPU_LOAD. Also, shift the test_tsk_thread_flag(TIF_NEED_FPU_LOAD) from the callers to switch_fpu(). Note that the "PF_KTHREAD \| PF_USER_WORKER" check can be removed but this deserves a separate patch which can change more functions, say, kernel_fpu_begin_mask(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143830.GA8982@redhat.com	2025-05-04 10:29:24 +02:00
Ingo Molnar	a78701fe4b	Linux 6.15-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9 FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ b/dCGNg= =xDGd -----END PGP SIGNATURE----- Merge tag 'v6.15-rc4' into x86/fpu, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-04 10:25:52 +02:00
Xin Li (Intel)	444b46a128	x86/msr: Replace wrmsr(msr, low, 0) with wrmsrq(msr, low) The third argument in wrmsr(msr, low, 0) is unnecessary. Instead, use wrmsrq(msr, low), which automatically sets the higher 32 bits of the MSR value to 0. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-15-xin@zytor.com	2025-05-02 10:36:36 +02:00
Xin Li (Intel)	0c2678efed	x86/pvops/msr: Refactor pv_cpu_ops.write_msr{,_safe}() An MSR value is represented as a 64-bit unsigned integer, with existing MSR instructions storing it in EDX:EAX as two 32-bit segments. The new immediate form MSR instructions, however, utilize a 64-bit general-purpose register to store the MSR value. To unify the usage of all MSR instructions, let the default MSR access APIs accept an MSR value as a single 64-bit argument instead of two 32-bit segments. The dual 32-bit APIs are still available as convenient wrappers over the APIs that handle an MSR value as a single 64-bit argument. The following illustrates the updated derivation of the MSR write APIs: __wrmsrq(u32 msr, u64 val) / \ / \ native_wrmsrq(msr, val) native_wrmsr(msr, low, high) \| \| native_write_msr(msr, val) / \ / \ wrmsrq(msr, val) wrmsr(msr, low, high) When CONFIG_PARAVIRT is enabled, wrmsrq() and wrmsr() are defined on top of paravirt_write_msr(): paravirt_write_msr(u32 msr, u64 val) / \ / \ wrmsrq(msr, val) wrmsr(msr, low, high) paravirt_write_msr() invokes cpu.write_msr(msr, val), an indirect layer of pv_ops MSR write call: If on native: cpu.write_msr = native_write_msr If on Xen: cpu.write_msr = xen_write_msr Therefore, refactor pv_cpu_ops.write_msr{_safe}() to accept an MSR value in a single u64 argument, replacing the current dual u32 arguments. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-14-xin@zytor.com	2025-05-02 10:36:36 +02:00
Xin Li (Intel)	3204877d05	x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses __rdmsr() is the lowest level MSR write API, with native_rdmsr() and native_rdmsrq() serving as higher-level wrappers around it. #define native_rdmsr(msr, val1, val2) \ do { \ u64 __val = __rdmsr((msr)); \ (void)((val1) = (u32)__val); \ (void)((val2) = (u32)(__val >> 32)); \ } while (0) static __always_inline u64 native_rdmsrq(u32 msr) { return __rdmsr(msr); } However, __rdmsr() continues to be utilized in various locations. MSR APIs are designed for different scenarios, such as native or pvops, with or without trace, and safe or non-safe. Unfortunately, the current MSR API names do not adequately reflect these factors, making it challenging to select the most appropriate API for various situations. To pave the way for improving MSR API names, convert __rdmsr() uses to native_rdmsrq() to ensure consistent usage. Later, these APIs can be renamed to better reflect their implications, such as native or pvops, with or without trace, and safe or non-safe. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-10-xin@zytor.com	2025-05-02 10:36:35 +02:00
Xin Li (Intel)	519be7da37	x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() uses __wrmsr() is the lowest level MSR write API, with native_wrmsr() and native_wrmsrq() serving as higher-level wrappers around it: #define native_wrmsr(msr, low, high) \ __wrmsr(msr, low, high) #define native_wrmsrl(msr, val) \ __wrmsr((msr), (u32)((u64)(val)), \ (u32)((u64)(val) >> 32)) However, __wrmsr() continues to be utilized in various locations. MSR APIs are designed for different scenarios, such as native or pvops, with or without trace, and safe or non-safe. Unfortunately, the current MSR API names do not adequately reflect these factors, making it challenging to select the most appropriate API for various situations. To pave the way for improving MSR API names, convert __wrmsr() uses to native_wrmsr{,q}() to ensure consistent usage. Later, these APIs can be renamed to better reflect their implications, such as native or pvops, with or without trace, and safe or non-safe. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-8-xin@zytor.com	2025-05-02 10:27:49 +02:00
Xin Li (Intel)	795ada5287	x86/msr: Convert the rdpmc() macro to an __always_inline function Functions offer type safety and better readability compared to macros. Additionally, always inline functions can match the performance of macros. Converting the rdpmc() macro into an always inline function is simple and straightforward, so just make the change. Moreover, the read result is now the returned value, further enhancing readability. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250427092027.1598740-6-xin@zytor.com	2025-05-02 10:26:56 +02:00
Xin Li (Intel)	7d9ccde56b	x86/msr: Rename rdpmcl() to rdpmc() Now that rdpmc() is gone, rdpmcl() is the sole PMC read helper, simply rename rdpmcl() to rdpmc(). Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250427092027.1598740-5-xin@zytor.com	2025-05-02 10:25:24 +02:00
Xin Li (Intel)	efef7f184f	x86/msr: Add explicit includes of <asm/msr.h> For historic reasons there are some TSC-related functions in the <asm/msr.h> header, even though there's an <asm/tsc.h> header. To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h> and to eventually eliminate the inclusion of <asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency to the source files that reference definitions from <asm/msr.h>. [ mingo: Clarified the changelog. ] Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com	2025-05-02 10:23:47 +02:00
Ingo Molnar	c9d8ea9d53	x86/msr: Rename DECLARE_ARGS() to EAX_EDX_DECLARE_ARGS DECLARE_ARGS() is way too generic of a name that says very little about why these args are declared in that fashion - use the EAX_EDX_ prefix to create a common prefix between the three helper methods: EAX_EDX_DECLARE_ARGS() EAX_EDX_VAL() EAX_EDX_RET() Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: linux-kernel@vger.kernel.org	2025-05-02 10:11:17 +02:00
Ingo Molnar	0c7b20b852	Linux 6.15-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9 FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ b/dCGNg= =xDGd -----END PGP SIGNATURE----- Merge tag 'v6.15-rc4' into x86/msr, to pick up fixes and resolve conflicts Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-02 09:43:44 +02:00
Ruben Wauters	003f144ca0	x86/CPU/AMD: Replace strcpy() with strscpy() strcpy() is deprecated due to issues with bounds checking and overflows. Replace it with strscpy(). Signed-off-by: Ruben Wauters <rubenru09@aol.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250429230710.54014-1-rubenru09@aol.com	2025-04-30 19:32:36 +02:00
Annie Li	b43dc4ab09	x86/microcode/AMD: Do not return error when microcode update is not necessary After 6f059e634dcd("x86/microcode: Clarify the late load logic"), if the load is up-to-date, the AMD side returns UCODE_OK which leads to load_late_locked() returning -EBADFD. Handle UCODE_OK in the switch case to avoid this error. [ bp: Massage commit message. ] Fixes: `6f059e634d` ("x86/microcode: Clarify the late load logic") Signed-off-by: Annie Li <jiayanli@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250430053424.77438-1-jiayanli@google.com	2025-04-30 17:10:46 +02:00
David Kaplan	1f4bb068b4	x86/bugs: Restructure SRSO mitigation Restructure SRSO to use select/update/apply functions to create consistent vulnerability handling. Like with retbleed, the command line options directly select mitigations which can later be modified. While at it, remove a comment which doesn't apply anymore due to the changed mitigation detection flow. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-17-david.kaplan@amd.com	2025-04-30 10:25:45 +02:00
David Kaplan	d43ba2dc8e	x86/bugs: Restructure L1TF mitigation Restructure L1TF to use select/apply functions to create consistent vulnerability handling. Define new AUTO mitigation for L1TF. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-16-david.kaplan@amd.com	2025-04-29 18:57:30 +02:00
David Kaplan	5ece59a2fc	x86/bugs: Restructure SSB mitigation Restructure SSB to use select/apply functions to create consistent vulnerability handling. Remove __ssb_select_mitigation() and split the functionality between the select/apply functions. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-15-david.kaplan@amd.com	2025-04-29 18:57:26 +02:00
David Kaplan	480e803dac	x86/bugs: Restructure spectre_v2 mitigation Restructure spectre_v2 to use select/update/apply functions to create consistent vulnerability handling. The spectre_v2 mitigation may be updated based on the selected retbleed mitigation. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-14-david.kaplan@amd.com	2025-04-29 18:53:35 +02:00
David Kaplan	efe313827c	x86/bugs: Restructure BHI mitigation Restructure BHI mitigation to use select/update/apply functions to create consistent vulnerability handling. BHI mitigation was previously selected from within spectre_v2_select_mitigation() and now is selected from cpu_select_mitigation() like with all others. Define new AUTO mitigation for BHI. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-13-david.kaplan@amd.com	2025-04-29 18:51:29 +02:00
David Kaplan	ddfca9430a	x86/bugs: Restructure spectre_v2_user mitigation Restructure spectre_v2_user to use select/update/apply functions to create consistent vulnerability handling. The IBPB/STIBP choices are first decided based on the spectre_v2_user command line but can be modified by the spectre_v2 command line option as well. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-12-david.kaplan@amd.com	2025-04-29 18:51:21 +02:00
David Kaplan	e3b78a7ad5	x86/bugs: Restructure retbleed mitigation Restructure retbleed mitigation to use select/update/apply functions to create consistent vulnerability handling. The retbleed_update_mitigation() simplifies the dependency between spectre_v2 and retbleed. The command line options now directly select a preferred mitigation which simplifies the logic. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-11-david.kaplan@amd.com	2025-04-29 10:22:08 +02:00
Eric Biggers	e59236b5a0	x86/sgx: Use SHA-256 library API instead of crypto_shash API This user of SHA-256 does not support any other algorithm, so the crypto_shash abstraction provides no value. Just use the SHA-256 library API instead, which is much simpler and easier to use. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250428183838.799333-1-ebiggers%40kernel.org	2025-04-28 12:39:33 -07:00
Eric Biggers	c0a62eadb6	x86/microcode/AMD: Use sha256() instead of init/update/final Just call sha256() instead of doing the init/update/final sequence. No functional changes. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250428183006.782501-1-ebiggers@kernel.org	2025-04-28 21:03:58 +02:00

1 2 3 4 5 ...

20797 Commits