Next commits will introduce vendor-specific CPUID ranges like Transmeta's
0x8086000 range and Centaur's 0xc0000000.
Initially explicit vendor detection was implemented, but it turned out to
be not strictly necessary. As Dave Hansen noted, even established tools
like cpuid(1) just tries all ranges indices, and see if the CPU responds
back with something sensible.
Do something similar at setup_cpuid_range(). Query the range's index,
and check the maximum range function value returned. If it's within an
expected interval of [range_index, range_index + MAX_RANGE_INDEX_OFFSET],
accept the range as valid and further query its leaves.
Set MAX_RANGE_INDEX_OFFSET to a heuristic of 0xff. That should be
sensible enough since all the ranges covered by x86-cpuid-db XML database
are:
0x00000000 0x00000023
0x40000000 0x40000000
0x80000000 0x80000026
0x80860000 0x80860007
0xc0000000 0xc0000001
At setup_cpuid_range(), if the range's returned maximum function was not
sane, mark it as invalid by setting its number of leaves, range->nr, to
zero.
Introduce the for_each_valid_cpuid_range() iterator instead of sprinkling
"range->nr != 0" checks throughout the code.
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-15-darwi@linutronix.de
Let index_to_cpuid_range() return a CPUID range only if the passed index
is within a CPUID range's maximum supported function on the CPU.
Returning a CPUID range that is invalid on the CPU for the passed index
does not make sense.
This also avoids repeating the "function index is within CPUID range"
checks, both at setup_cpuid_range() and index_to_func().
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-14-darwi@linutronix.de
Extend the CPUID index mask macro from 0x80000000 to 0xffff0000. This
accommodates the Transmeta (0x80860000) and Centaur (0xc0000000) index
ranges which will be later added.
This also automatically sets CPUID_FUNCTION_MASK to 0x0000ffff, which is
the actual correct value. Use that macro, instead of the 0xffff literal
where appropriate.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-13-darwi@linutronix.de
The kcpuid code assumes only two CPUID index ranges, standard (0x0...)
and extended (0x80000000...).
Since additional CPUID index ranges will be added in further commits,
replace the "is_ext" boolean with enumeration-based range classification.
Collect all CPUID ranges in a structured array and introduce helper
macros to iterate over it. Use such helpers throughout the code.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-12-darwi@linutronix.de
Use the __cpuid_count() intrinsic, provided by GCC and LLVM, instead of
rolling a manual version. Both of the kernel's minimum required GCC
version (5.1) and LLVM version (13.0.1) supports it, and it is heavily
used across standard Linux user-space tooling.
This also makes the CPUID call sites more readable.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-11-darwi@linutronix.de
Since commit e8c07082a8 ("Kbuild: move to -std=gnu11") and the kernel
allows C99-style variable declarations inside of a for() loop.
Adjust the kcpuid code accordingly.
Note, this helps readability as some of the kcpuid functions have a huge
list of variable declarations on top.
Note, remove the empty lines before cpuid() invocations as it is clearer
to have their parameter initialization and the actual call in one block.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-10-darwi@linutronix.de
parse_line() returns an integer but its caller ignored it. Change the
function signature to return void.
While at it, adjust some of the "Skip line" comments for readability.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-9-darwi@linutronix.de
The global variable "is_amd" is written to, but is not read from
anywhere. Remove it.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-8-darwi@linutronix.de
The local variable "index" is written to, but is not read from
anywhere. Remove it.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-7-darwi@linutronix.de
kcpuid --all --detail claims that all bits belong to ECX, in the form
of the header CPUID_${leaf}_ECX[${subleaf}].
Print the correct register name for all CPUID output.
kcpuid --detail also dumps the raw register value if a leaf/subleaf is
covered in the CSV file, but a certain output register within it is not
covered by any CSV entry. Since register names are now properly printed,
and since the CSV file has become exhaustive using x86-cpuid-db, remove
that value dump as it pollutes the output.
While at it, rename decode_bits() to show_reg(). This makes it match its
show_range(), show_leaf() and show_reg_header() counterparts.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-6-darwi@linutronix.de
For each CPUID leaf/subleaf query, save the output in an output[] array
instead of spelling it out using EAX to EDX variables.
This allows the CPUID output to be accessed programmatically instead of
calling decode_bits() four times. Loop-based access also allows "kcpuid
--detail" to print the correct output register names in next commit.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-5-darwi@linutronix.de
Refactor usage() to accept an exit code parameter and exit the program
after usage output. This streamlines its callers' code paths.
Remove the "Invalid option" error message since getopt_long(3) already
emits a similar message by default.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-4-darwi@linutronix.de
If the user passed an invalid CPUID index value through --leaf=index,
kcpuid prints a warning, does nothing, then exits successfully.
Transform the warning to an error, and exit the program with a proper
error code.
Similarly, if the user passed an invalid subleaf, kcpuid prints a
warning, dumps the whole leaf, then exits successfully. Print a clear
error message regarding the invalid subleaf and exit the program with the
proper error code.
Note, moving the "Invalid input index" message from index_to_func() to
show_info() localizes error message handling to the latter, where it
should be. It also allows index_to_func() to be refactored at further
commits.
Note, since after this commit and its parent kcpuid does not just "move
on" on failures, remove the NULL parameter check plus silent exit at
show_func() and show_leaf().
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-3-darwi@linutronix.de
Error handling in kcpuid is unreliable. On malloc() failures, the code
prints an error then just goes on. The error messages are also printed
to standard output instead of standard error.
Use err() and errx() from <err.h> to direct all error messages to
standard error and automatically exit the program. Use err() to include
the errno information, and errx() otherwise. Use warnx() for warnings.
While at it, alphabetically reorder the header includes.
[ mingo: Fix capitalization in the help text while at it. ]
Fixes: c6b2f240bf ("tools/x86: Add a kcpuid tool to show raw CPU features")
Reported-by: Remington Brasga <rbrasga@uci.edu>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250324142042.29010-2-darwi@linutronix.de
Closes: https://lkml.kernel.org/r/20240926223557.2048-1-rbrasga@uci.edu
two locking commits in the locking tree,
part of the locking-core-2025-03-22 pull request. ]
x86 CPU features support:
- Generate the <asm/cpufeaturemasks.h> header based on build config
(H. Peter Anvin, Xin Li)
- x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
- Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
- Enable modifying CPU bug flags with '{clear,set}puid='
(Brendan Jackman)
- Utilize CPU-type for CPU matching (Pawan Gupta)
- Warn about unmet CPU feature dependencies (Sohil Mehta)
- Prepare for new Intel Family numbers (Sohil Mehta)
Percpu code:
- Standardize & reorganize the x86 percpu layout and
related cleanups (Brian Gerst)
- Convert the stackprotector canary to a regular percpu
variable (Brian Gerst)
- Add a percpu subsection for cache hot data (Brian Gerst)
- Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
- Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
MM:
- Add support for broadcast TLB invalidation using AMD's INVLPGB instruction
(Rik van Riel)
- Rework ROX cache to avoid writable copy (Mike Rapoport)
- PAT: restore large ROX pages after fragmentation
(Kirill A. Shutemov, Mike Rapoport)
- Make memremap(MEMREMAP_WB) map memory as encrypted by default
(Kirill A. Shutemov)
- Robustify page table initialization (Kirill A. Shutemov)
- Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
- Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
(Matthew Wilcox)
KASLR:
- x86/kaslr: Reduce KASLR entropy on most x86 systems,
to support PCI BAR space beyond the 10TiB region
(CONFIG_PCI_P2PDMA=y) (Balbir Singh)
CPU bugs:
- Implement FineIBT-BHI mitigation (Peter Zijlstra)
- speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
- speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta)
- RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta)
System calls:
- Break up entry/common.c (Brian Gerst)
- Move sysctls into arch/x86 (Joel Granados)
Intel LAM support updates: (Maciej Wieczor-Retman)
- selftests/lam: Move cpu_has_la57() to use cpuinfo flag
- selftests/lam: Skip test if LAM is disabled
- selftests/lam: Test get_user() LAM pointer handling
AMD SMN access updates:
- Add SMN offsets to exclusive region access (Mario Limonciello)
- Add support for debugfs access to SMN registers (Mario Limonciello)
- Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
Power management updates: (Patryk Wlazlyn)
- Allow calling mwait_play_dead with an arbitrary hint
- ACPI/processor_idle: Add FFH state handling
- intel_idle: Provide the default enter_dead() handler
- Eliminate mwait_play_dead_cpuid_hint()
Bootup:
Build system:
- Raise the minimum GCC version to 8.1 (Brian Gerst)
- Raise the minimum LLVM version to 15.0.0
(Nathan Chancellor)
Kconfig: (Arnd Bergmann)
- Add cmpxchg8b support back to Geode CPUs
- Drop 32-bit "bigsmp" machine support
- Rework CONFIG_GENERIC_CPU compiler flags
- Drop configuration options for early 64-bit CPUs
- Remove CONFIG_HIGHMEM64G support
- Drop CONFIG_SWIOTLB for PAE
- Drop support for CONFIG_HIGHPTE
- Document CONFIG_X86_INTEL_MID as 64-bit-only
- Remove old STA2x11 support
- Only allow CONFIG_EISA for 32-bit
Headers:
- Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers
(Thomas Huth)
Assembly code & machine code patching:
- x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf)
- x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
- KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
- x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
- x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
- x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
(Uros Bizjak)
- Use named operands in inline asm (Uros Bizjak)
- Improve performance by using asm_inline() for atomic locking instructions
(Uros Bizjak)
Earlyprintk:
- Harden early_serial (Peter Zijlstra)
NMI handler:
- Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
(Waiman Long)
Miscellaneous fixes and cleanups:
- by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel,
Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst,
Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin,
Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport,
Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker,
Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra,
Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak,
Vitaly Kuznetsov, Xin Li, liuye.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd
7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0
3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw
OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq
2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX
Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1
q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/
DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn
fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW
NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf
THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M
tJj1oc2TIZw=
=Dcb3
-----END PGP SIGNATURE-----
Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
"x86 CPU features support:
- Generate the <asm/cpufeaturemasks.h> header based on build config
(H. Peter Anvin, Xin Li)
- x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
- Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
- Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
Jackman)
- Utilize CPU-type for CPU matching (Pawan Gupta)
- Warn about unmet CPU feature dependencies (Sohil Mehta)
- Prepare for new Intel Family numbers (Sohil Mehta)
Percpu code:
- Standardize & reorganize the x86 percpu layout and related cleanups
(Brian Gerst)
- Convert the stackprotector canary to a regular percpu variable
(Brian Gerst)
- Add a percpu subsection for cache hot data (Brian Gerst)
- Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
- Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
MM:
- Add support for broadcast TLB invalidation using AMD's INVLPGB
instruction (Rik van Riel)
- Rework ROX cache to avoid writable copy (Mike Rapoport)
- PAT: restore large ROX pages after fragmentation (Kirill A.
Shutemov, Mike Rapoport)
- Make memremap(MEMREMAP_WB) map memory as encrypted by default
(Kirill A. Shutemov)
- Robustify page table initialization (Kirill A. Shutemov)
- Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
- Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
(Matthew Wilcox)
KASLR:
- x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
Singh)
CPU bugs:
- Implement FineIBT-BHI mitigation (Peter Zijlstra)
- speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
- speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
Gupta)
- RFDS: Exclude P-only parts from the RFDS affected list (Pawan
Gupta)
System calls:
- Break up entry/common.c (Brian Gerst)
- Move sysctls into arch/x86 (Joel Granados)
Intel LAM support updates: (Maciej Wieczor-Retman)
- selftests/lam: Move cpu_has_la57() to use cpuinfo flag
- selftests/lam: Skip test if LAM is disabled
- selftests/lam: Test get_user() LAM pointer handling
AMD SMN access updates:
- Add SMN offsets to exclusive region access (Mario Limonciello)
- Add support for debugfs access to SMN registers (Mario Limonciello)
- Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
Power management updates: (Patryk Wlazlyn)
- Allow calling mwait_play_dead with an arbitrary hint
- ACPI/processor_idle: Add FFH state handling
- intel_idle: Provide the default enter_dead() handler
- Eliminate mwait_play_dead_cpuid_hint()
Build system:
- Raise the minimum GCC version to 8.1 (Brian Gerst)
- Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)
Kconfig: (Arnd Bergmann)
- Add cmpxchg8b support back to Geode CPUs
- Drop 32-bit "bigsmp" machine support
- Rework CONFIG_GENERIC_CPU compiler flags
- Drop configuration options for early 64-bit CPUs
- Remove CONFIG_HIGHMEM64G support
- Drop CONFIG_SWIOTLB for PAE
- Drop support for CONFIG_HIGHPTE
- Document CONFIG_X86_INTEL_MID as 64-bit-only
- Remove old STA2x11 support
- Only allow CONFIG_EISA for 32-bit
Headers:
- Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
headers (Thomas Huth)
Assembly code & machine code patching:
- x86/alternatives: Simplify alternative_call() interface (Josh
Poimboeuf)
- x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
- KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
- x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
- x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
- x86/kexec: Merge x86_32 and x86_64 code using macros from
<asm/asm.h> (Uros Bizjak)
- Use named operands in inline asm (Uros Bizjak)
- Improve performance by using asm_inline() for atomic locking
instructions (Uros Bizjak)
Earlyprintk:
- Harden early_serial (Peter Zijlstra)
NMI handler:
- Add an emergency handler in nmi_desc & use it in
nmi_shootdown_cpus() (Waiman Long)
Miscellaneous fixes and cleanups:
- by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
Kuznetsov, Xin Li, liuye"
* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
x86/asm: Make asm export of __ref_stack_chk_guard unconditional
x86/mm: Only do broadcast flush from reclaim if pages were unmapped
perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
x86/asm: Use asm_inline() instead of asm() in clwb()
x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
x86/hweight: Use asm_inline() instead of asm()
x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
x86/hweight: Use named operands in inline asm()
x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
x86/head/64: Avoid Clang < 17 stack protector in startup code
x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
x86/cpu/intel: Limit the non-architectural constant_tsc model checks
x86/mm/pat: Replace Intel x86_model checks with VFM ones
x86/cpu/intel: Fix fast string initialization for extended Families
...
* kvm-arm64/pv-cpuid:
: Paravirtualized implementation ID, courtesy of Shameer Kolothum
:
: Big-little has historically been a pain in the ass to virtualize. The
: implementation ID (MIDR, REVIDR, AIDR) of a vCPU can change at the whim
: of vCPU scheduling. This can be particularly annoying when the guest
: needs to know the underlying implementation to mitigate errata.
:
: "Hyperscalers" face a similar scheduling problem, where VMs may freely
: migrate between hosts in a pool of heterogenous hardware. And yes, our
: server-class friends are equally riddled with errata too.
:
: In absence of an architected solution to this wart on the ecosystem,
: introduce support for paravirtualizing the implementation exposed
: to a VM, allowing the VMM to describe the pool of implementations that a
: VM may be exposed to due to scheduling/migration.
:
: Userspace is expected to intercept and handle these hypercalls using the
: SMCCC filter UAPI, should it choose to do so.
smccc: kvm_guest: Fix kernel builds for 32 bit arm
KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2
smccc/kvm_guest: Enable errata based on implementation CPUs
arm64: Make _midr_in_range_list() an exported function
KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2
KVM: arm64: Specify hypercall ABI for retrieving target implementations
arm64: Modify _midr_range() functions to read MIDR/REVIDR internally
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/nv-vgic:
: NV VGICv3 support, courtesy of Marc Zyngier
:
: Support for emulating the GIC hypervisor controls and managing shadow
: VGICv3 state for the L1 hypervisor. As part of it, bring in support for
: taking IRQs to the L1 and UAPI to manage the VGIC maintenance interrupt.
KVM: arm64: nv: Fail KVM init if asking for NV without GICv3
KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ
KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
KVM: arm64: nv: Propagate used_lrs between L1 and L0 contexts
KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2
KVM: arm64: nv: Respect virtual HCR_EL2.TWx setting
KVM: arm64: nv: Add Maintenance Interrupt emulation
KVM: arm64: nv: Handle L2->L1 transition on interrupt injection
KVM: arm64: nv: Nested GICv3 emulation
KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses
KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses
KVM: arm64: nv: Add ICH_*_EL2 registers to vpcu_sysreg
KVM: arm64: nv: Load timer before the GIC
arm64: sysreg: Add layout for ICH_MISR_EL2
arm64: sysreg: Add layout for ICH_VTR_EL2
arm64: sysreg: Add layout for ICH_HCR_EL2
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with UAPI headers that
rather should use __ASSEMBLER__ instead. So let's standardize on
the __ASSEMBLER__ macro that is provided by the compilers now.
This is mostly a mechanical patch (done with a simple "sed -i"
statement), with some manual tweaks in <asm/frame.h>, <asm/hw_irq.h>
and <asm/setup.h> that mentioned this macro in comments with some
missing underscores.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250314071013.1575167-38-thuth@redhat.com
The functionalities of {disabled,required}-features.h have been replaced with
the auto-generated generated/<asm/cpufeaturemasks.h> header.
Thus they are no longer needed and can be removed.
None of the macros defined in {disabled,required}-features.h is used in tools,
delete them too.
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250305184725.3341760-4-xin@zytor.com
With AMD TCE (translation cache extensions) only the intermediate mappings
that cover the address range zapped by INVLPG / INVLPGB get invalidated,
rather than all intermediate mappings getting zapped at every TLB invalidation.
This can help reduce the TLB miss rate, by keeping more intermediate mappings
in the cache.
From the AMD manual:
Translation Cache Extension (TCE) Bit. Bit 15, read/write. Setting this bit to
1 changes how the INVLPG, INVLPGB, and INVPCID instructions operate on TLB
entries. When this bit is 0, these instructions remove the target PTE from the
TLB as well as all upper-level table entries that are cached in the TLB,
whether or not they are associated with the target PTE. When this bit is set,
these instructions will remove the target PTE and only those upper-level
entries that lead to the target PTE in the page table hierarchy, leaving
unrelated upper-level entries intact.
[ bp: use cpu_has()... I know, it is a mess. ]
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226030129.530345-13-riel@surriel.com
The VGIC maintenance IRQ signals various conditions about the LRs, when
the GIC's virtualization extension is used.
So far we didn't need it, but nested virtualization needs to know about
this interrupt, so add a userland interface to setup the IRQ number.
The architecture mandates that it must be a PPI, on top of that this code
only exports a per-device option, so the PPI is the same on all VCPUs.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[added some bits of documentation]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-16-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The ICH_MISR_EL2-related macros are missing a number of status
bits that we are about to handle. Take this opportunity to fully
describe the layout of that register as part of the automatic
generation infrastructure.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The ICH_VTR_EL2-related macros are missing a number of config
bits that we are about to handle. Take this opportunity to fully
describe the layout of that register as part of the automatic
generation infrastructure.
This results in a bit of churn to repaint constants that are now
generated with a different format.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The ICH_HCR_EL2-related macros are missing a number of control
bits that we are about to handle. Take this opportunity to fully
describe the layout of that register as part of the automatic
generation infrastructure.
This results in a bit of churn, unfortunately.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
tools/arch/x86/include/linux doesn't exist but building is working by
virtue of a -I. Building using bazel this fails. Use angle brackets to
include unaligned.h so there isn't an invalid relative include.
Fixes: 5f60d5f6bb ("move asm/unaligned.h to linux/unaligned.h")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20250225193600.90037-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Replace X86_CMPXCHG64 with X86_CX8, as CX8 is the name of the CPUID
flag, thus to make it consistent with X86_FEATURE_CX8 defined in
<asm/cpufeatures.h>.
No functional change intended.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250228082338.73859-2-xin@zytor.com
X86_FEATURE_USE_IBPB was introduced in:
2961298efe ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags")
to have separate flags for when the CPU supports IBPB (i.e. X86_FEATURE_IBPB)
and when an IBPB is actually used to mitigate Spectre v2.
Ever since then, the uses of IBPB expanded. The name became confusing
because it does not control all IBPB executions in the kernel.
Furthermore, because its name is generic and it's buried within
indirect_branch_prediction_barrier(), it's easy to use it not knowing
that it is specific to Spectre v2.
X86_FEATURE_USE_IBPB is no longer needed because all the IBPB executions
it used to control are now controlled through other means (e.g.
switch_mm_*_ibpb static branches).
Remove the unused feature bit.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20250227012712.3193063-7-yosry.ahmed@linux.dev
One difference here with other pseudo-firmware bitmap registers
is that the default/reset value for the supported hypercall
function-ids is 0 at present. Hence, modify the test accordingly.
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Link: https://lore.kernel.org/r/20250221140229.12588-7-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
- Fix tools/ quiet build Makefile infrastructure that was broken when
working on tools/perf/ without testing on other tools/ living
utilities.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZ74OJwAKCRCyPKLppCJ+
J30mAPsHCA8A+CNq/5yW2VhFLV1GgCSL5oWqxXRn7QjhSrCQBQEAot2u4O5zXs7M
sg+mPlYiS1oT+zmvTLlXrN+bVyWP9A4=
=jH1N
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v6.14-2-2025-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix tools/ quiet build Makefile infrastructure that was broken when
working on tools/perf/ without testing on other tools/ living
utilities.
* tag 'perf-tools-fixes-for-v6.14-2-2025-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
tools: Remove redundant quiet setup
tools: Unify top-level quiet infrastructure
Q is exported from Makefile.include so it is not necessary to manually
set it.
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Benjamin Tissoires <bentiss@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Mykola Lysenko <mykolal@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20250213-quiet_tools-v3-2-07de4482a581@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Convert TRFCR to automatic generation. Add separate definitions for ELx
and EL2 as TRFCR_EL1 doesn't have CX. This also mirrors the previous
definition so no code change is required.
Also add TRFCR_EL12 which will start to be used in a later commit.
Unfortunately, to avoid breaking the Perf build with duplicate
definition errors, the tools copy of the sysreg.h header needs to be
updated at the same time rather than the usual second commit. This is
because the generated version of sysreg
(arch/arm64/include/generated/asm/sysreg-defs.h), is currently shared
and tools/ does not have its own copy.
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-4-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Created with the following:
cp include/linux/kasan-tags.h tools/include/linux/
cp arch/arm64/include/asm/sysreg.h tools/arch/arm64/include/asm/
Update the tools copy of sysreg.h so that the next commit to add a new
register doesn't have unrelated changes in it. Because the new version
of sysreg.h includes kasan-tags.h, that file also now needs to be copied
into tools.
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-3-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
To pick up the changes in this cset:
97413cea1c ("KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation")
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Please see tools/include/uapi/README for further details.
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.linux.dev
Link: https://lore.kernel.org/r/20241203035349.1901262-6-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To pick up the changes in this cset:
a0423af92c ("x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest")
0c487010cb ("x86/cpufeatures: Add X86_FEATURE_AMD_WORKLOAD_CLASS feature bit")
1ad4667066 ("x86/cpufeatures: Add X86_FEATURE_AMD_HETEROGENEOUS_CORES")
104edc6efc ("x86/cpufeatures: Rename X86_FEATURE_FAST_CPPC to have AMD prefix")
3ea87dfa31 ("x86/cpufeatures: Add a IBPB_NO_RET BUG flag")
ff898623af ("x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET")
dcb988cdac ("KVM: x86: Quirk initialization of feature MSRs to KVM's max configuration")
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Please see tools/include/uapi/README for further details.
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20241203035349.1901262-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
essentially guessing which pfns are refcounted pages. The reason to
do so was that KVM needs to map both non-refcounted pages (for example
BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain
refcounted pages. However, the result was security issues in the past,
and more recently the inability to map VM_IO and VM_PFNMAP memory
that _is_ backed by struct page but is not refcounted. In particular
this broke virtio-gpu blob resources (which directly map host graphics
buffers into the guest as "vram" for the virtio-gpu device) with the
amdgpu driver, because amdgpu allocates non-compound higher order pages
and the tail pages could not be mapped into KVM.
This requires adjusting all uses of struct page in the per-architecture
code, to always work on the pfn whenever possible. The large series that
did this, from David Stevens and Sean Christopherson, also cleaned up
substantially the set of functions that provided arch code with the
pfn for a host virtual addresses. The previous maze of twisty little
passages, all different, is replaced by five functions (__gfn_to_page,
__kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages)
saving almost 200 lines of code.
ARM:
* Support for stage-1 permission indirection (FEAT_S1PIE) and
permission overlays (FEAT_S1POE), including nested virt + the
emulated page table walker
* Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call
was introduced in PSCIv1.3 as a mechanism to request hibernation,
similar to the S4 state in ACPI
* Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
part of it, introduce trivial initialization of the host's MPAM
context so KVM can use the corresponding traps
* PMU support under nested virtualization, honoring the guest
hypervisor's trap configuration and event filtering when running a
nested guest
* Fixes to vgic ITS serialization where stale device/interrupt table
entries are not zeroed when the mapping is invalidated by the VM
* Avoid emulated MMIO completion if userspace has requested synchronous
external abort injection
* Various fixes and cleanups affecting pKVM, vCPU initialization, and
selftests
LoongArch:
* Add iocsr and mmio bus simulation in kernel.
* Add in-kernel interrupt controller emulation.
* Add support for virtualization extensions to the eiointc irqchip.
PPC:
* Drop lingering and utterly obsolete references to PPC970 KVM, which was
removed 10 years ago.
* Fix incorrect documentation references to non-existing ioctls
RISC-V:
* Accelerate KVM RISC-V when running as a guest
* Perf support to collect KVM guest statistics from host side
s390:
* New selftests: more ucontrol selftests and CPU model sanity checks
* Support for the gen17 CPU model
* List registers supported by KVM_GET/SET_ONE_REG in the documentation
x86:
* Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve
documentation, harden against unexpected changes. Even if the hardware
A/D tracking is disabled, it is possible to use the hardware-defined A/D
bits to track if a PFN is Accessed and/or Dirty, and that removes a lot
of special cases.
* Elide TLB flushes when aging secondary PTEs, as has been done in x86's
primary MMU for over 10 years.
* Recover huge pages in-place in the TDP MMU when dirty page logging is
toggled off, instead of zapping them and waiting until the page is
re-accessed to create a huge mapping. This reduces vCPU jitter.
* Batch TLB flushes when dirty page logging is toggled off. This reduces
the time it takes to disable dirty logging by ~3x.
* Remove the shrinker that was (poorly) attempting to reclaim shadow page
tables in low-memory situations.
* Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE.
* Advertise CPUIDs for new instructions in Clearwater Forest
* Quirk KVM's misguided behavior of initialized certain feature MSRs to
their maximum supported feature set, which can result in KVM creating
invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero
value results in the vCPU having invalid state if userspace hides PDCM
from the guest, which in turn can lead to save/restore failures.
* Fix KVM's handling of non-canonical checks for vCPUs that support LA57
to better follow the "architecture", in quotes because the actual
behavior is poorly documented. E.g. most MSR writes and descriptor
table loads ignore CR4.LA57 and operate purely on whether the CPU
supports LA57.
* Bypass the register cache when querying CPL from kvm_sched_out(), as
filling the cache from IRQ context is generally unsafe; harden the
cache accessors to try to prevent similar issues from occuring in the
future. The issue that triggered this change was already fixed in 6.12,
but was still kinda latent.
* Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM
over-advertises SPEC_CTRL when trying to support cross-vendor VMs.
* Minor cleanups
* Switch hugepage recovery thread to use vhost_task. These kthreads can
consume significant amounts of CPU time on behalf of a VM or in response
to how the VM behaves (for example how it accesses its memory); therefore
KVM tried to place the thread in the VM's cgroups and charge the CPU
time consumed by that work to the VM's container. However the kthreads
did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM
instances inside could not complete freezing. Fix this by replacing the
kthread with a PF_USER_WORKER thread, via the vhost_task abstraction.
Another 100+ lines removed, with generally better behavior too like
having these threads properly parented in the process tree.
* Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't
really work; there was really nothing to work around anyway: the broken
patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL
MSR is virtualized and therefore unaffected by the erratum.
* Fix 6.12 regression where CONFIG_KVM will be built as a module even
if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'.
x86 selftests:
* x86 selftests can now use AVX.
Documentation:
* Use rST internal links
* Reorganize the introduction to the API document
Generic:
* Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead
of RCU, so that running a vCPU on a different task doesn't encounter long
due to having to wait for all CPUs become quiescent. In general both reads
and writes are rare, but userspace that supports confidential computing is
introducing the use of "helper" vCPUs that may jump from one host processor
to another. Those will be very happy to trigger a synchronize_rcu(), and
the effect on performance is quite the disaster.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmc9MRYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroP00QgArxqxBIGLCW5t7bw7vtNq63QYRyh4
dTiDguLiYQJ+AXmnRu11R6aPC7HgMAvlFCCmH+GEce4WEgt26hxCmncJr/aJOSwS
letCS7TrME16PeZvh25A1nhPBUw6mTF1qqzgcdHMrqXG8LuHoGcKYGSRVbkf3kfI
1ZoMq1r8ChXbVVmCx9DQ3gw1TVr5Dpjs2voLh8rDSE9Xpw0tVVabHu3/NhQEz/F+
t8/nRaqH777icCHIf9PCk5HnarHxLAOvhM2M0Yj09PuBcE5fFQxpxltw/qiKQqqW
ep4oquojGl87kZnhlDaac2UNtK90Ws+WxxvCwUmbvGN0ZJVaQwf4FvTwig==
=lWpE
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"The biggest change here is eliminating the awful idea that KVM had of
essentially guessing which pfns are refcounted pages.
The reason to do so was that KVM needs to map both non-refcounted
pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP
VMAs that contain refcounted pages.
However, the result was security issues in the past, and more recently
the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by
struct page but is not refcounted. In particular this broke virtio-gpu
blob resources (which directly map host graphics buffers into the
guest as "vram" for the virtio-gpu device) with the amdgpu driver,
because amdgpu allocates non-compound higher order pages and the tail
pages could not be mapped into KVM.
This requires adjusting all uses of struct page in the
per-architecture code, to always work on the pfn whenever possible.
The large series that did this, from David Stevens and Sean
Christopherson, also cleaned up substantially the set of functions
that provided arch code with the pfn for a host virtual addresses.
The previous maze of twisty little passages, all different, is
replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the
non-__ versions of these two, and kvm_prefetch_pages) saving almost
200 lines of code.
ARM:
- Support for stage-1 permission indirection (FEAT_S1PIE) and
permission overlays (FEAT_S1POE), including nested virt + the
emulated page table walker
- Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This
call was introduced in PSCIv1.3 as a mechanism to request
hibernation, similar to the S4 state in ACPI
- Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
part of it, introduce trivial initialization of the host's MPAM
context so KVM can use the corresponding traps
- PMU support under nested virtualization, honoring the guest
hypervisor's trap configuration and event filtering when running a
nested guest
- Fixes to vgic ITS serialization where stale device/interrupt table
entries are not zeroed when the mapping is invalidated by the VM
- Avoid emulated MMIO completion if userspace has requested
synchronous external abort injection
- Various fixes and cleanups affecting pKVM, vCPU initialization, and
selftests
LoongArch:
- Add iocsr and mmio bus simulation in kernel.
- Add in-kernel interrupt controller emulation.
- Add support for virtualization extensions to the eiointc irqchip.
PPC:
- Drop lingering and utterly obsolete references to PPC970 KVM, which
was removed 10 years ago.
- Fix incorrect documentation references to non-existing ioctls
RISC-V:
- Accelerate KVM RISC-V when running as a guest
- Perf support to collect KVM guest statistics from host side
s390:
- New selftests: more ucontrol selftests and CPU model sanity checks
- Support for the gen17 CPU model
- List registers supported by KVM_GET/SET_ONE_REG in the
documentation
x86:
- Cleanup KVM's handling of Accessed and Dirty bits to dedup code,
improve documentation, harden against unexpected changes.
Even if the hardware A/D tracking is disabled, it is possible to
use the hardware-defined A/D bits to track if a PFN is Accessed
and/or Dirty, and that removes a lot of special cases.
- Elide TLB flushes when aging secondary PTEs, as has been done in
x86's primary MMU for over 10 years.
- Recover huge pages in-place in the TDP MMU when dirty page logging
is toggled off, instead of zapping them and waiting until the page
is re-accessed to create a huge mapping. This reduces vCPU jitter.
- Batch TLB flushes when dirty page logging is toggled off. This
reduces the time it takes to disable dirty logging by ~3x.
- Remove the shrinker that was (poorly) attempting to reclaim shadow
page tables in low-memory situations.
- Clean up and optimize KVM's handling of writes to
MSR_IA32_APICBASE.
- Advertise CPUIDs for new instructions in Clearwater Forest
- Quirk KVM's misguided behavior of initialized certain feature MSRs
to their maximum supported feature set, which can result in KVM
creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to
a non-zero value results in the vCPU having invalid state if
userspace hides PDCM from the guest, which in turn can lead to
save/restore failures.
- Fix KVM's handling of non-canonical checks for vCPUs that support
LA57 to better follow the "architecture", in quotes because the
actual behavior is poorly documented. E.g. most MSR writes and
descriptor table loads ignore CR4.LA57 and operate purely on
whether the CPU supports LA57.
- Bypass the register cache when querying CPL from kvm_sched_out(),
as filling the cache from IRQ context is generally unsafe; harden
the cache accessors to try to prevent similar issues from occuring
in the future. The issue that triggered this change was already
fixed in 6.12, but was still kinda latent.
- Advertise AMD_IBPB_RET to userspace, and fix a related bug where
KVM over-advertises SPEC_CTRL when trying to support cross-vendor
VMs.
- Minor cleanups
- Switch hugepage recovery thread to use vhost_task.
These kthreads can consume significant amounts of CPU time on
behalf of a VM or in response to how the VM behaves (for example
how it accesses its memory); therefore KVM tried to place the
thread in the VM's cgroups and charge the CPU time consumed by that
work to the VM's container.
However the kthreads did not process SIGSTOP/SIGCONT, and therefore
cgroups which had KVM instances inside could not complete freezing.
Fix this by replacing the kthread with a PF_USER_WORKER thread, via
the vhost_task abstraction. Another 100+ lines removed, with
generally better behavior too like having these threads properly
parented in the process tree.
- Revert a workaround for an old CPU erratum (Nehalem/Westmere) that
didn't really work; there was really nothing to work around anyway:
the broken patch was meant to fix nested virtualization, but the
PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the
erratum.
- Fix 6.12 regression where CONFIG_KVM will be built as a module even
if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is
'y'.
x86 selftests:
- x86 selftests can now use AVX.
Documentation:
- Use rST internal links
- Reorganize the introduction to the API document
Generic:
- Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock
instead of RCU, so that running a vCPU on a different task doesn't
encounter long due to having to wait for all CPUs become quiescent.
In general both reads and writes are rare, but userspace that
supports confidential computing is introducing the use of "helper"
vCPUs that may jump from one host processor to another. Those will
be very happy to trigger a synchronize_rcu(), and the effect on
performance is quite the disaster"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits)
KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD
KVM: x86: add back X86_LOCAL_APIC dependency
Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()"
KVM: x86: switch hugepage recovery thread to vhost_task
KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR
x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest
Documentation: KVM: fix malformed table
irqchip/loongson-eiointc: Add virt extension support
LoongArch: KVM: Add irqfd support
LoongArch: KVM: Add PCHPIC user mode read and write functions
LoongArch: KVM: Add PCHPIC read and write functions
LoongArch: KVM: Add PCHPIC device support
LoongArch: KVM: Add EIOINTC user mode read and write functions
LoongArch: KVM: Add EIOINTC read and write functions
LoongArch: KVM: Add EIOINTC device support
LoongArch: KVM: Add IPI user mode read and write function
LoongArch: KVM: Add IPI read and write function
LoongArch: KVM: Add IPI device support
LoongArch: KVM: Add iocsr and mmio bus simulation in kernel
KVM: arm64: Pass on SVE mapping failures
...
with the purpose of using such hints when making scheduling decisions
- Determine the boost enumerator for each AMD core based on its type: efficiency
or performance, in the cppc driver
- Add the type of a CPU to the topology CPU descriptor with the goal of
supporting and making decisions based on the type of the respective core
- Add a feature flag to denote AMD cores which have heterogeneous topology and
enable SD_ASYM_PACKING for those
- Check microcode revisions before disabling PCID on Intel
- Cleanups and fixlets
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7q0UACgkQEsHwGGHe
VUq27Q//TADIn/rZj95OuWLYFXduOpzdyfF6BAOabRjUpIWTGJ5YdKjj1TCA2wUE
6SiHZWQxQropB3NgeICcDT+3OGdGzE2qywzpXspUDsBPraWx+9CA56qREYafpRps
88ZQZJWHla2/0kHN5oM4fYe05mWMLAFgIhG4tPH/7sj54Zqar40nhVksz3WjKAid
yEfzbdVeRI5sNoujyHzGANXI0Fo98nAyi5Qj9kXL9W/UV1JmoQ78Rq7V9IIgOBsc
l6Gv/h0CNtH9voqfrfUb07VHk8ZqSJ37xUnrnKdidncWGCWEAoZRr7wU+I9CHKIs
tzdx+zq6JC3YN0IwsZCjk4me+BqVLJxW2oDgW7esPifye6ElyEo4T9UO9LEpE1qm
ReAByoIMdSXWwXuITwy4NxLPKPCpU7RyJCiqFzpJp0g4qUq2cmlyERDirf6eknXL
s+dmRaglEdcQT/EL+Y+vfFdQtLdwJmOu+nPPjjFxeRcIDB+u1sXJMEFbyvkLL6FE
HOdNxL+5n/3M8Lbh77KIS5uCcjXL2VCkZK2/hyoifUb+JZR/ENoqYjElkMXOplyV
KQIfcTzVCLRVvZApf/MMkTO86cpxMDs7YLYkgFxDsBjRdoq/Mzub8yzWn6kLZtmP
ANNH4uYVtjrHE1nxJSA0JgYQlJKYeNU5yhLiTLKhHL5BwDYfiz8=
=420r
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Add a feature flag which denotes AMD CPUs supporting workload
classification with the purpose of using such hints when making
scheduling decisions
- Determine the boost enumerator for each AMD core based on its type:
efficiency or performance, in the cppc driver
- Add the type of a CPU to the topology CPU descriptor with the goal of
supporting and making decisions based on the type of the respective
core
- Add a feature flag to denote AMD cores which have heterogeneous
topology and enable SD_ASYM_PACKING for those
- Check microcode revisions before disabling PCID on Intel
- Cleanups and fixlets
* tag 'x86_cpu_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Remove redundant CONFIG_NUMA guard around numa_add_cpu()
x86/cpu: Fix FAM5_QUARK_X1000 to use X86_MATCH_VFM()
x86/cpu: Fix formatting of cpuid_bits[] in scattered.c
x86/cpufeatures: Add X86_FEATURE_AMD_WORKLOAD_CLASS feature bit
x86/amd: Use heterogeneous core topology for identifying boost numerator
x86/cpu: Add CPU type to struct cpuinfo_topology
x86/cpu: Enable SD_ASYM_PACKING for PKG domain on AMD
x86/cpufeatures: Add X86_FEATURE_AMD_HETEROGENEOUS_CORES
x86/cpufeatures: Rename X86_FEATURE_FAST_CPPC to have AMD prefix
x86/mm: Don't disable PCID when INVLPG has been fixed by microcode
- Support for stage-1 permission indirection (FEAT_S1PIE) and
permission overlays (FEAT_S1POE), including nested virt + the
emulated page table walker
- Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call
was introduced in PSCIv1.3 as a mechanism to request hibernation,
similar to the S4 state in ACPI
- Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
part of it, introduce trivial initialization of the host's MPAM
context so KVM can use the corresponding traps
- PMU support under nested virtualization, honoring the guest
hypervisor's trap configuration and event filtering when running a
nested guest
- Fixes to vgic ITS serialization where stale device/interrupt table
entries are not zeroed when the mapping is invalidated by the VM
- Avoid emulated MMIO completion if userspace has requested synchronous
external abort injection
- Various fixes and cleanups affecting pKVM, vCPU initialization, and
selftests
-----BEGIN PGP SIGNATURE-----
iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZzTZXRccb2xpdmVyLnVw
dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFioUAP0cs2pYcwuCqLgmeHqfz6L5Xsw3
hKBCNuvr5mjU0hZfLAEA5ml2eUKD7OnssAOmUZ/K/NoCdJFCe8mJWQDlURvr9g4=
=u2/3
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.13' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.13, part #1
- Support for stage-1 permission indirection (FEAT_S1PIE) and
permission overlays (FEAT_S1POE), including nested virt + the
emulated page table walker
- Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call
was introduced in PSCIv1.3 as a mechanism to request hibernation,
similar to the S4 state in ACPI
- Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
part of it, introduce trivial initialization of the host's MPAM
context so KVM can use the corresponding traps
- PMU support under nested virtualization, honoring the guest
hypervisor's trap configuration and event filtering when running a
nested guest
- Fixes to vgic ITS serialization where stale device/interrupt table
entries are not zeroed when the mapping is invalidated by the VM
- Avoid emulated MMIO completion if userspace has requested synchronous
external abort injection
- Various fixes and cleanups affecting pKVM, vCPU initialization, and
selftests
- Drop obsolete references to PPC970 KVM, which was removed 10 years ago.
- Fix incorrect references to non-existing ioctls
- List registers supported by KVM_GET/SET_ONE_REG on s390
- Use rST internal links
- Reorganize the introduction to the API document
Check if the PFCR query reported in userspace coincides with the
kernel reported function list. Right now we don't mask the functions
in the kernel so they have to be the same.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Hariharan Mari <hari55@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107152319.77816-5-brueckner@linux.ibm.com
[frankja@linux.ibm.com: Added commit description]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-5-brueckner@linux.ibm.com>
As there are duplicated kernel headers in tools/include libc can pick
up the wrong definitions. This was causing the wrong system call for
capget in perf.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: e25ebda78e ("perf cap: Tidy up and improve capability testing")
Closes: https://lore.kernel.org/lkml/cc7d6bdf-1aeb-4179-9029-4baf50b59342@intel.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241026055448.312247-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
924725707d ("arm64: cputype: Add Neoverse-N3 definitions")
That makes this perf source code to be rebuilt:
CC /tmp/build/perf-tools/util/arm-spe.o
The changes in the above patch add MIDR_NEOVERSE_N3, that probably need
changes in arm-spe.c, so probably we need to add it to that array? Or
maybe we need to leave this for later when this is all tested on those
machines?
static const struct midr_range neoverse_spe[] = {
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
{},
};
Mark Rutland recommended about arm-spe.c in a previous update to this
file:
"I would not touch this for now -- someone would have to go audit the
TRMs to check that those other cores have the same encoding, and I think
it'd be better to do that as a follow-up."
That addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zx-dffKdGsgkhG96@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from these csets:
dc1e67f70f ("KVM VMX: Move MSR_IA32_VMX_MISC bit defines to asm/vmx.h")
d7bfc9ffd5 ("KVM: VMX: Move MSR_IA32_VMX_BASIC bit defines to asm/vmx.h")
beb2e44604 ("x86/cpu: KVM: Move macro to encode PAT value to common header")
e7e80b66fb ("x86/cpu: KVM: Add common defines for architectural memory types (PAT, MTRRs, etc.)")
That cause no changes to tooling:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
$
To see how this works take a look at this previous update:
https://git.kernel.org/torvalds/c/174372668933ede51743726689 ("tools arch x86: Sync the msr-index.h copy with the kernel sources to pick IA32_MKTME_KEYID_PARTITIONING")
Just silences this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Please see tools/include/uapi/README for further details.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Xin Li <xin3.li@intel.com>
Link: https://lore.kernel.org/lkml/ZxpLSBzGin3vjs3b@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
aa8d1f48d3 ("KVM: x86/mmu: Introduce a quirk to control memslot zap behavior")
That don't change functionality in tools/perf, as no new ioctl is added
for the 'perf trace' scripts to harvest.
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Please see tools/include/uapi/README for further details.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/lkml/ZxgN0O02YrAJ2qIC@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This kselftest update for Linux 6.12-rc3 consists of several fixes
for build, run-time errors, and reporting errors:
-- ftrace: regression test for a kernel crash when running function graph
tracing and then enabling function profiler.
-- rseq: fix for mm_cid test failure.
-- vDSO:
- fixes to reporting skip and other error conditions.
- changes unconditionally build chacha and getrandom tests on
all architectures to make it easier for them to run in CIs.
- build error when sched.h to bring in CLONE_NEWTIME define.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmcJnTAACgkQCwJExA0N
QxxG0w/6AnqINbQIrpn8OmdublNPfh+fMpRFPYQLqDFJhFl7vc8Qp/m0kApcFQCC
B15LSnxSdy4imhCYjtLaXsKsd6oFL4cv0sFswHCT7YFOd4q8yPzU7SvDERetFKJV
1/nhq2VR86jVMnoZioiHvb96xVrCw+YUtbJrG6qUfAO7nlnZ2Rtme5ZUSZ5EhEtt
i5UGolc1qeCOF01H2Cg+Y4SZp1pP4YBZuV04miK3xvzrRx3sB9U8Rzd/1yVAbD1d
BuQ5fIogu0+DiL5yIADaY6KB1IYfQojOtU0lOFCHyn7cpOpH0XT3pmRy+SrYf094
nX+UFDdHuX0565rZJMzqGryqrc3sZz/wtVHrMD04NqwkdNEfoHL2ABlrGepTBMbd
Tqs7BpKnvuBz+WlM7oDTu3S94ZwukT3tZ6A6VOYQDSflOIYLbmVTK4DvG5yVAmoW
u4LxCDC9ODuUTvxeGmHlI4UpR7f0EArY3Rf2UeCGHaHwKgAhAY4Ti/gwmddpg60i
2nbZ2cneTqtVn0C0fc612EXv6uqlk8onj+1Usp9PVqeYrhGKl2DPuFDoBVpBGr0W
Mi0CISrgrFngsVxsLlaHnn31e4SyWSLODKp2y5H7WfFU0eZCRb3efleVADpmEqRr
NI0Pzhw1cvz72G1XFvVZtALeEPdA5cdKwTuutxfgGlkjwAKyw5M=
=1etd
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Fixes for build, run-time errors, and reporting errors:
- ftrace: regression test for a kernel crash when running function
graph tracing and then enabling function profiler.
- rseq: fix for mm_cid test failure.
- vDSO:
- fixes to reporting skip and other error conditions
- changes unconditionally build chacha and getrandom tests on all
architectures to make it easier for them to run in CIs
- build error when sched.h to bring in CLONE_NEWTIME define"
* tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
ftrace/selftest: Test combination of function_graph tracer and function profiler
selftests/rseq: Fix mm_cid test failure
selftests: vDSO: Explicitly include sched.h
selftests: vDSO: improve getrandom and chacha error messages
selftests: vDSO: unconditionally build getrandom test
selftests: vDSO: unconditionally build chacha test
Rather than using symlinks to find the vgetrandom-chacha.S file for each
arch, store this in a file that uses the compiler to determine
architecture, and then make use of weak symbols to skip the test on
architectures that don't provide the code.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
- Fix an assert() to handle captured and unprocessed ARM CoreSight CPU traces.
- Fix static build compilation error when libdw isn't installed or is too old.
- Add missing include when building with !HAVE_DWARF_GETLOCATIONS_SUPPORT.
- Add missing refcount put on 32-bit DSOs.
- Fix disassembly of user space binaries by setting the binary_type of DSO when
loading.
- Update headers with the kernel sources, including asound.h, sched.h, fcntl,
msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's cputype.h.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZwU2dgAKCRCyPKLppCJ+
J8uaAQDEbp0lMf1S/Y6vOGbnP6mGQCewQsXtIpSA4gcRMWlCCgD+O6ZxbnBCHOzn
nQfBmbT62qUGuUA38Mg7pCyRXBd8FgU=
=s4JZ
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix an assert() to handle captured and unprocessed ARM CoreSight CPU
traces
- Fix static build compilation error when libdw isn't installed or is
too old
- Add missing include when building with
!HAVE_DWARF_GETLOCATIONS_SUPPORT
- Add missing refcount put on 32-bit DSOs
- Fix disassembly of user space binaries by setting the binary_type of
DSO when loading
- Update headers with the kernel sources, including asound.h, sched.h,
fcntl, msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's
cputype.h
* tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf cs-etm: Fix the assert() to handle captured and unprocessed cpu trace
perf build: Fix build feature-dwarf_getlocations fail for old libdw
perf build: Fix static compilation error when libdw is not installed
perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT
tools headers arm64: Sync arm64's cputype.h with the kernel sources
perf tools: Cope with differences for lib/list_sort.c copy from the kernel
tools check_headers.sh: Add check variant that excludes some hunks
perf beauty: Update copy of linux/socket.h with the kernel sources
tools headers UAPI: Sync the linux/in.h with the kernel sources
perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools include UAPI: Sync linux/fcntl.h copy with the kernel sources
tools include UAPI: Sync linux/sched.h copy with the kernel sources
tools include UAPI: Sync sound/asound.h copy with the kernel sources
perf vdso: Missed put on 32-bit dsos
perf symbol: Set binary_type of dso when loading
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
To get the changes in:
db0d8a8434 ("arm64: errata: Enable the AC03_CPU_38 workaround for ampere1a")
That makes this perf source code to be rebuilt:
CC /tmp/build/perf-tools/util/arm-spe.o
The changes in the above patch add MIDR_AMPERE1A, used in arm-spe.c, so
probably we need to add it to that array? Or maybe we need to leave
this for later when this is all tested on those machines?
static const struct midr_range neoverse_spe[] = {
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
{},
};
Mark Rutland recommended about arm-spe.c in a previous update to this
file:
"I would not touch this for now -- someone would have to go audit the
TRMs to check that those other cores have the same encoding, and I think
it'd be better to do that as a follow-up."
That addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/lkml/ZvtFu7J-Awy2zuEJ@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from these csets:
0a3e4e94d1 ("platform/x86/intel/ifs: Add SBAF test image loading support")
That cause no changes to tooling:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
$
Just silences this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jithu Joseph <jithu.joseph@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrJY68Btx3a_yV4@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Support for using Zkr to seed KASLR.
* Support for IPI-triggered CPU backtracing.
* Support for generic CPU vulnerabilities reporting to userspace.
* A few cleanups for missing licenses.
* The size limit on the XIP kernel has been removed.
* Support for tracing userspace stacks.
* Support for the Svvptc extension.
* Various cleanups and fixes throughout the tree.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmbykZITHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiTDLEACABTPlMX+Ke1lMYWj6MLBTXMSlQJ6w
TKLVk3slp4POPsd+ViWy4J6EJDpCkNp6iK30Bv4AGainA3RJnbS7neKYy+MTw0ZV
pJWTn73sxHeF+APnZ+geFYcmyteL/SZplgHgwLipFaojs7i/+tOvLFRRD3LueCz6
Q6Muvke9R5uN6LL3tUrmIhJeyZjOwJtKEYoRz6Fo5Tq3RRK0VHYZkdpqRQ86rr9U
yNbRNlBLlANstq8xjtiwm22ImPGIsvvhs0AvFft0H72zhf3Tfy0VcTLlDJYwkdKN
Bz59v4N4jg1ajXuAsj4/BQdj5uRkzJNZ3Yq1yvMmGI+2UV1tInHEMeZi63+4gs8F
0FYCziA+j6/08u8v8CrwdDOpJ9Iq/NMV6359bt5ySu3rM6QnPRGnHGkv4Ptk9Oyh
sSPiuHS8YxpRBOB8xXNp5xFipTZ4+65VqCz253pAsbbp5aZ/o9Jw/bbBFFWcSsRZ
tV2xhELVzPiC7dowfYkQhcNZOLlCaJ/Mvo2nMOtjbUwzaXkcjIRcYEy7+dTlfXyr
3h5sStjWAKEmtLEvCYSI+lAT544tbV1x+lCMJGuvztapsTMtAP4GpQKblXl5qqnV
L+VWIPJV1elI26H/p/Max9Z1s48NtfwoJSRL7Rnaya6uJ3iWG/BtajHlFbvgOkJf
ObPZ//Yr9e91gg==
=zDwL
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support using Zkr to seed KASLR
- Support IPI-triggered CPU backtracing
- Support for generic CPU vulnerabilities reporting to userspace
- A few cleanups for missing licenses
- The size limit on the XIP kernel has been removed
- Support for tracing userspace stacks
- Support for the Svvptc extension
- Various cleanups and fixes throughout the tree
* tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (47 commits)
crash: Fix riscv64 crash memory reserve dead loop
perf/riscv-sbi: Add platform specific firmware event handling
tools: Optimize ring buffer for riscv
tools: Add riscv barrier implementation
RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_t
ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
riscv: Enable bitops instrumentation
riscv: Omit optimized string routines when using KASAN
ACPI: RISCV: Make acpi_numa_get_nid() to be static
riscv: Randomize lower bits of stack address
selftests: riscv: Allow mmap test to compile on 32-bit
riscv: Make riscv_isa_vendor_ext_andes array static
riscv: Use LIST_HEAD() to simplify code
riscv: defconfig: Disable RZ/Five peripheral support
RISC-V: Implement kgdb_roundup_cpus() to enable future NMI Roundup
riscv: avoid Imbalance in RAS
riscv: cacheinfo: Add back init_cache_level() function
riscv: Remove unused _TIF_WORK_MASK
drivers/perf: riscv: Remove redundant macro check
riscv: define ILLEGAL_POINTER_VALUE for 64bit
...
Many of the other architectures use their custom barrier implementations.
Use the barrier code from the kernel sources to optimize barriers in
tools.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20240806-optimize_ring_buffer_read_riscv-v2-1-ca7e193ae198@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmboHyUACgkQSfxwEqXe
A66wGQ/8DRIjBllwf1YuTWi4T6OcfoYxK6C9bXO6QPP5gzdTyFE9pvDuuPyad6+F
FR086ydTHeodemz1dFiQCL9etcUaxo4+6FRKyXKF9/1ezGbTA5nJd0/fKJGlqbI2
EoA4LNYHOsvCZk1BTpxRNWKeKphU9zQgQdSigy6Rx8p269UkGmIZjD1PtUc+vqfR
Ox0dK/Cswyo236fRi5HzaoMntWI4vXgLfxty0e1R7tfbstkCxSKWAON1lo3uHgkA
0HpJXWgWXAPt9gp++Fs/jGNpOqbt6IaKeV5f7CjYfvWhlFjNMhQxF+PbxknaZn/k
K0gQsItOIoFTfbQdLDIdfnj9awMdLW8FB2A1WXHpNr9pVC4ickPb1bMTF/XRd0tm
wBNu4BL0gklx6017KZg5uINMIduzMLGkBLRFiBW0en/sZMLTJTMg58BJn0CL1Pmh
1ll/Q3ToSMHalvxU2OnJagTwh4fzzCEpK/hW9WiDO4jSCsMXyX0clinrCjNo1JfA
tqgTWEy3uGtg+dg0Du9VD5JASbNQSJ0ZRnas5+qz10IRWWfTolrsk61dliXLQ4Sv
tSryDtsE2znwJF1Krh4aHNSSVhD5/l/8QaXkf9aZc/kkaHxwsx83FuWnqw6nMz8c
l4B2MbH0jUgsEqEyx+0iwk+FXE9kZKWumTVLjFZ6bRnq3q+uq0U=
=mWCw
-----END PGP SIGNATURE-----
Merge tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"Originally I'd planned on sending each of the vDSO getrandom()
architecture ports to their respective arch trees. But as we started
to work on this, we found lots of interesting issues in the shared
code and infrastructure, the fixes for which the various archs needed
to base their work.
So in the end, this turned into a nice collaborative effort fixing up
issues and porting to 5 new architectures -- arm64, powerpc64,
powerpc32, s390x, and loongarch64 -- with everybody pitching in and
commenting on each other's code. It was a fun development cycle.
This contains:
- Numerous fixups to the vDSO selftest infrastructure, getting it
running successfully on more platforms, and fixing bugs in it.
- Additions to the vDSO getrandom & chacha selftests. Basically every
time manual review unearthed a bug in a revision of an arch patch,
or an ambiguity, the tests were augmented.
By the time the last arch was submitted for review, s390x, v1 of
the series was essentially fine right out of the gate.
- Fixes to the the generic C implementation of vDSO getrandom, to
build and run successfully on all archs, decoupling it from
assumptions we had (unintentionally) made on x86_64 that didn't
carry through to the other architectures.
- Port of vDSO getrandom to LoongArch64, from Xi Ruoyao and acked by
Huacai Chen.
- Port of vDSO getrandom to ARM64, from Adhemerval Zanella and acked
by Will Deacon.
- Port of vDSO getrandom to PowerPC, in both 32-bit and 64-bit
varieties, from Christophe Leroy and acked by Michael Ellerman.
- Port of vDSO getrandom to S390X from Heiko Carstens, the arch
maintainer.
While it'd be natural for there to be things to fix up over the course
of the development cycle, these patches got a decent amount of review
from a fairly diverse crew of folks on the mailing lists, and, for the
most part, they've been cooking in linux-next, which has been helpful
for ironing out build issues.
In terms of architectures, I think that mostly takes care of the
important 64-bit archs with hardware still being produced and running
production loads in settings where vDSO getrandom is likely to help.
Arguably there's still RISC-V left, and we'll see for 6.13 whether
they find it useful and submit a port"
* tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (47 commits)
selftests: vDSO: check cpu caps before running chacha test
s390/vdso: Wire up getrandom() vdso implementation
s390/vdso: Move vdso symbol handling to separate header file
s390/vdso: Allow alternatives in vdso code
s390/module: Provide find_section() helper
s390/facility: Let test_facility() generate static branch if possible
s390/alternatives: Remove ALT_FACILITY_EARLY
s390/facility: Disable compile time optimization for decompressor code
selftests: vDSO: fix vdso_config for s390
selftests: vDSO: fix ELF hash table entry size for s390x
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32
powerpc/vdso: Refactor CFLAGS for CVDSO build
powerpc/vdso32: Add crtsavres
mm: Define VM_DROPPABLE for powerpc/32
powerpc/vdso: Fix VDSO data access when running in a non-root time namespace
selftests: vDSO: don't include generated headers for chacha test
arm64: vDSO: Wire up getrandom() vDSO implementation
arm64: alternative: make alternative_has_cap_likely() VDSO compatible
selftests: vDSO: also test counter in vdso_test_chacha
...
- Rework kcpuid to handle the the autogenerated CSV file correctly and
update the CSV file to cover the whole zoo of CPUID.
- Avoid memcpy() for ia32 syscall_get_arguments() and use direct
assignments as fortified memcpy() is unhappy about writing/reading
beyond the end of the addresses destination/source struct member
- A few new PCI IDs for AMD
- Update MAINTAINERS to cover x86 specific selftests
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpOZ8THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofVUEACt8JjMxanswpMy1O6HbJcdVf2wwZ3q
n30BKIFXucvqE6Opc7tWy5THh1+YjHuNXZMkfuuEe2Qjc69z2m3YwUmF0oAB9/AI
6HU4yoePHTbEiPbTjNZMaKL+9CaYJbWkgoEjQpdQGWmo6gJqJxoRF5fY2assLfdJ
zik2faebMNj3l1C1R1w646Zu3CScfZUE8512zwBfOxTqkpVBO4uDrspTzLYljlQN
+gPZ41XDvQKu6SVoVC/TH/oRdshtLBg74fUDoL14yMkWqx3N5IKulFIMCeD2dEHv
pJcbYb8x0pJ1iLx8q/k+spzbvTewY3sAAzbo5JLvcHy1PhW8jc+uCWorMpqLEhH0
LzH1XZwC+kYvJytzZ9EEyYJAAMbh3KRBaphEXmRVec19tujwRy2NGjhRyVmLyqYr
aShIGEVqigCGY8dF0mJgyVu5kd7X4vDZw4xH92c5/G41Ui19cXp1nXh61KMs1WMR
sQm9FDvtRgcX9Pc89RyRRgYz2U75p3gcNyXKio4Oa2VfIlGRYUB5kg5/qDx3RjJx
kZZ44TqPA/oJjpJyNjVrYqD6Gd3WUsjuH2gn6IAohKiSEKDdGTtHu7LEnKEcdkQk
TomxWk1fTR8513GNXgEy2YhXdRN8iTlhgRI9G2BA5c4B6MCGHzPRFzWrosogB3+g
tAOsEN8Sp3ea+g==
=XVR5
-----END PGP SIGNATURE-----
Merge tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Thomas Gleixner:
- Rework kcpuid to handle the the autogenerated CSV file correctly and
update the CSV file to cover the whole zoo of CPUID.
- Avoid memcpy() for ia32 syscall_get_arguments() and use direct
assignments as fortified memcpy() is unhappy about writing/reading
beyond the end of the addresses destination/source struct member
- A few new PCI IDs for AMD
- Update MAINTAINERS to cover x86 specific selftests
* tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add selftests/x86 entry
x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h-70h
x86/syscall: Avoid memcpy() for ia32 syscall_get_arguments()
MAINTAINERS: Add x86 cpuid database entry
tools/x86/kcpuid: Introduce a complete cpuid bitfields CSV file
tools/x86/kcpuid: Parse subleaf ranges if provided
tools/x86/kcpuid: Recognize all leaves with subleaves
tools/x86/kcpuid: Strip bitfield names leading/trailing whitespace
tools/x86/kcpuid: Protect against faulty "max subleaf" values
tools/x86/kcpuid: Set max possible subleaves count to 64
tools/x86/kcpuid: Properly align long-description columns
tools/x86/kcpuid: Remove unused variable
x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h
Provide the s390 specific vdso getrandom() architecture backend.
_vdso_rng_data required data is placed within the _vdso_data vvar page,
by using a hardcoded offset larger than vdso_data.
As required the chacha20 implementation does not write to the stack.
The implementation follows more or less the arm64 implementations and
makes use of vector instructions. It has a fallback to the getrandom()
system call for machines where the vector facility is not installed.
The check if the vector facility is installed, as well as an
optimization for machines with the vector-enhancements facility 2, is
implemented with alternatives, avoiding runtime checks.
Note that __kernel_getrandom() is implemented without the vdso user
wrapper which would setup a stack frame for odd cases (aka very old
glibc variants) where the caller has not done that. All callers of
__kernel_getrandom() are required to setup a stack frame, like the C ABI
requires it.
The vdso testcases vdso_test_getrandom and vdso_test_chacha pass.
Benchmark on a z16:
$ ./vdso_test_getrandom bench-single
vdso: 25000000 times in 0.493703559 seconds
syscall: 25000000 times in 6.584025337 seconds
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
To be consistent with other VDSO functions, the function is called
__kernel_getrandom()
__arch_chacha20_blocks_nostack() fonction is implemented basically
with 32 bits operations. It performs 4 QUARTERROUND operations in
parallele. There are enough registers to avoid using the stack:
On input:
r3: output bytes
r4: 32-byte key input
r5: 8-byte counter input/output
r6: number of 64-byte blocks to write to output
During operation:
stack: pointer to counter (r5) and non-volatile registers (r14-131)
r0: counter of blocks (initialised with r6)
r4: Value '4' after key has been read, used for indexing
r5-r12: key
r14-r15: block counter
r16-r31: chacha state
At the end:
r0, r6-r12: Zeroised
r5, r14-r31: Restored
Performance on powerpc 885 (using kernel selftest):
~# ./vdso_test_getrandom bench-single
vdso: 25000000 times in 62.938002291 seconds
libc: 25000000 times in 535.581916866 seconds
syscall: 25000000 times in 531.525042806 seconds
Performance on powerpc 8321 (using kernel selftest):
~# ./vdso_test_getrandom bench-single
vdso: 25000000 times in 16.899318858 seconds
libc: 25000000 times in 131.050596522 seconds
syscall: 25000000 times in 129.794790389 seconds
This first patch adds support for VDSO32. As selftests cannot easily
be generated only for VDSO32, and because the following patch brings
support for VDSO64 anyway, this patch opts out all code in
__arch_chacha20_blocks_nostack() so that vdso_test_chacha will not
fail to compile and will not crash on PPC64/PPC64LE, allthough the
selftest itself will fail.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Hook up the generic vDSO implementation to the aarch64 vDSO data page.
The _vdso_rng_data required data is placed within the _vdso_data vvar
page, by using a offset larger than the vdso_data.
The vDSO function requires a ChaCha20 implementation that does not write
to the stack, and that can do an entire ChaCha20 permutation. The one
provided uses NEON on the permute operation, with a fallback to the
syscall for chips that do not support AdvSIMD.
This also passes the vdso_test_chacha test along with
vdso_test_getrandom. The vdso_test_getrandom bench-single result on
Neoverse-N1 shows:
vdso: 25000000 times in 0.783884250 seconds
libc: 25000000 times in 8.780275399 seconds
syscall: 25000000 times in 8.786581518 seconds
A small fixup to arch/arm64/include/asm/mman.h was required to avoid
pulling kernel code into the vDSO, similar to what's already done in
arch/arm64/include/asm/rwonce.h.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Hook up the generic vDSO implementation to the LoongArch vDSO data page
by providing the required __arch_chacha20_blocks_nostack,
__arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also
wire up the selftests.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Acked-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Architectures use different location for vDSO sources:
arch/mips/vdso
arch/sparc/vdso
arch/arm64/kernel/vdso
arch/riscv/kernel/vdso
arch/csky/kernel/vdso
arch/x86/um/vdso
arch/x86/entry/vdso
arch/powerpc/kernel/vdso
arch/arm/vdso
arch/loongarch/vdso
Don't hard-code vdso sources location in selftest Makefile. Instead
create a vdso/ symbolic link in tools/arch/$arch/ and update Makefile
accordingly.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
To pick up changes from:
9ef54a3845 arm64: cputype: Add Cortex-A725 definitions
58d245e03c arm64: cputype: Add Cortex-X1C definitions
fd2ff5f0b3 arm64: cputype: Add Cortex-X925 definitions
add332c403 arm64: cputype: Add Cortex-A720 definitions
be5a6f2387 arm64: cputype: Add Cortex-X3 definitions
This should be used to beautify x86 syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To pick up changes from:
149fd4712b perf/x86/intel: Support Perfmon MSRs aliasing
21b362cc76 x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems
4f460bff7b cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.h
7ea81936b8 x86/cpufeatures: Add HWP highest perf change feature flag
78ce84b9e0 x86/cpufeatures: Flip the /proc/cpuinfo appearance logic
1beb348d5c x86/sev: Provide SVSM discovery support
This should be used to beautify x86 syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
And other arch-specific UAPI headers to pick up changes from:
4b23e0c199 KVM: Ensure new code that references immediate_exit gets extra scrutiny
85542adb65 KVM: x86: Add KVM_RUN_X86_GUEST_MODE kvm_run flag
6fef518594 KVM: x86: Add a capability to configure bus frequency for APIC timer
34ff659017 x86/sev: Use kernel provided SVSM Calling Areas
5dcc1e7614 Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEAD
9a0d2f4995 KVM: PPC: Book3S HV: Add one-reg interface for HASHPKEYR register
e9eb790b25 KVM: PPC: Book3S HV: Add one-reg interface for HASHKEYR register
1a1e6865f5 KVM: PPC: Book3S HV: Add one-reg interface for DEXCR register
This should be used to beautify KVM syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
For parsing the cpuid bitfields, kcpuid uses an incomplete CSV file with
300+ bitfields.
Use an auto-generated CSV file from the x86-cpuid.org project instead.
It provides complete bitfields coverage: 830+ bitfields, all with proper
descriptions.
The auto-generated file has the following blurb automatically added:
# SPDX-License-Identifier: CC0-1.0
# Generator: x86-cpuid-db v1.0
The generator tag includes the project's workspace "git describe"
version string. It is intended for projects like KernelCI, to aid in
verifying that the auto-generated files have not been tampered with.
The file also has the blurb:
# Auto-generated file.
# Please submit all updates and bugfixes to https://x86-cpuid.org
It's thus kindly requested that the Linux kernel's x86 tree maintainers
enforce sending all updates to x86-cpuid.org's upstream database first,
thus benefiting the whole ecosystem.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://gitlab.com/x86-cpuid.org/x86-cpuid-db/-/blob/v1.0/LICENSE.rst
Link: https://gitlab.com/x86-cpuid.org/x86-cpuid-db
Link: https://lore.kernel.org/all/20240718134755.378115-9-darwi@linutronix.de
It's a common pattern in cpuid leaves to have the same bitfields format
repeated across a number of subleaves. Typically, this is used for
enumerating hierarchial structures like cache and TLB levels, CPU
topology levels, etc.
Modify kcpuid.c to handle subleaf ranges in the CSV file subleaves
column. For example, make it able to parse lines in the form:
# LEAF, SUBLEAVES, reg, bits, short_name , ...
0xb, 1:0, eax, 4:0, x2apic_id_shift , ...
0xb, 1:0, ebx, 15:0, domain_lcpus_count , ...
0xb, 1:0, ecx, 7:0, domain_nr , ...
This way, full output can be printed to the user.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-8-darwi@linutronix.de
cpuid.csv will be extended in further commits with all-publicly-known
CPUID leaves and bitfields. Thus, modify has_subleafs() to identify all
known leaves with subleaves.
Remove the redundant "is_amd" check since all x86 vendors already report
the maxium supported extended leaf at leaf 0x80000000 EAX register.
The extra mentioned leaves are:
- Leaf 0x12, Intel Software Guard Extensions (SGX) enumeration
- Leaf 0x14, Intel process trace (PT) enumeration
- Leaf 0x17, Intel SoC vendor attributes enumeration
- Leaf 0x1b, Intel PCONFIG (Platform configuration) enumeration
- Leaf 0x1d, Intel AMX (Advanced Matrix Extensions) tile information
- Leaf 0x1f, Intel v2 extended topology enumeration
- Leaf 0x23, Intel ArchPerfmonExt (Architectural PMU ext) enumeration
- Leaf 0x80000020, AMD Platform QoS extended features enumeration
- Leaf 0x80000026, AMD v2 extended topology enumeration
Set the 'max_subleaf' variable for all the newly marked leaves with extra
subleaves. Ideally, this should be fetched from the CSV file instead,
but the current kcpuid code architecture has two runs: one run to
serially invoke the cpuid instructions and save all the output in-memory,
and one run to parse this in-memory output through the CSV specification.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-7-darwi@linutronix.de
While parsing and saving bitfield names from the CSV file, an extra
leading space is copied verbatim. That extra space is not a big issue
now, but further commits will add a new CSV file with much more padding
for the bitfield's name column.
Strip leading/trailing whitespaces while saving bitfield names.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-6-darwi@linutronix.de
Protect against the kcpuid code parsing faulty max subleaf numbers
through a min() expression. Thus, ensuring that max_subleaf will always
be ≤ MAX_SUBLEAF_NUM.
Use "u32" for the subleaf numbers since kcpuid is compiled with -Wextra,
which includes signed/unsigned comparisons warnings.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-5-darwi@linutronix.de
cpuid.csv will be extended in further commits with all-publicly-known
CPUID leaves and bitfields. One of the new leaves is 0xd for extended
CPU state enumeration. Depending on XCR0 dword bits, it can export up to
64 subleaves.
Set kcpuid.c MAX_SUBLEAF_NUM to 64.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-4-darwi@linutronix.de
When kcpuid is invoked with "--all --details", the detailed description
column is not properly aligned for all bitfield rows:
CPUID_0x4_ECX[0x0]:
cache_level : 0x1 - Cache Level ...
cache_self_init - Cache Self Initialization
This is due to differences in output handling between boolean single-bit
"bitflags" and multi-bit bitfields. For the former, the bitfield's value
is not outputted as it is implied to be true by just outputting the
bitflag's name in its respective line.
If long descriptions were requested through the --all parameter, properly
align the bitflag's description columns through extra tabs. With that,
the sample output above becomes:
CPUID_0x4_ECX[0x0]:
cache_level : 0x1 - Cache Level ...
cache_self_init - Cache Self Initialization
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-3-darwi@linutronix.de
Most of this is part of my ongoing work to clean up the system call
tables. In this bit, all of the newer architectures are converted to
use the machine readable syscall.tbl format instead in place of complex
macros in include/uapi/asm-generic/unistd.h.
This follows an earlier series that fixed various API mismatches
and in turn is used as the base for planned simplifications.
The other two patches are dead code removal and a warning fix.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmaVB1cACgkQYKtH/8kJ
UicMqxAAnYKOxfjoMIhYYK6bl126wg/vIcDcjIR9cNWH21Nhn3qxn11ZXau3S7xv
3l/HreEhyEQr4gC2a70IlXyHUadYOlrk+83OURrunWk1oKPmZlMKcfPVbtp8GL7x
PUNXQfwM1XZLveKwufY24hoZdwKC+Y/5WLc1t0ReznJuAqgeO2rM9W5dnV5bAfCp
he3F5hFcr196Dz3/GJjJIWrY+cbwfmZWsNtj1vFTL5/r/LuCu8HTkqhsGj8tE5BJ
NGVEEXbp5eaVTCIGqJWhnuZcsnKN9kM51M7CtdwWf8OTckUVuJap5OsDVKQkWkGl
bLPbd2jhDltph0sah51hAIvv4WdkThW76u9FRW7KR3fo7ra67eF7l5j7wc1lE2JB
GwLJ1X56Bxe1GhvvNTlDmb7DrnlP/DMPuRv3Z6xyH6l8iZ2pMGlnAxuw6Bs1s6Y5
WSs36ZpnS0ctgjfx37ZITsZSvbKFPpQFJP4siwS8aRNv/NFALNNdFyOCY5lNzspZ
0dxwjn6/7UpHE4MKh6/hvCg2QwupXXBTRytibw+75/rOsR+EYlmtuONtyq2sLUHe
ktJ5pg+8XuZm27+wLffuluzmY7sv2F8OU4cTYeM60Ynmc6pRzwUY6/VhG52S1/mU
Ua4VgYIpzOtlLrYmz5QTWIZpdSFSVbIc/3pLriD6hn4Mvg+BwdA=
=XOhL
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"Most of this is part of my ongoing work to clean up the system call
tables. In this bit, all of the newer architectures are converted to
use the machine readable syscall.tbl format instead in place of
complex macros in include/uapi/asm-generic/unistd.h.
This follows an earlier series that fixed various API mismatches and
in turn is used as the base for planned simplifications.
The other two patches are dead code removal and a warning fix"
* tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
vmlinux.lds.h: catch .bss..L* sections into BSS")
fixmap: Remove unused set_fixmap_offset_io()
riscv: convert to generic syscall table
openrisc: convert to generic syscall table
nios2: convert to generic syscall table
loongarch: convert to generic syscall table
hexagon: use new system call table
csky: convert to generic syscall table
arm64: rework compat syscall macros
arm64: generate 64-bit syscall.tbl
arm64: convert unistd_32.h to syscall.tbl format
arc: convert to generic syscall table
clone3: drop __ARCH_WANT_SYS_CLONE3 macro
kbuild: add syscall table generation to scripts/Makefile.asm-headers
kbuild: verify asm-generic header list
loongarch: avoid generating extra header files
um: don't generate asm/bpf_perf_event.h
csky: drop asm/gpio.h wrapper
syscalls: add generic scripts/syscall.tbl
When clone3() was introduced, it was not obvious how each architecture
deals with setting up the stack and keeping the register contents in
a fork()-like system call, so this was left for the architecture
maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those
that already implement it.
Five years later, we still have a few architectures left that are missing
clone3(), and the macro keeps getting in the way as it's fundamentally
different from all the other __ARCH_WANT_SYS_* macros that are meant
to provide backwards-compatibility with applications using older
syscalls that are no longer provided by default.
Address this by reversing the polarity of the macro, adding an
__ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't
already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3
from all the other ones.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
So far the Makefile just installed the csv into $(HWDATADIR)/cpuid.csv, which
made it unaware about $DESTDIR. Add $DESTDIR to the install command and while
at it also create the directory, should it not exist already. This eases the
packaging of kcpuid and allows i.e. for the install on Arch to look like this:
$ make BINDIR=/usr/bin DESTDIR="$pkgdir" -C tools/arch/x86/kcpuid install
Some background on DESTDIR:
DESTDIR is commonly used in packaging for staged installs (regardless of the
used package manager):
https://www.gnu.org/prep/standards/html_node/DESTDIR.html
So the package is built and installed into a directory which the package
manager later picks up and creates some archive from it.
What is specific to Arch Linux here is only the usage of $pkgdir in the
example, DESTDIR itself is widely used.
[ bp: Extend the commit message with Christian's info on DESTDIR as a GNU
coding standards thing. ]
Signed-off-by: Christian Heusel <christian@heusel.eu>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240531111757.719528-2-christian@heusel.eu
To get the changes in:
0ce85db6c2 ("arm64: cputype: Add Neoverse-V3 definitions")
02a0a04676 ("arm64: cputype: Add Cortex-X4 definitions")
f4d9d9dcc7 ("arm64: Add Neoverse-V2 part")
That makes this perf source code to be rebuilt:
CC /tmp/build/perf-tools/util/arm-spe.o
The changes in the above patch add MIDR_NEOVERSE_V[23] and
MIDR_NEOVERSE_V1 is used in arm-spe.c, so probably we need to add those
and perhaps MIDR_CORTEX_X4 to that array? Or maybe we need to leave this
for later when this is all tested on those machines?
static const struct midr_range neoverse_spe[] = {
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
{},
};
Mark Rutland recommended about arm-spe.c:
"I would not touch this for now -- someone would have to go audit the
TRMs to check that those other cores have the same encoding, and I think
it'd be better to do that as a follow-up."
That addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/lkml/Zl8cYk0Tai2fs7aM@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
4af663c2f6 ("KVM: SEV: Allow per-guest configuration of GHCB protocol version")
4f5defae70 ("KVM: SEV: introduce KVM_SEV_INIT2 operation")
26c44aa9e0 ("KVM: SEV: define VM types for SEV and SEV-ES")
ac5c48027b ("KVM: SEV: publish supported VMSA features")
651d61bc8b ("KVM: PPC: Fix documentation for ppc mmu caps")
That don't change functionality in tools/perf, as no new ioctl is added
for the 'perf trace' scripts to harvest.
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/lkml/ZlYxAdHjyAkvGtMW@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from these csets:
53bc516ade ("x86/msr: Move ARCH_CAP_XAPIC_DISABLE bit definition to its rightful place")
That patch just move definitions around, so this just silences this perf
build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/lkml/ZlYe8jOzd1_DyA7X@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- Add Kan Liang to MAINTAINERS as a perf tools reviewer.
- Add support for using the 'capstone' disassembler library in various tools,
such as 'perf script' and 'perf annotate'. This is an alternative for the
use of the 'xed' and 'objdump' disassemblers.
- Data-type profiling improvements:
Resolve types for a->b->c by backtracking the assignments until it finds
DWARF info for one of those members
Support for global variables, keeping a cache to speed up lookups.
Handle the 'call' instruction, dealing with effects on registers and handling
its return when tracking register data types.
Handle x86's segment based addressing like %gs:0x28, to support things like
per CPU variables, the stack canary, etc.
Data-type profiling got big speedups when using capstone for disassembling.
The objdump outoput parsing method is left as a fallback when capstone fails or
isn't available. There are patches posted for 6.11 that to use a LLVM
disassembler.
Support event group display in the TUI when annotating types with --data-type,
for instance to show memory load and store events for the data type fields.
Optimize the 'perf annotate' data structures, reducing memory usage.
Add a initial 'perf test' for 'perf annotate', checking that a target symbol
appears on the output, specifying objdump via the command line, etc.
- Integrate the shellcheck utility with the build of perf to allow catching
shell problems early in areas such as 'perf test', 'perf trace' scrape
scripts, etc.
- Add 'uretprobe' variant in the 'perf bench uprobe' tool.
- Add script to run instances of 'perf script' in parallel.
- Allow parsing tracepoint names that start with digits, such as
9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems
where those tracepoints aren't available.
Vendor Events:
- Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice
Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X,
Sky Lake and Snow Ridge X. Remove info metrics erroneously in TopdownL1.
- Add AMD's Zen 5 core and uncore events and metrics. Those come from the
"Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors"
document, with events that capture information on op dispatch, execution and
retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc.
- Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/AmpereOneX.
Miscellaneous:
- Sync header copies with the kernel sources.
- Move some header copies used only for generating translation string tables
for ioctl cmds and other syscall integer arguments to a new directory under
tools/perf/beauty/, to separate from copies in tools/include/ that are used
to build the tools.
- Introduce scrape script for several syscall 'flags'/'mask' arguments.
- Improve cpumap utilization, fixing up pairing of refcounts, using the right
iterators (perf_cpu_map__for_each_cpu), etc.
- Give more details about raw event encodings in 'perf list', show tracepoint
encoding in the detailed output.
- Refactor the DSOs handling code, reducing memory usage.
- Document the BPF event modifier and add a 'perf test' for it.
- Improve the event parser, better error messages and add further 'perf test's
for it.
- Add reference count checking to 'struct comm_str' and 'struct mem_info'.
- Make ARM64's 'perf test' entries for the Neoverse N1 more robust.
- Tweak the ARM64's Coresight 'perf test's.
- Improve ARM64's CoreSight ETM version detection and error reporting.
- Fix handling of symbols when using kcore.
- Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual
machines in 'perf report'.
- Fix -g/--call-graph option failure in 'perf sched timehist'.
- Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent
installed in non-standard directories, such as when doing cross builds.
- Various 'perf test' and 'perf bench' fixes.
- Improve 'perf probe' error message for long C++ probe names.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZkzdjgAKCRCyPKLppCJ+
J8ZYAP46rcbOuDWol6tjD9FDXd+spkWc40bnqeSnOR+TWlmJXwEA87XU4+3LAh6p
HQxKXehJRh90I90yn954mK2NuN+58Q0=
=sie+
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"General:
- Integrate the shellcheck utility with the build of perf to allow
catching shell problems early in areas such as 'perf test', 'perf
trace' scrape scripts, etc
- Add 'uretprobe' variant in the 'perf bench uprobe' tool
- Add script to run instances of 'perf script' in parallel
- Allow parsing tracepoint names that start with digits, such as
9p/9p_client_req, etc. Make sure 'perf test' tests it even on
systems where those tracepoints aren't available
- Add Kan Liang to MAINTAINERS as a perf tools reviewer
- Add support for using the 'capstone' disassembler library in
various tools, such as 'perf script' and 'perf annotate'. This is
an alternative for the use of the 'xed' and 'objdump' disassemblers
Data-type profiling improvements:
- Resolve types for a->b->c by backtracking the assignments until it
finds DWARF info for one of those members
- Support for global variables, keeping a cache to speed up lookups
- Handle the 'call' instruction, dealing with effects on registers
and handling its return when tracking register data types
- Handle x86's segment based addressing like %gs:0x28, to support
things like per CPU variables, the stack canary, etc
- Data-type profiling got big speedups when using capstone for
disassembling. The objdump outoput parsing method is left as a
fallback when capstone fails or isn't available. There are patches
posted for 6.11 that to use a LLVM disassembler
- Support event group display in the TUI when annotating types with
--data-type, for instance to show memory load and store events for
the data type fields
- Optimize the 'perf annotate' data structures, reducing memory usage
- Add a initial 'perf test' for 'perf annotate', checking that a
target symbol appears on the output, specifying objdump via the
command line, etc
Vendor Events:
- Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand
Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra
Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics
erroneously in TopdownL1
- Add AMD's Zen 5 core and uncore events and metrics. Those come from
the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh
Processors" document, with events that capture information on op
dispatch, execution and retirement, branch prediction, L1 and L2
cache activity, TLB activity, etc
- Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/
AmpereOneX
Miscellaneous:
- Sync header copies with the kernel sources
- Move some header copies used only for generating translation string
tables for ioctl cmds and other syscall integer arguments to a new
directory under tools/perf/beauty/, to separate from copies in
tools/include/ that are used to build the tools
- Introduce scrape script for several syscall 'flags'/'mask'
arguments
- Improve cpumap utilization, fixing up pairing of refcounts, using
the right iterators (perf_cpu_map__for_each_cpu), etc
- Give more details about raw event encodings in 'perf list', show
tracepoint encoding in the detailed output
- Refactor the DSOs handling code, reducing memory usage
- Document the BPF event modifier and add a 'perf test' for it
- Improve the event parser, better error messages and add further
'perf test's for it
- Add reference count checking to 'struct comm_str' and 'struct
mem_info'
- Make ARM64's 'perf test' entries for the Neoverse N1 more robust
- Tweak the ARM64's Coresight 'perf test's
- Improve ARM64's CoreSight ETM version detection and error reporting
- Fix handling of symbols when using kcore
- Fix PAI (Processor Activity Instrumentation) counter names for s390
virtual machines in 'perf report'
- Fix -g/--call-graph option failure in 'perf sched timehist'
- Add LIBTRACEEVENT_DIR build option to allow building with
libtraceevent installed in non-standard directories, such as when
doing cross builds
- Various 'perf test' and 'perf bench' fixes
- Improve 'perf probe' error message for long C++ probe names"
* tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits)
tools lib subcmd: Show parent options in help
perf pmu: Count sys and cpuid JSON events separately
perf stat: Don't display metric header for non-leader uncore events
perf annotate-data: Ensure the number of type histograms
perf annotate: Fix segfault on sample histogram
perf daemon: Fix file leak in daemon_session__control
libsubcmd: Fix parse-options memory leak
perf lock: Avoid memory leaks from strdup()
perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency
perf tools: Ignore deleted cgroups
perf parse: Allow tracepoint names to start with digits
perf parse-events: Add new 'fake_tp' parameter for tests
perf parse-events: pass parse_state to add_tracepoint
perf symbols: Fix ownership of string in dso__load_vmlinux()
perf symbols: Update kcore map before merging in remaining symbols
perf maps: Re-use __maps__free_maps_by_name()
perf symbols: Remove map from list before updating addresses
perf tracepoint: Don't scan all tracepoints to test if one exists
perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT
perf thread: Fixes to thread__new() related to initializing comm
...
- Extend the x86 instruction decoder with APX and
other new instructions
- Misc cleanups
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZIa8ERHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gHbw//Zet6K5cbgp5QB570J+rdDyViAl+spYxt
sbWk8CUg0/jk5oSo45psl9xR8mmSPpeEOpTsuJPzEGfbunvTLU8G6HV/l1EDAk8I
Yeia3zLvssTfsirIfSck6spSDRmCRQTiKWibj2mlXSFlXRuVXiIKmbSYZGyx4vk6
5zkKuC6+k77X1qlWYCl9M9Sn0nWr/oEuXPXotliDqhev/DdhP5iBniKHEhkzUOEn
KHtfFTu0B4GbTC1w3hZ3Dmbqz3nrdXf56Py1Vf/uMyzP3UhuE0vE+tC4h7TnfZf6
LBTLEpw+K4KRuppcI2PbEMvzfMT41rtx7S8u83gzKIBhqrfSm1L6OSi8UEOph68G
+p1IS1H4c4woY+0JefaFLiTeweuws4L45PiNNa4qnQp9HX/3G3bTt+kc1vddbfjg
x7pnIntSDKwLtKfo5GYJ+OtTfKQRC13dQroLujsmFa0/me3MbFao+i50UlAoWWBa
1qSCsJpSpGAhYlchxBVfitiiLVpGU7+O39m6ZosA6n2HGSpfgfW1p3xigaPYRISq
GcedKmx8lIThe483T0Y8/Bk2QtCeVCryZb9Qij3B2NKFttlNJaGx/iabE2AuLheY
qnEEQ5UqYgrXEJz1Vu/QqR5Yb9dqkC2MID8llawK66M+kH91cXSXg7RcBEkoLBF4
eT9AuGGWMp4=
=mmyf
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:
- Extend the x86 instruction decoder with APX and
other new instructions
- Misc cleanups
* tag 'perf-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/cstate: Remove unused 'struct perf_cstate_msr'
perf/x86/rapl: Rename 'maxdie' to nr_rapl_pmu and 'dieid' to rapl_pmu_idx
x86/insn: Add support for APX EVEX instructions to the opcode map
x86/insn: Add support for APX EVEX to the instruction decoder logic
x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map
x86/insn: Add support for REX2 prefix to the instruction decoder logic
x86/insn: Add misc new Intel instructions
x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS
x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map
x86/insn: Add Key Locker instructions to the opcode map
Highlights:
- New drivers/platform/arm64 directory for arm64 embedded-controller drivers
- New drivers for:
- Acer Aspire 1 embedded controllers (for arm64 models)
- ACPI quickstart PNP0C32 buttons
- Dell All-In-One backlight support (dell-uart-backlight)
- Lenovo WMI camera buttons
- Lenovo Yoga Tablet 2 Pro 1380F/L fast charging
- MeeGoPad ANX7428 Type-C Cross Switch (power sequencing only)
- MSI WMI sensors (fan speed sensors only for now)
- Asus WMI:
- 2024 ROG Mini-LED support
- MCU powersave support
- Vivobook GPU MUX support
- Misc. other improvements
- Ideapad laptop:
- Export FnLock LED as LED class device
- Switch platform profiles using thermal management key
- Intel drivers:
- IFS: various improvements
- PMC: Lunar Lake support
- SDSI: various improvements
- TPMI/ISST: various improvements
- tools: intel-speed-select: various improvements
- MS Surface drivers:
- Fan profile switching support
- Surface Pro thermal sensors support
- ThinkPad ACPI:
- Reworked hotkey support to use sparse keymaps
- Add support for new trackpoint-doubletap, Fn+N and Fn+G hotkeys
- WMI core:
- New WMI driver development guide
- x86 Android tablets:
- Lenovo Yoga Tablet 2 Pro 1380F/L support
- Xiaomi MiPad 2 status LED and bezel touch buttons backlight support
- Miscellaneous cleanups / fixes / improvements
The following is an automated git shortlog grouped by driver:
ACPI:
- platform-profile: add platform_profile_cycle()
Add ACPI quickstart button (PNP0C32) driver:
- Add ACPI quickstart button (PNP0C32) driver
Add lenovo-yoga-tab2-pro-1380-fastcharger driver:
- Add lenovo-yoga-tab2-pro-1380-fastcharger driver
Add new Dell UART backlight driver:
- Add new Dell UART backlight driver
Add lenovo WMI camera button driver:
- Add lenovo WMI camera button driver
Add new MeeGoPad ANX7428 Type-C Cross Switch driver:
- Add new MeeGoPad ANX7428 Type-C Cross Switch driver
ISST:
- Support SST-BF and SST-TF per level
- Add missing MODULE_DESCRIPTION
- Add dev_fmt
- Use in_range() to check package ID validity
- Support partitioned systems
- Shorten the assignments for power_domain_info
- Use local variable for auxdev->dev
MAINTAINERS:
- drop Daniel Oliveira Nascimento
arm64:
- dts: qcom: acer-aspire1: Add embedded controller
asus-laptop:
- Use sysfs_emit() and sysfs_emit_at() to replace sprintf()
asus-wmi:
- cleanup main struct to avoid some holes
- Add support for MCU powersave
- ROG Ally increase wait time, allow MCU powersave
- adjust formatting of ppt-<name>() functions
- store a min default for ppt options
- support toggling POST sound
- add support variant of TUF RGB
- add support for Vivobook GPU MUX
- add support for 2024 ROG Mini-LED
- use sysfs_emit() instead of sprintf()
classmate-laptop:
- Add missing MODULE_DESCRIPTION()
devm-helpers:
- Fix a misspelled cancellation in the comments
dt-bindings:
- leds: Add LED_FUNCTION_FNLOCK
- platform: Add Acer Aspire 1 EC
hp-wmi:
- use sysfs_emit() instead of sprintf()
huawei-wmi:
- use sysfs_emit() instead of sprintf()
ideapad-laptop:
- switch platform profiles using thermal management key
- add FnLock LED class device
- add fn_lock_get/set functions
intel-vbtn:
- Log event code on unexpected button events
intel/pmc:
- Enable S0ix blocker show in Lunar Lake
- Add support to show S0ix blocker counter
- Update LNL signal status map
msi-laptop:
- Use sysfs_emit() to replace sprintf()
p2sb:
- Don't init until unassigned resources have been assigned
- Make p2sb_get_devfn() return void
platform:
- arm64: Add Acer Aspire 1 embedded controller driver
- Add ARM64 platform directory
platform/surface:
- aggregator: Log critical errors during SAM probing
- aggregator_registry: Add support for thermal sensors on the Surface Pro 9
- platform_profile: add fan profile switching
platform/x86/amd:
- pmc: Add new ACPI ID AMDI000B
- pmf: Add new ACPI ID AMDI0105
platform/x86/amd/hsmp:
- switch to use device_add_groups()
platform/x86/amd/pmc:
- Fix implicit declaration error on i386
- Add AMD MP2 STB functionality
platform/x86/fujitsu-laptop:
- Replace sprintf() with sysfs_emit()
platform/x86/intel-uncore-freq:
- Don't present root domain on error
platform/x86/intel/ifs:
- Disable irq during one load stage
- trace: display batch num in hex
- Classify error scenarios correctly
platform/x86/intel/pmc:
- Fix PCH names in comments
platform/x86/intel/sdsi:
- Add attribute to read the current meter state
- Add in-band BIOS lock support
- Combine read and write mailbox flows
- Set message size during writes
platform/x86/intel/tpmi:
- Add additional TPMI header fields
- Align comments in kernel-doc
- Check major version change for TPMI Information
- Handle error from tpmi_process_info()
quickstart:
- Fix race condition when reporting input event
- fix Kconfig selects
- Miscellaneous improvements
samsung-laptop:
- Use sysfs_emit() to replace the old interface sprintf()
think-lmi:
- Convert container_of() macros to static inline
thinkpad_acpi:
- Use false to set acpi_send_ev to false
- Support hotkey to disable trackpoint doubletap
- Support for system debug info hotkey
- Support for trackpoint doubletap
- Simplify known_ev handling
- Add mappings for adaptive kbd clipping-tool and cloud keys
- Switch to using sparse-keymap helpers
- Drop KEY_RESERVED special handling
- Use correct keycodes for volume and brightness keys
- Change hotkey_reserved_mask initialization
- Do not send ACPI netlink events for unknown hotkeys
- Move tpacpi_driver_event() call to tpacpi_input_send_key()
- Move hkey > scancode mapping to tpacpi_input_send_key()
- Drop tpacpi_input_send_key_masked() and hotkey_driver_event()
- Always call tpacpi_driver_event() for hotkeys
- Move hotkey_user_mask check to tpacpi_input_send_key()
- Move special original hotkeys handling out of switch-case
- Move adaptive kbd event handling to tpacpi_driver_event()
- Make tpacpi_driver_event() return if it handled the event
- Do hkey to scancode translation later
- Use tpacpi_input_send_key() in adaptive kbd code
- Drop ignore_acpi_ev
- Drop setting send_/ignore_acpi_ev defaults twice
- Provide hotkey_poll_stop_sync() dummy
- Take hotkey_mutex during hotkey_exit()
- change sprintf() to sysfs_emit()
- use platform_profile_cycle()
tools arch x86:
- Add dell-uart-backlight-emulator
tools/arch/x86/intel_sdsi:
- Add current meter support
- Simplify ascii printing
- Fix meter_certificate decoding
- Fix meter_show display
- Fix maximum meter bundle length
tools/power/x86/intel-speed-select:
- v1.19 release
- Display CPU as None for -1
- SST BF/TF support per level
- Increase number of CPUs displayed
- Present all TRL levels for turbo-freq
- Fix display for unsupported levels
- Support multiple dies
- Increase die count
toshiba_acpi:
- Add quirk for buttons on Z830
uv_sysfs:
- use sysfs_emit() instead of sprintf()
wmi:
- Add MSI WMI Platform driver
- Add driver development guide
- Mark simple WMI drivers as legacy-free
- Avoid returning AE_OK upon unknown error
- Support reading/writing 16 bit EC values
x86-android-tablets:
- Create LED device for Xiaomi Pad 2 bottom bezel touch buttons
- Xiaomi pad2 RGB LED fwnode updates
- Pass struct device to init()
- Add Lenovo Yoga Tablet 2 Pro 1380F/L data
- Unregister devices in reverse order
- Add swnode for Xiaomi pad2 indicator LED
- Use GPIO_LOOKUP() macro
xiaomi-wmi:
- Drop unnecessary NULL checks
- Fix race condition when reporting key events
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmZF1kwUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9wSXwgAsaSH6Sawn5sHOj52lQY7gNI0uf3V
YfZFawRpreCrlwLPU2f7SX0mLW+hh+ekQ2C1NvaUUVqQwzONELh0DWSYJpzz/v1r
jD14EcY2dnTv+FVyvCj5jZsiYxo/ViTvthMduiO7rrJKN7aOej9iNn68P0lvcY8s
HDJ2lPFNGnY01snz3C1NyjyIWw8YsfwqXEqOmhrDyyoKLXpsDs8H/Jqq5yXfeLax
hSpjbGB85EGJPXna6Ux5TziPh/MYMtF1+8R4Fn0sGvfcZO6/H1fDne0uI9UwrKnN
d2g4VHXU2DIhTshUc14YT2AU27eQiZVN+J3VpuYIbC9cmlQ2F6bjN3uxoQ==
=UWbu
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
- New drivers/platform/arm64 directory for arm64 embedded-controller
drivers
- New drivers:
- Acer Aspire 1 embedded controllers (for arm64 models)
- ACPI quickstart PNP0C32 buttons
- Dell All-In-One backlight support (dell-uart-backlight)
- Lenovo WMI camera buttons
- Lenovo Yoga Tablet 2 Pro 1380F/L fast charging
- MeeGoPad ANX7428 Type-C Cross Switch (power sequencing only)
- MSI WMI sensors (fan speed sensors only for now)
- Asus WMI:
- 2024 ROG Mini-LED support
- MCU powersave support
- Vivobook GPU MUX support
- Misc. other improvements
- Ideapad laptop:
- Export FnLock LED as LED class device
- Switch platform profiles using thermal management key
- Intel drivers:
- IFS: various improvements
- PMC: Lunar Lake support
- SDSI: various improvements
- TPMI/ISST: various improvements
- tools: intel-speed-select: various improvements
- MS Surface drivers:
- Fan profile switching support
- Surface Pro thermal sensors support
- ThinkPad ACPI:
- Reworked hotkey support to use sparse keymaps
- Add support for new trackpoint-doubletap, Fn+N and Fn+G hotkeys
- WMI core:
- New WMI driver development guide
- x86 Android tablets:
- Lenovo Yoga Tablet 2 Pro 1380F/L support
- Xiaomi MiPad 2 status LED and bezel touch buttons backlight
support
- Miscellaneous cleanups / fixes / improvements
* tag 'platform-drivers-x86-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (128 commits)
platform/x86: Add new MeeGoPad ANX7428 Type-C Cross Switch driver
devm-helpers: Fix a misspelled cancellation in the comments
tools arch x86: Add dell-uart-backlight-emulator
platform/x86: Add new Dell UART backlight driver
platform/x86: x86-android-tablets: Create LED device for Xiaomi Pad 2 bottom bezel touch buttons
platform/x86: x86-android-tablets: Xiaomi pad2 RGB LED fwnode updates
platform/x86: x86-android-tablets: Pass struct device to init()
platform/x86/amd: pmc: Add new ACPI ID AMDI000B
platform/x86/amd: pmf: Add new ACPI ID AMDI0105
platform/x86: p2sb: Don't init until unassigned resources have been assigned
platform/surface: aggregator: Log critical errors during SAM probing
platform/x86: ISST: Support SST-BF and SST-TF per level
platform/x86/fujitsu-laptop: Replace sprintf() with sysfs_emit()
tools/power/x86/intel-speed-select: v1.19 release
tools/power/x86/intel-speed-select: Display CPU as None for -1
tools/power/x86/intel-speed-select: SST BF/TF support per level
tools/power/x86/intel-speed-select: Increase number of CPUs displayed
tools/power/x86/intel-speed-select: Present all TRL levels for turbo-freq
tools/power/x86/intel-speed-select: Fix display for unsupported levels
tools/power/x86/intel-speed-select: Support multiple dies
...
ACPI:
* Support for the Firmware ACPI Control Structure (FACS) signature
feature which is used to reboot out of hibernation on some systems.
Kbuild:
* Support for building Flat Image Tree (FIT) images, where the kernel
Image is compressed alongside a set of devicetree blobs.
Memory management:
* Optimisation of our early page-table manipulation for creation of the
linear mapping.
* Support for userfaultfd write protection, which brings along some nice
cleanups to our handling of invalid but present ptes.
* Extend our use of range TLBI invalidation at EL1.
Perf and PMUs:
* Ensure that the 'pmu->parent' pointer is correctly initialised by PMU
drivers.
* Avoid allocating 'cpumask_t' types on the stack in some PMU drivers.
* Fix parsing of the CPU PMU "version" field in assembly code, as it
doesn't follow the usual architectural rules.
* Add best-effort unwinding support for USER_STACKTRACE
* Minor driver fixes and cleanups.
Selftests:
* Minor cleanups to the arm64 selftests (missing NULL check, unused
variable).
Miscellaneous
* Add a command-line alias for disabling 32-bit application support.
* Add part number for Neoverse-V2 CPUs.
* Minor fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmY+IWkQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNBVNB/9JG4jlmgxzbTDoer0md31YFvWCDGeOKx1x
g3XhE24W5w8eLXnc75p7/tOUKfo0TNWL4qdUs0hJCEUAOSy6a4Qz13bkkkvvBtDm
nnHvEjidx5yprHggocsoTF29CKgHMJ3bt8rJe6g+O3Lp1JAFlXXNgplX5koeaVtm
TtaFvX9MGyDDNkPIcQ/SQTFZJ2Oz51+ik6O8SYuGYtmAcR7MzlxH77lHl2mrF1bf
Jzv/f5n0lS+Gt9tRuFWhbfEm4aKdUlLha4ufzUq42/vJvELboZbG3LqLxRG8DbqR
+HvyZOG/xtu2dbzDqHkRumMToWmwzD4oBGSK4JAoJxeHavEdAvSG
=JMvT
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The most interesting parts are probably the mm changes from Ryan which
optimise the creation of the linear mapping at boot and (separately)
implement write-protect support for userfaultfd.
Outside of our usual directories, the Kbuild-related changes under
scripts/ have been acked by Masahiro whilst the drivers/acpi/ parts
have been acked by Rafael and the addition of cpumask_any_and_but()
has been acked by Yury.
ACPI:
- Support for the Firmware ACPI Control Structure (FACS) signature
feature which is used to reboot out of hibernation on some systems
Kbuild:
- Support for building Flat Image Tree (FIT) images, where the kernel
Image is compressed alongside a set of devicetree blobs
Memory management:
- Optimisation of our early page-table manipulation for creation of
the linear mapping
- Support for userfaultfd write protection, which brings along some
nice cleanups to our handling of invalid but present ptes
- Extend our use of range TLBI invalidation at EL1
Perf and PMUs:
- Ensure that the 'pmu->parent' pointer is correctly initialised by
PMU drivers
- Avoid allocating 'cpumask_t' types on the stack in some PMU drivers
- Fix parsing of the CPU PMU "version" field in assembly code, as it
doesn't follow the usual architectural rules
- Add best-effort unwinding support for USER_STACKTRACE
- Minor driver fixes and cleanups
Selftests:
- Minor cleanups to the arm64 selftests (missing NULL check, unused
variable)
Miscellaneous:
- Add a command-line alias for disabling 32-bit application support
- Add part number for Neoverse-V2 CPUs
- Minor fixes and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
arm64/mm: Fix pud_user_accessible_page() for PGTABLE_LEVELS <= 2
arm64/mm: Add uffd write-protect support
arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG
arm64/mm: Remove PTE_PROT_NONE bit
arm64/mm: generalize PMD_PRESENT_INVALID for all levels
arm64: simplify arch_static_branch/_jump function
arm64: Add USER_STACKTRACE support
arm64: Add the arm64.no32bit_el0 command line option
drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()
drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group
drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group
kselftest: arm64: Add a null pointer check
arm64: defer clearing DAIF.D
arm64: assembler: update stale comment for disable_step_tsk
arm64/sysreg: Update PIE permission encodings
kselftest/arm64: Remove unused parameters in abi test
perf/arm-spe: Assign parents for event_source device
perf/arm-smmuv3: Assign parents for event_source device
perf/arm-dsu: Assign parents for event_source device
perf/arm-dmc620: Assign parents for event_source device
...
Dell All In One (AIO) models released after 2017 use a backlight controller
board connected to an UART.
Add a small emulator to allow development and testing of
the drivers/platform/x86/dell/dell-uart-backlight.c driver for
this board, without requiring access to an actual Dell All In One.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240513144603.93874-3-hdegoede@redhat.com
To support APX functionality, the EVEX prefix is used to:
- promote legacy instructions
- promote VEX instructions
- add new instructions
Promoted VEX instructions require no extra annotation because the opcodes
do not change and the permissive nature of the instruction decoder already
allows them to have an EVEX prefix.
Promoted legacy instructions and new instructions are placed in map 4 which
has not been used before.
Create a new table for map 4 and add APX instructions.
Annotate SCALABLE instructions with "(es)" - refer to patch "x86/insn: Add
support for APX EVEX to the instruction decoder logic". SCALABLE
instructions must be represented in both no-prefix (NP) and 66 prefix
forms.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-9-adrian.hunter@intel.com
Intel Advanced Performance Extensions (APX) extends the EVEX prefix to
support:
- extended general purpose registers (EGPRs) i.e. r16 to r31
- Push-Pop Acceleration (PPX) hints
- new data destination (NDD) register
- suppress status flags writes (NF) of common instructions
- new instructions
Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.
The extended EVEX prefix does not need amended instruction decoder logic,
except in one area. Some instructions are defined as SCALABLE which means
the EVEX.W bit and EVEX.pp bits are used to determine operand size.
Specifically, if an instruction is SCALABLE and EVEX.W is zero, then
EVEX.pp value 0 (representing no prefix NP) means default operand size,
whereas EVEX.pp value 1 (representing 66 prefix) means operand size
override i.e. 16 bits
Add an attribute (INAT_EVEX_SCALABLE) to identify such instructions, and
amend the logic appropriately.
Amend the awk script that generates the attribute tables from the opcode
map, to recognise "(es)" as attribute INAT_EVEX_SCALABLE.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-8-adrian.hunter@intel.com
Support for REX2 has been added to the instruction decoder logic and the
awk script that generates the attribute tables from the opcode map.
Add REX2 prefix byte (0xD5) to the opcode map.
Add annotation (!REX2) for map 0/1 opcodes that are reserved under REX2.
Add JMPABS to the opcode map and add annotation (REX2) to identify that it
has a mandatory REX2 prefix. A separate opcode attribute table is not
needed at this time because JMPABS has the same attribute encoding as the
MOV instruction that it shares an opcode with i.e. INAT_MOFFSET.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-7-adrian.hunter@intel.com
Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.
The REX2 prefix is effectively an extended version of the REX prefix.
REX2 and EVEX are also used with PUSH/POP instructions to provide a
Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
fast-forward register data between matching PUSH and POP instructions.
REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
other maps is provided by the EVEX prefix, covered in a separate patch.
Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
for a new 64-bit absolute direct jump instruction JMPABS.
Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.
Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
opcodes (only JMPABS) that require a mandatory REX2 prefix
(INAT_REX2_VARIANT).
Amend logic to read the REX2 prefix and get the opcode attribute for the
map number (0 or 1) encoded in the REX2 prefix.
Amend the awk script that generates the attribute tables from the opcode
map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-6-adrian.hunter@intel.com
The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.
Add instructions documented in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference March 2024
319433-052, that have not been added yet:
AADD
AAND
AOR
AXOR
CMPccXADD
PBNDKB
RDMSRLIST
URDMSR
UWRMSR
VBCSTNEBF162PS
VBCSTNESH2PS
VCVTNEEBF162PS
VCVTNEEPH2PS
VCVTNEOBF162PS
VCVTNEOPH2PS
VCVTNEPS2BF16
VPDPB[SU,UU,SS]D[,S]
VPDPW[SU,US,UU]D[,S]
VPMADD52HUQ
VPMADD52LUQ
VSHA512MSG1
VSHA512MSG2
VSHA512RNDS2
VSM3MSG1
VSM3MSG2
VSM3RNDS2
VSM4KEY4
VSM4RNDS4
WRMSRLIST
TCMMIMFP16PS
TCMMRLFP16PS
TDPFP16PS
PREFETCHIT1
PREFETCHIT0
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-5-adrian.hunter@intel.com
The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.
Intel Architecture Instruction Set Extensions and Future Features manual
number 319433-044 of May 2021, documented VEX versions of instructions
VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS, but the opcode map has them
listed as EVEX only.
Remove EVEX-only (ev) annotation from instructions VPDPBUSD, VPDPBUSDS,
VPDPWSSD and VPDPWSSDS, which allows them to be decoded with either a VEX
or EVEX prefix.
Fixes: 0153d98f2d ("x86/insn: Add misc instructions to x86 instruction decoder")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-4-adrian.hunter@intel.com
The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.
Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size
only i.e. (d64). That was based on Intel SDM Opcode Map. However that is
contradicted by the Instruction Set Reference section for PUSH in the
same manual.
Remove 64-bit operand size only annotation from opcode 0x68 PUSH
instruction.
Example:
$ cat pushw.s
.global _start
.text
_start:
pushw $0x1234
mov $0x1,%eax # system call number (sys_exit)
int $0x80
$ as -o pushw.o pushw.s
$ ld -s -o pushw pushw.o
$ objdump -d pushw | tail -4
0000000000401000 <.text>:
401000: 66 68 34 12 pushw $0x1234
401004: b8 01 00 00 00 mov $0x1,%eax
401009: cd 80 int $0x80
$ perf record -e intel_pt//u ./pushw
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
Before:
$ perf script --insn-trace=disasm
Warning:
1 instruction trace errors
pushw 10349 [000] 10586.869237014: 401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %al, (%rax)
pushw 10349 [000] 10586.869237014: 401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %cl, %ch
pushw 10349 [000] 10586.869237014: 40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb $0x2e, (%rax)
instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction
After:
$ perf script --insn-trace=disasm
pushw 10349 [000] 10586.869237014: 401000 [unknown] (./pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401004 [unknown] (./pushw) movl $1, %eax
Fixes: eb13296cfa ("x86: Instruction decoder API")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-3-adrian.hunter@intel.com
The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:
LOADIWKEY:
Load a CPU-internal wrapping key.
ENCODEKEY128:
Wrap a 128-bit AES key to a key handle.
ENCODEKEY256:
Wrap a 256-bit AES key to a key handle.
AESENC128KL:
Encrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.
AESENC256KL:
Encrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.
AESDEC128KL:
Decrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.
AESDEC256KL:
Decrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.
AESENCWIDE128KL:
Encrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.
AESENCWIDE256KL:
Encrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.
AESDECWIDE128KL:
Decrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.
AESDECWIDE256KL:
Decrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.
The detail can be found in Intel Software Developer Manual.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240502105853.5338-2-adrian.hunter@intel.com
Add support to read the 'meter_current' file. The display is the same as
the 'meter_certificate', but will show the current snapshot of the
counters.
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240411025856.2782476-10-david.e.box@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Add #define for feature length and move NUL assignment from callers to
get_feature().
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240411025856.2782476-9-david.e.box@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Fix errors in the calculation of the start position of the counters and in
the display loop. While here, use a #define for the bundle count and size.
Fixes: 7fdc03a737 ("tools/arch/x86: intel_sdsi: Add support for reading meter certificates")
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240411025856.2782476-8-david.e.box@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Fixes sdsi_meter_cert_show() to correctly decode and display the meter
certificate output. Adds and displays a missing version field, displays the
ASCII name of the signature, and fixes the print alignment.
Fixes: 7fdc03a737 ("tools/arch/x86: intel_sdsi: Add support for reading meter certificates")
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240411025856.2782476-7-david.e.box@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The maximum number of bundles in the meter certificate was set to 8 which
is much less than the maximum. Instead, since the bundles appear at the end
of the file, set it based on the remaining file size from the bundle start
position.
Fixes: 7fdc03a737 ("tools/arch/x86: intel_sdsi: Add support for reading meter certificates")
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240411025856.2782476-6-david.e.box@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Fix left shift overflow issue when the parameter idx is greater than or
equal to 8 in the calculation of perm in PIRx_ELx_PERM macro.
Fix this by modifying the encoding to use a long integer type.
Signed-off-by: Shiqi Liu <shiqiliu@hust.edu.cn>
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240421063328.29710-1-shiqiliu@hust.edu.cn
Signed-off-by: Will Deacon <will@kernel.org>