Add a new hwprobe key "RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0" which allows
userspace to probe for the new vendor extensions from SiFive. Also, add
new hwprobe for SiFive "xsfvqmaccdod" and "xsfvqmaccqoq" vendor
extensions.
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Link: https://lore.kernel.org/r/20250418053239.4351-5-cyan.yang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Document the support for sifive vendor extensions using the key
RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0 and two vendor extensions for SiFive
Int8 Matrix Multiplication Instructions using
RISCV_HWPROBE_VENDOR_EXT_XSFVQMACCDOD and
RISCV_HWPROBE_VENDOR_EXT_XSFVQMACCQOQ.
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Link: https://lore.kernel.org/r/20250418053239.4351-4-cyan.yang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add SiFive vendor extension support to the kernel with the target of
"xsfvqmaccdod" and "xsfvqmaccqoq".
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Link: https://lore.kernel.org/r/20250418053239.4351-3-cyan.yang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When RISC-V borned, DT_GNU_HASH had already became the de-facto
standard so DT_HASH is just wasting storage space. Remove the explicit
--hash-style=both setting and rely on the distro toolchain default,
which is most likely "gnu" (i.e. generating only DT_GNU_HASH, no
DT_HASH).
Following the logic of commit 48f6430505
("arm64/vdso: Remove --hash-style=sysv").
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Link: https://lore.kernel.org/r/20250224112042.60282-2-xry111@xry111.site
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Cyril Bur <cyrilbur@tenstorrent.com> says:
This series tries to optimize riscv uaccess by allowing the use of
user_access_begin() and user_access_end() which permits grouping user accesses
and avoiding the CSR write penalty for each access.
The error path can also be optimised using asm goto which patches 3 and 4
achieve. This will speed up jumping to labels by avoiding the need of an
intermediary error type variable within the uaccess macros
I did read the discussion this series generated. It isn't clear to me
which direction to take the patches, if any.
* b4-shazam-merge:
riscv: uaccess: use 'asm_goto_output' for get_user()
riscv: uaccess: use 'asm goto' for put_user()
riscv: uaccess: use input constraints for ptr of __put_user()
riscv: implement user_access_begin() and families
riscv: save the SR_SUM status over switches
Link: https://lore.kernel.org/r/20250410070526.3160847-1-cyrilbur@tenstorrent.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
With 'asm goto' we don't need to test the error etc, the exception just
jumps to the error handling directly.
Unlike put_user(), get_user() must work around GCC bugs [1] when using
output clobbers in an asm goto statement.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
[Cyril Bur: Rewritten commit message]
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250410070526.3160847-6-cyrilbur@tenstorrent.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
With 'asm goto' we don't need to test the error etc, the exception just
jumps to the error handling directly.
Because there are no output clobbers which could trigger gcc bugs [1]
the use of asm_goto_output() macro is not necessary here. Not using
asm_goto_output() is desirable as the generated output asm will be
cleaner.
Use of the volatile keyword is redundant as per gcc 14.2.0 manual section
6.48.2.7 Goto Labels:
> Also note that an asm goto statement is always implicitly considered
volatile.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
[Cyril Bur: Rewritten commit message]
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250410070526.3160847-5-cyrilbur@tenstorrent.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Putting ptr in the inputs as opposed to output may seem incorrect but
this is done for a few reasons:
- Not having it in the output permits the use of asm goto in a
subsequent patch. There are bugs in gcc [1] which would otherwise
prevent it.
- Since the output memory is userspace there isn't any real benefit from
telling the compiler about the memory clobber.
- x86, arm and powerpc all use this technique.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
[Cyril Bur: Rewritten commit message]
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250410070526.3160847-4-cyrilbur@tenstorrent.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently, when a function like strncpy_from_user() is called,
the userspace access protection is disabled and enabled
for every word read.
By implementing user_access_begin() and families, the protection
is disabled at the beginning of the copy and enabled at the end.
The __inttype macro is borrowed from x86 implementation.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250410070526.3160847-3-cyrilbur@tenstorrent.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When threads/tasks are switched we need to ensure the old execution's
SR_SUM state is saved and the new thread has the old SR_SUM state
restored.
The issue was seen under heavy load especially with the syz-stress tool
running, with crashes as follows in schedule_tail:
Unable to handle kernel access to user memory without uaccess routines
at virtual address 000000002749f0d0
Oops [#1]
Modules linked in:
CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
Hardware name: riscv-virtio,qemu (DT)
epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
ra : task_pid_vnr include/linux/sched.h:1421 [inline]
ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
t5 : ffffffc4043cafba t6 : 0000000000040000
status: 0000000000000120 badaddr: 000000002749f0d0 cause:
000000000000000f
Call Trace:
[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
[<ffffffe000005570>] ret_from_exception+0x0/0x14
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace b5f8f9231dc87dda ]---
The issue comes from the put_user() in schedule_tail
(kernel/sched/core.c) doing the following:
asmlinkage __visible void schedule_tail(struct task_struct *prev)
{
...
if (current->set_child_tid)
put_user(task_pid_vnr(current), current->set_child_tid);
...
}
the put_user() macro causes the code sequence to come out as follows:
1: __enable_user_access()
2: reg = task_pid_vnr(current);
3: *current->set_child_tid = reg;
4: __disable_user_access()
The problem is that we may have a sleeping function as argument which
could clear SR_SUM causing the panic above. This was fixed by
evaluating the argument of the put_user() macro outside the user-enabled
section in commit 285a76bb2c ("riscv: evaluate put_user() arg before
enabling user access")"
In order for riscv to take advantage of unsafe_get/put_XXX() macros and
to avoid the same issue we had with put_user() and sleeping functions we
must ensure code flow can go through switch_to() from within a region of
code with SR_SUM enabled and come back with SR_SUM still enabled. This
patch addresses the problem allowing future work to enable full use of
unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
on every access. Make switch_to() save and restore SR_SUM.
Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Link: https://lore.kernel.org/r/20250410070526.3160847-2-cyrilbur@tenstorrent.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When the prctl() interface for pointer masking was added, it did not
check that the pointer masking ISA extension was supported, only the
individual submodes. Userspace could still attempt to disable pointer
masking and query the pointer masking state. commit 81de1afb2dd1
("riscv: Fix kernel crash due to PR_SET_TAGGED_ADDR_CTRL") disallowed
the former, as the senvcfg write could crash on older systems.
PR_GET_TAGGED_ADDR_CTRL state does not crash, because it reads only
kernel-internal state and not senvcfg, but it should still be disallowed
for consistency.
Fixes: 09d6775f50 ("riscv: Add support for userspace pointer masking")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20250507145230.2272871-1-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
When userspace does PR_SET_TAGGED_ADDR_CTRL, but Supm extension is not
available, the kernel crashes:
Oops - illegal instruction [#1]
[snip]
epc : set_tagged_addr_ctrl+0x112/0x15a
ra : set_tagged_addr_ctrl+0x74/0x15a
epc : ffffffff80011ace ra : ffffffff80011a30 sp : ffffffc60039be10
[snip]
status: 0000000200000120 badaddr: 0000000010a79073 cause: 0000000000000002
set_tagged_addr_ctrl+0x112/0x15a
__riscv_sys_prctl+0x352/0x73c
do_trap_ecall_u+0x17c/0x20c
andle_exception+0x150/0x15c
Fix it by checking if Supm is available.
Fixes: 09d6775f50 ("riscv: Add support for userspace pointer masking")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20250504101920.3393053-1-namcao@linutronix.de
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Now that we can safely handle user memory accesses while in the
misaligned access handlers, use get_user() instead of __get_user() to
have user memory access checks.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250422162324.956065-4-cleger@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
We can safely reenable IRQs if coming from userspace. This allows to
access user memory that could potentially trigger a page fault.
Fixes: b686ecdeac ("riscv: misaligned: Restrict user access to kernel memory")
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250422162324.956065-3-cleger@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Since both load/store and user/kernel should use almost the same path and
that we are going to add some code around that, factorize it.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250422162324.956065-2-cleger@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
VO clocks reside in a different address space from the AP clocks on the
T-HEAD SoC. Add the device tree node of a clock-controller to handle
VO address space as well.
Reviewed-by: Drew Fustini <drew@pdp7.com>
Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Signed-off-by: Drew Fustini <drew@pdp7.com>
Add CRYPTO_ARCH_HAVE_LIB_SHA256_SIMD and a SIMD block function
so that the caller can decide whether to use SIMD.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Export the block functions as GPL only, there is no reason
to let arbitrary modules use these internal functions.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This reverts commit c4741b2305.
Crypto API self-tests no longer run at registration time and now
occur either at late_initcall or upon the first use.
Therefore the premise of the above commit no longer exists. Revert
it and subsequent additions of subsys_initcall and arch_initcall.
Note that lib/crypto calls will stay at subsys_initcall (or rather
downgraded from arch_initcall) because they may need to occur
before Crypto API registration.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of providing crypto_shash algorithms for the arch-optimized
SHA-256 code, instead implement the SHA-256 library. This is much
simpler, it makes the SHA-256 library functions be arch-optimized, and
it fixes the longstanding issue where the arch-optimized SHA-256 was
disabled by default. SHA-256 still remains available through
crypto_shash, but individual architectures no longer need to handle it.
To match sha256_blocks_arch(), change the type of the nblocks parameter
of the assembly function from int to size_t. The assembly function
actually already treated it as size_t.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Merge mainline to pick up bcachefs poly1305 patch 4bf4b5046d
("bcachefs: use library APIs for ChaCha20 and Poly1305"). This
is a prerequisite for removing the poly1305 shash algorithm.
Not resetting smstateen is a potential security hole, because VU might
be able to access state that VS does not properly context-switch.
Fixes: 81f0f314fe ("RISCV: KVM: Add sstateen0 context save/restore")
Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Link: https://lore.kernel.org/r/20250403112522.1566629-8-rkrcmar@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The atomic cpumask_assign_cpu() follows non-atomic cpumask_setall(),
which makes the whole operation non-atomic. Fix this by relaxing to
non-atomic __assign_cpu().
Fixes: 7c1e5b9690 ("riscv: Disable preemption while handling PR_RISCV_CTX_SW_FENCEI_OFF")
Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
This function was unified into a single function in commit ab9164dae2
("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
However that imposed a performance degradation.
Partially reverting this commit to have ret_from_fork() split again,
results in a 1% increase on the number of times fork is able to be called
per second.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/all/20250320-riscv_optimize_entry-v6-2-63e187e26041@rivosinc.com
Now that the architecture-optimized ChaCha kconfig symbols are defined
regardless of CRYPTO, there is no need for CRYPTO_LIB_CHACHA to select
CRYPTO. So, remove that. This makes the indirection through the
CRYPTO_LIB_CHACHA_INTERNAL symbol unnecessary, so get rid of that and
just use CRYPTO_LIB_CHACHA directly. Finally, make the fallback to the
generic implementation use a default value instead of a select; this
makes it consistent with how the arch-optimized code gets enabled and
also with how CRYPTO_LIB_BLAKE2S_GENERIC gets enabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Continue disentangling the crypto library functions from the generic
crypto infrastructure by moving the riscv ChaCha library functions into
a new directory arch/riscv/lib/crypto/ that does not depend on CRYPTO.
This mirrors the distinction between crypto/ and lib/crypto/.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Clock controller unit, or CCU, generates various clocks frequency for
peripherals integrated in SpacemiT K1 SoC and is essential for normal
operation. Let's enable it as built-in driver in defconfig.
Signed-off-by: Haylen Chu <heylenay@4d2.org>
Reviewed-by: Alex Elder <elder@riscstar.com>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* A fix for a missing icache flush in uprobes, which manifests as at
least a BFF selftest failure on the Spacemit X1.
* A workaround for build warnings in flush_icache_range().
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmgLvt0THHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiSXTEACYyBa3WiAywufa8qeMli8s5McgIp76
WQhTakhp2/2KWegbFfn4+ugvY9NhpY3CGGESJPo1tgHvvDh6kceVEV1jqVuQuPBl
IEh4NImUc9UvimL9QhqsIEkGiF3lTSB+gsmp9VE/tk2BsPcq07oJgtTCaB87JQND
mLpcBkHSu8T6YbN7ZM8XSVc0sbvTEzKFGBtXyVu5y/7OehiNXeFKVIZ1Qdv1tBf/
QbfC43UBQ++mO7v6QTDy7BwWcBR3r6EBR/rLxutjyefhXNwHQVEGk2WP9pIVcc2P
B4nihfiCfDFCwijgRQFtH54Cwdxr253bdci5tEw640hQCRHAJSVXKjjdxcdX9FXm
qlXyXDgmDXUqGR5gC34i1eUT+ahjyvOeS39SJbE2QT8ZRbNRXIU6KNN9sO6h5x7P
sN6pJin04nT0No9ADGnUza7rtVlEgCpM1KYTOVgVltHYJ5PtSdHv1dIiYjqlJx7G
LaAbs93FUE6f7aJd6un/RMAb6g9sCu2GnDTU+iivU4IKRXQ+Bf1LSEst3lYNxMln
r6o7jg0ZZFa9cqBKGVl4G74YOBzRi5wzlwtJsxo7uA3tc0WnlFVVSZv6lryGYR1S
e8dMttYDQ+vpWrP5Qvo55YnebRt3khmxcWbVTcA8EBiX5cPHAvo456JBby7h7vqU
BA8wEj0K7O/+Eg==
=SrfS
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for a missing icache flush in uprobes, which manifests as at
least a BFF selftest failure on the Spacemit X1
- A workaround for build warnings in flush_icache_range()
* tag 'riscv-for-linus-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: uprobes: Add missing fence.i after building the XOL buffer
riscv: Replace function-like macro by static inline function
T-HEAD TH1520 SoC requires to put the GPU out of the reset state as part
of the power-up sequence.
Link: https://lore.kernel.org/linux-riscv/81e53e3a-5873-44c7-9070-5596021daa42@samsung.com/
Reviewed-by: Drew Fustini <drew@pdp7.com>
Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
[drew: remove hunk that included thead,th1520-reset.h]
Signed-off-by: Drew Fustini <drew@pdp7.com>
Enable GPIO support, in order to activate follow-up GPIO LED,
and ethernet reset pin.
Signed-off-by: Yixun Lan <dlan@gentoo.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
After some recent changes to the RISC-V crypto code that turned some
indirect function calls into direct ones, builds with CONFIG_CFI_CLANG
fail with:
ld.lld: error: undefined symbol: __kcfi_typeid_sm3_transform_zvksh_zvkb
>>> referenced by arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.o:(.text+0x2) in archive vmlinux.a
ld.lld: error: undefined symbol: __kcfi_typeid_sha512_transform_zvknhb_zvkb
>>> referenced by arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.o:(.text+0x2) in archive vmlinux.a
ld.lld: error: undefined symbol: __kcfi_typeid_sha256_transform_zvknha_or_zvknhb_zvkb
>>> referenced by arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.o:(.text+0x2) in archive vmlinux.a
As these functions are no longer indirectly called (i.e., have their
address taken), the compiler will not emit __kcfi_typeid symbols for
them but SYM_TYPED_FUNC_START expects these to exist at link time.
Switch the definitions of these functions to use SYM_FUNC_START, as they
no longer need kCFI type information since they are only called
directly.
Fixes: 1523eaed0a ("crypto: riscv/sm3 - Use API partial block handling")
Fixes: 561aab1104 ("crypto: riscv/sha512 - Use API partial block handling")
Fixes: e6c5597bad ("crypto: riscv/sha256 - Use API partial block handling")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The XOL (execute out-of-line) buffer is used to single-step the
replaced instruction(s) for uprobes. The RISC-V port was missing a
proper fence.i (i$ flushing) after constructing the XOL buffer, which
can result in incorrect execution of stale/broken instructions.
This was found running the BPF selftests "test_progs:
uprobe_autoattach, attach_probe" on the Spacemit K1/X60, where the
uprobes tests randomly blew up.
Reviewed-by: Guo Ren <guoren@kernel.org>
Fixes: 74784081aa ("riscv: Add uprobes supported")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250419111402.1660267-2-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The flush_icache_range() function is implemented as a "function-like
macro with unused parameters", which can result in "unused variables"
warnings.
Replace the macro with a static inline function, as advised by
Documentation/process/coding-style.rst.
Fixes: 08f051eda3 ("RISC-V: Flush I$ when making a dirty page executable")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250419111402.1660267-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This was triggered by one of my mis-uses causing odd build warnings on
sparc in linux-next, but while figuring out why the "obviously correct"
use of cc-option caused such odd breakage, I found eight other cases of
the same thing in the tree.
The root cause is that 'cc-option' doesn't work for checking negative
warning options (ie things like '-Wno-stringop-overflow') because gcc
will silently accept options it doesn't recognize, and so 'cc-option'
ends up thinking they are perfectly fine.
And it all works, until you have a situation where _another_ warning is
emitted. At that point the compiler will go "Hmm, maybe the user
intended to disable this warning but used that wrong option that I
didn't recognize", and generate a warning for the unrecognized negative
option.
Which explains why we have several cases of this in the tree: the
'cc-option' test really doesn't work for this situation, but most of the
time it simply doesn't matter that ity doesn't work.
The reason my recently added case caused problems on sparc was pointed
out by Thomas Weißschuh: the sparc build had a previous explicit warning
that then triggered the new one.
I think the best fix for this would be to make 'cc-option' a bit smarter
about this sitation, possibly by adding an intentional warning to the
test case that then triggers the unrecognized option warning reliably.
But the short-term fix is to replace 'cc-option' with an existing helper
designed for this exact case: 'cc-disable-warning', which picks the
negative warning but uses the positive form for testing the compiler
support.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/all/20250422204718.0b4e3f81@canb.auug.org.au/
Explained-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the Crypto API partial block handling.
As this was the last user relying on crypto/ghash.h for gf128mul.h,
remove the unnecessary inclusion of gf128mul.h from crypto/ghash.h.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Though the module_exit functions are now no-ops, they should still be
defined, since otherwise the modules become unremovable.
Fixes: 08820553f3 ("crypto: arm/chacha - remove the redundant skcipher algorithms")
Fixes: 8c28abede1 ("crypto: arm64/chacha - remove the skcipher algorithms")
Fixes: f7915484c0 ("crypto: powerpc/chacha - remove the skcipher algorithms")
Fixes: ceba0eda83 ("crypto: riscv/chacha - implement library instead of skcipher")
Fixes: 632ab0978f ("crypto: x86/chacha - remove the skcipher algorithms")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for the Zcb extension's compressed half-word instructions
(C.LHU, C.LH, and C.SH) in the RISC-V misaligned access trap handler.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Nylon Chen <nylon.chen@sifive.com>
Link: https://lore.kernel.org/r/20250411073850.3699180-2-nylon.chen@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
- A couple of fixes regarding module relocations
- Fix a build error by implementing missing alternative macros
- Another fix for kexec by fixing /proc/iomem
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQgN2CKhD/Nf5v80u9kP7K8koXvigUCZ//mhQAKCRBkP7K8koXv
iqjNAQCke5JNfU6ZhIfwZokrWCIU5CJO0WF9uGl0u972YiazkQD/fKC7nL0e6BDZ
2/Ha31y8pmjLUDAlC8+yxzuhec5QyQ4=
=A6JZ
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmf/8pgTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiQikD/95qOZZXzE55oTgD7noMwyDZQ8x41ND
5fsH2YF+oziC3Z97IkEtgj24Bz0QY3Q5YLdB1lY7Ps0RrAXhC5KHAZyhOYe5sBwF
oaB3b/MyuiPot7GbqSMN6400wV9mTiU4+os+GNoWutsH/Z3tQuFNTryO+Qy9jlzX
raX/TxLKd8AVqva9y1U9wqoH2iLN2cXvY3/lfXCgTgDsYKVNMIFyOgEuaMiWq3Yt
EjkT1A9RmNOaxpfPOhcmq5DkCKlNpih9t4OLurAFJTcmajVyjxP12SrmcfNQPaPz
HdV9pGjC5ZfNribm2NrHtl8sJXppXUBGeKxFdT+7MGYfqCAsiBUXQ/PByybPU3Na
Zy7576h9kXhsmwXQJvt8x7lrBhjBfGWwuVtyVZFcCKuvj9aPOoZPBOHA7iX2uuC3
hF/6DWha8IDLO2PGYo6B4lCUnPZkAqBaOpaXmxCYAaCwKaiCx9x9zIYDsJfbD9ZC
H+GpAizu8iCMtZXrg9pA5QvMEBTbFOLhjgD1KCk01KY8hsUihnAGfu4mktYsXYEe
PokxhYIIvzcCRD6Yi5vCtVxL6YXjuTYHo8tNEENoFxi8nvdST3AhqkgbrP9MorOW
879Zqiz/G37aFGOUh7uDGKoP1EcGdlZPWfnfKc+jSUNdStWWLBcHbkcd7EGeg+ov
1MTELcNHvxl6EQ==
=M/+3
-----END PGP SIGNATURE-----
Merge tag 'riscv-fixes-6.15-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into fixes
riscv fixes for 6.15-rc3
- A couple of fixes regarding module relocations
- Fix a build error by implementing missing alternative macros
- Another fix for kexec by fixing /proc/iomem
* tag 'riscv-fixes-6.15-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux:
riscv: Avoid fortify warning in syscall_get_arguments()
riscv: Provide all alternative macros all the time
riscv: module: Allocate PLT entries for R_RISCV_PLT32
riscv: module: Fix out-of-bounds relocation access
riscv: Properly export reserved regions in /proc/iomem
riscv: Fix unaligned access info messages
The setting of EXPERT is a leftover from when the riscv defconfig was
first added. As mentioned in the EXPERT Kconfig help text it is not
intended to be set in the usual case.
Upon removal a bunch of intrusive debug-related kernel options are no
longer set, which is good. A few may want to come back in the future but
let those be advocated for on a case by case basis.
NAMESPACES, SYSFS_SYSCALL and MEDIA_SUPPORT_FILTER default on and thus
fall out of the defconfig.
Set VIDEO_CADENCE_CSI2RX=y to ensure VIDEO_CADENCE_CSI2RX stays enabled.
Set DEBUG_KERNEL=y in line with other arch defconfigs. This turns on
tracing.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20250415122832.982610-1-joel@jms.id.au
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
WangYuli <wangyuli@uniontech.com> says:
1. The arch_kgdb_breakpoint() function defines the kgdb_compiled_break
symbol using inline assembly.
There's a potential issue where the compiler might inline
arch_kgdb_breakpoint(), which would then define the kgdb_compiled_break
symbol multiple times, leading to fail to link vmlinux.o.
This isn't merely a potential compilation problem. The intent here
is to determine the global symbol address of kgdb_compiled_break,
and if this function is inlined multiple times, it would logically
be a grave error.
2. Remove ".option norvc/.option rvc" to fix a bug that the C extension
would unconditionally enable even if the kernel is being built with
CONFIG_RISCV_ISA_C=n.
* b4-shazam-merge:
riscv: KGDB: Remove ".option norvc/.option rvc" for kgdb_compiled_break
riscv: KGDB: Do not inline arch_kgdb_breakpoint()
Link: https://lore.kernel.org/r/D5A83DF3A06E1DF9+20250411072905.55134-1-wangyuli@uniontech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
[ Quoting Samuel Holland: ]
This is a separate issue, but using ".option rvc" here is a bug.
It will unconditionally enable the C extension for the rest of
the file, even if the kernel is being built with CONFIG_RISCV_ISA_C=n.
[ Quoting Palmer Dabbelt: ]
We're just looking at the address of kgdb_compiled_break, so it's
fine if it ends up as a c.ebreak.
[ Quoting Alexandre Ghiti: ]
.option norvc is used to prevent the assembler from using compressed
instructions, but it's generally used when we need to ensure the
size of the instructions that are used, which is not the case here
as noted by Palmer since we only care about the address. So yes
it will work fine with C enabled :)
So let's just remove them all.
Link: https://lore.kernel.org/all/4b4187c1-77e5-44b7-885f-d6826723dd9a@sifive.com/
Link: https://lore.kernel.org/all/mhng-69513841-5068-441d-be8f-2aeebdc56a08@palmer-ri-x1c9a/
Link: https://lore.kernel.org/all/23693e7f-4fff-40f3-a437-e06d827278a5@ghiti.fr/
Fixes: fe89bd2be8 ("riscv: Add KGDB support")
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Link: https://lore.kernel.org/r/8B431C6A4626225C+20250411073222.56820-2-wangyuli@uniontech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Hook up the generic vDSO implementation to the generic vDSO getrandom
implementation by providing the required __arch_chacha20_blocks_nostack
and getrandom_syscall implementations. Also wire up the selftests.
The benchmark result:
vdso: 25000000 times in 2.466341333 seconds
libc: 25000000 times in 41.447720005 seconds
syscall: 25000000 times in 41.043926672 seconds
vdso: 25000000 x 256 times in 162.286219353 seconds
libc: 25000000 x 256 times in 2953.855018685 seconds
syscall: 25000000 x 256 times in 2796.268546000 seconds
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Link: https://lore.kernel.org/r/20250411024600.16045-1-xry111@xry111.site
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When building with CONFIG_FORTIFY_SOURCE=y and W=1, there is a warning
because of the memcpy() in syscall_get_arguments():
In file included from include/linux/string.h:392,
from include/linux/bitmap.h:13,
from include/linux/cpumask.h:12,
from arch/riscv/include/asm/processor.h:55,
from include/linux/sched.h:13,
from kernel/ptrace.c:13:
In function 'fortify_memcpy_chk',
inlined from 'syscall_get_arguments.isra' at arch/riscv/include/asm/syscall.h:66:2:
include/linux/fortify-string.h:580:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
580 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
The fortified memcpy() routine enforces that the source is not overread
and the destination is not overwritten if the size of either field and
the size of the copy are known at compile time. The memcpy() in
syscall_get_arguments() intentionally overreads from a1 to a5 in
'struct pt_regs' but this is bigger than the size of a1.
Normally, this could be solved by wrapping a1 through a5 with
struct_group() but there was already a struct_group() applied to these
members in commit bba547810c ("riscv: tracing: Fix
__write_overflow_field in ftrace_partial_regs()").
Just avoid memcpy() altogether and write the copying of args from regs
manually, which clears up the warning at the expense of three extra
lines of code.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Dmitry V. Levin <ldv@strace.io>
Link: https://lore.kernel.org/r/20250409-riscv-avoid-fortify-warning-syscall_get_arguments-v1-1-7853436d4755@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
The DRM Imagination GPU requires a power-domain driver. In the T-HEAD
TH1520 SoC implements power management capabilities through the E902
core, which can be communicated with through the mailbox, using firmware
protocol.
Add AON node, which servers as a power-domain controller.
Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Reviewed-by: Drew Fustini <drew@pdp7.com>
Signed-off-by: Drew Fustini <drew@pdp7.com>
We need to provide all six forms of the alternative macros
(ALTERNATIVE, ALTERNATIVE_2, _ALTERNATIVE_CFG, _ALTERNATIVE_CFG_2,
__ALTERNATIVE_CFG, __ALTERNATIVE_CFG_2) for all four cases derived
from the two ifdefs (RISCV_ALTERNATIVE, __ASSEMBLY__) in order to
ensure all configs can compile. Define this missing ones and ensure
all are defined to consume all parameters passed.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504130710.3IKz6Ibs-lkp@intel.com/
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250414120947.135173-2-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
apply_r_riscv_plt32_rela() may need to emit a PLT entry for the
referenced symbol, so there must be space allocated in the PLT.
Fixes: 8fd6c51423 ("riscv: Add remaining module relocations")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250409171526.862481-2-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
The current code allows rel[j] to access one element past the end of the
relocation section. Simplify to num_relocations which is equivalent to
the existing size expression.
Fixes: 080c4324fa ("riscv: optimize ELF relocation function in riscv")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250409171526.862481-1-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
The /proc/iomem represents the kernel's memory map. Regions marked
with "Reserved" tells the user that the range should not be tampered
with. Kexec-tools, when using the older kexec_load syscall relies on
the "Reserved" regions to build the memory segments, that will be the
target of the new kexec'd kernel.
The RISC-V port tries to expose all reserved regions to userland, but
some regions were not properly exposed: Regions that resided in both
the "regular" and reserved memory block, e.g. the EFI Memory Map. A
missing entry could result in reserved memory being overwritten.
It turns out, that arm64, and loongarch had a similar issue a while
back:
commit d91680e687 ("arm64: Fix /proc/iomem for reserved but not memory regions")
commit 50d7ba36b9 ("arm64: export memblock_reserve()d regions via /proc/iomem")
Similar to the other ports, resolve the issue by splitting the regions
in an arch initcall, since we need a working allocator.
Fixes: ffe0e52612 ("RISC-V: Improve init_resources()")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250409182129.634415-1-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
When building with CONFIG_FORTIFY_SOURCE=y and W=1, there is a warning
because of the memcpy() in syscall_get_arguments():
In file included from include/linux/string.h:392,
from include/linux/bitmap.h:13,
from include/linux/cpumask.h:12,
from arch/riscv/include/asm/processor.h:55,
from include/linux/sched.h:13,
from kernel/ptrace.c:13:
In function 'fortify_memcpy_chk',
inlined from 'syscall_get_arguments.isra' at arch/riscv/include/asm/syscall.h:66:2:
include/linux/fortify-string.h:580:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
580 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
The fortified memcpy() routine enforces that the source is not overread
and the destination is not overwritten if the size of either field and
the size of the copy are known at compile time. The memcpy() in
syscall_get_arguments() intentionally overreads from a1 to a5 in
'struct pt_regs' but this is bigger than the size of a1.
Normally, this could be solved by wrapping a1 through a5 with
struct_group() but there was already a struct_group() applied to these
members in commit bba547810c ("riscv: tracing: Fix
__write_overflow_field in ftrace_partial_regs()").
Just avoid memcpy() altogether and write the copying of args from regs
manually, which clears up the warning at the expense of three extra
lines of code.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Dmitry V. Levin <ldv@strace.io>
Fixes: e2c0cdfba7 ("RISC-V: User-facing API")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250409-riscv-avoid-fortify-warning-syscall_get_arguments-v1-1-7853436d4755@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
T-Head SoCs feature separate power domains (power islands) for major
components like the GPU, Audio, and NPU. To manage the power states of
these components effectively, the kernel requires generic power domain
support.
This commit enables `CONFIG_PM_GENERIC_DOMAINS` for T-Head SoCs,
allowing the power domain driver for these components to be compiled and
integrated. This ensures proper power management and energy efficiency
on T-Head platforms.
By selecting `PM_GENERIC_DOMAINS`, we provide the necessary framework
for the power domain drivers to function correctly on RISC-V
architecture with T-Head SoCs.
Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Reviewed-by: Drew Fustini <drew@pdp7.com>
Acked-by: Drew Fustini <drew@pdp7.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
The number of relocations may be a huge value that is unallocatable
by kmalloc. Use kvmalloc instead so that it does not fail.
Fixes: 8fd6c51423 ("riscv: Add remaining module relocations")
Suggested-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Will Pierce <wgpierce17@gmail.com>
Link: https://lore.kernel.org/r/20250402081426.5197-1-wgpierce17@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Following the example of the crc32 and crc32c code, make the crypto
subsystem register both generic and architecture-optimized chacha20,
xchacha20, and xchacha12 skcipher algorithms, all implemented on top of
the appropriate library functions. This eliminates the need for every
architecture to implement the same skcipher glue code.
To register the architecture-optimized skciphers only when
architecture-optimized code is actually being used, add a function
chacha_is_arch_optimized() and make each arch implement it. Change each
architecture's ChaCha module_init function to arch_initcall so that the
CPU feature detection is guaranteed to run before
chacha_is_arch_optimized() gets called by crypto/chacha.c. In the case
of s390, remove the CPU feature based module autoloading, which is no
longer needed since the module just gets pulled in via function linkage.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently the RISC-V optimized ChaCha20 is only wired up to the
crypto_skcipher API, which makes it unavailable to users of the library
API. The crypto_skcipher API for ChaCha20 is going to change to be
implemented on top of the library API, so the library API needs to be
supported. And of course it's needed anyway to serve the library users.
Therefore, change the RISC-V ChaCha20 code to implement the library API
instead of the crypto_skcipher API.
The library functions take the ChaCha state matrix directly (instead of
key and IV) and support both ChaCha20 and ChaCha12. To make the RISC-V
code work properly for that, change the assembly code to take the state
matrix directly and add a nrounds parameter.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
- Improve performance in gendwarfksyms
- Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS
- Support CONFIG_HEADERS_INSTALL for ARCH=um
- Use more relative paths to sources files for better reproducibility
- Support the loong64 Debian architecture
- Add Kbuild bash completion
- Introduce intermediate vmlinux.unstripped for architectures that need
static relocations to be stripped from the final vmlinux
- Fix versioning in Debian packages for -rc releases
- Treat missing MODULE_DESCRIPTION() as an error
- Convert Nios2 Makefiles to use the generic rule for built-in DTB
- Add debuginfo support to the RPM package
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmfxp2EVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGkIUP/AgNiP6or6fmY5+HSyjlrdutBWAh
QNW0AiKh5vytmBIv63/i103OE0SRbt+U6IApn9c7FQKkeuyIlD1e9NfSwFMZixmP
P7t6JqDCL61G5d3W2Iisqle1cpBoVvNgUwu0k3sTSXl0vNsDbiyxcCzQzLhZMKsd
O+Ppwp3zNGE2vIUwpIjzJsR5Dt/Z5MfuKDi4UShsyWpFZ1rg9X93YKc9QJOXjKwj
4Np2x2cukDo2oz4uXuZQ8F1+bOFsKYoilCwjtxlrC6BO0lSPiJsRTN6nGJ0ejns9
GGD56mBNGcGk+NEPGhAMQmZHqNAP4JfjEvAgaoSBn0Rdnjd9Cj/2T+4n61xkR4Wu
MXCP/LEJ3MyctmkZjUq+0fDAe2wjxuaAG15kAHCha+9KxIG2NzHbf2XXb4E49DDU
2rw3fqA41/cKCq1ZEaqRn3pZZgU6ysfsEW42JmnNxO+7zz9k8RX4rk8CVaVIEUuw
Xojkis//KnE6+OCBe6Tb0H2Rzo0JF3AG2eNF4zY/xnc562FRIMS19WYS38tKZng6
Gr1BRG0bA4t9mf2Vck1W1LcAb3Jh0mddtyrgYKhbcwq0YOj2q/H6F50DkC+wL282
wvhV6B/vKAH8BByEWAn3rBcN0N+w/VFc0uPCz//tkoAm4nPg8PvKq63JHPrHsyZe
mOMhifoiVbjF4KFo
=GiQ6
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Improve performance in gendwarfksyms
- Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS
- Support CONFIG_HEADERS_INSTALL for ARCH=um
- Use more relative paths to sources files for better reproducibility
- Support the loong64 Debian architecture
- Add Kbuild bash completion
- Introduce intermediate vmlinux.unstripped for architectures that need
static relocations to be stripped from the final vmlinux
- Fix versioning in Debian packages for -rc releases
- Treat missing MODULE_DESCRIPTION() as an error
- Convert Nios2 Makefiles to use the generic rule for built-in DTB
- Add debuginfo support to the RPM package
* tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
kbuild: rpm-pkg: build a debuginfo RPM
kconfig: merge_config: use an empty file as initfile
nios2: migrate to the generic rule for built-in DTB
rust: kbuild: skip `--remap-path-prefix` for `rustdoc`
kbuild: pacman-pkg: hardcode module installation path
kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally
modpost: require a MODULE_DESCRIPTION()
kbuild: make all file references relative to source root
x86: drop unnecessary prefix map configuration
kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS
kbuild: Add a help message for "headers"
kbuild: deb-pkg: remove "version" variable in mkdebian
kbuild: deb-pkg: fix versioning for -rc releases
Documentation/kbuild: Fix indentation in modules.rst example
x86: Get rid of Makefile.postlink
kbuild: Create intermediate vmlinux build with relocations preserved
kbuild: Introduce Kconfig symbol for linking vmlinux with relocations
kbuild: link-vmlinux.sh: Make output file name configurable
kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y
Revert "kheaders: Ignore silly-rename files"
...
* The sub-architecture selection Kconfig system has been cleaned up,
the documentation has been improved, and various detections have been
fixed.
* The vector-related extensions dependencies are now validated when
parsing from device tree and in the DT bindings.
* Misaligned access probing can be overridden via a kernel command-line
parameter, along with various fixes to misalign access handling.
* Support for relocatable !MMU kernels builds.
* Support for hpge pfnmaps, which should improve TLB utilization.
* Support for runtime constants, which improves the d_hash()
performance.
* Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm.
* Various fixes, including:
- We were missing a secondary mmu notifier call when flushing the
tlb which is required for IOMMU.
- Fix ftrace panics by saving the registers as expected by ftrace.
- Fix a couple of stimecmp usage related to cpu hotplug.
- purgatory_start is now aligned as per the STVEC requirements.
- A fix for hugetlb when calculating the size of non-present PTEs.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmfv/soTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYierZEACDwI9lJFCEbQPon3z8rAy1moTj0+AZ
bMfZFqMphUTrJ0cMm2+Bc+XZgck12zHCyu1UljDcZVYMCHA9aOoj5C5NkBBVLCuL
uLYrhIoQXtJaVIANiFl0SHAZmh2s2OoSgmUzrEZ8JGlHpKCF7EVX5bHEsOvzn9ir
B2W992W6q3ISuKXHKsTpa7rmTtf7swGYg6zW3pX3l6HmY+EMEQOcQl0tAB383J/T
lm0K4+YvLpRJdm2ARpNGWlcFXj9/UXUM5hplK3aBAHpPKQ5/83/4tMDsfRvhpEVC
VJXNgK+H4XLD542aQ8d4ZROguyhwn9e2n6Dkv0OqfNk4lg5pUBcJUZftQ+rB7AWg
VYB1KVpxhwcruheXJFz8S3EzjZTcS+JrcD80vvx8JmHdXkZwHTfYUgiFwe/TR7yr
b518fEbXpVwDZiCbaAe3Cmpw0mlNnSVmU4hgNbiwt0fu9DGdPN9WQbyds68RKb7A
TWwDmmD6kV2BTWl0mHPtu9VhX58CDG+0WYbHA7r82p2T50187766C92GYfN2UPpz
lH0iMRDkmucclZ3fEoosJ+HsDntc4oe6Bhdzuj52Q7vBpDd/QB6t5cfrlDpEEdgU
3qoWMN5mb5l1rbvrqENh5ZgmEpzV8K0R5F5quiXh/9wO0y1kepDslTqC2oXK/m0p
DzsvvD6UnNMOUQ==
=nCJo
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- The sub-architecture selection Kconfig system has been cleaned up,
the documentation has been improved, and various detections have been
fixed
- The vector-related extensions dependencies are now validated when
parsing from device tree and in the DT bindings
- Misaligned access probing can be overridden via a kernel command-line
parameter, along with various fixes to misalign access handling
- Support for relocatable !MMU kernels builds
- Support for hpge pfnmaps, which should improve TLB utilization
- Support for runtime constants, which improves the d_hash()
performance
- Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm
- Various fixes, including:
- We were missing a secondary mmu notifier call when flushing the
tlb which is required for IOMMU
- Fix ftrace panics by saving the registers as expected by ftrace
- Fix a couple of stimecmp usage related to cpu hotplug
- purgatory_start is now aligned as per the STVEC requirements
- A fix for hugetlb when calculating the size of non-present PTEs
* tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (65 commits)
riscv: Add norvc after .option arch in runtime const
riscv: Make sure toolchain supports zba before using zba instructions
riscv/purgatory: 4B align purgatory_start
riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator
selftests: riscv: fix v_exec_initval_nolibc.c
riscv: Fix hugetlb retrieval of number of ptes in case of !present pte
riscv: print hartid on bringup
riscv: Add norvc after .option arch in runtime const
riscv: Remove CONFIG_PAGE_OFFSET
riscv: Support CONFIG_RELOCATABLE on riscv32
asm-generic: Always define Elf_Rel and Elf_Rela
riscv: Support CONFIG_RELOCATABLE on NOMMU
riscv: Allow NOMMU kernels to access all of RAM
riscv: Remove duplicate CONFIG_PAGE_OFFSET definition
RISC-V: errata: Use medany for relocatable builds
dt-bindings: riscv: document vector crypto requirements
dt-bindings: riscv: add vector sub-extension dependencies
dt-bindings: riscv: d requires f
RISC-V: add f & d extension validation checks
RISC-V: add vector crypto extension validation checks
...
* A bunch of fixes:
- 2 fixes in the purgatory code which prevented kexec to work
- Workaround an issue with gcc-15
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQgN2CKhD/Nf5v80u9kP7K8koXvigUCZ+uRYgAKCRBkP7K8koXv
iiroAQCIF7ojJGZvdRfAeknzb1WKM2GFucVTRxwyicyg/9omGQD5AYGYsaQSbN4H
j8ToELbTEnsY8YqRaQgm/AiuIkpM1AE=
=db0s
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAmfurxYTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQSEaD/9Lp/ZQxW2+oCZQ/MxXPnn7MVBn4ncY
SC6xVzdSye14A9RyaTUnCZIklNhOA5iKs5uZBm3mH0MTaL5K/LqtO+gKCkduT30k
Rt5DKJWaXzsi3QNVq12Lakun7xJV8auYJP3rWj6rLdSo8wOG3PF2T4+L9kie+WtB
EhxdfOK3oF1JV0a0q7E0D599K9E/y0T/kXeNzjvt4uhJVHY096LJY3GcMBvAUuDx
aS4gF3DS2iURvZfCFZKIJhzXMz5p2bQqngDvpNit1cmHgT/4EBM7hahqfGXlP1oG
pUTsktsnPBntXWCIKLfsac6XMKVCj3J5pqmn8cvhyALO3AjB+kU921xhRe9OyMpL
zlBb/4B9AB4Yf6EMGLecbzaf/WX2m+L/vS+AJdD7D88X9k4kSnT4WBs90gUwyD/I
wCGadWQtZvrwH6LENdiuuyLdHldmG76hnHjglIBJSkQCqTBFlnvHwlYI7QQ3AXd7
TrRS2G7tcMaAd0tyIJ9FaaZdlgmc7wTQjvaJz1oTwx8nHRo4ApwEnRs1oCxyDO3I
L+cfVcQLdsHnxdUeCssLkJHAfjU4HC8jweh7u1Q0LDcrdJb3nMkwzYrHTXunGrK6
TELnsbnRNrYmvsV0AWEiJB0ymgoCuhapqSUuaLyWaeoKXOPKrZltkRsPBKbUGaIm
V3z/XjtMJenbOQ==
=tmof
-----END PGP SIGNATURE-----
Merge tag 'riscv-mw2-6.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into for-next
riscv patches for 6.15-rc1, part 2
* A bunch of fixes:
- 2 fixes in the purgatory code which prevented kexec to work
- Workaround an issue with gcc-15
* tag 'riscv-mw2-6.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux:
riscv: Add norvc after .option arch in runtime const
riscv: Make sure toolchain supports zba before using zba instructions
riscv/purgatory: 4B align purgatory_start
riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator
selftests: riscv: fix v_exec_initval_nolibc.c
riscv: Fix hugetlb retrieval of number of ptes in case of !present pte
riscv: print hartid on bringup
dt-bindings: riscv: document vector crypto requirements
dt-bindings: riscv: add vector sub-extension dependencies
dt-bindings: riscv: d requires f
RISC-V: add f & d extension validation checks
RISC-V: add vector crypto extension validation checks
RISC-V: add vector extension validation checks
To support fast gup, the commit 69be3fb111 ("riscv: enable
MMU_GATHER_RCU_TABLE_FREE for SMP && MMU") did the following:
1) use tlb_remove_page_ptdesc() for those platforms which use IPI to
perform TLB shootdown
2) use tlb_remove_ptdesc() for those platforms which use SBI to perform
TLB shootdown
The tlb_remove_page_ptdesc() is the wrapper of the tlb_remove_page(). By
design, the tlb_remove_page() should be used to remove a normal page from
a page table entry, and should not be used for page table pages.
The tlb_remove_ptdesc() is the wrapper of the tlb_remove_table(), which is
designed specifically for freeing page table pages. If the
CONFIG_MMU_GATHER_TABLE_FREE is enabled, the tlb_remove_table() will use
semi RCU to free page table pages, that is:
- batch table freeing: asynchronous free by RCU
- single table freeing: IPI + synchronous free
If the CONFIG_MMU_GATHER_TABLE_FREE is disabled, the tlb_remove_table()
will fall back to pagetable_dtor() + tlb_remove_page().
For case 1), since we need to perform TLB shootdown before freeing the
page table page, the local_irq_save() in fast gup can block the freeing
and protect the fast gup page walker. Therefore we can ensure safety by
just using tlb_remove_page_ptdesc(). In addition, we can also the
tlb_remove_ptdesc()/tlb_remove_table() to achieve it, and it doesn't
matter whether CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected. And in
theory, the performance of freeing pages asynchronously via RCU will not
be lower than synchronous free.
For case 2), since local_irq_save() only disable S-privilege IPI irq but
not M-privilege's, which is used by the SBI implementation to perform TLB
shootdown, so we must select CONFIG_MMU_GATHER_RCU_TABLE_FREE and use
tlb_remove_ptdesc() to ensure safety. The riscv selects this config for
SMP && MMU, the CONFIG_RISCV_SBI is dependent on MMU. Therefore, only the
UP system may have the situation where CONFIG_MMU_GATHER_RCU_TABLE_FREE is
disabled but CONFIG_RISCV_SBI is enabled. But there is no freeing vs fast
gup race in the UP system.
So, in summary, we can use tlb_remove_ptdesc() to support fast gup in all
cases, and this interface is specifically designed for page table pages.
So let's use it unconditionally.
Link: https://lkml.kernel.org/r/9025595e895515515c95e48db54b29afa489c41d.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
reservation" from Sourabh Jain changes powerpc's kexec code to use more
of the generic layers.
- The 2 patch series "get_maintainer: report subsystem status
separately" from Vlastimil Babka makes some long-requested improvements
to the get_maintainer output.
- The 4 patch series "ucount: Simplify refcounting with rcuref_t" from
Sebastian Siewior cleans up and optimizing the refcounting in the ucount
code.
- The 12 patch series "reboot: support runtime configuration of
emergency hw_protection action" from Ahmad Fatoum improves the ability
for a driver to perform an emergency system shutdown or reboot.
- The 16 patch series "Converge on using secs_to_jiffies() part two"
from Easwar Hariharan performs further migrations from
msecs_to_jiffies() to secs_to_jiffies().
- The 7 patch series "lib/interval_tree: add some test cases and
cleanup" from Wei Yang permits more userspace testing of kernel library
code, adds some more tests and performs some cleanups.
- The 2 patch series "hung_task: Dump the blocking task stacktrace" from
Masami Hiramatsu arranges for the hung_task detector to dump the stack
of the blocking task and not just that of the blocked task.
- The 4 patch series "resource: Split and use DEFINE_RES*() macros" from
Andy Shevchenko provides some cleanups to the resource definition
macros.
- Plus the usual shower of singleton patches - please see the individual
changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA
jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq
r6khQUIcQImPPcjFqEFpuiSOU0MBZA0=
=Kii8
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- The series "powerpc/crash: use generic crashkernel reservation" from
Sourabh Jain changes powerpc's kexec code to use more of the generic
layers.
- The series "get_maintainer: report subsystem status separately" from
Vlastimil Babka makes some long-requested improvements to the
get_maintainer output.
- The series "ucount: Simplify refcounting with rcuref_t" from
Sebastian Siewior cleans up and optimizing the refcounting in the
ucount code.
- The series "reboot: support runtime configuration of emergency
hw_protection action" from Ahmad Fatoum improves the ability for a
driver to perform an emergency system shutdown or reboot.
- The series "Converge on using secs_to_jiffies() part two" from Easwar
Hariharan performs further migrations from msecs_to_jiffies() to
secs_to_jiffies().
- The series "lib/interval_tree: add some test cases and cleanup" from
Wei Yang permits more userspace testing of kernel library code, adds
some more tests and performs some cleanups.
- The series "hung_task: Dump the blocking task stacktrace" from Masami
Hiramatsu arranges for the hung_task detector to dump the stack of
the blocking task and not just that of the blocked task.
- The series "resource: Split and use DEFINE_RES*() macros" from Andy
Shevchenko provides some cleanups to the resource definition macros.
- Plus the usual shower of singleton patches - please see the
individual changelogs for details.
* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
mailmap: consolidate email addresses of Alexander Sverdlin
fs/procfs: fix the comment above proc_pid_wchan()
relay: use kasprintf() instead of fixed buffer formatting
resource: replace open coded variant of DEFINE_RES()
resource: replace open coded variants of DEFINE_RES_*_NAMED()
resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
samples: add hung_task detector mutex blocking sample
hung_task: show the blocker task if the task is hung on mutex
kexec_core: accept unaccepted kexec segments' destination addresses
watchdog/perf: optimize bytes copied and remove manual NUL-termination
lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
lib/interval_tree: skip the check before go to the right subtree
lib/interval_tree: add test case for span iteration
lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
lib/rbtree: add random seed
lib/rbtree: split tests
lib/rbtree: enable userland test suite for rbtree related data structure
checkpatch: describe --min-conf-desc-length
scripts/gdb/symbols: determine KASLR offset on s390
...
Uros Bizjak uses x86 named address space qualifiers to provide
compile-time checking of percpu area accesses.
This has caused a small amount of fallout - two or three issues were
reported. In all cases the calling code was founf to be incorrect.
- The 4 patch series "Some cleanup for memcg" from Chen Ridong
implements some relatively monir cleanups for the memcontrol code.
- The 17 patch series "mm: fixes for device-exclusive entries (hmm)"
from David Hildenbrand fixes a boatload of issues which David found then
using device-exclusive PTE entries when THP is enabled. More work is
needed, but this makes thins better - our own HMM selftests now succeed.
- The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry
Ahmed remove the z3fold and zbud implementations. They have been
deprecated for half a year and nobody has complained.
- The 5 patch series "mm: further simplify VMA merge operation" from
Lorenzo Stoakes implements numerous simplifications in this area. No
runtime effects are anticipated.
- The 4 patch series "mm/madvise: remove redundant mmap_lock operations
from process_madvise()" from SeongJae Park rationalizes the locking in
the madvise() implementation. Performance gains of 20-25% were observed
in one MADV_DONTNEED microbenchmark.
- The 12 patch series "Tiny cleanup and improvements about SWAP code"
from Baoquan He contains a number of touchups to issues which Baoquan
noticed when working on the swap code.
- The 2 patch series "mm: kmemleak: Usability improvements" from Catalin
Marinas implements a couple of improvements to the kmemleak user-visible
output.
- The 2 patch series "mm/damon/paddr: fix large folios access and
schemes handling" from Usama Arif provides a couple of fixes for DAMON's
handling of large folios.
- The 3 patch series "mm/damon/core: fix wrong and/or useless
damos_walk() behaviors" from SeongJae Park fixes a few issues with the
accuracy of kdamond's walking of DAMON regions.
- The 3 patch series "expose mapping wrprotect, fix fb_defio use" from
Lorenzo Stoakes changes the interaction between framebuffer deferred-io
and core MM. No functional changes are anticipated - this is
preparatory work for the future removal of page structure fields.
- The 4 patch series "mm/damon: add support for hugepage_size DAMOS
filter" from Usama Arif adds a DAMOS filter which permits the filtering
by huge page sizes.
- The 4 patch series "mm: permit guard regions for file-backed/shmem
mappings" from Lorenzo Stoakes extends the guard region feature from its
present "anon mappings only" state. The feature now covers shmem and
file-backed mappings.
- The 4 patch series "mm: batched unmap lazyfree large folios during
reclamation" from Barry Song cleans up and speeds up the unmapping for
pte-mapped large folios.
- The 18 patch series "reimplement per-vma lock as a refcount" from
Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for
pulling it out were largely bogus and that change made the code more
messy. This patchset provides small (0-10%) improvements on one
microbenchmark.
- The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation
fixes and improves" from SeongJae Park does some maintenance work on the
DAMON docs.
- The 27 patch series "hugetlb/CMA improvements for large systems" from
Frank van der Linden addresses a pile of issues which have been observed
when using CMA on large machines.
- The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped
pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the
page's mapped/unmapped status.
- The 19 patch series "zsmalloc/zram: there be preemption" from Sergey
Senozhatsky teaches zram to run its compression and decompression
operations preemptibly.
- The 12 patch series "selftests/mm: Some cleanups from trying to run
them" from Brendan Jackman fixes a pile of unrelated issues which
Brendan encountered while runnimg our selftests.
- The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap"
from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
determine whether a particular page is a guard page.
- The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song
removes the swap slot cache from the allocation path - it simply wasn't
being effective.
- The 5 patch series "mm: cleanups for device-exclusive entries (hmm)"
from David Hildenbrand implements a number of unrelated cleanups in this
code.
- The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman
Khandual implements a number of preparatoty cleanups to the
GENERIC_PTDUMP Kconfig logic.
- The 8 patch series "mm/damon: auto-tune aggregation interval" from
SeongJae Park implements a feedback-driven automatic tuning feature for
DAMON's aggregation interval tuning.
- The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some
issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did
this in preparation for implementing lazy mmu mode for arm64 to optimize
vmalloc.
- The 2 patch series "mm/page_alloc: Some clarifications for migratetype
fallback" from Brendan Jackman reworks some commentary to make the code
easier to follow.
- The 3 patch series "page_counter cleanup and size reduction" from
Shakeel Butt cleans up the page_counter code and fixes a size increase
which we accidentally added late last year.
- The 3 patch series "Add a command line option that enables control of
how many threads should be used to allocate huge pages" from Thomas
Prescher does that. It allows the careful operator to significantly
reduce boot time by tuning the parallalization of huge page
initialization.
- The 3 patch series "Fix calculations in trace_balance_dirty_pages()
for cgwb" from Tang Yizhou fixes the tracing output from the dirty page
balancing code.
- The 9 patch series "mm/damon: make allow filters after reject filters
useful and intuitive" from SeongJae Park improves the handling of allow
and reject filters. Behaviour is made more consistent and the
documention is updated accordingly.
- The 5 patch series "Switch zswap to object read/write APIs" from Yosry
Ahmed updates zswap to the new object read/write APIs and thus permits
the removal of some legacy code from zpool and zsmalloc.
- The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang
does as it claims.
- The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts"
from Alistair Popple regularizes the weird ZONE_DEVICE page refcount
handling in DAX, permittig the removal of a number of special-case
checks.
- The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes
is a preparatoty refactoring and cleanup of the mremap() code.
- The 20 patch series "mm: MM owner tracking for large folios (!hugetlb)
+ CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
which we determine whether a large folio is known to be mapped
exclusively into a single MM.
- The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS
filters based on handling layers" from SeongJae Park adds a couple of
new sysfs directories to ease the management of DAMON/DAMOS filters.
- The 13 patch series "arch, mm: reduce code duplication in mem_init()"
from Mike Rapoport consolidates many per-arch implementations of
mem_init() into code generic code, where that is practical.
- The 13 patch series "mm/damon/sysfs: commit parameters online via
damon_call()" from SeongJae Park continues the cleaning up of sysfs
access to DAMON internal data.
- The 3 patch series "mm: page_ext: Introduce new iteration API" from
Luiz Capitulino reworks the page_ext initialization to fix a boot-time
crash which was observed with an unusual combination of compile and
cmdline options.
- The 8 patch series "Buddy allocator like (or non-uniform) folio split"
from Zi Yan reworks the code to split a folio into smaller folios. The
main benefit is lessened memory consumption: fewer post-split folios are
generated.
- The 2 patch series "Minimize xa_node allocation during xarry split"
from Zi Yan reduces the number of xarray xa_nodes which are generated
during an xarray split.
- The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan
performs some maintenance work on the drivers/base/memory code.
- The 3 patch series "Add tracepoints for lowmem reserves, watermarks
and totalreserve_pages" from Martin Liu adds some more tracepoints to
the page allocator code.
- The 4 patch series "mm/madvise: cleanup requests validations and
classifications" from SeongJae Park cleans up some warts which SeongJae
observed during his earlier madvise work.
- The 3 patch series "mm/hwpoison: Fix regressions in memory failure
handling" from Shuai Xue addresses two quite serious regressions which
Shuai has observed in the memory-failure implementation.
- The 5 patch series "mm: reliable huge page allocator" from Johannes
Weiner makes huge page allocations cheaper and more reliable by reducing
fragmentation.
- The 5 patch series "Minor memcg cleanups & prep for memdescs" from
Matthew Wilcox is preparatory work for the future implementation of
memdescs.
- The 4 patch series "track memory used by balloon drivers" from Nico
Pache introduces a way to track memory used by our various balloon
drivers.
- The 2 patch series "mm/damon: introduce DAMOS filter type for active
pages" from Nhat Pham permits users to filter for active/inactive pages,
separately for file and anon pages.
- The 2 patch series "Adding Proactive Memory Reclaim Statistics" from
Hao Jia separates the proactive reclaim statistics from the direct
reclaim statistics.
- The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio"
from Jinjiang Tu fixes our handling of hwpoisoned pages within the
reclaim code.
-----BEGIN PGP SIGNATURE-----
iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA
jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF
Ee93qSSLR1BkNdDw+931Pu0mXfbnBw==
=Pn2K
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- The series "Enable strict percpu address space checks" from Uros
Bizjak uses x86 named address space qualifiers to provide
compile-time checking of percpu area accesses.
This has caused a small amount of fallout - two or three issues were
reported. In all cases the calling code was found to be incorrect.
- The series "Some cleanup for memcg" from Chen Ridong implements some
relatively monir cleanups for the memcontrol code.
- The series "mm: fixes for device-exclusive entries (hmm)" from David
Hildenbrand fixes a boatload of issues which David found then using
device-exclusive PTE entries when THP is enabled. More work is
needed, but this makes thins better - our own HMM selftests now
succeed.
- The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
remove the z3fold and zbud implementations. They have been deprecated
for half a year and nobody has complained.
- The series "mm: further simplify VMA merge operation" from Lorenzo
Stoakes implements numerous simplifications in this area. No runtime
effects are anticipated.
- The series "mm/madvise: remove redundant mmap_lock operations from
process_madvise()" from SeongJae Park rationalizes the locking in the
madvise() implementation. Performance gains of 20-25% were observed
in one MADV_DONTNEED microbenchmark.
- The series "Tiny cleanup and improvements about SWAP code" from
Baoquan He contains a number of touchups to issues which Baoquan
noticed when working on the swap code.
- The series "mm: kmemleak: Usability improvements" from Catalin
Marinas implements a couple of improvements to the kmemleak
user-visible output.
- The series "mm/damon/paddr: fix large folios access and schemes
handling" from Usama Arif provides a couple of fixes for DAMON's
handling of large folios.
- The series "mm/damon/core: fix wrong and/or useless damos_walk()
behaviors" from SeongJae Park fixes a few issues with the accuracy of
kdamond's walking of DAMON regions.
- The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
Stoakes changes the interaction between framebuffer deferred-io and
core MM. No functional changes are anticipated - this is preparatory
work for the future removal of page structure fields.
- The series "mm/damon: add support for hugepage_size DAMOS filter"
from Usama Arif adds a DAMOS filter which permits the filtering by
huge page sizes.
- The series "mm: permit guard regions for file-backed/shmem mappings"
from Lorenzo Stoakes extends the guard region feature from its
present "anon mappings only" state. The feature now covers shmem and
file-backed mappings.
- The series "mm: batched unmap lazyfree large folios during
reclamation" from Barry Song cleans up and speeds up the unmapping
for pte-mapped large folios.
- The series "reimplement per-vma lock as a refcount" from Suren
Baghdasaryan puts the vm_lock back into the vma. Our reasons for
pulling it out were largely bogus and that change made the code more
messy. This patchset provides small (0-10%) improvements on one
microbenchmark.
- The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
improves" from SeongJae Park does some maintenance work on the DAMON
docs.
- The series "hugetlb/CMA improvements for large systems" from Frank
van der Linden addresses a pile of issues which have been observed
when using CMA on large machines.
- The series "mm/damon: introduce DAMOS filter type for unmapped pages"
from SeongJae Park enables users of DMAON/DAMOS to filter my the
page's mapped/unmapped status.
- The series "zsmalloc/zram: there be preemption" from Sergey
Senozhatsky teaches zram to run its compression and decompression
operations preemptibly.
- The series "selftests/mm: Some cleanups from trying to run them" from
Brendan Jackman fixes a pile of unrelated issues which Brendan
encountered while runnimg our selftests.
- The series "fs/proc/task_mmu: add guard region bit to pagemap" from
Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
determine whether a particular page is a guard page.
- The series "mm, swap: remove swap slot cache" from Kairui Song
removes the swap slot cache from the allocation path - it simply
wasn't being effective.
- The series "mm: cleanups for device-exclusive entries (hmm)" from
David Hildenbrand implements a number of unrelated cleanups in this
code.
- The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
implements a number of preparatoty cleanups to the GENERIC_PTDUMP
Kconfig logic.
- The series "mm/damon: auto-tune aggregation interval" from SeongJae
Park implements a feedback-driven automatic tuning feature for
DAMON's aggregation interval tuning.
- The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
preparation for implementing lazy mmu mode for arm64 to optimize
vmalloc.
- The series "mm/page_alloc: Some clarifications for migratetype
fallback" from Brendan Jackman reworks some commentary to make the
code easier to follow.
- The series "page_counter cleanup and size reduction" from Shakeel
Butt cleans up the page_counter code and fixes a size increase which
we accidentally added late last year.
- The series "Add a command line option that enables control of how
many threads should be used to allocate huge pages" from Thomas
Prescher does that. It allows the careful operator to significantly
reduce boot time by tuning the parallalization of huge page
initialization.
- The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
from Tang Yizhou fixes the tracing output from the dirty page
balancing code.
- The series "mm/damon: make allow filters after reject filters useful
and intuitive" from SeongJae Park improves the handling of allow and
reject filters. Behaviour is made more consistent and the documention
is updated accordingly.
- The series "Switch zswap to object read/write APIs" from Yosry Ahmed
updates zswap to the new object read/write APIs and thus permits the
removal of some legacy code from zpool and zsmalloc.
- The series "Some trivial cleanups for shmem" from Baolin Wang does as
it claims.
- The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
Alistair Popple regularizes the weird ZONE_DEVICE page refcount
handling in DAX, permittig the removal of a number of special-case
checks.
- The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
preparatoty refactoring and cleanup of the mremap() code.
- The series "mm: MM owner tracking for large folios (!hugetlb) +
CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
which we determine whether a large folio is known to be mapped
exclusively into a single MM.
- The series "mm/damon: add sysfs dirs for managing DAMOS filters based
on handling layers" from SeongJae Park adds a couple of new sysfs
directories to ease the management of DAMON/DAMOS filters.
- The series "arch, mm: reduce code duplication in mem_init()" from
Mike Rapoport consolidates many per-arch implementations of
mem_init() into code generic code, where that is practical.
- The series "mm/damon/sysfs: commit parameters online via
damon_call()" from SeongJae Park continues the cleaning up of sysfs
access to DAMON internal data.
- The series "mm: page_ext: Introduce new iteration API" from Luiz
Capitulino reworks the page_ext initialization to fix a boot-time
crash which was observed with an unusual combination of compile and
cmdline options.
- The series "Buddy allocator like (or non-uniform) folio split" from
Zi Yan reworks the code to split a folio into smaller folios. The
main benefit is lessened memory consumption: fewer post-split folios
are generated.
- The series "Minimize xa_node allocation during xarry split" from Zi
Yan reduces the number of xarray xa_nodes which are generated during
an xarray split.
- The series "drivers/base/memory: Two cleanups" from Gavin Shan
performs some maintenance work on the drivers/base/memory code.
- The series "Add tracepoints for lowmem reserves, watermarks and
totalreserve_pages" from Martin Liu adds some more tracepoints to the
page allocator code.
- The series "mm/madvise: cleanup requests validations and
classifications" from SeongJae Park cleans up some warts which
SeongJae observed during his earlier madvise work.
- The series "mm/hwpoison: Fix regressions in memory failure handling"
from Shuai Xue addresses two quite serious regressions which Shuai
has observed in the memory-failure implementation.
- The series "mm: reliable huge page allocator" from Johannes Weiner
makes huge page allocations cheaper and more reliable by reducing
fragmentation.
- The series "Minor memcg cleanups & prep for memdescs" from Matthew
Wilcox is preparatory work for the future implementation of memdescs.
- The series "track memory used by balloon drivers" from Nico Pache
introduces a way to track memory used by our various balloon drivers.
- The series "mm/damon: introduce DAMOS filter type for active pages"
from Nhat Pham permits users to filter for active/inactive pages,
separately for file and anon pages.
- The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
separates the proactive reclaim statistics from the direct reclaim
statistics.
- The series "mm/vmscan: don't try to reclaim hwpoison folio" from
Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
code.
* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
x86/mm: restore early initialization of high_memory for 32-bits
mm/vmscan: don't try to reclaim hwpoison folio
mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
selftests/mm: speed up split_huge_page_test
selftests/mm: uffd-unit-tests support for hugepages > 2M
docs/mm/damon/design: document active DAMOS filter type
mm/damon: implement a new DAMOS filter type for active pages
fs/dax: don't disassociate zero page entries
MM documentation: add "Unaccepted" meminfo entry
selftests/mm: add commentary about 9pfs bugs
fork: use __vmalloc_node() for stack allocation
docs/mm: Physical Memory: Populate the "Zones" section
xen: balloon: update the NR_BALLOON_PAGES state
hv_balloon: update the NR_BALLOON_PAGES state
balloon_compaction: update the NR_BALLOON_PAGES state
meminfo: add a per node counter for balloon drivers
mm: remove references to folio in __memcg_kmem_uncharge_page()
...
When a crashkernel is launched on RISC-V, the entry to purgatory is
done by trapping via the stvec CSR. From riscv_kexec_norelocate():
| ...
| /*
| * Switch to physical addressing
| * This will also trigger a jump to CSR_STVEC
| * which in this case is the address of the new
| * kernel.
| */
| csrw CSR_STVEC, a2
| csrw CSR_SATP, zero
stvec requires that the address is 4B aligned, which was not the case,
e.g.:
| Loaded purgatory at 0xffffc000
| kexec_file: kexec_file_load: type:1, start:0xffffd232 head:0x4 flags:0x6
The address 0xffffd232 not 4B aligned.
Correct by adding proper function alignment.
With this change, crashkernels loaded with kexec-file will be able to
properly enter the purgatory.
Fixes: 736e30af58 ("RISC-V: Add purgatory")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250328085313.1193815-1-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Commit 58ff537109 ("riscv: Omit optimized string routines when
using KASAN") introduced calls to EXPORT_SYMBOL() in assembly string
routines, which result in R_RISCV_64 relocations against
.export_symbol section. As these rountines are reused by RISC-V
purgatory and our relocator doesn't recognize these relocations, this
fails kexec-file-load with dmesg like
[ 11.344251] kexec_image: Unknown rela relocation: 2
[ 11.345972] kexec_image: Error loading purgatory ret=-8
Let's support R_RISCV_64 relocation to fix kexec on 64-bit RISC-V.
32-bit variant isn't covered since KEXEC_FILE and KEXEC_PURGATORY isn't
available.
Fixes: 58ff537109 ("riscv: Omit optimized string routines when using KASAN")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250326051445.55131-2-ziyao@disroot.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Conor Dooley <conor@kernel.org> says:
From: Conor Dooley <conor.dooley@microchip.com>
Yo,
This series is partly leveraging Clement's work adding a validate
callback in the extension detection code so that things like checking
for whether a vector crypto extension is usable can be done like:
has_extension(<vector crypto>)
rather than
has_vector() && has_extension(<vector crypto>)
which Eric pointed out was a poor design some months ago.
The rest of this is adding some requirements to the bindings that
prevent combinations of extensions disallowed by the ISA.
There's a bunch of over-long lines in here, but I thought that the
over-long lines were clearer than breaking them up.
Cheers,
Conor.
* patches from https://lore.kernel.org/r/20250312-abide-pancreas-3576b8c44d2c@spud:
dt-bindings: riscv: document vector crypto requirements
dt-bindings: riscv: add vector sub-extension dependencies
dt-bindings: riscv: d requires f
RISC-V: add f & d extension validation checks
RISC-V: add vector crypto extension validation checks
RISC-V: add vector extension validation checks
Link: https://lore.kernel.org/r/20250312-abide-pancreas-3576b8c44d2c@spud
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Ryan sent a fix [1] for arm64 that applies to riscv too: in some hugetlb
functions, we must not use the pte value to get the size of a mapping
because the pte may not be present.
So use the already present size parameter for huge_pte_clear() and the
newly introduced size parameter for huge_ptep_get_and_clear(). And make
sure to gather A/D bits only on present ptes.
Fixes: 82a1a1f3bf ("riscv: mm: support Svnapot in hugetlb page")
Link: https://lore.kernel.org/all/20250217140419.1702389-1-ryan.roberts@arm.com/ [1]
Link: https://lore.kernel.org/r/20250317072551.572169-1-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Firmware randomly releases cores, so CPU numbers don't linearly map
to hartids. When the system has an exception, we care more about hartids.
Adding "dyndbg="file smpboot.c +p" loglevel=8" to the cmdline can output
the hartid.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250303083424.14309-1-cuiyunhui@bytedance.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
There is new support for additional on-chip devices on Apple, Mediatek,
Renesas, Rockchip, Samsung, Google, TI, ST, Nvidia and Amlogic devices.
The Arm Morello reference platform gets a devicetree for booting in
normal aarch64 mode. The hardware supports experimental CHERI support,
which requires a modified kernel.
The AMD (formerly Xilinx) Versal NET SoC gets added, this is a combined
FPGA with Cortex-A78 CPUs in a SoC.
Six new ST STM32MP2 SoC variants are added. Like the earlier STM32MP25,
the MP211, MP213, MP215, MP231, MP233 and MP235 models are based on one
or two Cortex-A35 cores but each feature a different set of I/O devices.
Mediatek MT8370 is a minor variation of MT8390 with fewer CPU and
GPU cores
Apple T2 is the baseboard management controller on earlier Intel CPU
based Macs, with 16 models now gaining initial support.
All the above come with dts files for the reference boards. In
addition, these boards are added for the SoCs that are already supported.
- The Milk-V Jupiter board based on SpacemiT K1/M1
- NetCube Systems Kumquat board based on the 32-bit Allwinner V3s SoC
- Three boards based on 32-bit stm32mp1
- 11 distinct board variants from Toradex and one from Variscite,
all based on i.MX6
- Google Pixel Pro 6 phone based on gs101 (Tensor)
- Three additional variants of the i.MX8MP based "Skov" board
- A second variant of the i.MX95 EVK board
- Two boards based on Renesas SoCs
- Four boards based the Rockchip RK35xx series, plus the RK3588
"MNT Reform 2" laptop
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmfkN4kACgkQYKtH/8kJ
Uied/g/+OA7VTWS9K9DMjmrDvrMn73GPRRuC3Se7NutHt5kQeQ8I0oFk3VILxiLx
/Rck8imIac65ANejy7M0omPYbVak6lyC0PRT9v/gZT++3rQzbd5/MAXHyKftVk6V
aEgIcFcbu3805dPf+ioIsyk5DshhlqRg0o3u9iMIJlzaykoLapSirnOW0/dcbz8V
kZlxuJRkQopx/+7/sbOtKXcL6ybif9jmgflQHUdGBT7Z7lAnhnIceelfKWPCQPki
TrQXG5YITJXEebyM399SwnAOl9pXs6z4KSYsULAfCONKOGwIXmcD1wDhZOtDUvoH
tu/V07R70fbEhJMTY5eZEc4Ym0firfk3IU3VRGxxOZJEuBOXN49snZzutNZEyqjX
WOjy4jgHTHhQQKc7++mB+5G7zfBPI2gcMnLWgU18Bq8LRcdLGOYPfQ6+PvkMJI8K
Vn/otYr80cSoJIPAmJZ9KvZiRmdqkK1YLxfOcNNjmFwk2YJxta7wmnbYX4g+zkkT
ZdlU5+iQp0BQnd37m3WU+b4Ed60dH/g5bflj6TzXEYqiHY4Kxoud9Z7AQTPj4FIu
PJiz+7J1p4a/uKK8BbVMjvnomARggdacrVDTB2mCfr6Av6c9RifAcREm0x6hZ2L8
dTR5fnh8zJ7sbmFwoS6ncUp4hgqotfZ/fYfK91xJ9rpyucOpnTE=
=6ZZ6
-----END PGP SIGNATURE-----
Merge tag 'soc-dt-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC devicetree updates from Arnd Bergmann:
"There is new support for additional on-chip devices on Apple,
Mediatek, Renesas, Rockchip, Samsung, Google, TI, ST, Nvidia and
Amlogic devices.
The Arm Morello reference platform gets a devicetree for booting in
normal aarch64 mode. The hardware supports experimental CHERI support,
which requires a modified kernel.
The AMD (formerly Xilinx) Versal NET SoC gets added, this is a
combined FPGA with Cortex-A78 CPUs in a SoC.
Six new ST STM32MP2 SoC variants are added. Like the earlier
STM32MP25, the MP211, MP213, MP215, MP231, MP233 and MP235 models are
based on one or two Cortex-A35 cores but each feature a different set
of I/O devices.
Mediatek MT8370 is a minor variation of MT8390 with fewer CPU and GPU
cores
Apple T2 is the baseboard management controller on earlier Intel CPU
based Macs, with 16 models now gaining initial support.
All the above come with dts files for the reference boards. In
addition, these boards are added for the SoCs that are already
supported:
- The Milk-V Jupiter board based on SpacemiT K1/M1
- NetCube Systems Kumquat board based on the 32-bit Allwinner V3s SoC
- Three boards based on 32-bit stm32mp1
- 11 distinct board variants from Toradex and one from Variscite, all
based on i.MX6
- Google Pixel Pro 6 phone based on gs101 (Tensor)
- Three additional variants of the i.MX8MP based "Skov" board
- A second variant of the i.MX95 EVK board
- Two boards based on Renesas SoCs
- Four boards based the Rockchip RK35xx series, plus the RK3588 'MNT
Reform 2' laptop"
* tag 'soc-dt-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (538 commits)
arm64: dts: Add gpio_intc node for Amlogic A5 SoCs
arm64: dts: Add gpio_intc node for Amlogic A4 SoCs
arm64: dts: hi3660: Add property for fixing CPUIdle
arm64: dts: rockchip: remove ethm0_clk0_25m_out from Sige5 gmac0
arm64: dts: marvell: Use preferred node names for "simple-bus"
arm64: dts: marvell: Drop unused CP11X_TYPE define
arm64: dts: marvell: Move arch timer and pmu nodes to top-level
arm64: dts: rockchip: Fix PWM pinctrl names
arm64: dts: rockchip: fix RK3576 SCMI clock IDs
dt-bindings: clock: rk3576: add SCMI clocks
arm64: dts: rockchip: Fix pcie reset gpio on Orange Pi 5 Max
arm64: dts: amd/seattle: Drop undocumented "spi-controller" properties
arm64: dts: amd/seattle: Fix bus, mmc, and ethernet node names
arm64: dts: amd/seattle: Move and simplify fixed clocks
arm64: dts: amd/seattle: Base Overdrive B1 on top of B0 version
arm64: dts: rockchip: Enable HDMI audio output for ArmSoM Sige7
arm64: dts: rockchip: Enable onboard eMMC on Radxa E20C
arm64: dts: rockchip: Add SDHCI controller for RK3528
arm64: dts: rockchip: Remove bluetooth node from rock-3a
arm64: dts: rockchip: Move rk356x scmi SHMEM to reserved memory
...
Core & protocols
----------------
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls).
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked)
in BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream performance
up to 2x.
- Improve performance of contended connect() by 200% by searching
for an available 4-tuple under RCU rather than a spin lock.
Bring an additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles.
There are up to 4 namespace roles but we used to have just 2 netns
pointer arguments, interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout
in TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar
to normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name
to messages as metadata
Driver API
----------
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where possible.
Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers
--------------
- Remove the IBM LCS driver for s390.
- Remove the sb1000 cable modem driver.
- Add support for SFP module access over SMBus.
- Add MCTP transport driver for MCTP-over-USB.
- Enable XDP metadata support in multiple drivers.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96% with
1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfkLC8ACgkQMUZtbf5S
Irsb5g/+L7oKOf0ALbaV9kxFsoz8AymZfAW9i/27F07omGJGpks8oX6j6rQLgIRO
OQOFcp7XEdDh1+jh82gHVuPrw2/6lchLtW8ARtzdiQKFr5DRjrsbtua6GRc8iBqA
DIRCBFoV2HuMkF39Vr09HMa9AZAT7QR2RLsRGpSq8E8Z8xxKz0X7oujs10PFpMTE
IVKhTrVrk+NDot/IU2hzVpnpup+0ld+T2/ZaBklJGcU8uDffImsqNepHRyCG5UC3
xz74Ju23MAj24Gct+og0yFUooF+lUltKyVm0FYCDCY3bASTwgY01NR3kEH/0NQvM
cywLzd/ngHm/SMD2ggVAHkjZUieiIVHdaZ53dgjDeBOQoVP6p0dgUK7EumXX8Mx4
8ReR2UiGoYRPaq9c4o+IjG4K027MwVK2p+mF1a6MLa+20XcyMbev8FIRbbHtC/V4
z5/FsOAxcuICWkA1hU9bODrrGzIqemmdRgKG8sGuTJCt/kYGAn72/TCATGNSaCJ0
00n2jN1aepa7wtywHJ5MhVzxN9iQX7+geUHXz0BI+lK4e1Pmk+vjGksymb9ai2fk
eQAUV9ekub6q68/J16scD7XeOUM37bTLiMBQeIF8UtZBOJscKiS71zn9QP9Twwxv
P2pm01RDZUI+z5ZX3hc12Pm1vjRHaAh9S1JpAw/pTOVlQ+mAJEM=
=XY0S
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls)
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked) in
BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream
performance up to 2x.
- Improve performance of contended connect() by 200% by searching for
an available 4-tuple under RCU rather than a spin lock. Bring an
additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles. There are up to 4
namespace roles but we used to have just 2 netns pointer arguments,
interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout in
TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a
module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name to
messages as metadata
Driver API:
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where
possible. Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers:
- Remove the IBM LCS driver for s390
- Remove the sb1000 cable modem driver
- Add support for SFP module access over SMBus
- Add MCTP transport driver for MCTP-over-USB
- Enable XDP metadata support in multiple drivers
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD
platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96%
with 1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused
cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at
runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7"
* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
unix: fix up for "apparmor: add fine grained af_unix mediation"
mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
net: usb: asix: ax88772: Increase phy_name size
net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
net: libwx: fix Tx L4 checksum
net: libwx: fix Tx descriptor content for some tunnel packets
atm: Fix NULL pointer dereference
net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
net: phy: aquantia: add essential functions to aqr105 driver
net: phy: aquantia: search for firmware-name in fwnode
net: phy: aquantia: add probe function to aqr105 for firmware loading
net: phy: Add swnode support to mdiobus_scan
gve: add XDP DROP and PASS support for DQ
gve: update XDP allocation path support RX buffer posting
gve: merge packet buffer size fields
gve: update GQ RX to use buf_size
gve: introduce config-based allocation for XDP
gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
...
Samuel Holland <samuel.holland@sifive.com> says:
Currently, RISC-V NOMMU kernels are linked at CONFIG_PAGE_OFFSET, and
since they are not relocatable, must be loaded at this address as well.
CONFIG_PAGE_OFFSET is not a user-visible Kconfig option, so its value is
not obvious, and users must patch the kernel source if they want to load
it at a different address.
Make NOMMU kernels more portable by making them relocatable by default.
This allows a single kernel binary to work when loaded at any address.
* b4-shazam-merge:
riscv: Remove CONFIG_PAGE_OFFSET
riscv: Support CONFIG_RELOCATABLE on riscv32
asm-generic: Always define Elf_Rel and Elf_Rela
riscv: Support CONFIG_RELOCATABLE on NOMMU
riscv: Allow NOMMU kernels to access all of RAM
riscv: Remove duplicate CONFIG_PAGE_OFFSET definition
Link: https://lore.kernel.org/r/20241026171441.3047904-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The current definition of CONFIG_PAGE_OFFSET is problematic for a couple
of reasons:
1) The value is misleading for normal 64-bit kernels, where it is
overridden at runtime if Sv48 or Sv39 is chosen. This is especially
the case for XIP kernels, which always use Sv39.
2) The option is not user-visible, but for NOMMU kernels it must be a
valid RAM address, and for !RELOCATABLE it must additionally be the
exact address where the kernel is loaded.
Fix both of these by removing the option.
1) For MMU kernels, drop the indirection through Kconfig. Additionally,
for XIP, drop the indirection through kernel_map.
2) For NOMMU kernels, use the user-visible physical RAM base if
provided. Otherwise, force the kernel to be relocatable.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Jesse Taube <mr.bossman075@gmail.com>
Link: https://lore.kernel.org/r/20241026171441.3047904-7-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When adjusted to use the correctly-sized ELF types, relocate_kernel()
works on riscv32 as well. The caveat about crossing an intermediate page
table boundary does not apply to riscv32, since for Sv32 the early
kernel mapping uses only PGD entries. Since KASLR is not yet supported
on riscv32, this option is mostly useful for NOMMU.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20241026171441.3047904-6-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Move relocate_kernel() out of the CONFIG_MMU block so it can be called
from the NOMMU version of setup_vm(). Set some offsets in kernel_map so
relocate_kernel() does not need to be modified. Relocatable NOMMU
kernels can be loaded to any physical memory address; they no longer
depend on CONFIG_PAGE_OFFSET.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20241026171441.3047904-4-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
NOMMU kernels currently cannot access memory below the kernel link
address. Remove this restriction by setting PAGE_OFFSET to the actual
start of RAM, as determined from the devicetree. The kernel link address
must be a constant, so keep using CONFIG_PAGE_OFFSET for that purpose.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Jesse Taube <mr.bossman075@gmail.com>
Link: https://lore.kernel.org/r/20241026171441.3047904-3-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This definition is already provided by include/generated/autoconf.h,
so it does not need to be provided on the command line.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Jesse Taube <mr.bossman075@gmail.com>
Link: https://lore.kernel.org/r/20241026171441.3047904-2-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
We're trying to mix non-PIC/PIE objects into the otherwise-PIE
relocatable kernels, to avoid GOT/PLT references during early boot
alternative resolution (which happens before the GOT/PLT are set up).
riscv64-unknown-linux-gnu-ld: arch/riscv/errata/sifive/errata.o: relocation R_RISCV_HI20 against `tlb_flush_all_threshold' can not be used when making a shared object; recompile with -fPIC
riscv64-unknown-linux-gnu-ld: arch/riscv/errata/thead/errata.o: relocation R_RISCV_HI20 against `riscv_cbom_block_size' can not be used when making a shared object; recompile with -fPIC
Fixes: 8dc2a7e802 ("riscv: Fix relocatable kernels with early alternatives using -fno-pie")
Link: https://lore.kernel.org/r/20250326224506.27165-2-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Another set of improvements to the kernel's CRC (cyclic redundancy
check) code:
- Rework the CRC64 library functions to be directly optimized, like what
I did last cycle for the CRC32 and CRC-T10DIF library functions.
- Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ
support and acceleration for crc64_be and crc64_nvme.
- Rewrite the riscv Zbc-optimized CRC code, and add acceleration for
crc_t10dif, crc64_be, and crc64_nvme.
- Remove crc_t10dif and crc64_rocksoft from the crypto API, since they
are no longer needed there.
- Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect.
- Add kunit test cases for crc64_nvme and crc7.
- Eliminate redundant functions for calculating the Castagnoli CRC32,
settling on just crc32c().
- Remove unnecessary prompts from some of the CRC kconfig options.
- Further optimize the x86 crc32c code.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZ+CGGhQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK3wRAP4tbnzawUmlIHIF0hleoADXehUgAhMt
NZn15mGvyiuwIQEA8W9qvnLdFXZkdxhxAEvDDFjyrRauL6eGtr/GvCx4AQY=
=wmKG
-----END PGP SIGNATURE-----
Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
"Another set of improvements to the kernel's CRC (cyclic redundancy
check) code:
- Rework the CRC64 library functions to be directly optimized, like
what I did last cycle for the CRC32 and CRC-T10DIF library
functions
- Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ
support and acceleration for crc64_be and crc64_nvme
- Rewrite the riscv Zbc-optimized CRC code, and add acceleration for
crc_t10dif, crc64_be, and crc64_nvme
- Remove crc_t10dif and crc64_rocksoft from the crypto API, since
they are no longer needed there
- Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect
- Add kunit test cases for crc64_nvme and crc7
- Eliminate redundant functions for calculating the Castagnoli CRC32,
settling on just crc32c()
- Remove unnecessary prompts from some of the CRC kconfig options
- Further optimize the x86 crc32c code"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (36 commits)
x86/crc: drop the avx10_256 functions and rename avx10_512 to avx512
lib/crc: remove unnecessary prompt for CONFIG_CRC64
lib/crc: remove unnecessary prompt for CONFIG_LIBCRC32C
lib/crc: remove unnecessary prompt for CONFIG_CRC8
lib/crc: remove unnecessary prompt for CONFIG_CRC7
lib/crc: remove unnecessary prompt for CONFIG_CRC4
lib/crc7: unexport crc7_be_syndrome_table
lib/crc_kunit.c: update comment in crc_benchmark()
lib/crc_kunit.c: add test and benchmark for crc7_be()
x86/crc32: optimize tail handling for crc32c short inputs
riscv/crc64: add Zbc optimized CRC64 functions
riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function
riscv/crc32: reimplement the CRC32 functions using new template
riscv/crc: add "template" for Zbc optimized CRC functions
x86/crc: add ANNOTATE_NOENDBR to suppress objtool warnings
x86/crc32: improve crc32c_arch() code generation with clang
x86/crc64: implement crc64_be and crc64_nvme using new template
x86/crc-t10dif: implement crc_t10dif using new template
x86/crc32: implement crc32_le using new template
x86/crc: add "template" for [V]PCLMULQDQ based CRC functions
...
* Nested virtualization support for VGICv3, giving the nested
hypervisor control of the VGIC hardware when running an L2 VM
* Removal of 'late' nested virtualization feature register masking,
making the supported feature set directly visible to userspace
* Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
* Paravirtual interface for discovering the set of CPU implementations
where a VM may run, addressing a longstanding issue of guest CPU
errata awareness in big-little systems and cross-implementation VM
migration
* Userspace control of the registers responsible for identifying a
particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
allowing VMs to be migrated cross-implementation
* pKVM updates, including support for tracking stage-2 page table
allocations in the protected hypervisor in the 'SecPageTable' stat
* Fixes to vPMU, ensuring that userspace updates to the vPMU after
KVM_RUN are reflected into the backing perf events
LoongArch:
* Remove unnecessary header include path
* Assume constant PGD during VM context switch
* Add perf events support for guest VM
RISC-V:
* Disable the kernel perf counter during configure
* KVM selftests improvements for PMU
* Fix warning at the time of KVM module removal
x86:
* Add support for aging of SPTEs without holding mmu_lock. Not taking mmu_lock
allows multiple aging actions to run in parallel, and more importantly avoids
stalling vCPUs. This includes an implementation of per-rmap-entry locking;
aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas
locking an rmap for write requires taking both the per-rmap spinlock and
the mmu_lock.
Note that this decreases slightly the accuracy of accessed-page information,
because changes to the SPTE outside aging might not use atomic operations
even if they could race against a clear of the Accessed bit. This is
deliberate because KVM and mm/ tolerate false positives/negatives for
accessed information, and testing has shown that reducing the latency of
aging is far more beneficial to overall system performance than providing
"perfect" young/old information.
* Defer runtime CPUID updates until KVM emulates a CPUID instruction, to
coalesce updates when multiple pieces of vCPU state are changing, e.g. as
part of a nested transition.
* Fix a variety of nested emulation bugs, and add VMX support for synthesizing
nested VM-Exit on interception (instead of injecting #UD into L2).
* Drop "support" for async page faults for protected guests that do not set
SEND_ALWAYS (i.e. that only want async page faults at CPL3)
* Bring a bit of sanity to x86's VM teardown code, which has accumulated
a lot of cruft over the years. Particularly, destroy vCPUs before
the MMU, despite the latter being a VM-wide operation.
* Add common secure TSC infrastructure for use within SNP and in the
future TDX
* Block KVM_CAP_SYNC_REGS if guest state is protected. It does not make
sense to use the capability if the relevant registers are not
available for reading or writing.
* Don't take kvm->lock when iterating over vCPUs in the suspend notifier to
fix a largely theoretical deadlock.
* Use the vCPU's actual Xen PV clock information when starting the Xen timer,
as the cached state in arch.hv_clock can be stale/bogus.
* Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different
PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend
notifier only accounts for kvmclock, and there's no evidence that the
flag is actually supported by Xen guests.
* Clean up the per-vCPU "cache" of its reference pvclock, and instead only
track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately
expensive to compute, and rarely changes for modern setups).
* Don't write to the Xen hypercall page on MSR writes that are initiated by
the host (userspace or KVM) to fix a class of bugs where KVM can write to
guest memory at unexpected times, e.g. during vCPU creation if userspace has
set the Xen hypercall MSR index to collide with an MSR that KVM emulates.
* Restrict the Xen hypercall MSR index to the unofficial synthetic range to
reduce the set of possible collisions with MSRs that are emulated by KVM
(collisions can still happen as KVM emulates Hyper-V MSRs, which also reside
in the synthetic range).
* Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config.
* Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID
entries when updating PV clocks; there is no guarantee PV clocks will be
updated between TSC frequency changes and CPUID emulation, and guest reads
of the TSC leaves should be rare, i.e. are not a hot path.
x86 (Intel):
* Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus
modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1.
* Pass XFD_ERR as the payload when injecting #NM, as a preparatory step
for upcoming FRED virtualization support.
* Decouple the EPT entry RWX protection bit macros from the EPT Violation
bits, both as a general cleanup and in anticipation of adding support for
emulating Mode-Based Execution Control (MBEC).
* Reject KVM_RUN if userspace manages to gain control and stuff invalid guest
state while KVM is in the middle of emulating nested VM-Enter.
* Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs
in anticipation of adding sanity checks for secondary exit controls (the
primary field is out of bits).
x86 (AMD):
* Ensure the PSP driver is initialized when both the PSP and KVM modules are
built-in (the initcall framework doesn't handle dependencies).
* Use long-term pins when registering encrypted memory regions, so that the
pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to
excessive fragmentation.
* Add macros and helpers for setting GHCB return/error codes.
* Add support for Idle HLT interception, which elides interception if the vCPU
has a pending, unmasked virtual IRQ when HLT is executed.
* Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical
address.
* Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g.
because the vCPU was "destroyed" via SNP's AP Creation hypercall.
* Reject SNP AP Creation if the requested SEV features for the vCPU don't
match the VM's configured set of features.
Selftests:
* Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data
instead of executing code. The theory is that modern Intel CPUs have
learned new code prefetching tricks that bypass the PMU counters.
* Fix a flaw in the Intel PMU counters test where it asserts that an event is
counting correctly without actually knowing what the event counts on the
underlying hardware.
* Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and
improve its coverage by collecting all dirty entries on each iteration.
* Fix a few minor bugs related to handling of stats FDs.
* Add infrastructure to make vCPU and VM stats FDs available to tests by
default (open the FDs during VM/vCPU creation).
* Relax an assertion on the number of HLT exits in the xAPIC IPI test when
running on a CPU that supports AMD's Idle HLT (which elides interception of
HLT if a virtual IRQ is pending and unmasked).
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmfcTkEUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMnQAf/cPx72hJOdNy4Qrm8M33YLXVRVV00
yEZ8eN8TWdOclr0ltE/w/ELGh/qS4CU8pjURAk0A6lPioU+mdcTn3dPEqMDMVYom
uOQ2lusEHw0UuSnGZSEjvZJsE/Ro2NSAsHIB6PWRqig1ZBPJzyu0frce34pMpeQH
diwriJL9lKPAhBWXnUQ9BKoi1R0P5OLW9ahX4SOWk7cAFg4DLlDE66Nqf6nKqViw
DwEucTiUEg5+a3d93gihdD4JNl+fb3vI2erxrMxjFjkacl0qgqRu3ei3DG0MfdHU
wNcFSG5B1n0OECKxr80lr1Ip1KTVNNij0Ks+w6Gc6lSg9c4PptnNkfLK3A==
=nnCN
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Nested virtualization support for VGICv3, giving the nested
hypervisor control of the VGIC hardware when running an L2 VM
- Removal of 'late' nested virtualization feature register masking,
making the supported feature set directly visible to userspace
- Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
- Paravirtual interface for discovering the set of CPU
implementations where a VM may run, addressing a longstanding issue
of guest CPU errata awareness in big-little systems and
cross-implementation VM migration
- Userspace control of the registers responsible for identifying a
particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
allowing VMs to be migrated cross-implementation
- pKVM updates, including support for tracking stage-2 page table
allocations in the protected hypervisor in the 'SecPageTable' stat
- Fixes to vPMU, ensuring that userspace updates to the vPMU after
KVM_RUN are reflected into the backing perf events
LoongArch:
- Remove unnecessary header include path
- Assume constant PGD during VM context switch
- Add perf events support for guest VM
RISC-V:
- Disable the kernel perf counter during configure
- KVM selftests improvements for PMU
- Fix warning at the time of KVM module removal
x86:
- Add support for aging of SPTEs without holding mmu_lock.
Not taking mmu_lock allows multiple aging actions to run in
parallel, and more importantly avoids stalling vCPUs. This includes
an implementation of per-rmap-entry locking; aging the gfn is done
with only a per-rmap single-bin spinlock taken, whereas locking an
rmap for write requires taking both the per-rmap spinlock and the
mmu_lock.
Note that this decreases slightly the accuracy of accessed-page
information, because changes to the SPTE outside aging might not
use atomic operations even if they could race against a clear of
the Accessed bit.
This is deliberate because KVM and mm/ tolerate false
positives/negatives for accessed information, and testing has shown
that reducing the latency of aging is far more beneficial to
overall system performance than providing "perfect" young/old
information.
- Defer runtime CPUID updates until KVM emulates a CPUID instruction,
to coalesce updates when multiple pieces of vCPU state are
changing, e.g. as part of a nested transition
- Fix a variety of nested emulation bugs, and add VMX support for
synthesizing nested VM-Exit on interception (instead of injecting
#UD into L2)
- Drop "support" for async page faults for protected guests that do
not set SEND_ALWAYS (i.e. that only want async page faults at CPL3)
- Bring a bit of sanity to x86's VM teardown code, which has
accumulated a lot of cruft over the years. Particularly, destroy
vCPUs before the MMU, despite the latter being a VM-wide operation
- Add common secure TSC infrastructure for use within SNP and in the
future TDX
- Block KVM_CAP_SYNC_REGS if guest state is protected. It does not
make sense to use the capability if the relevant registers are not
available for reading or writing
- Don't take kvm->lock when iterating over vCPUs in the suspend
notifier to fix a largely theoretical deadlock
- Use the vCPU's actual Xen PV clock information when starting the
Xen timer, as the cached state in arch.hv_clock can be stale/bogus
- Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across
different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as
KVM's suspend notifier only accounts for kvmclock, and there's no
evidence that the flag is actually supported by Xen guests
- Clean up the per-vCPU "cache" of its reference pvclock, and instead
only track the vCPU's TSC scaling (multipler+shift) metadata (which
is moderately expensive to compute, and rarely changes for modern
setups)
- Don't write to the Xen hypercall page on MSR writes that are
initiated by the host (userspace or KVM) to fix a class of bugs
where KVM can write to guest memory at unexpected times, e.g.
during vCPU creation if userspace has set the Xen hypercall MSR
index to collide with an MSR that KVM emulates
- Restrict the Xen hypercall MSR index to the unofficial synthetic
range to reduce the set of possible collisions with MSRs that are
emulated by KVM (collisions can still happen as KVM emulates
Hyper-V MSRs, which also reside in the synthetic range)
- Clean up and optimize KVM's handling of Xen MSR writes and
xen_hvm_config
- Update Xen TSC leaves during CPUID emulation instead of modifying
the CPUID entries when updating PV clocks; there is no guarantee PV
clocks will be updated between TSC frequency changes and CPUID
emulation, and guest reads of the TSC leaves should be rare, i.e.
are not a hot path
x86 (Intel):
- Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and
thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1
- Pass XFD_ERR as the payload when injecting #NM, as a preparatory
step for upcoming FRED virtualization support
- Decouple the EPT entry RWX protection bit macros from the EPT
Violation bits, both as a general cleanup and in anticipation of
adding support for emulating Mode-Based Execution Control (MBEC)
- Reject KVM_RUN if userspace manages to gain control and stuff
invalid guest state while KVM is in the middle of emulating nested
VM-Enter
- Add a macro to handle KVM's sanity checks on entry/exit VMCS
control pairs in anticipation of adding sanity checks for secondary
exit controls (the primary field is out of bits)
x86 (AMD):
- Ensure the PSP driver is initialized when both the PSP and KVM
modules are built-in (the initcall framework doesn't handle
dependencies)
- Use long-term pins when registering encrypted memory regions, so
that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and
don't lead to excessive fragmentation
- Add macros and helpers for setting GHCB return/error codes
- Add support for Idle HLT interception, which elides interception if
the vCPU has a pending, unmasked virtual IRQ when HLT is executed
- Fix a bug in INVPCID emulation where KVM fails to check for a
non-canonical address
- Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is
invalid, e.g. because the vCPU was "destroyed" via SNP's AP
Creation hypercall
- Reject SNP AP Creation if the requested SEV features for the vCPU
don't match the VM's configured set of features
Selftests:
- Fix again the Intel PMU counters test; add a data load and do
CLFLUSH{OPT} on the data instead of executing code. The theory is
that modern Intel CPUs have learned new code prefetching tricks
that bypass the PMU counters
- Fix a flaw in the Intel PMU counters test where it asserts that an
event is counting correctly without actually knowing what the event
counts on the underlying hardware
- Fix a variety of flaws, bugs, and false failures/passes
dirty_log_test, and improve its coverage by collecting all dirty
entries on each iteration
- Fix a few minor bugs related to handling of stats FDs
- Add infrastructure to make vCPU and VM stats FDs available to tests
by default (open the FDs during VM/vCPU creation)
- Relax an assertion on the number of HLT exits in the xAPIC IPI test
when running on a CPU that supports AMD's Idle HLT (which elides
interception of HLT if a virtual IRQ is pending and unmasked)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits)
RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed
RISC-V: KVM: Teardown riscv specific bits after kvm_exit
LoongArch: KVM: Register perf callbacks for guest
LoongArch: KVM: Implement arch-specific functions for guest perf
LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel()
LoongArch: KVM: Remove PGD saving during VM context switch
LoongArch: KVM: Remove unnecessary header include path
KVM: arm64: Tear down vGIC on failed vCPU creation
KVM: arm64: PMU: Reload when resetting
KVM: arm64: PMU: Reload when user modifies registers
KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
KVM: arm64: Initialize HCRX_EL2 traps in pKVM
KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected
KVM: x86: Add infrastructure for secure TSC
KVM: x86: Push down setting vcpu.arch.user_set_tsc
...
- Consolidate the VDSO storage
The VDSO data storage and data layout has been largely architecture
specific for historical reasons. That increases the maintenance effort
and causes inconsistencies over and over.
There is no real technical reason for architecture specific layouts and
implementations. The architecture specific details can easily be
integrated into a generic layout, which also reduces the amount of
duplicated code for managing the mappings.
Convert all architectures over to a unified layout and common mapping
infrastructure. This splits the VDSO data layout into subsystem
specific blocks, timekeeping, random and architecture parts, which
provides a better structure and allows to improve and update the
functionalities without conflict and interaction.
- Rework the timekeeping data storage
The current implementation is designed for exposing system timekeeping
accessors, which was good enough at the time when it was designed.
PTP and Time Sensitive Networking (TSN) change that as there are
requirements to expose independent PTP clocks, which are not related to
system timekeeping.
Replace the monolithic data storage by a structured layout, which
allows to add support for independent PTP clocks on top while reusing
both the data structures and the time accessor implementations.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfgSWUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYGED/0f/M8YyacAyErDYW4ufW+zh2sUidSf
GVlK0Jn5BMljOoye+y2XfTxuvvXxEDjJNYiJm2uKGPdV29tjNXreGK39XyNqXPu5
jwR4f/IN/QVSM2nCO6jyydMz8ympJ2k6M4RewwmxXBL2KsUzzJWSKTgRNqM5Tdjs
1RhJMjkQVTiiSYerBpHXYCeZLM7/VEfZ120uuzVAYPXo0/R6zuyF7IBgIao9hbfO
IQeCMLLfpDQHQhwquTA8ZbWqQusiEoSYHT+kTDa3eXDDbE/2UklAUs9gaatI979x
73zs0Yqxyx2iIGaghACWOAbKdcBWBeCYDw5fFwYVKn4VMQi1+wcxbtOYL767jp9o
vfkLXGilXcVkvDjv4fH+e1NoJXXBxq1Ug1silKdOeJzenQF8Q1i3tavkWUVCNfwH
qyOIM72NiCEWbYBDcz0lwBxEAyO4o0E6NP1bDc4y50VedEYIbXwSh0QGrdev1abn
rjY9vsuUR9oznmZ6BRPPxMTY87gOSHoKvqydgSZUACEgLV9346f5qZf341OReYai
MXUmXOM4+LdyaM1+Mec8ppvjMbLw+736NZyZtT2InusEBE+Ddp25L3hYiWnklJu8
2uwv0AoyrwaJ8y6ADOX4thcLZq0gND0Z/Ayz/XvpeI30eftsGUCt5KOVlqwfwOkI
4EQKvk2fAixPxg==
=rwei
-----END PGP SIGNATURE-----
Merge tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO infrastructure updates from Thomas Gleixner:
- Consolidate the VDSO storage
The VDSO data storage and data layout has been largely architecture
specific for historical reasons. That increases the maintenance
effort and causes inconsistencies over and over.
There is no real technical reason for architecture specific layouts
and implementations. The architecture specific details can easily be
integrated into a generic layout, which also reduces the amount of
duplicated code for managing the mappings.
Convert all architectures over to a unified layout and common mapping
infrastructure. This splits the VDSO data layout into subsystem
specific blocks, timekeeping, random and architecture parts, which
provides a better structure and allows to improve and update the
functionalities without conflict and interaction.
- Rework the timekeeping data storage
The current implementation is designed for exposing system
timekeeping accessors, which was good enough at the time when it was
designed.
PTP and Time Sensitive Networking (TSN) change that as there are
requirements to expose independent PTP clocks, which are not related
to system timekeeping.
Replace the monolithic data storage by a structured layout, which
allows to add support for independent PTP clocks on top while reusing
both the data structures and the time accessor implementations.
* tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
sparc/vdso: Always reject undefined references during linking
x86/vdso: Always reject undefined references during linking
vdso: Rework struct vdso_time_data and introduce struct vdso_clock
vdso: Move architecture related data before basetime data
powerpc/vdso: Prepare introduction of struct vdso_clock
arm64/vdso: Prepare introduction of struct vdso_clock
x86/vdso: Prepare introduction of struct vdso_clock
time/namespace: Prepare introduction of struct vdso_clock
vdso/namespace: Rename timens_setup_vdso_data() to reflect new vdso_clock struct
vdso/vsyscall: Prepare introduction of struct vdso_clock
vdso/gettimeofday: Prepare helper functions for introduction of struct vdso_clock
vdso/gettimeofday: Prepare do_coarse_timens() for introduction of struct vdso_clock
vdso/gettimeofday: Prepare do_coarse() for introduction of struct vdso_clock
vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock
vdso/gettimeofday: Prepare do_hres() for introduction of struct vdso_clock
vdso/gettimeofday: Prepare introduction of struct vdso_clock
vdso/helpers: Prepare introduction of struct vdso_clock
vdso/datapage: Define vdso_clock to prepare for multiple PTP clocks
vdso: Make vdso_time_data cacheline aligned
arm64: Make asm/cache.h compatible with vDSO
...
hrtimers are initialized with hrtimer_init() and a subsequent store to
the callback pointer. This turned out to be suboptimal for the upcoming
Rust integration and is obviously a silly implementation to begin with.
This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
with hrtimer_setup(T, cb);
The conversion was done with Coccinelle and a few manual fixups.
Once the conversion has completely landed in mainline, hrtimer_init()
will be removed and the hrtimer::function becomes a private member.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff5jQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoVvRD/wKtuwmiA66NJFgXC0qVq82A6fO3bY8
GBdbfysDJIbqGu5PTcULTbJ8qkqv3jeLUv6CcXvS4sZ7y/uJQl2lzf8yrD/0bbwc
rLI6sHiPSZmK93kNVN4X5H7kvt7cE/DYC9nnEOgK3BY5FgKc4n9887d4aVBhL8Lv
ODwVXvZ+xi351YCj7qRyPU24zt/p4tkkT1o2k4a0HBluqLI0D+V20fke9IERUL8r
d1uWKlcn0TqYDesE8HXKIhbst3gx52rMJrXBJDHwFmG6v8Pj1fkTXCVpPo8QcBz8
OTVkpomN9f/Tx4+GZwhZOF86LhLL3OhxD6pT7JhFCXdmSGv+Ez8uyk1YZysM/XpV
Juy/1yAcBpDIDkmhMFGdAAn48Nn9Fotty0r4je60zSEp1d/4QMXcFme29qr2JTUE
iWnQ/HD6DxUjVHqy7CYvvo26Xegg1C7qgyOVt4PYZwAM1VKF5P3kzYTb4SAdxtop
Tpji1sfW9QV08jqMNo6XntD32DSP9S2HqjO9LwBw700jnx2jjJ35fcJs6iodMOUn
gckIZLMn3L0OoglPdyA5O7SNTbKE7aFiRKdnT/cJtR3Fa39Qu27CwC5gfiyuie9I
Q+LG8GLuYSBHXAR+PBK4GWlzJ7Dn8k3eqmbnLeKpRMsU6ZzcttgA64xhaviN2wN0
iJbvLJeisXr3GA==
=bYAX
-----END PGP SIGNATURE-----
Merge tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to
the callback pointer. This turned out to be suboptimal for the
upcoming Rust integration and is obviously a silly implementation to
begin with.
This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
with hrtimer_setup(T, cb);
The conversion was done with Coccinelle and a few manual fixups.
Once the conversion has completely landed in mainline, hrtimer_init()
will be removed and the hrtimer::function becomes a private member"
* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
wifi: rt2x00: Switch to use hrtimer_update_function()
io_uring: Use helper function hrtimer_update_function()
serial: xilinx_uartps: Use helper function hrtimer_update_function()
ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
RDMA: Switch to use hrtimer_setup()
virtio: mem: Switch to use hrtimer_setup()
drm/vmwgfx: Switch to use hrtimer_setup()
drm/xe/oa: Switch to use hrtimer_setup()
drm/vkms: Switch to use hrtimer_setup()
drm/msm: Switch to use hrtimer_setup()
drm/i915/request: Switch to use hrtimer_setup()
drm/i915/uncore: Switch to use hrtimer_setup()
drm/i915/pmu: Switch to use hrtimer_setup()
drm/i915/perf: Switch to use hrtimer_setup()
drm/i915/gvt: Switch to use hrtimer_setup()
drm/i915/huc: Switch to use hrtimer_setup()
drm/amdgpu: Switch to use hrtimer_setup()
stm class: heartbeat: Switch to use hrtimer_setup()
i2c: Switch to use hrtimer_setup()
iio: Switch to use hrtimer_setup()
...
- Support for hard indices on RISC-V. The hart index identifies a hart
(core) within a specific interrupt domain in RISC-V's Priviledged
Architecture.
- Rework of the RISC-V MSI driver.
This moves the driver over to the generic MSI library and solves the
affinity problem of unmaskable PCI/MSI controllers. Unmaskable PCI/MSI
controllers are prone to lose interrupts when the MSI message is
updated to change the affinity because the message write consists of
three 32-bit subsequent writes, which update address and data. As these
writes are non-atomic versus the device raising an interrupt, the
device can observe a half written update and issue an interrupt on the
wrong vector. This is mitiated by a carefully orchestrated step by step
update and the observation of an eventually pending interrupt on the
CPU which issues the update. The algorithm follows the well established
method of the X86 MSI driver.
- A new driver for the RISC-V Sophgo SG2042 MSI controller
- Overhaul of the Renesas RZQ2L driver.
Simplification of the probe function by using devm_*() mechanisms,
which avoid the endless list of error prone gotos in the failure paths.
- Expand the Renesas RZV2H driver to support RZ/G3E SoCs
- A workaround for Rockchip 3568002 erratum in the GIC-V3 driver to
ensure that the addressing is limited to the lower 32-bit of the
physical address space.
- Add support for the Allwinner AS23 NMI controller
- Expand the IMX irqsteer driver to handle up to 960 input interrupts
- The usual small updates, cleanups and device tree changes.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff454THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZqoD/4kdHzbxfLpf7vC3NnG8NWwTq5FpbSx
6grQC9hWNMAs4n2IFjJRFLrjeX3AcdAQXL/BWuM0LfW9tQDQaVmqlSIlB/bn69KB
7HyAR6ozbOgnHKGAqFUXSLf+4pq+6q3mOgGKIF289dy14HFu4ta0DqKgkPZeQnVs
R/J8i7REUnn+YuxzSt5eOqyDPyt2EHJosSUABSWQZBlrM9jy1W7f6NqDFwawiVsa
+tv4U/bz91vjzVxwTIgt7nJK+b2HVYdxoZYuKJwPaTsj26ANPp6ltjRTeOmZhb5h
uKgw+OyzDnk6q+tjGcRqrqwl291VKxCvnRiqHFfu3CERdmI9qvpN9IRcEJqIbkcN
cakekhAyt7OO7sEPcql5vBL97e9hpb7EcH78gYxwHf8Dy0rFZUvSC5v+L6VRFnJS
XcKA1L+f9B6u5qxnBtLan9IW08HYNdvmPq6AuVjk+ndKioPUFqB2q6AtXpuA3Rmu
Y3XH/wh/q5wk0pgeByxQW6swsfpMN3OYK3mpLx475wFh2NKzcdGlwGhDFhiw8DKX
m1AESy3UZatj1a0qGaFS/M+mm9KGrDYIMrje832Wf4Yf1LGmTsDkd3/V99oazSsq
Jm4qhDASXChJXd0imQICX9hPw0aHTlLYNs54obUXVULH4HivQKIgWhUXrjG0dBDL
+tttjuv5FJxr3A==
=jPHa
-----END PGP SIGNATURE-----
Merge tag 'irq-drivers-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq driver updates from Thomas Gleixner:
- Support for hard indices on RISC-V. The hart index identifies a hart
(core) within a specific interrupt domain in RISC-V's Priviledged
Architecture.
- Rework of the RISC-V MSI driver
This moves the driver over to the generic MSI library and solves the
affinity problem of unmaskable PCI/MSI controllers. Unmaskable
PCI/MSI controllers are prone to lose interrupts when the MSI message
is updated to change the affinity because the message write consists
of three 32-bit subsequent writes, which update address and data. As
these writes are non-atomic versus the device raising an interrupt,
the device can observe a half written update and issue an interrupt
on the wrong vector. This is mitiated by a carefully orchestrated
step by step update and the observation of an eventually pending
interrupt on the CPU which issues the update. The algorithm follows
the well established method of the X86 MSI driver.
- A new driver for the RISC-V Sophgo SG2042 MSI controller
- Overhaul of the Renesas RZQ2L driver
Simplification of the probe function by using devm_*() mechanisms,
which avoid the endless list of error prone gotos in the failure
paths.
- Expand the Renesas RZV2H driver to support RZ/G3E SoCs
- A workaround for Rockchip 3568002 erratum in the GIC-V3 driver to
ensure that the addressing is limited to the lower 32-bit of the
physical address space.
- Add support for the Allwinner AS23 NMI controller
- Expand the IMX irqsteer driver to handle up to 960 input interrupts
- The usual small updates, cleanups and device tree changes
* tag 'irq-drivers-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
irqchip/imx-irqsteer: Support up to 960 input interrupts
irqchip/sunxi-nmi: Support Allwinner A523 NMI controller
dt-bindings: irq: sun7i-nmi: Document the Allwinner A523 NMI controller
irqchip/davinci-cp-intc: Remove public header
irqchip/renesas-rzv2h: Add RZ/G3E support
irqchip/renesas-rzv2h: Update macros ICU_TSSR_TSSEL_{MASK,PREP}
irqchip/renesas-rzv2h: Update TSSR_TIEN macro
irqchip/renesas-rzv2h: Add field_width to struct rzv2h_hw_info
irqchip/renesas-rzv2h: Add max_tssel to struct rzv2h_hw_info
irqchip/renesas-rzv2h: Add struct rzv2h_hw_info with t_offs variable
irqchip/renesas-rzv2h: Use devm_pm_runtime_enable()
irqchip/renesas-rzv2h: Use devm_reset_control_get_exclusive_deasserted()
irqchip/renesas-rzv2h: Simplify rzv2h_icu_init()
irqchip/renesas-rzv2h: Drop irqchip from struct rzv2h_icu_priv
irqchip/renesas-rzv2h: Fix wrong variable usage in rzv2h_tint_set_type()
dt-bindings: interrupt-controller: renesas,rzv2h-icu: Document RZ/G3E SoC
riscv: sophgo: dts: Add msi controller for SG2042
irqchip: Add the Sophgo SG2042 MSI interrupt controller
dt-bindings: interrupt-controller: Add Sophgo SG2042 MSI
arm64: dts: rockchip: rk356x: Move PCIe MSI to use GIC ITS instead of MBI
...
Using Clement's new validation callbacks, support checking that
dependencies have been satisfied for the floating point extensions.
The check for "d" might be slightly confusingly shorter than that of "f",
despite "d" depending on "f". This is because the requirement that a
hart supporting double precision must also support single precision,
should be validated by dt-bindings etc, not the kernel but lack of
support for single precision only is a limitation of the kernel.
Tested-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250312-reptile-platinum-62ee0f444a32@spud
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Using Clement's new validation callbacks, support checking that
dependencies have been satisfied for the vector crpyto extensions.
Currently riscv_isa_extension_available(<vector crypto>) will return
true on systems that support the extensions but vector itself has been
disabled by the kernel, adding validation callbacks will prevent such a
scenario from occuring and make the behaviour of the extension detection
functions more consistent with user expectations - it's not expected to
have to check for vector AND the specific crypto extension.
The Unpriv spec states:
| The Zvknhb and Zvbc Vector Crypto Extensions --and accordingly the
| composite extensions Zvkn, Zvknc, Zvkng, and Zvksc-- require a Zve64x
| base, or application ("V") base Vector Extension. All of the other
| Vector Crypto Extensions can be built on any embedded (Zve*) or
| application ("V") base Vector Extension.
While this could be used as the basis for checking that the correct base
for individual crypto extensions, but that's not really the kernel's job
in my opinion and it is sufficient to leave that sort of precision to
the dt-bindings. The kernel only needs to make sure that vector, in some
form, is available.
Link: https://github.com/riscv/riscv-isa-manual/blob/main/src/vector-crypto.adoc#extensions-overview
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20250312-entertain-shaking-b664142c2f99@spud
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Using Clement's new validation callbacks, support checking that
dependencies have been satisfied for the vector extensions. From the
kernel's perfective, it's not required to differentiate between the
conditions for all the various vector subsets - it's the firmware's job
to not report impossible combinations. Instead, the kernel only has to
check that the correct config options are enabled and to enforce its
requirement of the d extension being present for FPU support.
Since vector will now be disabled proactively, there's no need to clear
the bit in elf_hwcap in riscv_fill_hwcap() any longer.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250312-eclair-affluent-55b098c3602b@spud
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
two locking commits in the locking tree,
part of the locking-core-2025-03-22 pull request. ]
x86 CPU features support:
- Generate the <asm/cpufeaturemasks.h> header based on build config
(H. Peter Anvin, Xin Li)
- x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
- Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
- Enable modifying CPU bug flags with '{clear,set}puid='
(Brendan Jackman)
- Utilize CPU-type for CPU matching (Pawan Gupta)
- Warn about unmet CPU feature dependencies (Sohil Mehta)
- Prepare for new Intel Family numbers (Sohil Mehta)
Percpu code:
- Standardize & reorganize the x86 percpu layout and
related cleanups (Brian Gerst)
- Convert the stackprotector canary to a regular percpu
variable (Brian Gerst)
- Add a percpu subsection for cache hot data (Brian Gerst)
- Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
- Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
MM:
- Add support for broadcast TLB invalidation using AMD's INVLPGB instruction
(Rik van Riel)
- Rework ROX cache to avoid writable copy (Mike Rapoport)
- PAT: restore large ROX pages after fragmentation
(Kirill A. Shutemov, Mike Rapoport)
- Make memremap(MEMREMAP_WB) map memory as encrypted by default
(Kirill A. Shutemov)
- Robustify page table initialization (Kirill A. Shutemov)
- Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
- Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
(Matthew Wilcox)
KASLR:
- x86/kaslr: Reduce KASLR entropy on most x86 systems,
to support PCI BAR space beyond the 10TiB region
(CONFIG_PCI_P2PDMA=y) (Balbir Singh)
CPU bugs:
- Implement FineIBT-BHI mitigation (Peter Zijlstra)
- speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
- speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta)
- RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta)
System calls:
- Break up entry/common.c (Brian Gerst)
- Move sysctls into arch/x86 (Joel Granados)
Intel LAM support updates: (Maciej Wieczor-Retman)
- selftests/lam: Move cpu_has_la57() to use cpuinfo flag
- selftests/lam: Skip test if LAM is disabled
- selftests/lam: Test get_user() LAM pointer handling
AMD SMN access updates:
- Add SMN offsets to exclusive region access (Mario Limonciello)
- Add support for debugfs access to SMN registers (Mario Limonciello)
- Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
Power management updates: (Patryk Wlazlyn)
- Allow calling mwait_play_dead with an arbitrary hint
- ACPI/processor_idle: Add FFH state handling
- intel_idle: Provide the default enter_dead() handler
- Eliminate mwait_play_dead_cpuid_hint()
Bootup:
Build system:
- Raise the minimum GCC version to 8.1 (Brian Gerst)
- Raise the minimum LLVM version to 15.0.0
(Nathan Chancellor)
Kconfig: (Arnd Bergmann)
- Add cmpxchg8b support back to Geode CPUs
- Drop 32-bit "bigsmp" machine support
- Rework CONFIG_GENERIC_CPU compiler flags
- Drop configuration options for early 64-bit CPUs
- Remove CONFIG_HIGHMEM64G support
- Drop CONFIG_SWIOTLB for PAE
- Drop support for CONFIG_HIGHPTE
- Document CONFIG_X86_INTEL_MID as 64-bit-only
- Remove old STA2x11 support
- Only allow CONFIG_EISA for 32-bit
Headers:
- Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers
(Thomas Huth)
Assembly code & machine code patching:
- x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf)
- x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
- KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
- x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
- x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
- x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
(Uros Bizjak)
- Use named operands in inline asm (Uros Bizjak)
- Improve performance by using asm_inline() for atomic locking instructions
(Uros Bizjak)
Earlyprintk:
- Harden early_serial (Peter Zijlstra)
NMI handler:
- Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
(Waiman Long)
Miscellaneous fixes and cleanups:
- by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel,
Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst,
Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin,
Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport,
Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker,
Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra,
Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak,
Vitaly Kuznetsov, Xin Li, liuye.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd
7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0
3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw
OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq
2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX
Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1
q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/
DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn
fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW
NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf
THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M
tJj1oc2TIZw=
=Dcb3
-----END PGP SIGNATURE-----
Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
"x86 CPU features support:
- Generate the <asm/cpufeaturemasks.h> header based on build config
(H. Peter Anvin, Xin Li)
- x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
- Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
- Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
Jackman)
- Utilize CPU-type for CPU matching (Pawan Gupta)
- Warn about unmet CPU feature dependencies (Sohil Mehta)
- Prepare for new Intel Family numbers (Sohil Mehta)
Percpu code:
- Standardize & reorganize the x86 percpu layout and related cleanups
(Brian Gerst)
- Convert the stackprotector canary to a regular percpu variable
(Brian Gerst)
- Add a percpu subsection for cache hot data (Brian Gerst)
- Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
- Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
MM:
- Add support for broadcast TLB invalidation using AMD's INVLPGB
instruction (Rik van Riel)
- Rework ROX cache to avoid writable copy (Mike Rapoport)
- PAT: restore large ROX pages after fragmentation (Kirill A.
Shutemov, Mike Rapoport)
- Make memremap(MEMREMAP_WB) map memory as encrypted by default
(Kirill A. Shutemov)
- Robustify page table initialization (Kirill A. Shutemov)
- Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
- Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
(Matthew Wilcox)
KASLR:
- x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
Singh)
CPU bugs:
- Implement FineIBT-BHI mitigation (Peter Zijlstra)
- speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
- speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
Gupta)
- RFDS: Exclude P-only parts from the RFDS affected list (Pawan
Gupta)
System calls:
- Break up entry/common.c (Brian Gerst)
- Move sysctls into arch/x86 (Joel Granados)
Intel LAM support updates: (Maciej Wieczor-Retman)
- selftests/lam: Move cpu_has_la57() to use cpuinfo flag
- selftests/lam: Skip test if LAM is disabled
- selftests/lam: Test get_user() LAM pointer handling
AMD SMN access updates:
- Add SMN offsets to exclusive region access (Mario Limonciello)
- Add support for debugfs access to SMN registers (Mario Limonciello)
- Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
Power management updates: (Patryk Wlazlyn)
- Allow calling mwait_play_dead with an arbitrary hint
- ACPI/processor_idle: Add FFH state handling
- intel_idle: Provide the default enter_dead() handler
- Eliminate mwait_play_dead_cpuid_hint()
Build system:
- Raise the minimum GCC version to 8.1 (Brian Gerst)
- Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)
Kconfig: (Arnd Bergmann)
- Add cmpxchg8b support back to Geode CPUs
- Drop 32-bit "bigsmp" machine support
- Rework CONFIG_GENERIC_CPU compiler flags
- Drop configuration options for early 64-bit CPUs
- Remove CONFIG_HIGHMEM64G support
- Drop CONFIG_SWIOTLB for PAE
- Drop support for CONFIG_HIGHPTE
- Document CONFIG_X86_INTEL_MID as 64-bit-only
- Remove old STA2x11 support
- Only allow CONFIG_EISA for 32-bit
Headers:
- Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
headers (Thomas Huth)
Assembly code & machine code patching:
- x86/alternatives: Simplify alternative_call() interface (Josh
Poimboeuf)
- x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
- KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
- x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
- x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
- x86/kexec: Merge x86_32 and x86_64 code using macros from
<asm/asm.h> (Uros Bizjak)
- Use named operands in inline asm (Uros Bizjak)
- Improve performance by using asm_inline() for atomic locking
instructions (Uros Bizjak)
Earlyprintk:
- Harden early_serial (Peter Zijlstra)
NMI handler:
- Add an emergency handler in nmi_desc & use it in
nmi_shootdown_cpus() (Waiman Long)
Miscellaneous fixes and cleanups:
- by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
Kuznetsov, Xin Li, liuye"
* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
x86/asm: Make asm export of __ref_stack_chk_guard unconditional
x86/mm: Only do broadcast flush from reclaim if pages were unmapped
perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
x86/asm: Use asm_inline() instead of asm() in clwb()
x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
x86/hweight: Use asm_inline() instead of asm()
x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
x86/hweight: Use named operands in inline asm()
x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
x86/head/64: Avoid Clang < 17 stack protector in startup code
x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
x86/cpu/intel: Limit the non-architectural constant_tsc model checks
x86/mm/pat: Replace Intel x86_model checks with VFM ones
x86/cpu/intel: Fix fast string initialization for extended Families
...
Cross-merge networking fixes after downstream PR (net-6.14-rc8).
Conflict:
tools/testing/selftests/net/Makefile
03544faad7 ("selftest: net: add proc_net_pktgen")
3ed61b8938 ("selftests: net: test for lwtunnel dst ref loops")
tools/testing/selftests/net/config:
85cb3711ac ("selftests: net: Add test cases for link and peer netns")
3ed61b8938 ("selftests: net: test for lwtunnel dst ref loops")
Adjacent commits:
tools/testing/selftests/net/Makefile
c935af429e ("selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITY")
355d940f4d ("Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The immediate issue being fixed here is a nVMX bug where KVM fails to
detect that, after nested VM-Exit, L1 has a pending IRQ (or NMI).
However, checking for a pending interrupt accesses the legacy PIC, and
x86's kvm_arch_destroy_vm() currently frees the PIC before destroying
vCPUs, i.e. checking for IRQs during the forced nested VM-Exit results
in a NULL pointer deref; that's a prerequisite for the nVMX fix.
The remaining patches attempt to bring a bit of sanity to x86's VM
teardown code, which has accumulated a lot of cruft over the years. E.g.
KVM currently unloads each vCPU's MMUs in a separate operation from
destroying vCPUs, all because when guest SMP support was added, KVM had a
kludgy MMU teardown flow that broke when a VM had more than one 1 vCPU.
And that oddity lived on, for 18 years...
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Disable the kernel perf counter during configure
- KVM selftests improvements for PMU
- Fix warning at the time of KVM module removal
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmfbsiQACgkQrUjsVaLH
LAfGAxAAlODgZMDt+f/YlOOmQPsLohGrWsZjvheRfrf3X5PFNuXd1frpZSbF1PCs
lk4WZrekJo2VT2tWMEBjFrH6A9slHUzjZs+xIxig1N/Gmxy1clKp/svJVG20WwQI
a5/5jPH3TNDL3KUphwBF0QIi/Tlp6+SZ5RAag6QsBXKnFy5+o9lioR2T6tFzMoxy
ry8VD8e395bQXn5tYkk4vsr9vBRzpgq7ZBOMg6zyNTL86UBOVq3QXPNzSE5kenXW
4zqDmuxbk+GgpUKLXA9aXwLvasGiOr+59uToZd2lHf5ynSdSprQyW11I5ZOFs2DX
800mLeYtyHc4nGTi6OO3VN9toMhmgijtFLNpBDg4LRvwMfB+sp62Ixsmjs3CBCd+
GqcbJaMtdgM1SpZ3tlzEQoWpIBlNYs7BuG/6Jen9xKHHvOg7Ty9UimJeMikV+FUm
srq0+GlgjaniZ/d8arIIm7LVqZLAc6OSzn+A8IcE0KQ9fKBgB0CZrANW+N+b+7ub
Lkq9hA8tM5Ti0bNszqPyJu3bmhWAOEOwF0CjYL79vEhIYsegCKKtruKOppdM+sl2
dzk+DVV9Aar+bDu/v/bYV5TfIq39U86Tqxe+OU7EPzch5NdyDvKcpZ9/JIZnf/j5
tviM3awZfFmgTeBUTAYvaR34KExBOLQLkGb9BqAydYjg+pl9jVE=
=HP3N
-----END PGP SIGNATURE-----
Merge tag 'kvm-riscv-6.15-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.15
- Disable the kernel perf counter during configure
- KVM selftests improvements for PMU
- Fix warning at the time of KVM module removal
Implement the runtime constant infrastructure for riscv. Use this
infrastructure to generate constants to be used by the d_hash()
function.
This is the riscv variant of commit 94a2bc0f61 ("arm64: add 'runtime
constant' support") and commit e3c92e8171 ("runtime constants: add
x86 architecture support").
[ alex: Remove trailing whitespace ]
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250319-runtime_const_riscv-v10-2-745b31a11d65@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
We have duplicated the definition of the nop instruction in ftrace.h and
in jump_label.c. Move this definition into the generic file insn-def.h
so that they can share the definition with each other and with future
files.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250319-runtime_const_riscv-v10-1-745b31a11d65@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Andrew Jones <ajones@ventanamicro.com> says:
The first six patches of this series are fixes and cleanups of the
unaligned access speed probing code. The next patch introduces a
kernel command line option that allows the probing to be skipped.
This command line option is a different approach than Jesse's [1].
[1] takes a cpu-list for a particular speed, supporting heterogeneous
platforms. With this approach, the kernel command line should only
be used for homogeneous platforms. [1] also only allowed 'fast' and
'slow' to be selected. This parameter also supports 'unsupported',
which could be useful for testing code paths gated on that. The final
patch adds the documentation.
[1] https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/
* patches from https://lore.kernel.org/r/20250304120014.143628-10-ajones@ventanamicro.com:
Documentation/kernel-parameters: Add riscv unaligned speed parameters
riscv: Add parameter for skipping access speed tests
riscv: Fix set up of vector cpu hotplug callback
riscv: Fix set up of cpu hotplug callbacks
riscv: Change check_unaligned_access_speed_all_cpus to void
riscv: Fix check_unaligned_access_all_cpus
riscv: Fix riscv_online_cpu_vec
riscv: Annotate unaligned access init functions
Link: https://lore.kernel.org/r/20250304120014.143628-10-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Starfive:
All changes for jh7110-based boards including the removal of a dac
that does not exist and the addition of usb3 support on the star64 board
and pcie on the framework mainboard.
Microchip:
Update pcie reg properties to fix a mistake originally describing them.
Here rather than in fixes, since the driver maintains support for the
old format.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRh246EGq/8RLhDjO14tDGHoIJi0gUCZ9lfIwAKCRB4tDGHoIJi
0hoWAQD4bX80fEznTmpFKy5suiljz7v2ePTkOhRFqU9yu7RS9AD/WEZkw4tNU4Y9
IxS/Znk5i/ECLgQGsi5axHTwYkoMFwQ=
=V0XH
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmfbM+AACgkQYKtH/8kJ
UieHeBAA17sRn2qDBsOVjOY5SU7j1vyWwWUEt5qJh4Ft55wEt2gZ/ay1pWqMST3q
PpQ5cQQqf+lzhAweVVrmCsXSBwDe4AzArOdDpOxfsWCq8wkiq2/yV0220VD513j/
qH1BmzO5QNC1rw5a9wvmOzjc/YJwEbnrfAwscBiuePvA+ve/vafngTd0ElJgc6dh
5Si9OP2OvLZzNEiR3eKWLMd1PFCNwHUiAu8Ug50K0zYWt1p2jT8y+nFYI8bZIUjZ
pltj1XHZe6h0eItt9w1aM4PY6TygiB5bc8teHWcp2tnoBplj7C+QGVw86M4YFI5D
neKwiH1+ek8rVLORdAtH0eBG8BvLppZJ7enlY6aFgiYbIH3SYI9AGfM3ntnZhvRi
Z6VrfhCkB2ddq1uTBgBqRld9ZgmCPQcSoTMVwG3KXbXk01f7JBDwG6bE1OBOeJbS
cIUHwuxN+bwSPD/QOagUf+wsJt+XYTWv3iJIj46+mSD5qz6yKjmvRAMAotydG130
G3ftE6I79E3zDYtbNt17d2K3sTJnLjjRk4gHp6g9SLf6QhLNd8SCp1zIVfUk1WoL
CIUpPlgGgkbtLTmGAEXNW95zVRi+ZeFFDIOcCsahS33hmOkidthd5BzkSUXKa+Zh
lzMujCh/4JUXToHmugwpi1ZEVl84rk9RLCgPd9PhfFvhCzL1rSA=
=YiZd
-----END PGP SIGNATURE-----
Merge tag 'riscv-dt-for-v6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/dt
RISC-V Devicetrees for v6.15
Starfive:
All changes for jh7110-based boards including the removal of a dac
that does not exist and the addition of usb3 support on the star64 board
and pcie on the framework mainboard.
Microchip:
Update pcie reg properties to fix a mistake originally describing them.
Here rather than in fixes, since the driver maintains support for the
old format.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-dt-for-v6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: starfive: jh7110-pine64-star64: enable USB 3.0 port
riscv: dts: starfive: jh7110: pciephy0 USB 3.0 configuration registers
riscv: dts: starfive: fml13v01: enable pcie1
riscv: dts: starfive: remove non-existent dac from jh7110
riscv: dts: starfive: Unify regulator naming scheme
riscv: dts: microchip: update pcie reg properties to new format
Link: https://lore.kernel.org/r/20250318-favorite-presuming-bf2fcf55bf6a@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Whether the MII transmit clock can be stopped is primarily a property
of the PHY (there is a capability bit that should be checked first.)
Whether the MAC is capable of stopping the transmit clock is a separate
issue, but this is already handled by the core DesignWare MAC code.
As commit "net: stmmac: starfive: use PHY capability for TX clock stop"
adds the flag to use the PHY capability, remove the DT property that is
now unecessary.
Cc: Samin Guo <samin.guo@starfivetech.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIU5-005vGR-4c@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Allow skipping scalar and vector unaligned access speed tests. This
is useful for testing alternative code paths and to skip the tests in
environments where they run too slowly. All CPUs must have the same
unaligned access speed.
The code movement is because we now need the scalar cpu hotplug
callback to always run, so we need to bring it and its supporting
functions out of CONFIG_RISCV_PROBE_UNALIGNED_ACCESS.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250304120014.143628-17-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Whether or not we have RISCV_PROBE_VECTOR_UNALIGNED_ACCESS we need to
set up a cpu hotplug callback to check if we have vector at all,
since, when we don't have vector, we need to set
vector_misaligned_access to unsupported rather than leave it the
default of unknown.
Fixes: e7c9d66e31 ("RISC-V: Report vector unaligned access speed hwprobe")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250304120014.143628-16-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
CPU hotplug callbacks should be set up even if we detected all
current cpus emulate misaligned accesses, since we want to
ensure our expectations of all cpus emulating is maintained.
Fixes: 6e5ce7f2ea ("riscv: Decouple emulated unaligned accesses from access speed")
Fixes: e7c9d66e31 ("RISC-V: Report vector unaligned access speed hwprobe")
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250304120014.143628-15-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
The return value of check_unaligned_access_speed_all_cpus() is always
zero, so make the function void so we don't need to concern ourselves
with it. The change also allows us to tidy up
check_unaligned_access_all_cpus() a bit.
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250304120014.143628-14-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
check_vector_unaligned_access_emulated_all_cpus(), like its name
suggests, will return true when all cpus emulate unaligned vector
accesses. If the function returned false it may have been because
vector isn't supported at all (!has_vector()) or because at least
one cpu doesn't emulate unaligned vector accesses. Since false may
be returned for two cases, checking for it isn't sufficient when
attempting to determine if we should proceed with the vector speed
check. Move the !has_vector() functionality to
check_unaligned_access_all_cpus() in order for
check_vector_unaligned_access_emulated_all_cpus() to return false
for a single case.
Fixes: e7c9d66e31 ("RISC-V: Report vector unaligned access speed hwprobe")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250304120014.143628-13-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
We shouldn't probe when we already know vector is unsupported and
we should probe when we see we don't yet know whether it's supported.
Furthermore, we should ensure we've set the access type to
unsupported when we don't have vector at all.
Fixes: e7c9d66e31 ("RISC-V: Report vector unaligned access speed hwprobe")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250304120014.143628-12-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Several functions used in unaligned access probing are only run at
init time. Annotate them appropriately.
Fixes: f413aae96c ("riscv: Set unaligned access speed at compile time")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250304120014.143628-11-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Extend the KVM ISA extension ONE_REG interface to allow KVM user space
to detect and enable Zaamo/Zalrsc extensions for Guest/VM.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20240619153913.867263-5-cleger@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Export the Zaamo and Zalrsc extensions to userspace using hwprobe.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240619153913.867263-4-cleger@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
These 2 new extensions are actually a subset of the A extension which
provides atomic memory operations and load-reserved/store-conditional
instructions.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240619153913.867263-3-cleger@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Currently, fgraph on riscv relies on the infrastructure of
DYNAMIC_FTRACE_WITH_ARGS. However, DYNAMIC_FTRACE_WITH_ARGS may be
turned off on riscv, which will cause the enabled fgraph to be abnormal.
Therefore, let's select HAVE_FUNCTION_GRAPH_TRACER depends on
HAVE_DYNAMIC_FTRACE_WITH_ARGS.
Fixes: a3ed4157b7 ("fgraph: Replace fgraph_ret_regs with ftrace_regs")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202503160820.dvqMpH0g-lkp@intel.com/
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250317031214.4138436-1-pulehui@huaweicloud.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Correct the VEC_S macro definition to fix the implementation
of vector words copy in the case of unalignment in RISC-V.
Fixes: e7c9d66e31 ("RISC-V: Report vector unaligned access speed hwprobe")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Tingbo Liao <tingbo.liao@starfivetech.com>
Link: https://lore.kernel.org/r/20250228090801.8334-1-tingbo.liao@starfivetech.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
This patch adds parentheses to parameters caller and callee of macros
make_call_t0 and make_call_ra. Every existing invocation of these two
macros uses a single variable for each argument, so the absence of the
parentheses seems okay. However, future invocations might use more
complex expressions as arguments. For example, a future invocation might
look like this: make_call_t0(a - b, c, call). Without parentheses in the
macro definition, the macro invocation expands to:
...
unsigned int offset = (unsigned long) c - (unsigned long) a - b;
...
which is clearly wrong.
The use of parentheses ensures arguments are correctly evaluated and
potentially saves future users of make_call_t0 and make_call_ra debugging
trouble.
Fixes: 6724a76cff ("riscv: ftrace: Reduce the detour code size to half")
Signed-off-by: Juhan Jin <juhan.jin@foxmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/tencent_AE90AA59903A628E87E9F80E563DA5BA5508@qq.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Commit 654102df2a ("kbuild: add generic support for built-in boot
DTBs") introduced generic support for built-in DTBs.
Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled.
To keep consistency across architectures, this commit also renames
CONFIG_BUILTIN_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20241222000836.2578171-1-masahiroy@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
The size of ®s->a0 is unknown, causing the error:
../include/linux/fortify-string.h:571:25: warning: call to
'__write_overflow_field' declared with attribute warning: detected write
beyond size of field (1st parameter); maybe use struct_group()?
[-Wattribute-warning]
Fix this by wrapping the required registers in pt_regs with
struct_group() and reference the group when doing the offending
memcpy().
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250224-fix_ftrace_partial_regs-v1-1-54b906417e86@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Since commit f862bbf4cd ("riscv: Allow NOMMU kernels to run in
S-mode") in v6.10, CLINT_TIMER is selected by the main RISCV symbol when
RISCV_M_MODE is enabled.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/ce55529a42fa232cacd580e38866c60701f91095.1738764474.git.geert+renesas@glider.be
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Follow-up to commit e36ddf3226 ("riscv: defconfig: Disable RZ/Five
peripheral support") in v6.12-rc1:
- Disable ARCH_RENESAS, too, as currently RZ/Five is the sole Renesas
RISC-V SoC,
- Drop no longer needed explicit disable of USB_XHCI_RCAR, which
depends on ARCH_RENESAS.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/e8a2fb273c8c68bd6d526b924b4212f397195b28.1738764211.git.geert+renesas@glider.be
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Nick Hu <nick.hu@sifive.com> says:
When the cpu is going to be hotplug, stop the stimecmp to prevent pending
interrupt.
When the cpu is going to be suspended, save the stimecmp before entering
the suspend state and restore it in the resume path.
* patches from https://lore.kernel.org/r/20250219114135.27764-1-nick.hu@sifive.com:
clocksource/drivers/timer-riscv: Stop stimecmp when cpu hotplug
riscv: Add stimecmp save and restore
Link: https://lore.kernel.org/r/20250219114135.27764-1-nick.hu@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
If the HW support the SSTC extension, we should save and restore the
stimecmp register while cpu non retention suspend.
Signed-off-by: Nick Hu <nick.hu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250219114135.27764-2-nick.hu@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reduce three lines checking to single line using a ternary conditional
expression for getting the base extension word. In addition, the
test_bit macro function already return a boolean which matches the
return type of the caller, so directly return the result of the test_bit
macro function.
Signed-off-by: Chin Yik Ming <yikming2222@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250129203843.1136838-1-yikming2222@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Since commit f0bddf5058 ("riscv: entry: Convert to generic
entry"), TASK_TI_FLAGS is not used any more, so remove it.
Fixes: f0bddf5058 ("riscv: entry: Convert to generic entry")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20241109014605.2801492-1-ruanjinjie@huawei.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Expose Zicbom through hwprobe and also provide a key to extract its
respective block size.
[ alex: Fix merge conflicts and hwprobe numbering ]
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Link: https://lore.kernel.org/r/20250226063206.71216-3-cuiyunhui@bytedance.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Enabling cbo.clean and cbo.flush in user mode makes it more
convenient to manage the cache state and achieve better performance.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Link: https://lore.kernel.org/r/20250226063206.71216-2-cuiyunhui@bytedance.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Inochi Amaoto <inochiama@gmail.com> says:
Add description for the BFloat16 precision Floating-Point ISA extension,
(Zfbfmin, Zvfbfmin, Zvfbfwma). which was ratified in commit 4dc23d62
("Added Chapter title to BF16") of the riscv-isa-manual.
* patches from https://lore.kernel.org/r/20250213003849.147358-1-inochiama@gmail.com:
riscv: hwprobe: export bfloat16 ISA extension
riscv: add ISA extension parsing for bfloat16 ISA extension
dt-bindings: riscv: add bfloat16 ISA extension description
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250213003849.147358-1-inochiama@gmail.com
Add parsing for Zfbmin, Zvfbfmin, Zvfbfwma ISA extension which
were ratified in 4dc23d62 ("Added Chapter title to BF16") of
the riscv-isa-manual.
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20250213003849.147358-3-inochiama@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
RISC-V code uses the queued spinlock implementation, which calls
the macros smp_cond_load_acquire for one byte. So, complement the
implementation of byte and halfword versions.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20241217013910.1039923-1-guoren@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
This is required to allow the IOMMU driver to correctly flush its own
TLB.
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20250113142424.30487-1-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Checking for pc to be a kernel text address at this location is useless
since pc == handle_exception. Remove this check.
[ alex: Fix merge conflict ]
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240830084934.3690037-1-cleger@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Use RSW0 as the special bit for pmds and puds, just like for ptes.
Also define the {pte,pmd,pud}_pgprot helpers which were previously
missing and are needed for the follow_pfnmap APIs.
Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250108135700.2614848-1-abrestic@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Conor Dooley <conor@kernel.org> says:
Since one depends on the other, albeit trivially, here's a v4 of the Zbb
toolchain dep removal alongside the rewording of Kconfig options I'd
sent out before the merge window. I think I like this implementation
better than v1, but I couldn't think of a good name for a "public"
version of __ALTERNATIVE(), so I used it here directly.
Unfortunately "ALTERNATIVE_2_CFG" already exists and I couldn't think of
a good way to name an alternative macro that allows for several config
options that didn't make the distinction sufficiently clear.. Yell
if you have better suggestions than I did.
I am a wee bit "worried" that this makes the Kconfig option confusing as
it isn't immediately obvious if someone is or is not going to get the
toolchain based optimisations.
Cheers,
Conor.
* patches from https://lore.kernel.org/r/20241024-aspire-rectify-9982da6943e5@spud:
RISC-V: separate Zbb optimisations requiring and not requiring toolchain support
RISC-V: clarify what some RISCV_ISA* config options do
Link: https://lore.kernel.org/r/20241024-aspire-rectify-9982da6943e5@spud
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
It seems a bit ridiculous to require toolchain support for BPF to
assemble Zbb instructions, so move the dependency on toolchain support
for Zbb optimisations out of the Kconfig option and to the callsites.
Zbb support has always depended on alternatives, so while adjusting the
config options guarding optimisations, remove any checks for
whether or not alternatives are enabled.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20241024-chump-freebase-d26b6d81af33@spud
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
During some discussion on IRC yesterday and on Pu's bpf patch [1]
I noticed that these RISCV_ISA* Kconfig options are not really clear
about their implications. Many of these options have no impact on what
userspace is allowed to do, for example an application can use Zbb
regardless of whether or not the kernel does. Change the help text to
try and clarify whether or not an option affects just the kernel, or
also userspace. None of these options actually control whether or not an
extension is detected dynamically as that's done regardless of Kconfig
options, so drop any text that implies the option is required for
dynamic detection, rewording them as "do x when y is detected".
Link: https://lore.kernel.org/linux-riscv/20240328-ferocity-repose-c554f75a676c@spud/ [1]
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20241024-overdue-slogan-0b0f69d3da91@spud
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
The point where the memory is released from memblock to the buddy
allocator is hidden inside arch-specific mem_init()s and the call to
memblock_free_all() is needlessly duplicated in every artiste cure and
after introduction of arch_mm_preinit() hook, mem_init() implementation on
many architecture only contains the call to memblock_free_all().
Pull memblock_free_all() call into mm_core_init() and drop mem_init() on
relevant architectures to make it more explicit where the free memory is
released from memblock to the buddy allocator and to reduce code
duplication in architecture specific code.
Link: https://lkml.kernel.org/r/20250313135003.836600-14-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, implementation of mem_init() in every architecture consists of
one or more of the following:
* initializations that must run before page allocator is active, for
instance swiotlb_init()
* a call to memblock_free_all() to release all the memory to the buddy
allocator
* initializations that must run after page allocator is ready and there is
no arch-specific hook other than mem_init() for that, like for example
register_page_bootmem_info() in x86 and sparc64 or simple setting of
mem_init_done = 1 in several architectures
* a bunch of semi-related stuff that apparently had no better place to
live, for example a ton of BUILD_BUG_ON()s in parisc.
Introduce arch_mm_preinit() that will be the first thing called from
mm_core_init(). On architectures that have initializations that must happen
before the page allocator is ready, move those into arch_mm_preinit() along
with the code that does not depend on ordering with page allocator setup.
On several architectures this results in reduction of mem_init() to a
single call to memblock_free_all() that allows its consolidation next.
Link: https://lkml.kernel.org/r/20250313135003.836600-13-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86]
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
high_memory defines upper bound on the directly mapped memory. This bound
is defined by the beginning of ZONE_HIGHMEM when a system has high memory
and by the end of memory otherwise.
All this is known to generic memory management initialization code that
can set high_memory while initializing core mm structures.
Add a generic calculation of high_memory to free_area_init() and remove
per-architecture calculation except for the architectures that set and use
high_memory earlier than that.
Link: https://lkml.kernel.org/r/20250313135003.836600-11-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86]
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
max_mapnr is essentially the size of the memory map for systems that use
FLATMEM. There is no reason to calculate it in each and every architecture
when it's anyway calculated in alloc_node_mem_map().
Drop setting of max_mapnr from architecture code and set it once in
alloc_node_mem_map().
While on it, move definition of mem_map and max_mapnr to mm/mm_init.c so
there won't be two copies for MMU and !MMU variants.
Link: https://lkml.kernel.org/r/20250313135003.836600-10-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86]
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The majority of the last fixes are for devicetree files. This address two
important regressions for the Qualcomm SMMU and the Raspberry Pi 4 USB
controller, as well as a larger number of patches fixing minor mistakes
in board specific files for Rockchips, i.MX, starfive and broadcom.
The non-DT changes are
- A fix for an old boot regression on Renesas shmobile chips
- Another boot time regression for for the Qualcomm PDR SoC driver,
among a few other Qualcomm firmware driver fixes for efivars
and tzmem.
- Minor Kconfig fixes for davinci and OMAP1
- Minor code fixes for sparx5 reset controllers, OMAP memory controller,
i.MX SCU, cpufreq and SoC drivers and a Hisilicon SoC driver.
- One more update to the Asahi maintainers, adding Neal Gompa as a
reviewer
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmfYYesACgkQYKtH/8kJ
UicnLxAA2rhqxXizpfBS1JBi3V1koh2JCbbsFfWPlYbgs6vFKfodvzpbz8nggWwK
lM+nIeF7PgxwDzip9mu07EdJgFwrQN9SFHG1lkXKW+d+e5o8ZKJzVMF7Faxedqju
vIWjpIiF3BlbLYOX+HkEL+fCmqmD7ehAWE4GJbIde3MdjEbzjJoAZqElh5BbPesq
PMnH0eRQtHRKlLhrb5BB512G7QGwFPB2GXJkwfY1egUpFQqwKsY45F9e3XUzm8B0
JlO45qcLJclHTA4ravLrZdXG38nF6DvwaHdgfyymu2QO+Hr6FHoQcjvS1ZkxByP1
EXuPurnC17UhM7SNGCB7iqx4V9jsVyh20gR9G36vMxq3igSppShaniaADlfYBXOn
To5ojVNHFWUJIFbFURPsLDXdo8CHjZDu+rB9p8eOeNl+A65CbOpa87L0K94JxrON
Tm35vVLNKPpMuAcYiT77J2v0hKnHRsnyKj3ofA7mnZ2SoiMGhet+ahmlh2qvNfpY
v32SEF31kpMsUDcZrX9Ex/ncwqba1M1VnnqZ9KZEJqvG6qZUo1M6338U7BGADV1t
O7XVTgiOycJExxMVtwmLEoMriQRqjUiQ5iPDtnfkvWn1OsMzxXBC2XjF0zr2U7ot
KpjydXdtrv8HrknsyvKZnJpn+0opbHq7yutgzjx1YW8i+cJPq/w=
=9pDv
-----END PGP SIGNATURE-----
Merge tag 'soc-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"The majority of these last fixes are for devicetree files.
These address two important regressions for the Qualcomm SMMU and the
Raspberry Pi 4 USB controller, as well as a larger number of patches
fixing minor mistakes in board specific files for Rockchips, i.MX,
starfive and broadcom.
The non-DT changes are
- A fix for an old boot regression on Renesas shmobile chips
- Another boot time regression for for the Qualcomm PDR SoC driver,
among a few other Qualcomm firmware driver fixes for efivars and
tzmem
- Minor Kconfig fixes for davinci and OMAP1
- Minor code fixes for sparx5 reset controllers, OMAP memory
controller, i.MX SCU, cpufreq and SoC drivers and a Hisilicon SoC
driver
- One more update to the Asahi maintainers, adding Neal Gompa as a
reviewer"
* tag 'soc-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits)
ARM: davinci: da850: fix selecting ARCH_DAVINCI_DA8XX
soc: hisilicon: kunpeng_hccs: Fix incorrect string assembly
memory: omap-gpmc: drop no compatible check
reset: mchp: sparx5: Fix for lan966x
ARM: shmobile: smp: Enforce shmobile_smp_* alignment
MAINTAINERS: Add myself (Neal Gompa) as a reviewer for ARM Apple support
MAINTAINERS: Add apple-spi driver & binding files
arm64: dts: rockchip: slow down emmc freq for rock 5 itx
ARM: dts: BCM5301X: Fix switch port labels of ASUS RT-AC3200
ARM: dts: BCM5301X: Fix switch port labels of ASUS RT-AC5300
ARM: dts: bcm2711: Don't mark timer regs unconfigured
ARM: OMAP1: select CONFIG_GENERIC_IRQ_CHIP
arm64: dts: rockchip: Add missing PCIe supplies to RockPro64 board dtsi
arm64: dts: rockchip: Add avdd HDMI supplies to RockPro64 board dtsi
arm64: dts: rockchip: Remove undocumented sdmmc property from lubancat-1
arm64: dts: rockchip: fix pinmux of UART5 for PX30 Ringneck on Haikou
arm64: dts: rockchip: fix pinmux of UART0 for PX30 Ringneck on Haikou
arm64: dts: rockchip: fix u2phy1_host status for NanoPi R4S
arm64: dts: bcm2712: PL011 UARTs are actually r1p5
ARM: dts: bcm2711: PL011 UARTs are actually r1p5
...
Platforms subscribe into generic ptdump implementation via GENERIC_PTDUMP.
But generic ptdump gets enabled via PTDUMP_CORE. These configs
combination is confusing as they sound very similar and does not
differentiate between platform's feature subscription and feature
enablement for ptdump. Rename the configs as ARCH_HAS_PTDUMP and PTDUMP
making it more clear and improve readability.
Link: https://lkml.kernel.org/r/20250226122404.1927473-6-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> (powerpc)
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
cmdline argument is not used in reserve_crashkernel_generic() so remove
it. Correspondingly, all the callers have been updated as well.
No functional change intended.
Link: https://lkml.kernel.org/r/20250131113830.925179-3-sourabhjain@linux.ibm.com
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioremap_prot() currently accepts pgprot_val parameter as an unsigned long,
thus implicitly assuming that pgprot_val and pgprot_t could never be
bigger than unsigned long. But this assumption soon will not be true on
arm64 when using D128 pgtables. In 128 bit page table configuration,
unsigned long is 64 bit, but pgprot_t is 128 bit.
Passing platform abstracted pgprot_t argument is better as compared to
size based data types. Let's change the parameter to directly pass
pgprot_t like another similar helper generic_ioremap_prot().
Without this change in place, D128 configuration does not work on arm64 as
the top 64 bits gets silently stripped when passing the protection value
to this function.
Link: https://lkml.kernel.org/r/20250218101954.415331-1-anshuman.khandual@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Co-developed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch lays the groundwork for supporting batch PTE unmapping in
try_to_unmap_one(). It introduces range handling for TLB batch flushing,
with the range currently set to the size of PAGE_SIZE.
The function __flush_tlb_range_nosync() is architecture-specific and is
only used within arch/arm64. This function requires the mm structure
instead of the vma structure. To allow its reuse by
arch_tlbbatch_add_pending(), which operates with mm but not vma, this
patch modifies the argument of __flush_tlb_range_nosync() to take mm as
its parameter.
Link: https://lkml.kernel.org/r/20250214093015.51024-3-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shaoqin Huang <shahuang@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chis Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mauricio Faria de Oliveira <mfo@canonical.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The imperative paradigm used to build vmlinux, extract some info from it
or perform some checks on it, and subsequently modify it again goes
against the declarative paradigm that is usually employed for defining
make rules.
In particular, the Makefile.postlink files that consume their input via
an output rule result in some dodgy logic in the decompressor makefiles
for RISC-V and x86, given that the vmlinux.relocs input file needed to
generate the arch-specific relocation tables may not exist or be out of
date, but cannot be constructed using the ordinary Make dependency based
rules, because the info needs to be extracted while vmlinux is in its
ephemeral, non-stripped form.
So instead, for architectures that require the static relocations that
are emitted into vmlinux when passing --emit-relocs to the linker, and
are subsequently stripped out again, introduce an intermediate vmlinux
target called vmlinux.unstripped, and organize the reset of the build
logic accordingly:
- vmlinux.unstripped is created only once, and not updated again
- build rules under arch/*/boot can depend on vmlinux.unstripped without
running the risk of the data disappearing or being out of date
- the final vmlinux generated by the build is not bloated with static
relocations that are never needed again after the build completes.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Some architectures build vmlinux with static relocations preserved, but
strip them again from the final vmlinux image. Arch specific tools
consume these static relocations in order to construct relocation tables
for KASLR.
The fact that vmlinux is created, consumed and subsequently updated goes
against the typical, declarative paradigm used by Make, which is based
on rules and dependencies. So as a first step towards cleaning this up,
introduce a Kconfig symbol to declare that the arch wants to consume the
static relocations emitted into vmlinux. This will be wired up further
in subsequent patches.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
test_and_{set,clear}_bit are fully ordered as specified in
Documentation/atomic_bitops.txt. Fix incorrect comment stating otherwise.
Note that the implementation is correct since commit
9347ce54cd ("RISC-V: __test_and_op_bit_ord should be strongly ordered")
was introduced.
Signed-off-by: Ignacio Encinas <ignacio@iencinas.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
- Fix the regmap settings for bcm281xx, this was missing the
stride.
- NULL check for the Nuvoton npcm8xx devm_kasprintf()
- Enable the Spacemit pin controller by default in the
SoC config. The SoC will not boot without it so this one
is prett much required.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmfQEsEACgkQQRCzN7AZ
XXNrMw//SAJpyYYdVcTlkjag1O0gklX/zwsm1QvdYFsysIivlp5aJxIonTgHKl/6
vj1rd2w4yvsxOWDb7wQJGzqu1ijaL/spC4UOsaeJfN9orMgn3DogYswUT22MxPeN
tYzpdrRRMDkNEcf8x+Y5ogANInbEkIjT2eGHGDschYUQw7l2rUrct4E8uelfWEW1
t6UPa69DbEdz3E77Zrx9z35R0a9lyBgh1gkd0schboEfIuX5iunf6WqRx8oSCSH/
+cS215SeGUO5SarWuw2X7bFvTEldm1wA6aQZloMfoae3Z49mXNIOI565rxVJFSSD
JGjagkZhTzT1yy1xFtptVwBHURsx4joHErhQ9MTaDEOUvwMY3h0pHSZdFy1aUi/T
PLkhfd4eMdXUqN739szG1zgm8esPNgmyqqXlnU6CY7lYqYQvjQescvIJ5brol4Bt
k3vIiouRItSp0C14ci9SG0vFHUbwjznYWk6396GQClc1y/xTX0p7xqLBhciCE0s4
Jdk3cFHe1QjF9T+Jn6UFv2YvTupj+X6bFWe+XOzUBaguDXI6AJgx2wzQiKWWOqRe
DmG1+zTCC59kbh0j7huQesvUquYx5qZbqLNIlBeY10wrGiHI7+a7+8Kzc9BDXtlq
DSkAQyFgctm7Bcm+dRH0qrkyKssTRb86ryIqk2GBPdokR2Vnmk8=
=r2/Q
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Fix the regmap settings for bcm281xx, this was missing the stride
- NULL check for the Nuvoton npcm8xx devm_kasprintf()
- Enable the Spacemit pin controller by default in the SoC config. The
SoC will not boot without it so this one is pretty much required
* tag 'pinctrl-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: spacemit: enable config option
pinctrl: nuvoton: npcm8xx: Add NULL check in npcm8xx_gpio_fw
pinctrl: bcm281xx: Fix incorrect regmap max_registers value
Wire up crc64_be_arch() and crc64_nvme_arch() for 64-bit RISC-V using
crc-clmul-template.h. This greatly improves the performance of these
CRCs on Zbc-capable CPUs in 64-bit kernels.
These optimized CRC64 functions are not yet supported in 32-bit kernels,
since crc-clmul-template.h assumes that the CRC fits in an unsigned
long. That implementation limitation could be addressed, but it would
add a fair bit of complexity, so it has been omitted for now.
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250216225530.306980-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Wire up crc_t10dif_arch() for RISC-V using crc-clmul-template.h. This
greatly improves CRC-T10DIF performance on Zbc-capable CPUs.
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250216225530.306980-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Delete the previous Zbc optimized CRC32 code, and re-implement it using
the new template. The new implementation is more optimized and shares
more code among CRC variants.
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250216225530.306980-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Add a "template" crc-clmul-template.h that can generate RISC-V Zbc
optimized CRC functions. Each generated CRC function is parameterized
by CRC length and bit order, and it accepts a pointer to the constants
struct required for the specific CRC polynomial desired. Update
gen-crc-consts.py to support generating the needed constants structs.
This makes it possible to easily wire up a Zbc optimized implementation
of almost any CRC.
The design generally follows what I did for x86, but it is simplified by
using RISC-V's scalar carryless multiplication Zbc, which has no
equivalent on x86. RISC-V's clmulr instruction is also helpful. A
potential switch to Zvbc (or support for Zvbc alongside Zbc) is left for
future work. For long messages Zvbc should be fastest, but it would
need to be shown to be worthwhile over just using Zbc which is
significantly more convenient to use, especially in the kernel context.
Compared to the existing Zbc-optimized CRC32 code and the earlier
proposed Zbc-optimized CRC-T10DIF code
(https://lore.kernel.org/r/20250211071101.181652-1-zhihang.shao.iscas@gmail.com),
this submission deduplicates the code among CRC variants and is
significantly more optimized. It uses "folding" to take better
advantage of instruction-level parallelism (to a more limited extent
than x86 for now, but it could be extended to more), it reworks the
Barrett reduction to eliminate unnecessary instructions, and it
documents all the math used and makes all the constants reproducible.
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250216225530.306980-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
A single fix for an incorrect define in the jh7110 pinctrl header.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRh246EGq/8RLhDjO14tDGHoIJi0gUCZ8hOeAAKCRB4tDGHoIJi
0ti5AQD7CJrTmBBqP5nxQHvgvz1KSSqfzpOMrtA3YpOkDaqYigD/Ww6z12oSHFuK
lLFSnpVJLqFd01N0wP1e+TUrbZ++Hg4=
=WHwP
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmfJwJIACgkQYKtH/8kJ
Uidt6RAApTAvDdwAR4FAxHqSCK+mJkC7njGkGr5t1xNBFpWdKwsONdet84ErEEzl
ift1XFMWwZ98jIR4cHAnsPinHyFZUDH48rOJw8AmygSaikmaH27604lqLF+i55nV
JjqBrQDCA3T6haH/1uM6+Te8c7FH/klG6sAymZs+x+l8UER2l4U8eNThaU0Xk1/n
ZMXQK/R1buv+6oGeY5mXNkdIerXM9fz/BjwcfECwZHyTk5gMTrwzM+6emlo78Y/V
OvnX/FJOGZ3MNL/rj9LDFu8QDqYf7p7D7fkYpGqRgScFs6LMTjKb37uCVfU7i4i8
4zMwUZjwOkaJOxBgnDHmySCLmlBhcj0FN34kRsVgdqG5YAYzcTzTBdBPeOjNCDpa
7i8kg3mJQD5hVbZFHWIvTqpBXOxFJTKKbU+W8dmlTsB7DyK4uV3WPym+MnqIZWw/
+xcBi7dJZEyMaEThuWjCfTrJu0rWJg3WeZjFLgxoiz+yB0NzDivXMN7lrmoYe/1U
dIoP4yDCOzVmJIeW9CIXVd/sXTOHPVaWRSUV0dqjHYefUl/wz9Mo55DbTT+JK9Ng
0d1iSQBmz/M5ThjYGDBYW1Ltq7zwSDxBIHwVBkKpopRPanbXSYHsByTMB3LNGDn1
Uou9MCr65MqZNajbk+vE1kEvAke47+/SHau9arFVtkjcA0srdcE=
=qqLw
-----END PGP SIGNATURE-----
Merge tag 'riscv-dt-fixes-for-v6.14-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes
RISC-V Devicetree fix for v6.14-rc6
A single fix for an incorrect define in the jh7110 pinctrl header.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-dt-fixes-for-v6.14-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: starfive: Fix a typo in StarFive JH7110 pin function definitions
Link: https://lore.kernel.org/r/20250305-sip-unable-d56ef7dbf86b@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The perf event should be marked disabled during the creation as
it is not ready to be scheduled until there is SBI PMU start call
or config matching is called with auto start. Otherwise, event add/start
gets called during perf_event_create_kernel_counter function.
It will be enabled and scheduled to run via perf_event_enable during
either the above mentioned scenario.
Fixes: 0cb74b65d2 ("RISC-V: KVM: Implement perf support without sampling")
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-1-41d177e45929@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The ARCH_MAY_HAVE patch missed arm64, mips and s390. But it may
also lead to arch options being enabled but ineffective because
of modular/built-in conflicts.
As the primary user of all these options wireguard is selecting
the arch options anyway, make the same selections at the lib/crypto
option level and hide the arch options from the user.
Instead of selecting them centrally from lib/crypto, simply set
the default of each arch option as suggested by Eric Biggers.
Change the Crypto API generic algorithms to select the top-level
lib/crypto options instead of the generic one as otherwise there
is no way to enable the arch options (Eric Biggers). Introduce a
set of INTERNAL options to work around dependency cycles on the
CONFIG_CRYPTO symbol.
Fixes: 1047e21aec ("crypto: lib/Kconfig - Fix lib built-in failure when arch is modular")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Closes: https://lore.kernel.org/oe-kbuild-all/202502232152.JC84YDLp-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
- Fix a sporadic boot failure due to incorrect randomization of the
linear map on systems that support it
- Fix the zapping (both clearing the entries *and* invalidating the TLB)
of hugetlb PTEs constructed using the contiguous bit
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmfDdBIQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNN0GB/9gmEOX1GwMU6wFjPYqvjWlkGCFDwrldO84
uF9jEUbPaw3P4xHTOFyPCfEWidktqa+yDVbe90mB7GVOM+1eEZ81em1k1hYBEXbz
Q73Nl5VrNzxX4BjOrdxxoTSaR/TKklUh5mqWfIzy1RxEnBfpr/GuDPtUn1GViCAs
sU16Ju12UdYXn3tyHFDHpjZS9WYZskfnrvS0QvXinz0LahZrCkeaH+ptYHrTjMFx
hxyrRQwOlqLnZWvjLOegH9AC6uyRkKDinXKhXqHYvUfcfEkQsKwM7Fpc6cviUD0Q
X2npLNegnYxPniwmLpXfNXazPDnKVMzxb9lpqw1fZS3nAuh8XOde
=RqDZ
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Ryan's been hard at work finding and fixing mm bugs in the arm64 code,
so here's a small crop of fixes for -rc5.
The main changes are to fix our zapping of non-present PTEs for
hugetlb entries created using the contiguous bit in the page-table
rather than a block entry at the level above. Prior to these fixes, we
were pulling the contiguous bit back out of the PTE in order to
determine the size of the hugetlb page but this is clearly bogus if
the thing isn't present and consequently both the clearing of the
PTE(s) and the TLB invalidation were unreliable.
Although the problem was found by code inspection, we really don't
want this sitting around waiting to trigger and the changes are CC'd
to stable accordingly.
Note that the diffstat looks a lot worse than it really is;
huge_ptep_get_and_clear() now takes a size argument from the core code
and so all the arch implementations of that have been updated in a
pretty mechanical fashion.
- Fix a sporadic boot failure due to incorrect randomization of the
linear map on systems that support it
- Fix the zapping (both clearing the entries *and* invalidating the
TLB) of hugetlb PTEs constructed using the contiguous bit"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level
arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
arm64/mm: Fix Boot panic on Ampere Altra
* Fix TCR_EL2 configuration to not use the ASID in TTBR1_EL2
and not mess-up T1SZ/PS by using the HCR_EL2.E2H==0 layout.
* Bring back the VMID allocation to the vcpu_load phase, ensuring
that we only setup VTTBR_EL2 once on VHE. This cures an ugly
race that would lead to running with an unallocated VMID.
RISC-V:
* Fix hart status check in SBI HSM extension
* Fix hart suspend_type usage in SBI HSM extension
* Fix error returned by SBI IPI and TIME extensions for
unsupported function IDs
* Fix suspend_type usage in SBI SUSP extension
* Remove unnecessary vcpu kick after injecting interrupt
via IMSIC guest file
x86:
* Fix an nVMX bug where KVM fails to detect that, after nested
VM-Exit, L1 has a pending IRQ (or NMI).
* To avoid freeing the PIC while vCPUs are still around, which
would cause a NULL pointer access with the previous patch,
destroy vCPUs before any VM-level destruction.
* Handle failures to create vhost_tasks
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmfCvVsUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPqGwf9FOWQRd/yCKHiufjPDefD1Og0DmgB
Dgk0nmHxaxbyPw+5vYlhn/J3vZ54sNngBpmUekE5OuBMZ9EsxXAK/myByHkzNnV9
cyLm4vYwpb9OQmbQ5MMdDlptYsjV40EmSfwwIJpBxjdkwAI3f7NgeHvG8EwkJgch
C+X4JMrLu2+BGo7BUhuE/xrB8h0CBRnhalB5aK1wuF+ey8v06zcU0zdQCRLUpOsx
mW9S0OpSpSlecvcblr0AhuajjHjwFaTFOQofaXaQFBW6kv3dXmSq/JRABEfx0TBb
MTUDQtnnaYvPy/RWwZIzBpgfASLQNQNxSJ7DIw9C8IG7k6rK25BSRwTmSw==
=afMB
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix TCR_EL2 configuration to not use the ASID in TTBR1_EL2 and not
mess-up T1SZ/PS by using the HCR_EL2.E2H==0 layout.
- Bring back the VMID allocation to the vcpu_load phase, ensuring
that we only setup VTTBR_EL2 once on VHE. This cures an ugly race
that would lead to running with an unallocated VMID.
RISC-V:
- Fix hart status check in SBI HSM extension
- Fix hart suspend_type usage in SBI HSM extension
- Fix error returned by SBI IPI and TIME extensions for unsupported
function IDs
- Fix suspend_type usage in SBI SUSP extension
- Remove unnecessary vcpu kick after injecting interrupt via IMSIC
guest file
x86:
- Fix an nVMX bug where KVM fails to detect that, after nested
VM-Exit, L1 has a pending IRQ (or NMI).
- To avoid freeing the PIC while vCPUs are still around, which would
cause a NULL pointer access with the previous patch, destroy vCPUs
before any VM-level destruction.
- Handle failures to create vhost_tasks"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: retry nx_huge_page_recovery_thread creation
vhost: return task creation error instead of NULL
KVM: nVMX: Process events on nested VM-Exit if injectable IRQ or NMI is pending
KVM: x86: Free vCPUs before freeing VM state
riscv: KVM: Remove unnecessary vcpu kick
KVM: arm64: Ensure a VMID is allocated before programming VTTBR_EL2
KVM: arm64: Fix tcr_el2 initialisation in hVHE mode
riscv: KVM: Fix SBI sleep_type use
riscv: KVM: Fix SBI TIME error generation
riscv: KVM: Fix SBI IPI error generation
riscv: KVM: Fix hart suspend_type use
riscv: KVM: Fix hart suspend status check
In order to fix a bug, arm64 needs to be told the size of the huge page
for which the huge_pte is being cleared in huge_ptep_get_and_clear().
Provide for this by adding an `unsigned long sz` parameter to the
function. This follows the same pattern as huge_pte_clear() and
set_huge_pte_at().
This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, loongarch, mips,
parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
in a separate commit.
Cc: stable@vger.kernel.org
Fixes: 66b3923a1a ("arm64: hugetlb: add support for PTE contiguous bit")
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390
Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Remove kvm_arch_sync_events() now that x86 no longer uses it (no other
arch has ever used it).
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20250224235542.2562848-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pinctrl is an essential driver for SpacemiT's SoC,
The uart driver requires it, same as sd card driver,
so let's enable it by default for this SoC.
The CONFIG_PINCTRL_SPACEMIT_K1 isn't enabled when using
'make defconfig' to select kernel configuration options.
This result in a broken uart driver where fail at probe()
stage due to no pins found.
Fixes: a83c29e1d1 ("pinctrl: spacemit: add support for SpacemiT K1 SoC")
Reported-by: Alex Elder <elder@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Alex Elder <elder@riscstar.com>
Signed-off-by: Yixun Lan <dlan@gentoo.org>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/20250218-k1-pinctrl-option-v3-1-36e031e0da1b@gentoo.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
x86 version of arch_memremap_wb() needs the flags to decide if the mapping
has to be encrypted or decrypted.
Pass down the flag to arch_memremap_wb(). All current implementations
ignore the argument.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20250217163822.343400-2-kirill.shutemov@linux.intel.com
Remove the unnecessary kick to the vCPU after writing to the vs_file
of IMSIC in kvm_riscv_vcpu_aia_imsic_inject.
For vCPUs that are running, writing to the vs_file directly forwards
the interrupt as an MSI to them and does not need an extra kick.
For vCPUs that are descheduled after emulating WFI, KVM will enable
the guest external interrupt for that vCPU in
kvm_riscv_aia_wakeon_hgei. This means that writing to the vs_file
will cause a guest external interrupt, which will cause KVM to wake
up the vCPU in hgei_interrupt to handle the interrupt properly.
Signed-off-by: BillXiang <xiangwencheng@lanxincomputing.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Link: https://lore.kernel.org/r/20250221104538.2147-1-xiangwencheng@lanxincomputing.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The generic storage implementation provides the same features as the
custom one. However it can be shared between architectures, making
maintenance easier.
Co-developed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-9-13a4669dfc8c@linutronix.de
As the Makefile is included into other Makefiles it can not be used to
define objects to be built from the current source directory.
However the generic datastore will introduce such a local source file.
Rename the included Makefile so it is clear how it is to be used and to
make room for a regular Makefile in lib/vdso/.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-4-13a4669dfc8c@linutronix.de
Enable CONFIG_GENERIC_PENDING_IRQ for RISC-V so that RISC-V interrupt chips
can support delayed interrupt mirgration in interrupt context.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-7-apatel@ventanamicro.com
One of four USB-A ports on the Pine64 Star64 is USB 3.0 which requires to
disable PCIE0 and change the mode of PCIE0 PHY to USB3.0 operation. The
remaining three USB-A ports are USB 2.0 with the USB0 PHY and do not
conflict with any of PCIE0 or PCIE1. PCIE1 (1-lane) routes to a PCIe X4
connector.
Signed-off-by: E Shattow <e@freeshell.de>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
StarFive JH7110 contains a Cadence USB2.0+USB3.0 controller IP block that
may exclusively use pciephy0 for USB3.0 connectivity. Add the register
offsets for the driver to enable/disable USB3.0 on pciephy0.
Signed-off-by: E Shattow <e@freeshell.de>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Starfive Soc common defines GPIO28 as pcie1 reset, GPIO21 as pcie1 wakeup;
But the FML13V01 board uses GPIO21 as pcie1 reset, GPIO28 as pcie1 wakeup;
redefine pcie1 gpio and enable pcie1 for pcie based Wi-Fi.
Signed-off-by: Sandie Cao <sandie.cao@deepcomputing.io>
Tested-by: Maud Spierings <maud_spierings@hotmail.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
The jh7110 boards do not have a Rohm DAC on them as far as I
can tell, and they certainly do not have a dh2228fv, as this device does
not actually exist! Remove the dac nodes from the devicetrees as it is
not acceptable to pretend to have a device on a board in order to bind
the spidev driver in Linux.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
hrtimer_setup() takes the callback function pointer as argument and
initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of
hrtimer::function with the new setup mechanism.
Patch was created by using Coccinelle.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/d5ededf778f59f2fc38ff4276fb7f4c893e4142c.1738746821.git.namcao@linutronix.de
Add initial support for the Milk-V Jupiter board [1], which is a Mini ITX
computer based on the SpacemiT K1/M1 Octa-Core X60 64-bit RISC-V SoC [2].
There are two variant for this board, one using the K1 chip and another
using the M1 chip. The main difference is that the M1 can run at a higher
frequency than the K1, thanks to its packaging.
For now, only a DTS for the K1 variant is added since there isn't support
yet for the X60 cores operating performance and thermal trip points.
The support is minimal, but at least allows to boot into a serial console.
Link: https://milkv.io/jupiter [1]
Link: https://www.spacemit.com/en/key-stone-k1 [2]
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20250214151700.666544-3-javierm@redhat.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
The spec says sleep_type is 32 bits wide and "In case the data is
defined as 32bit wide, higher privilege software must ensure that it
only uses 32 bit data." Mask off upper bits of sleep_type before
using it.
Fixes: 023c15151f ("RISC-V: KVM: Add SBI system suspend support")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250217084506.18763-12-ajones@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
When an invalid function ID of an SBI extension is used we should
return not-supported, not invalid-param.
Fixes: 5f862df558 ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250217084506.18763-11-ajones@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
When an invalid function ID of an SBI extension is used we should
return not-supported, not invalid-param. Also, when we see that at
least one hartid constructed from the base and mask parameters is
invalid, then we should return invalid-param. Finally, rather than
relying on overflowing a left shift to result in zero and then using
that zero in a condition which [correctly] skips sending an IPI (but
loops unnecessarily), explicitly check for overflow and exit the loop
immediately.
Fixes: 5f862df558 ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250217084506.18763-10-ajones@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The spec says suspend_type is 32 bits wide and "In case the data is
defined as 32bit wide, higher privilege software must ensure that it
only uses 32 bit data." Mask off upper bits of suspend_type before
using it.
Fixes: 763c8bed8c ("RISC-V: KVM: Implement SBI HSM suspend call")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250217084506.18763-9-ajones@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
"Not stopped" means started or suspended so we need to check for
a single state in order to have a chance to check for each state.
Also, we need to use target_vcpu when checking for the suspend
state.
Fixes: 763c8bed8c ("RISC-V: KVM: Implement SBI HSM suspend call")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250217084506.18763-8-ajones@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The init_rt_signal_env() funciton is called before the alternative patch
is applied, so using the alternative-related API to check the availability
of an extension within this function doesn't have the intended effect.
This patch reorders the init_rt_signal_env() and apply_boot_alternatives()
to get the correct signal_minsigstksz.
Fixes: e92f469b07 ("riscv: signal: Report signal frame size to userspace via auxv")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241220083926.19453-3-yongxuan.wang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The signal context of certain RISC-V extensions will be appended after
struct __riscv_extra_ext_header, which already includes an empty context
header. Therefore, there is no need to preserve a separate hdr for the
END of signal context.
Fixes: 8ee0b41898 ("riscv: signal: Add sigcontext save/restore for vector")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Andy Chiu <AndybnAC@gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241220083926.19453-2-yongxuan.wang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Make sure the compare value in the lr/sc loop is sign extended to match
what lr.w does. Fortunately, due to the compiler keeping the register
contents sign extended anyway the lack of the explicit extension didn't
result in wrong code so far, but this cannot be relied upon.
Fixes: b90edb3301 ("RISC-V: Add futex support.")
Signed-off-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/mvmfrkv2vhz.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Sign extend also an unsigned compare value to match what lr.w is doing.
Otherwise try_cmpxchg may spuriously return true when used on a u32 value
that has the sign bit set, as it happens often in inode_set_ctime_current.
Do this in three conversion steps. The first conversion to long is needed
to avoid a -Wpointer-to-int-cast warning when arch_cmpxchg is used with a
pointer type. Then convert to int and back to long to always sign extend
the 32-bit value to 64-bit.
Fixes: 6c58f25e69 ("riscv/atomic: Fix sign extension for RV64I")
Signed-off-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Xi Ruoyao <xry111@xry111.site>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/mvmed0k4prh.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Comparison of bitmaps should be done using bitmap_equal(), not memcmp(),
use the former one to compare isa bitmaps.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Fixes: 625034abd5 ("riscv: add ISA extensions validation callback")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250210155615.1545738-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The use of of_property_read_bool() for non-boolean properties is
deprecated in favor of of_property_present() when testing for property
presence.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Cc: stable@vger.kernel.org
Fixes: 76d2a0493a ("RISC-V: Init and Halt Code")
Link: https://lore.kernel.org/r/20241104190314.270095-1-robh@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently, there are 3 regulators defined in JH7110's common device tree,
but regulator names are mixed with "-" and "_". So unify them to "_",
which is more often to be seen in other dts files.
Signed-off-by: Shengyu Qu <wiagn233@outlook.com>
Acked-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Following the standardization on crc32c() as the lib entry point for the
Castagnoli CRC32 instead of the previous mix of crc32c(), crc32c_le(),
and __crc32c_le(), make the same change to the underlying base and arch
functions that implement it.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250208024911.14936-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Drop the use of __pure and __attribute_const__ from the CRC32 library
functions that had them. Both of these are unusual optimizations that
don't help properly written code. They seem more likely to cause
problems than have any real benefit.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250208024911.14936-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
The existing PolarFire SoC devicetrees all use root port instance 1,
update the reg properties in PCIe nodes to use the new format that
specifies the instance in use. Failing to do so would still work but
produces warnings:
mpfs-icicle-kit.dtb: pcie@3000000000: reg: [[48, 0, 0, 134217728], [0, 1124073472, 0, 65536]] is too short
mpfs-icicle-kit.dtb: pcie@3000000000: reg-names: ['cfg', 'apb'] is too short
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
---
CC: Conor Dooley <conor@kernel.org>
CC: Daire McNamara <daire.mcnamara@microchip.com>
CC: valentina.fernandezalanis@microchip.com
CC: Rob Herring <robh@kernel.org>
CC: Krzysztof Kozlowski <krzk+dt@kernel.org>
CC: linux-riscv@lists.infradead.org
CC: devicetree@vger.kernel.org
CC: linux-kernel@vger.kernel.org
* The PH1520 pinctrl and dwmac drivers are enabeled in defconfig.
* A redundant AQRL barrier has been removed from the futex cmpxchg
implementation.
* Support for the T-Head vector extensions, which includes exposing
these extensions to userspace on systems that implement them.
* Some more page table information is now printed on die() and systems
that cause PA overflows.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmedHIoTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYievXD/4hdt8h+fMM0I9mmJS096YevRJONdfe
Wk7D5q4PBwSHISHahuzfphieBhqPVnYkkEd7Vw6xRrLbUnhA41Fe0uvR52dx5UZd
3LwrDV/kjGTD59x6A2Zo9bSs/qPKJ2WHmHwHM21jY5tvcIB2Lo4dF8HT63OrwVNW
DxsujLO0jUw+HEwXPsfmUAZJWOPZuUnatl/9CaLMLwQv5N7yiMuz5oYDzJXTLnNh
m3Hv3CCtj1EeQPqDoWzz9nZvmAKOwcblSzz6OAy+xrRk1N0N3QFQPbIaRvkI9OVz
+wPHQiyx4KZNeAe0csV0uLQRIiXZV8rkCz5UT65s3Bfy3vukvzz+1VBdNnCqiP8Q
RpCTcYw62Cr6BWnvyTh+s9bhHb1ijG043nXd/Ty7ZRPCNLKHY6oL1CZ0pgqbTwPs
D2U2ZTZFTc35mPrU6QMfbTiUVWCU2XagFhI27Dgj3xh9mkBOQCHwk2Mrzn7uS4iz
xGNnrjRnKtuwBrvD68JzxCkEi8INFn2ifbVr44VZrOdTM7XtODGAYrBohQtV62kU
2L+q8DoHYis+0xFbR1wdrY1mRZoe45boUFgwnOpmoBr9ULe584sL+526y7IkkEHu
/9hmLPtLg7nyoR/rO1j1Sfg4Eqdwg5HY1TKNfagJZAdu23EDRwrcW1PD0P6vtDv8
j4og8MmL7dTt3A==
=HbAQ
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- The PH1520 pinctrl and dwmac drivers are enabeled in defconfig
- A redundant AQRL barrier has been removed from the futex cmpxchg
implementation
- Support for the T-Head vector extensions, which includes exposing
these extensions to userspace on systems that implement them
- Some more page table information is now printed on die() and systems
that cause PA overflows
* tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: add a warning when physical memory address overflows
riscv/mm/fault: add show_pte() before die()
riscv: Add ghostwrite vulnerability
selftests: riscv: Support xtheadvector in vector tests
selftests: riscv: Fix vector tests
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
riscv: hwprobe: Add thead vendor extension probing
riscv: vector: Support xtheadvector save/restore
riscv: Add xtheadvector instruction definitions
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
RISC-V: define the elements of the VCSR vector CSR
riscv: vector: Use vlenb from DT for thead
riscv: Add thead and xtheadvector as a vendor extension
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
dt-bindings: cpus: add a thead vlen register length property
dt-bindings: riscv: Add xtheadvector ISA extension description
RISC-V: Mark riscv_v_init() as __init
riscv: defconfig: drop RT_GROUP_SCHED=y
riscv/futex: Optimize atomic cmpxchg
riscv: defconfig: enable pinctrl and dwmac support for TH1520
- Support multiple hook locations for maint scripts of Debian package
- Remove 'cpio' from the build tool requirement
- Introduce gendwarfksyms tool, which computes CRCs for export symbols
based on the DWARF information
- Support CONFIG_MODVERSIONS for Rust
- Resolve all conflicts in the genksyms parser
- Fix several syntax errors in genksyms
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmedJ7AVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGBs0P/1svrWxVRdD7XFtM+UykQqf2R/be
SnZDeeviqdbGr1J0X5FM/pjaeOb1BPnXsr6O67QaWhyVnpWUviBwwYG2NrCDsx9k
AaMVsROzpDIeoMhMNeYVs+/SYhA8Jndnd4BOH5CBbo52k4/BLqhU9LXLncLgjZbF
nys30TilwOaylKq2FHFVI1GhTCQtKKbq+tIxE1SKkHZ1LsSaFphe/eCHMpcOvQb+
zIFHMm2eIcSlS8TCONFeTnw7NdeCVldYbPCyNAV+nC7Ow7VM8Ws1fq5vsKeH3TGE
+3qtgQS41KpccNRdp4cTvy6p9iBEvpAvk1BAAOZ347EtLXrNNhngg65CbjyCUt7H
yBpgWZ6GAGq2yExX5bHbbFHb/n4I3HhkZKeaFDZ3VnMPnni4zdbBqD+sBSI3yHLC
LUh1NI8gIHjD4bIazbjxWMAQlamQNVNMaHkHqGjro2yIbLL1i2mqMdXuzYhAVwrx
al7hv357fVPwQ1Gfin7R23T4/NqBTB+1IJnYTBYnnAFnUIAdhsuu9YkX8I97i6yu
mdTrGROpEYL0GZTE+1LGz6V9DZfQHP8GZ5fDAU1X/f8Js9xkn1lHFhFCg/xM5f9j
RnwHdiRbvq6uL40/zkinlcpPnGWJkAXWgZqlc0ZbwW8v/vKI51Uo0qUxlVSeu2uH
mQG30SibArJFPpUN
=D553
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support multiple hook locations for maint scripts of Debian package
- Remove 'cpio' from the build tool requirement
- Introduce gendwarfksyms tool, which computes CRCs for export symbols
based on the DWARF information
- Support CONFIG_MODVERSIONS for Rust
- Resolve all conflicts in the genksyms parser
- Fix several syntax errors in genksyms
* tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (64 commits)
kbuild: fix Clang LTO with CONFIG_OBJTOOL=n
kbuild: Strip runtime const RELA sections correctly
kconfig: fix memory leak in sym_warn_unmet_dep()
kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST
genksyms: fix syntax error for attribute before init-declarator
genksyms: fix syntax error for builtin (u)int*x*_t types
genksyms: fix syntax error for attribute after 'union'
genksyms: fix syntax error for attribute after 'struct'
genksyms: fix syntax error for attribute after abstact_declarator
genksyms: fix syntax error for attribute before nested_declarator
genksyms: fix syntax error for attribute before abstract_declarator
genksyms: decouple ATTRIBUTE_PHRASE from type-qualifier
genksyms: record attributes consistently for init-declarator
genksyms: restrict direct-declarator to take one parameter-type-list
genksyms: restrict direct-abstract-declarator to take one parameter-type-list
genksyms: remove Makefile hack
genksyms: fix last 3 shift/reduce conflicts
genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflicts
genksyms: reduce type_qualifier directly to decl_specifier
genksyms: rename cvar_qualifier to type_qualifier
...
Due to the fact that runtime const ELF sections are named without a
leading period or double underscore, the RSTRIP logic that removes the
static RELA sections from vmlinux fails to identify them. This results
in a situation like below, where some sections that were supposed to get
removed are left behind.
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[58] runtime_shift_d_hash_shift PROGBITS ffffffff83500f50 2900f50 000014 00 A 0 0 1
[59] .relaruntime_shift_d_hash_shift RELA 0000000000000000 55b6f00 000078 18 I 70 58 8
[60] runtime_ptr_dentry_hashtable PROGBITS ffffffff83500f68 2900f68 000014 00 A 0 0 1
[61] .relaruntime_ptr_dentry_hashtable RELA 0000000000000000 55b6f78 000078 18 I 70 60 8
[62] runtime_ptr_USER_PTR_MAX PROGBITS ffffffff83500f80 2900f80 000238 00 A 0 0 1
[63] .relaruntime_ptr_USER_PTR_MAX RELA 0000000000000000 55b6ff0 000d50 18 I 70 62 8
So tweak the match expression to strip all sections starting with .rel.
While at it, consolidate the logic used by RISC-V, s390 and x86 into a
single shared Makefile library command.
Link: https://lore.kernel.org/all/CAHk-=wjk3ynjomNvFN8jf9A1k=qSc=JFF591W00uXj-qqNUxPQ@mail.gmail.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The part of physical memory that exceeds the size of the linear mapping
will be discarded. When the system starts up normally, a warning message
will be printed to prevent confusion caused by the mismatch between the
system memory and the actual physical memory.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240814062625.19794-1-cuiyunhui@bytedance.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25c ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes the
page allocator so we end up with the ability to allocate and free
zero-refcount pages. So that callers (ie, slab) can avoid a refcount
inc & dec.
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use
large folios other than PMD-sized ones.
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and
fixes for this small built-in kernel selftest.
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part of
the mapletree code.
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups.
- "simplify split calculation" from Wei Yang provides a few fixes and a
test for the mapletree code.
- "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes
continues the work of moving vma-related code into the (relatively) new
mm/vma.c.
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the page
allocator.
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue. It
should reduce the amount of unnecessary reading.
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated
(https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/).
Qi's series addresses this windup by synchronously freeing PTE memory
within the context of madvise(MADV_DONTNEED).
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests code
when optional compiler warnings are enabled.
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from David
Hildenbrand tightens the allocator's observance of __GFP_HARDWALL.
- "pkeys kselftests improvements" from Kevin Brodsky implements various
fixes and cleanups in the MM selftests code, mainly pertaining to the
pkeys tests.
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size.
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic.
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based
kernel build was demonstrated.
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of zram_write_page().
A watchdog softlockup warning was eliminated.
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky
cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed.
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging logic.
- "Account page tables at all levels" from Kevin Brodsky cleans up and
regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy.
- "mm/damon: replace most damon_callback usages in sysfs with new core
functions" from SeongJae Park cleans up and generalizes DAMON's sysfs
file interface logic.
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is presented in
response to DAMOS actions.
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes
DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs
is completed.
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter
Xu cleans up and generalizes the hugetlb reservation accounting.
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface.
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting), but
also inclusion (allowing) behavior.
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
"introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to reduce
the size of struct page and to enable dynamic allocation of memory
descriptors."
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes and
simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel build
time with swap-on-zram.
- "mm: update mips to use do_mmap(), make mmap_region() internal" from
Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal.
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU
regressions and otherwise improves MGLRU performance.
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park
updates DAMON documentation.
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing.
- "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand
provides various cleanups in the areas of hugetlb folios, THP folios and
migration.
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache
reading and writing. To permite userspace to address issues with
massive buildup of useless pagecache when reading/writing fast devices.
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5a+cwAKCRDdBJ7gKXxA
jtoyAP9R58oaOKPJuTizEKKXvh/RpMyD6sYcz/uPpnf+cKTZxQEAqfVznfWlw/Lz
uC3KRZYhmd5YrxU4o+qjbzp9XWX/xAE=
=Ib2s
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes
the page allocator so we end up with the ability to allocate and
free zero-refcount pages. So that callers (ie, slab) can avoid a
refcount inc & dec
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
use large folios other than PMD-sized ones
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
and fixes for this small built-in kernel selftest
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part
of the mapletree code
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups
- "simplify split calculation" from Wei Yang provides a few fixes and
a test for the mapletree code
- "mm/vma: make more mmap logic userland testable" from Lorenzo
Stoakes continues the work of moving vma-related code into the
(relatively) new mm/vma.c
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the
page allocator
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue.
It should reduce the amount of unnecessary reading
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated:
https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
Qi's series addresses this windup by synchronously freeing PTE
memory within the context of madvise(MADV_DONTNEED)
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests
code when optional compiler warnings are enabled
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from
David Hildenbrand tightens the allocator's observance of
__GFP_HARDWALL
- "pkeys kselftests improvements" from Kevin Brodsky implements
various fixes and cleanups in the MM selftests code, mainly
pertaining to the pkeys tests
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a
tmpfs-based kernel build was demonstrated
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of
zram_write_page(). A watchdog softlockup warning was eliminated
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
Brodsky cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging
logic
- "Account page tables at all levels" from Kevin Brodsky cleans up
and regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy
- "mm/damon: replace most damon_callback usages in sysfs with new
core functions" from SeongJae Park cleans up and generalizes
DAMON's sysfs file interface logic
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is
presented in response to DAMOS actions
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park
removes DAMON's long-deprecated debugfs interfaces. Thus the
migration to sysfs is completed
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
Peter Xu cleans up and generalizes the hugetlb reservation
accounting
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting),
but also inclusion (allowing) behavior
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to
reduce the size of struct page and to enable dynamic allocation of
memory descriptors
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes
and simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel
build time with swap-on-zram
- "mm: update mips to use do_mmap(), make mmap_region() internal"
from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few
MGLRU regressions and otherwise improves MGLRU performance
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
Park updates DAMON documentation
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing
- "mm: hugetlb+THP folio and migration cleanups" from David
Hildenbrand provides various cleanups in the areas of hugetlb
folios, THP folios and migration
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for
pagecache reading and writing. To permite userspace to address
issues with massive buildup of useless pagecache when
reading/writing fast devices
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests"
* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm/compaction: fix UBSAN shift-out-of-bounds warning
s390/mm: add missing ctor/dtor on page table upgrade
kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
tools: add VM_WARN_ON_VMG definition
mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
seqlock: add missing parameter documentation for raw_seqcount_try_begin()
mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
mm/page_alloc: remove the incorrect and misleading comment
zram: remove zcomp_stream_put() from write_incompressible_page()
mm: separate move/undo parts from migrate_pages_batch()
mm/kfence: use str_write_read() helper in get_access_type()
selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
selftests/mm: vm_util: split up /proc/self/smaps parsing
selftests/mm: virtual_address_range: unmap chunks after validation
selftests/mm: virtual_address_range: mmap() without PROT_WRITE
selftests/memfd/memfd_test: fix possible NULL pointer dereference
mm: add FGP_DONTCACHE folio creation flag
mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
...
Hi Linus,
Please pull bitmap patches for v6.14.
This includes const_true() series from Vincent Mailhol, another
__always_inline rework from Nathan Chancellor for RISCV, and a
couple random fixes from Dr. David Alan Gilbert and I Hsin Cheng.
Thanks,
Yury
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmeUMpEACgkQsUSA/Tof
vsiqEAv/VvtTD6I3Ms3kIl2G1pBP/EFcYQGbwS1PCsX3RX16rinZ2XUDRtjvRy1Y
FA+OsZ2yPtH8G1WRM8YauZsh2cZSCA4xTFadLSZkT8leSWERKaTyJI+PXe2A43IU
d+FV4zYH5JYqV2u9aLHWMO8Voq9nNHZXOYHRu0q53TBFn7V294Lma9oDlK3Wjfur
vSZZU9SKKlMV8Oy6/hZ3tDemUDM1jAGlqrxFb8aXRsTsCpsmlqE1bQdZ+AadjevZ
cVeplB8OCCnqcYV28szIwsJpSzmd5/WBP6jLpeMgBYFGS0JT2USdZ8gsw+Yq/On5
hjxek3cHBKdv0CINk0Ejf4aV0IvoX5S/VRlTjhttzyX68no1DoibDuWJWB42PRWS
frllVOmdkm2DqA0G9mgxtwzBl5UqMFVe5LuVU9E9BZZeDmRZmS3obrUiMzpiqUOs
zkdDvA0uaKgjx2qZADDEFqg1+XdX0A0iPebEv9vLaULXv0+D/PbkClNqIf8p7778
2GWuBLJe
=1hW6
-----END PGP SIGNATURE-----
Merge tag 'bitmap-for-6.14' of https://github.com:/norov/linux
Pull bitmap updates from Yury Norov:
"This includes const_true() series from Vincent Mailhol, another
__always_inline rework from Nathan Chancellor for RISCV, and a couple
of random fixes from Dr. David Alan Gilbert and I Hsin Cheng"
* tag 'bitmap-for-6.14' of https://github.com:/norov/linux:
cpumask: Rephrase comments for cpumask_any*() APIs
cpu: Remove unused init_cpu_online
riscv: Always inline bitops
linux/bits.h: simplify GENMASK_INPUT_CHECK()
compiler.h: add const_true()
Before SLUB initialization, various subsystems used memblock_alloc to
allocate memory. In most cases, when memory allocation fails, an
immediate panic is required. To simplify this behavior and reduce
repetitive checks, introduce `memblock_alloc_or_panic`. This function
ensures that memory allocation failures result in a panic automatically,
improving code readability and consistency across subsystems that require
this behavior.
[guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic]
Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com
Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/
Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com
Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We already have a generic implementation of alloc/free up to P4D level, as
well as pgd_free(). Let's finish the work and add a generic PGD-level
alloc helper as well.
Unlike at lower levels, almost all architectures need some specific magic
at PGD level (typically initialising PGD entries), so introducing a
generic pgd_alloc() isn't worth it. Instead we introduce two new helpers,
__pgd_alloc() and __pgd_free(), and make use of them in the arch-specific
pgd_alloc() and pgd_free() wherever possible. To accommodate as many arch
as possible, __pgd_alloc() takes a page allocation order.
Because pagetable_alloc() allocates zeroed pages, explicit zeroing in
pgd_alloc() becomes redundant and we can get rid of it. Some trivial
implementations of pgd_free() also become unnecessary once __pgd_alloc()
is used; remove them.
Another small improvement is consistent accounting of PGD pages by using
GFP_PGTABLE_{USER,KERNEL} as appropriate.
Not all PGD allocations can be handled by the generic helpers. In
particular, multiple architectures allocate PGDs from a kmem_cache, and
those PGDs may not be page-sized.
Link: https://lkml.kernel.org/r/20250103184415.2744423-6-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Several architectures (arm, arm64, riscv and x86) define exactly the same
__tlb_remove_table(), just introduce generic __tlb_remove_table() to
eliminate these duplications.
The s390 __tlb_remove_table() is nearly the same, so also make s390
__tlb_remove_table() version generic.
Link: https://lkml.kernel.org/r/ea372633d94f4d3f9f56a7ec5994bf050bf77e39.1736317725.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc]
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Acked-by: Arnd Bergmann <arnd@arndb.de> [asm-generic]
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move pagetable_dtor() to __tlb_remove_table(), so that ptlock and page
table pages can be freed together (regardless of whether RCU is used).
This prevents the use-after-free problem where the ptlock is freed
immediately but the page table pages is freed later via RCU.
Page tables shouldn't have swap cache, so use pagetable_free() instead of
free_page_and_swap_cache() to free page table pages.
By the way, move the comment above __tlb_remove_table() to
riscv_tlb_remove_ptdesc(), it will be more appropriate.
Link: https://lkml.kernel.org/r/b89d77c965507b1b102cbabe988e69365cb288b6.1736317725.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The pagetable_p*_dtor() are exactly the same except for the handling of
ptlock. If we make ptlock_free() handle the case where ptdesc->ptl is
NULL and remove VM_BUG_ON_PAGE() from pmd_ptlock_free(), we can unify
pagetable_p*_dtor() into one function. Let's introduce pagetable_dtor()
to do this.
Later, pagetable_dtor() will be moved to tlb_remove_ptdesc(), so that
ptlock and page table pages can be freed together (regardless of whether
RCU is used). This prevents the use-after-free problem where the ptlock
is freed immediately but the page table pages is freed later via RCU.
Link: https://lkml.kernel.org/r/47f44fff9dc68d9d9e9a0d6c036df275f820598a.1736317725.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Four architectures currently implement 5-level pgtables: arm64, riscv, x86
and s390. The first three have essentially the same implementation for
p4d_alloc_one() and p4d_free(), so we've got an opportunity to reduce
duplication like at the lower levels.
Provide a generic version of p4d_alloc_one() and p4d_free(), and make use
of it on those architectures.
Their implementation is the same as at PUD level, except that p4d_free()
performs a runtime check by calling mm_p4d_folded(). 5-level pgtables
depend on a runtime-detected hardware feature on all supported
architectures, so we might as well include this check in the generic
implementation. No runtime check is required in p4d_alloc_one() as the
top-level p4d_alloc() already does the required check.
Link: https://lkml.kernel.org/r/26d69c74a29183ecc335b9b407040d8e4cd70c6a.1736317725.git.zhengqi.arch@bytedance.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de> [asm-generic]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "move pagetable_*_dtor() to __tlb_remove_table()", v5.
As proposed [1] by Peter Zijlstra below, this patch series aims to move
pagetable_*_dtor() into __tlb_remove_table(). This will cleanup
pagetable_*_dtor() a bit and more gracefully fix the UAF issue [2]
reported by syzbot.
: Notably:
:
: - s390 pud isn't calling the existing pagetable_pud_[cd]tor()
: - none of the p4d things have pagetable_p4d_[cd]tor() (x86,arm64,s390,riscv)
: and they have inconsistent accounting
: - while much of the _ctor calls are in generic code, many of the _dtor
: calls are in arch code for hysterial raisins, this could easily be
: fixed
: - if we fix ptlock_free() to handle NULL, then all the _dtor()
: functions can use it, and we can observe they're all identical
: and can be folded
:
: after all that cleanup, you can move the _dtor from *_free_tlb() into
: tlb_remove_table() -- which for the above case, would then have it called
: from __tlb_remove_table_free().
This patch (of 16):
{pmd,pud,p4d}_alloc_one() is never called if the corresponding page table
level is folded, as {pmd,pud,p4d}_alloc() already does the required check.
We can therefore remove the runtime page table level checks in
{pud,p4d}_alloc_one. The PUD helper becomes equivalent to the generic
version, so we remove it altogether.
This is consistent with the way arm64 and x86 handle this situation
(runtime check in p4d_free() only).
Link: https://lkml.kernel.org/r/cover.1736317725.git.zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/93a1c6bddc0ded9f1a9f15658c1e4af5c93d1194.1736317725.git.zhengqi.arch@bytedance.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Clear LLBCTL if secondary mmu mapping changes.
* Add hypercall service support for usermode VMM.
x86:
* Add a comment to kvm_mmu_do_page_fault() to explain why KVM performs a
direct call to kvm_tdp_page_fault() when RETPOLINE is enabled.
* Ensure that all SEV code is compiled out when disabled in Kconfig, even
if building with less brilliant compilers.
* Remove a redundant TLB flush on AMD processors when guest CR4.PGE changes.
* Use str_enabled_disabled() to replace open coded strings.
* Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's APICv cache
prior to every VM-Enter.
* Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities
instead of just those where KVM needs to manage state and/or explicitly
enable the feature in hardware. Along the way, refactor the code to make
it easier to add features, and to make it more self-documenting how KVM
is handling each feature.
* Rework KVM's handling of VM-Exits during event vectoring; this plugs holes
where KVM unintentionally puts the vCPU into infinite loops in some scenarios
(e.g. if emulation is triggered by the exit), and brings parity between VMX
and SVM.
* Add pending request and interrupt injection information to the kvm_exit and
kvm_entry tracepoints respectively.
* Fix a relatively benign flaw where KVM would end up redoing RDPKRU when
loading guest/host PKRU, due to a refactoring of the kernel helpers that
didn't account for KVM's pre-checking of the need to do WRPKRU.
* Make the completion of hypercalls go through the complete_hypercall
function pointer argument, no matter if the hypercall exits to
userspace or not. Previously, the code assumed that KVM_HC_MAP_GPA_RANGE
specifically went to userspace, and all the others did not; the new code
need not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at
all whether there was an exit to userspace or not.
* As part of enabling TDX virtual machines, support support separation of
private/shared EPT into separate roots. When TDX will be enabled, operations
on private pages will need to go through the privileged TDX Module via SEAMCALLs;
as a result, they are limited and relatively slow compared to reading a PTE.
The patches included in 6.14 allow KVM to keep a mirror of the private EPT in
host memory, and define entries in kvm_x86_ops to operate on external page
tables such as the TDX private EPT.
* The recently introduced conversion of the NX-page reclamation kthread to
vhost_task moved the task under the main process. The task is created as
soon as KVM_CREATE_VM was invoked and this, of course, broke userspace that
didn't expect to see any child task of the VM process until it started
creating its own userspace threads. In particular crosvm refuses to fork()
if procfs shows any child task, so unbreak it by creating the task lazily.
This is arguably a userspace bug, as there can be other kinds of legitimate
worker tasks and they wouldn't impede fork(); but it's not like userspace
has a way to distinguish kernel worker tasks right now. Should they show
as "Kthread: 1" in proc/.../status?
x86 - Intel:
* Fix a bug where KVM updates hardware's APICv cache of the highest ISR bit
while L2 is active, while ultimately results in a hardware-accelerated L1
EOI effectively being lost.
* Honor event priority when emulating Posted Interrupt delivery during nested
VM-Enter by queueing KVM_REQ_EVENT instead of immediately handling the
interrupt.
* Rework KVM's processing of the Page-Modification Logging buffer to reap
entries in the same order they were created, i.e. to mark gfns dirty in the
same order that hardware marked the page/PTE dirty.
* Misc cleanups.
Generic:
* Cleanup and harden kvm_set_memory_region(); add proper lockdep assertions when
setting memory regions and add a dedicated API for setting KVM-internal
memory regions. The API can then explicitly disallow all flags for
KVM-internal memory regions.
* Explicitly verify the target vCPU is online in kvm_get_vcpu() to fix a bug
where KVM would return a pointer to a vCPU prior to it being fully online,
and give kvm_for_each_vcpu() similar treatment to fix a similar flaw.
* Wait for a vCPU to come online prior to executing a vCPU ioctl, to fix a
bug where userspace could coerce KVM into handling the ioctl on a vCPU that
isn't yet onlined.
* Gracefully handle xarray insertion failures; even though such failures are
impossible in practice after xa_reserve(), reserving an entry is always followed
by xa_store() which does not know (or differentiate) whether there was an
xa_reserve() before or not.
RISC-V:
* Zabha, Svvptc, and Ziccrse extension support for guests. None of them
require anything in KVM except for detecting them and marking them
as supported; Zabha adds byte and halfword atomic operations, while the
others are markers for specific operation of the TLB and of LL/SC
instructions respectively.
* Virtualize SBI system suspend extension for Guest/VM
* Support firmware counters which can be used by the guests to collect
statistics about traps that occur in the host.
Selftests:
* Rework vcpu_get_reg() to return a value instead of using an out-param, and
update all affected arch code accordingly.
* Convert the max_guest_memory_test into a more generic mmu_stress_test.
The basic gist of the "conversion" is to have the test do mprotect() on
guest memory while vCPUs are accessing said memory, e.g. to verify KVM
and mmu_notifiers are working as intended.
* Play nice with treewrite builds of unsupported architectures, e.g. arm
(32-bit), as KVM selftests' Makefile doesn't do anything to ensure the
target architecture is actually one KVM selftests supports.
* Use the kernel's $(ARCH) definition instead of the target triple for arch
specific directories, e.g. arm64 instead of aarch64, mainly so as not to
be different from the rest of the kernel.
* Ensure that format strings for logging statements are checked by the
compiler even when the logging statement itself is disabled.
* Attempt to whack the last LLC references/misses mole in the Intel PMU
counters test by adding a data load and doing CLFLUSH{OPT} on the data
instead of the code being executed. It seems that modern Intel CPUs
have learned new code prefetching tricks that bypass the PMU counters.
* Fix a flaw in the Intel PMU counters test where it asserts that events
are counting correctly without actually knowing what the events count
given the underlying hardware; this can happen if Intel reuses a
formerly microarchitecture-specific event encoding as an architectural
event, as was the case for Top-Down Slots.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmeTuzoUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOkBwf8CRNExYaM3j9y2E7mmo6AiL2ug6+J
Uy5Hai1poY48pPwKC6ke3EWT8WVsgj/Py5pCeHvLojQchWNjCCYNfSQluJdkRxwG
DgP3QUljSxEJWBeSwyTRcKM+IySi5hZd1IFo3gePFRB829Jpnj05vjbvCyv8gIwU
y3HXxSYDsViaaFoNg4OlZFsIGis7mtknsZzk++QjuCXmxNa6UCbv3qvE/UkVLhVg
WH65RTRdjk+EsdwaOMHKuUvQoGa+iM4o39b6bqmw8+ZMK39+y33WeTX/y5RXsp1N
tUUBRfS+MuuYgC/6LmTr66EkMzoChxk3Dp3kKUaCBcfqRC8PxQag5reZhw==
=NEaO
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"Loongarch:
- Clear LLBCTL if secondary mmu mapping changes
- Add hypercall service support for usermode VMM
x86:
- Add a comment to kvm_mmu_do_page_fault() to explain why KVM
performs a direct call to kvm_tdp_page_fault() when RETPOLINE is
enabled
- Ensure that all SEV code is compiled out when disabled in Kconfig,
even if building with less brilliant compilers
- Remove a redundant TLB flush on AMD processors when guest CR4.PGE
changes
- Use str_enabled_disabled() to replace open coded strings
- Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's
APICv cache prior to every VM-Enter
- Overhaul KVM's CPUID feature infrastructure to track all vCPU
capabilities instead of just those where KVM needs to manage state
and/or explicitly enable the feature in hardware. Along the way,
refactor the code to make it easier to add features, and to make it
more self-documenting how KVM is handling each feature
- Rework KVM's handling of VM-Exits during event vectoring; this
plugs holes where KVM unintentionally puts the vCPU into infinite
loops in some scenarios (e.g. if emulation is triggered by the
exit), and brings parity between VMX and SVM
- Add pending request and interrupt injection information to the
kvm_exit and kvm_entry tracepoints respectively
- Fix a relatively benign flaw where KVM would end up redoing RDPKRU
when loading guest/host PKRU, due to a refactoring of the kernel
helpers that didn't account for KVM's pre-checking of the need to
do WRPKRU
- Make the completion of hypercalls go through the complete_hypercall
function pointer argument, no matter if the hypercall exits to
userspace or not.
Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically
went to userspace, and all the others did not; the new code need
not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at
all whether there was an exit to userspace or not
- As part of enabling TDX virtual machines, support support
separation of private/shared EPT into separate roots.
When TDX will be enabled, operations on private pages will need to
go through the privileged TDX Module via SEAMCALLs; as a result,
they are limited and relatively slow compared to reading a PTE.
The patches included in 6.14 allow KVM to keep a mirror of the
private EPT in host memory, and define entries in kvm_x86_ops to
operate on external page tables such as the TDX private EPT
- The recently introduced conversion of the NX-page reclamation
kthread to vhost_task moved the task under the main process. The
task is created as soon as KVM_CREATE_VM was invoked and this, of
course, broke userspace that didn't expect to see any child task of
the VM process until it started creating its own userspace threads.
In particular crosvm refuses to fork() if procfs shows any child
task, so unbreak it by creating the task lazily. This is arguably a
userspace bug, as there can be other kinds of legitimate worker
tasks and they wouldn't impede fork(); but it's not like userspace
has a way to distinguish kernel worker tasks right now. Should they
show as "Kthread: 1" in proc/.../status?
x86 - Intel:
- Fix a bug where KVM updates hardware's APICv cache of the highest
ISR bit while L2 is active, while ultimately results in a
hardware-accelerated L1 EOI effectively being lost
- Honor event priority when emulating Posted Interrupt delivery
during nested VM-Enter by queueing KVM_REQ_EVENT instead of
immediately handling the interrupt
- Rework KVM's processing of the Page-Modification Logging buffer to
reap entries in the same order they were created, i.e. to mark gfns
dirty in the same order that hardware marked the page/PTE dirty
- Misc cleanups
Generic:
- Cleanup and harden kvm_set_memory_region(); add proper lockdep
assertions when setting memory regions and add a dedicated API for
setting KVM-internal memory regions. The API can then explicitly
disallow all flags for KVM-internal memory regions
- Explicitly verify the target vCPU is online in kvm_get_vcpu() to
fix a bug where KVM would return a pointer to a vCPU prior to it
being fully online, and give kvm_for_each_vcpu() similar treatment
to fix a similar flaw
- Wait for a vCPU to come online prior to executing a vCPU ioctl, to
fix a bug where userspace could coerce KVM into handling the ioctl
on a vCPU that isn't yet onlined
- Gracefully handle xarray insertion failures; even though such
failures are impossible in practice after xa_reserve(), reserving
an entry is always followed by xa_store() which does not know (or
differentiate) whether there was an xa_reserve() before or not
RISC-V:
- Zabha, Svvptc, and Ziccrse extension support for guests. None of
them require anything in KVM except for detecting them and marking
them as supported; Zabha adds byte and halfword atomic operations,
while the others are markers for specific operation of the TLB and
of LL/SC instructions respectively
- Virtualize SBI system suspend extension for Guest/VM
- Support firmware counters which can be used by the guests to
collect statistics about traps that occur in the host
Selftests:
- Rework vcpu_get_reg() to return a value instead of using an
out-param, and update all affected arch code accordingly
- Convert the max_guest_memory_test into a more generic
mmu_stress_test. The basic gist of the "conversion" is to have the
test do mprotect() on guest memory while vCPUs are accessing said
memory, e.g. to verify KVM and mmu_notifiers are working as
intended
- Play nice with treewrite builds of unsupported architectures, e.g.
arm (32-bit), as KVM selftests' Makefile doesn't do anything to
ensure the target architecture is actually one KVM selftests
supports
- Use the kernel's $(ARCH) definition instead of the target triple
for arch specific directories, e.g. arm64 instead of aarch64,
mainly so as not to be different from the rest of the kernel
- Ensure that format strings for logging statements are checked by
the compiler even when the logging statement itself is disabled
- Attempt to whack the last LLC references/misses mole in the Intel
PMU counters test by adding a data load and doing CLFLUSH{OPT} on
the data instead of the code being executed. It seems that modern
Intel CPUs have learned new code prefetching tricks that bypass the
PMU counters
- Fix a flaw in the Intel PMU counters test where it asserts that
events are counting correctly without actually knowing what the
events count given the underlying hardware; this can happen if
Intel reuses a formerly microarchitecture-specific event encoding
as an architectural event, as was the case for Top-Down Slots"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (151 commits)
kvm: defer huge page recovery vhost task to later
KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault()
KVM: Disallow all flags for KVM-internal memslots
KVM: x86: Drop double-underscores from __kvm_set_memory_region()
KVM: Add a dedicated API for setting KVM-internal memslots
KVM: Assert slots_lock is held when setting memory regions
KVM: Open code kvm_set_memory_region() into its sole caller (ioctl() API)
LoongArch: KVM: Add hypercall service support for usermode VMM
LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changed
KVM: SVM: Use str_enabled_disabled() helper in svm_hardware_setup()
KVM: VMX: read the PML log in the same order as it was written
KVM: VMX: refactor PML terminology
KVM: VMX: Fix comment of handle_vmx_instruction()
KVM: VMX: Reinstate __exit attribute for vmx_exit()
KVM: SVM: Use str_enabled_disabled() helper in sev_hardware_setup()
KVM: x86: Avoid double RDPKRU when loading host/guest PKRU
KVM: x86: Use LVT_TIMER instead of an open coded literal
RISC-V: KVM: Add new exit statstics for redirected traps
RISC-V: KVM: Update firmware counters for various events
RISC-V: KVM: Redirect instruction access fault trap to guest
...
As usual, a number of new drivers get added to the defconfig to
support additional hardware. The stm32 defconfig also turns off
a few options to optimize for size.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmeTguUACgkQYKtH/8kJ
Uifr2xAAtxKljNKki7bb9aeTcvZ98g34dSBCB76Oc80tYTM1Q9zZO+96ZBv/9W/F
fhvAG8cSjfrXfUB8nwiQBzbV8Kl7jaxomb3h3z8snji6Kp4+IZtrYUPfS+QG5GMB
o5U5eBTeipbsZRO3EwHdPvdCf39oeAvb+6IsESnbzAbZkhMOaUhzHok1G+fxekvv
pF4mgFCYmnjq4TLMEikVlKF5qBQ2kScbECLDVx9/xvz8p6RotL0WOKGvfwJf4LFx
nP1D1bBB01g/yilVZ6PAyfhnpN8KULdENllOBxnladPkt+Qb4zEHHuAChaSE8gQS
ulZAq83OGyY85xTAUUPgFTpmLrIVeOPtjHuJQ2RScO+pB/MIf/fL1ozRvxD2jiKa
KYFTlVFUX0xIeSAJ/xCp3/i2052AQ6ac5mp9P3ti77XSbSfCarWx882BfELZrbsw
gKabUr3QJxJmxio8UL5HUPlxKAsBczBEa/jC/XqtSbQmcnfL3FxpNh77KgLx5ICx
n3CGio4YmE/cVgWVjjJzq974MPpdvzRWN7cnLpWEQnOv3iuk2tomdGDqkUlDzuan
7U2ylFa2q0Li7eKqplX6gqny4QKCNH1aMQlucmxifvAG0X9jYlq9sLVVeJM8LiCi
C0cCVUx6KtN0ZjRawDSY0qZ8Yeq/NyIpU4/Ywc0aAZ5JfNqsA4Y=
=ZFLM
-----END PGP SIGNATURE-----
Merge tag 'soc-defconfig-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC defconfig updates from Arnd Bergmann:
"As usual, a number of new drivers get added to the defconfig to
support additional hardware.
The stm32 defconfig also turns off a few options to optimize for size"
* tag 'soc-defconfig-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (27 commits)
dt-bindings: soc: samsung: exynos-pmu: Add exynos990-pmu compatible
arm64: defconfig: enable Maxim TCPCI driver
ARM: configs: stm32: Remove useless flags in STM32 defconfig
ARM: configs: stm32: Remove CRYPTO in STM32 defconfig
ARM: configs: stm32: Clean STM32 defconfig
ARM: configs: stm32: Remove FLASH_MEM_BASE and FLASH_SIZE in STM32 defconfig
arm64: defconfig: Enable pinctrl-based I2C mux
arm64: defconfig: Enable Rockchip extensions for Synopsys DW HDMI QP
arm64: defconfig: Enable RFKILL GPIO
arm64: defconfig: Enable TI K3 M4 remoteproc driver
arm64: defconfig: Enable Qualcomm IPQ CMN PLL clock controller
arm64: defconfig: Enable basic Qualcomm SM8750 SoC drivers
arm64: defconfig: remove obsolete CONFIG_SM_DISPCC_8650
arm64: defconfig: enable clock controller, interconnect and pinctrl for QCS8300
arm64: defconfig: Enable sa8775p clock controllers
arm64: defconfig: Enable MediaTek DWMAC
arm64: defconfig: Enable sound for MT8188
arm64: defconfig: Enable MediaTek STAR Ethernet MAC
riscv: defconfig: enable pinctrl and dwmac support for TH1520
arm64: defconfig: Enable Amazon Elastic Network Adaptor
...
We see the addition of eleven new SoCs, including a total of sixx arm64
chips from Qualcomm alone. Overall, the Qualcomm platforms once again
make up the majority of all changes, after a couple of quieter releases.
The new SoCs in this branch are:
- Microchip sama7d65 is a new 32-bit embedded chip with a single
Cortex-A7 and the current high end of the old Atmel SoC line.
- Samsung Exynos 9810 is a mobile phone chip used in some older
phones like the Samsung Galaxy S9
- Renesas R-Car V4H ES3.0 (R8A779G3) is an updated version of
the V4H (R8A779G0) low-power automotive SoC
- Renesas RZ/G3E (R0A09G047) is a family of embedded chips
using Cortex-A55 cores
- Qualcomm Snapdragon 8 Elite (SM8750) is a new phone chip based on
Qualcomm's Oryon CPU cores.
- Qualcomm Snapdragon AR2 (SAR2130P) is a SoC for augmented reality
glasses.
- Qualcomm IQ6 (QCS610) and IQ8 (QCS8300) are two industrial
IOT platforms.
- Snapdragon 425 (MSM8917) is a mobile phone SoC from 2016
- Qualcomm IPQ5424 is a Wi-Fi 7 networking chip
All of the above are part of already supported SoC families that
only need new devicetree files. Two additional SoCs in new
families are part of a separate branch.
There are 48 new machines in total, including six arm32 ones based
on aspeed. broadcom, microchip and st SoCs all using Cortex-A7 cores,
and a single risc-v board, the Banana Pi R3.
The remaining ones use arm64 chips from Broadcom, Samsung, NXP, Mediatek,
Qualcomm, Renesas and Rockchips and cover development boards, phones,
laptops, industrial machines routers.
A lot of ongoing work is for cleaning up build time warnings and other
issues, in addition to the new machines and added features.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmeSdLQACgkQYKtH/8kJ
UicovRAA0fABVQ8Fl45/NNaBGfXYagXptCSTGOFsdKJ49LVF4uLfWtL+0ENx5Ck5
PJjr0n9kMNWqeJDiaaQtW21HhYxGxcz3MJEj60/C+D0QNQExPVROUHNy1aggxjNI
qHf0DnTLAWzjtD0YdCmiI6JCDRdPIRQi2IJymAu7tlooc809PG15bbo6PpIYginC
1U6cYtuyBuE/9ku2FgWX6E4T0aRjPyaR8thg9VAIsKsugdH3v9EdtLC/MUqOBHMt
30PyghR9+r1LxQzOC/q7TFcPmnUb74fSPW85X7a5KXv53K6MeRXtRhnetts08R7Z
iZCJi2ORO100RX7plAzxtF+CWI8eO3bVzibTcZmgxP/Is6CmrlnTcPzOFvqfyx1E
AfeyEGA7XofjFwPJcc9bCQc3r2w90FpsKqtlaBAn2Od+1EUuuAAgUcjrNyNJqlkp
8Vos0FxNOOnYULjndYZqa6MslBuxNXYtNj0Ph1/fpzUWKwo+x8LWy8Xb9a5Sdz0H
OsPVWbumrXlG1rcNMFu8yPzKOBgO0t8on5MRwW+1Xmf1lcQNzJWeGqTzsFPObREV
Ar7evGEgSb8qladOtzbg645wIezWIXpSJUICQhilxV8DUO+IYuMz668QoZZP40V5
uHdWxFGdNe1cm5JAsjjwCeFNk/Pbro1+ojc4E6//MRp+WCgdPQ0=
=vdmR
-----END PGP SIGNATURE-----
Merge tag 'soc-dt-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC devicetree updates from Arnd Bergmann:
"We see the addition of eleven new SoCs, including a total of sixx
arm64 chips from Qualcomm alone. Overall, the Qualcomm platforms once
again make up the majority of all changes, after a couple of quieter
releases.
The new SoCs in this branch are:
- Microchip sama7d65 is a new 32-bit embedded chip with a single
Cortex-A7 and the current high end of the old Atmel SoC line.
- Samsung Exynos 9810 is a mobile phone chip used in some older
phones like the Samsung Galaxy S9
- Renesas R-Car V4H ES3.0 (R8A779G3) is an updated version of the V4H
(R8A779G0) low-power automotive SoC
- Renesas RZ/G3E (R0A09G047) is a family of embedded chips using
Cortex-A55 cores
- Qualcomm Snapdragon 8 Elite (SM8750) is a new phone chip based on
Qualcomm's Oryon CPU cores.
- Qualcomm Snapdragon AR2 (SAR2130P) is a SoC for augmented reality
glasses.
- Qualcomm IQ6 (QCS610) and IQ8 (QCS8300) are two industrial IOT
platforms.
- Snapdragon 425 (MSM8917) is a mobile phone SoC from 2016
- Qualcomm IPQ5424 is a Wi-Fi 7 networking chip
All of the above are part of already supported SoC families that only
need new devicetree files. Two additional SoCs in new families are
part of a separate branch.
There are 48 new machines in total, including six arm32 ones based on
aspeed. broadcom, microchip and st SoCs all using Cortex-A7 cores, and
a single risc-v board, the Banana Pi R3.
The remaining ones use arm64 chips from Broadcom, Samsung, NXP,
Mediatek, Qualcomm, Renesas and Rockchips and cover development
boards, phones, laptops, industrial machines routers.
A lot of ongoing work is for cleaning up build time warnings and other
issues, in addition to the new machines and added features"
* tag 'soc-dt-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (619 commits)
arm64: tegra: Fix Tegra234 PCIe interrupt-map
arm64: dts: qcom: x1e80100-romulus: Update firmware nodes
arm64: dts: rockchip: add DTs for Firefly ITX-3588J and its Core-3588J SoM
dt-bindings: arm: rockchip: Add Firefly ITX-3588J board
arm64: dts: rockchip: Add Orange Pi 5 Max board
dt-bindings: arm: rockchip: Add Xunlong Orange Pi 5 Max
arm64: dts: rockchip: refactor common rk3588-orangepi-5.dtsi
arm64: dts: rockchip: add WLAN to rk3588-evb1 controller
arm64: dts: rockchip: increase gmac rx_delay on rk3399-puma
arm64: dts: rockchip: Delete redundant RK3328 GMAC stability fixes
arm64: tegra: Disable Tegra234 sce-fabric node
arm64: tegra: Fix typo in Tegra234 dce-fabric compatible
arm64: tegra: Fix DMA ID for SPI2
arm64: dts: qcom: msm8916-samsung-serranove: Add display panel
arm64: dts: qcom: sm8650: Add 'global' interrupt to the PCIe RC nodes
arm64: dts: qcom: sm8550: Add 'global' interrupt to the PCIe RC nodes
arm64: dts: qcom: Remove unused and undocumented properties
arm64: dts: qcom: sdm450-lenovo-tbx605f: add DSI panel nodes
arm64: dts: qcom: pmi8950: add LAB-IBB nodes
arm64: dts: qcom: ipq5424: enable the download mode support
...
Two new SoC families are added here, with devicetree files and
a little bit of infrastructure to allow booting:
- Blaize BLZP1600 is an AI chip using custom GSP (Graph Streaming
Processor) cores for computation, and two small Cortex-A53 cores
that run the operating system.
- SpacemiT K1 is a 64-bit RISC-V chip, using eight custom RVA22
compatible CPU cores with vector support.
Also marketed at AI applications, it has a much slower NPU compared
to BLZP1600, but in turn focuses on the CPU performance
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmeSd7cACgkQYKtH/8kJ
UienxxAAoUMAWex/k5gWfg3DdvDrleXydjYzIm6DBIad1kxHYy/gvEuQAgV9vxqb
tQ763NKRLekmwv+CX+S47PjrMHgqqwtch5If8ixd50gYSZPukJmuCTkthMu1qBb9
/zi0LXg3cViPQBAY0MwCQMzukRHsqmBzaUSle6bM1yDHhHAmcz88Iqs/LttJOnYk
M/+ybxYoUdVmPI+Nt3e8hxlecfU2oM0S1hoStvdsA0fpuLfEBF8WpF7DnkSphT10
JqVysB3FTF0PYXyBodcuh37iCCObAFVAUT/ZvKO3+Qo/rI2IseP8lSeAV84JP/5S
w4MqpGhFecICBEWJ9h0FnbSI2NyON8uz23R+va6YAtTxuPL2Loq4HwvxXh5H5imC
sxZ0SEnkwFQ7DJ10nV00FXlmKZ7+Ax0WLoboNHWYxdAfqhgcZ8q8gi/+WTETTBog
jnTnWjscuEA5QTNfOP0AqUtVbWQOj09wLsiceJc8G5jpbe9WaZJy1yzXaCNu2hZV
2x+2UT31Y6G2Poxee8Ti4u1ljieJh5BmD1Ts81P8EpAC0JsVyJ3HtouYeBHcBPoj
HXmdGDPN8bS0Ugu7iFu3GO+TPcBzUw1ZND8L3QbBesb/oFK0455ARkVXO2nazP4p
4s/iGWbGAKIyKPOdYGaiSXmbXurx99pOVY6G8ccqkNf+17MfTgE=
=clzA
-----END PGP SIGNATURE-----
Merge tag 'soc-new-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull new SoC support from Arnd Bergmann:
"Two new SoC families are added here, with devicetree files and a
little bit of infrastructure to allow booting:
- Blaize BLZP1600 is an AI chip using custom GSP (Graph Streaming
Processor) cores for computation, and two small Cortex-A53 cores
that run the operating system.
- SpacemiT K1 is a 64-bit RISC-V chip, using eight custom RVA22
compatible CPU cores with vector support.
Also marketed at AI applications, it has a much slower NPU compared
to BLZP1600, but in turn focuses on the CPU performance"
* tag 'soc-new-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
riscv: dts: spacemit: move aliases to board dts
riscv: dts: spacemit: add pinctrl property to uart0 in BPI-F3
riscv: defconfig: enable SpacemiT SoC
riscv: dts: spacemit: add Banana Pi BPI-F3 board device tree
riscv: dts: add initial SpacemiT K1 SoC device tree
riscv: add SpacemiT SoC family Kconfig support
dt-bindings: serial: 8250: Add SpacemiT K1 uart compatible
dt-bindings: interrupt-controller: Add SpacemiT K1 PLIC
dt-bindings: timer: Add SpacemiT K1 CLINT
dt-bindings: riscv: add SpacemiT K1 bindings
dt-bindings: riscv: Add SpacemiT X60 compatibles
MAINTAINERS: setup support for SpacemiT SoC tree
MAINTAINER: Add entry for Blaize SoC
arm64: defconfig: Enable Blaize BLZP1600 platform
arm64: dts: Add initial support for Blaize BLZP1600 CB2
arm64: Add Blaize BLZP1600 SoC family
dt-bindings: arm: blaize: Add Blaize BLZP1600 SoC
dt-bindings: Add Blaize vendor prefix
- Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be
directly accessible via the library API, instead of requiring the
crypto API. This is much simpler and more efficient.
- Convert some users such as ext4 to use the CRC32 library API instead
of the crypto API. More conversions like this will come later.
- Add a KUnit test that tests and benchmarks multiple CRC variants.
Remove older, less-comprehensive tests that are made redundant by
this.
- Add an entry to MAINTAINERS for the kernel's CRC library code. I'm
volunteering to maintain it. I have additional cleanups and
optimizations planned for future cycles.
These patches have been in linux-next since -rc1.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZ418ZRQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKyJYAP9kBlpm8W9/XY6N8SpjKaXE/vKQYHQl
Nobhak06Us8uJwEAkcUTymWP4IwQj5A9jgBAPRw53FQcNVKIc+01C7gRHw0=
=mqSH
-----END PGP SIGNATURE-----
Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
- Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be
directly accessible via the library API, instead of requiring the
crypto API. This is much simpler and more efficient.
- Convert some users such as ext4 to use the CRC32 library API instead
of the crypto API. More conversions like this will come later.
- Add a KUnit test that tests and benchmarks multiple CRC variants.
Remove older, less-comprehensive tests that are made redundant by
this.
- Add an entry to MAINTAINERS for the kernel's CRC library code. I'm
volunteering to maintain it. I have additional cleanups and
optimizations planned for future cycles.
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits)
MAINTAINERS: add entry for CRC library
powerpc/crc: delete obsolete crc-vpmsum_test.c
lib/crc32test: delete obsolete crc32test.c
lib/crc16_kunit: delete obsolete crc16_kunit.c
lib/crc_kunit.c: add KUnit test suite for CRC library functions
powerpc/crc-t10dif: expose CRC-T10DIF function through lib
arm64/crc-t10dif: expose CRC-T10DIF function through lib
arm/crc-t10dif: expose CRC-T10DIF function through lib
x86/crc-t10dif: expose CRC-T10DIF function through lib
crypto: crct10dif - expose arch-optimized lib function
lib/crc-t10dif: add support for arch overrides
lib/crc-t10dif: stop wrapping the crypto API
scsi: target: iscsi: switch to using the crc32c library
f2fs: switch to using the crc32 library
jbd2: switch to using the crc32c library
ext4: switch to using the crc32c library
lib/crc32: make crc32c() go directly to lib
bcachefs: Explicitly select CRYPTO from BCACHEFS_FS
x86/crc32: expose CRC32 functions through lib
x86/crc32: update prototype for crc32_pclmul_le_16()
...
- Have fprobes built on top of function graph infrastructure
The fprobe logic is an optimized kprobe that uses ftrace to attach to
functions when a probe is needed at the start or end of the function. The
fprobe and kretprobe logic implements a similar method as the function
graph tracer to trace the end of the function. That is to hijack the
return address and jump to a trampoline to do the trace when the function
exits. To do this, a shadow stack needs to be created to store the
original return address. Fprobes and function graph do this slightly
differently. Fprobes (and kretprobes) has slots per callsite that are
reserved to save the return address. This is fine when just a few points
are traced. But users of fprobes, such as BPF programs, are starting to add
many more locations, and this method does not scale.
The function graph tracer was created to trace all functions in the
kernel. In order to do this, when function graph tracing is started, every
task gets its own shadow stack to hold the return address that is going to
be traced. The function graph tracer has been updated to allow multiple
users to use its infrastructure. Now have fprobes be one of those users.
This will also allow for the fprobe and kretprobe methods to trace the
return address to become obsolete. With new technologies like CFI that
need to know about these methods of hijacking the return address, going
toward a solution that has only one method of doing this will make the
kernel less complex.
- Cleanup with guard() and free() helpers
There were several places in the code that had a lot of "goto out" in the
error paths to either unlock a lock or free some memory that was
allocated. But this is error prone. Convert the code over to use the
guard() and free() helpers that let the compiler unlock locks or free
memory when the function exits.
- Remove disabling of interrupts in the function graph tracer
When function graph tracer was first introduced, it could race with
interrupts and NMIs. To prevent that race, it would disable interrupts and
not trace NMIs. But the code has changed to allow NMIs and also
interrupts. This change was done a long time ago, but the disabling of
interrupts was never removed. Remove the disabling of interrupts in the
function graph tracer is it is not needed. This greatly improves its
performance.
- Allow the :mod: command to enable tracing module functions on the kernel
command line.
The function tracer already has a way to enable functions to be traced in
modules by writing ":mod:<module>" into set_ftrace_filter. That will
enable either all the functions for the module if it is loaded, or if it
is not, it will cache that command, and when the module is loaded that
matches <module>, its functions will be enabled. This also allows init
functions to be traced. But currently events do not have that feature.
Because enabling function tracing can be done very early at boot up
(before scheduling is enabled), the commands that can be done when
function tracing is started is limited. Having the ":mod:" command to
trace module functions as they are loaded is very useful. Update the
kernel command line function filtering to allow it.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ42E2RQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qqXSAPwOMxuhye8tb1GYG62QD9+w7e6nOmlC
2GCPj4detnEM2QD/ciivkhespVKhHpZHRewAuSnJgHPSM45NQ3EVESzjWQ4=
=snbx
-----END PGP SIGNATURE-----
Merge tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Have fprobes built on top of function graph infrastructure
The fprobe logic is an optimized kprobe that uses ftrace to attach to
functions when a probe is needed at the start or end of the function.
The fprobe and kretprobe logic implements a similar method as the
function graph tracer to trace the end of the function. That is to
hijack the return address and jump to a trampoline to do the trace
when the function exits. To do this, a shadow stack needs to be
created to store the original return address. Fprobes and function
graph do this slightly differently. Fprobes (and kretprobes) has
slots per callsite that are reserved to save the return address. This
is fine when just a few points are traced. But users of fprobes, such
as BPF programs, are starting to add many more locations, and this
method does not scale.
The function graph tracer was created to trace all functions in the
kernel. In order to do this, when function graph tracing is started,
every task gets its own shadow stack to hold the return address that
is going to be traced. The function graph tracer has been updated to
allow multiple users to use its infrastructure. Now have fprobes be
one of those users. This will also allow for the fprobe and kretprobe
methods to trace the return address to become obsolete. With new
technologies like CFI that need to know about these methods of
hijacking the return address, going toward a solution that has only
one method of doing this will make the kernel less complex.
- Cleanup with guard() and free() helpers
There were several places in the code that had a lot of "goto out" in
the error paths to either unlock a lock or free some memory that was
allocated. But this is error prone. Convert the code over to use the
guard() and free() helpers that let the compiler unlock locks or free
memory when the function exits.
- Remove disabling of interrupts in the function graph tracer
When function graph tracer was first introduced, it could race with
interrupts and NMIs. To prevent that race, it would disable
interrupts and not trace NMIs. But the code has changed to allow NMIs
and also interrupts. This change was done a long time ago, but the
disabling of interrupts was never removed. Remove the disabling of
interrupts in the function graph tracer is it is not needed. This
greatly improves its performance.
- Allow the :mod: command to enable tracing module functions on the
kernel command line.
The function tracer already has a way to enable functions to be
traced in modules by writing ":mod:<module>" into set_ftrace_filter.
That will enable either all the functions for the module if it is
loaded, or if it is not, it will cache that command, and when the
module is loaded that matches <module>, its functions will be
enabled. This also allows init functions to be traced. But currently
events do not have that feature.
Because enabling function tracing can be done very early at boot up
(before scheduling is enabled), the commands that can be done when
function tracing is started is limited. Having the ":mod:" command to
trace module functions as they are loaded is very useful. Update the
kernel command line function filtering to allow it.
* tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
ftrace: Implement :mod: cache filtering on kernel command line
tracing: Adopt __free() and guard() for trace_fprobe.c
bpf: Use ftrace_get_symaddr() for kprobe_multi probes
ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr
Documentation: probes: Update fprobe on function-graph tracer
selftests/ftrace: Add a test case for repeating register/unregister fprobe
selftests: ftrace: Remove obsolate maxactive syntax check
tracing/fprobe: Remove nr_maxactive from fprobe
fprobe: Add fprobe_header encoding feature
fprobe: Rewrite fprobe on function-graph tracer
s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC
ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
tracing: Add ftrace_fill_perf_regs() for perf event
tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
fprobe: Use ftrace_regs in fprobe exit handler
fprobe: Use ftrace_regs in fprobe entry handler
fgraph: Pass ftrace_regs to retfunc
fgraph: Replace fgraph_ret_regs with ftrace_regs
...
- Consolidation of the machine_kexec_mask_interrupts() by providing a
generic implementation and replacing the copy & pasta orgy in the
relevant architectures.
- Prevent unconditional operations on interrupt chips during kexec
shutdown, which can trigger warnings in certain cases when the
underlying interrupt has been shut down before.
- Make the enforcement of interrupt handling in interrupt context
unconditionally available, so that it actually works for non x86
related interrupt chips. The earlier enablement for ARM GIC chips set
the required chip flag, but did not notice that the check was hidden
behind a config switch which is not selected by ARM[64].
- Decrapify the handling of deferred interrupt affinity setting. Some
interrupt chips require that affinity changes are made from the context
of handling an interrupt to avoid certain race conditions. For x86 this
was the default, but with interrupt remapping this requirement was
lifted and a flag was introduced which tells the core code that
affinity changes can be done in any context. Unrestricted affinity
changes are the default for the majority of interrupt chips. RISCV has
the requirement to add the deferred mode to one of it's interrupt
controllers, but with the original implementation this would require to
add the any context flag to all other RISC-V interrupt chips. That's
backwards, so reverse the logic and require that chips, which need the
deferred mode have to be marked accordingly. That avoids chasing the
'sane' chips and marking them.
- Add multi-node support to the Loongarch AVEC interrupt controller
driver.
- The usual tiny cleanups, fixes and improvements all over the place.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmePkVITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRbQD/9bHVph/V9Ekl7JAX3aY4gG4JbRhOc7
dp1VAcHRhktRfoTztYRbjsbMu2nvZ58GKA8bkOS2jHSF/m3PbkIJfOhwk0YdIAoa
+kdy5yDgqCGfkqW43DN4Cr+CnzGjWMitw67tFp3fhwehMDpDjdt2L28IjtanSS0f
hO6FV7o65MWeJwxk4Isb2/nvkO+X23Lrp6RrWS8SXBnF9FFXxiPIg/fiOPTizhCh
1W/bSGxLLb9WwsVzmlGAKVFlXDij0QGaIUug2fdVZ63OsELXD7tJrLSPG133yk92
ppIa0s6BT4IBsfM00us4hG15PkLuJmP3yWWcoquG0rP8Wq58VOXiN6+rcJIyvB+5
mWceTH6IKfZGoRQKwXC7BxeBAIb147reiJtb06meq1/8ADIvzafiNy0c8x9i/UaV
QiyhPVENjaGCGDomZmJQqN7Yb02Wge1k8InQnodDrHxZNl/bX/B1Z8Bxd0n6hPHg
NSJXYif2AxgaddpohsdygqRDbT6SNyQdj7YjJFY5qAGJ3yFyJ4JB6WTqkWW4o1vH
3FVqdAnJmejAmmYSkah0Hkem2T5QASQmTWb93PLxiV6q+d0NM8stWAujjyVdIV/B
W4Uj9mQ20cz54TjLtxqX+A1k6KcqOWRgh1l2QbUlFsgsOP3V8yz47yqYdR9qMWlO
9kNEjI3sw+G/IQ==
=q4rj
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
- Consolidate the machine_kexec_mask_interrupts() by providing a
generic implementation and replacing the copy & pasta orgy in the
relevant architectures.
- Prevent unconditional operations on interrupt chips during kexec
shutdown, which can trigger warnings in certain cases when the
underlying interrupt has been shut down before.
- Make the enforcement of interrupt handling in interrupt context
unconditionally available, so that it actually works for non x86
related interrupt chips. The earlier enablement for ARM GIC chips set
the required chip flag, but did not notice that the check was hidden
behind a config switch which is not selected by ARM[64].
- Decrapify the handling of deferred interrupt affinity setting.
Some interrupt chips require that affinity changes are made from the
context of handling an interrupt to avoid certain race conditions.
For x86 this was the default, but with interrupt remapping this
requirement was lifted and a flag was introduced which tells the core
code that affinity changes can be done in any context. Unrestricted
affinity changes are the default for the majority of interrupt chips.
RISCV has the requirement to add the deferred mode to one of it's
interrupt controllers, but with the original implementation this
would require to add the any context flag to all other RISC-V
interrupt chips. That's backwards, so reverse the logic and require
that chips, which need the deferred mode have to be marked
accordingly. That avoids chasing the 'sane' chips and marking them.
- Add multi-node support to the Loongarch AVEC interrupt controller
driver.
- The usual tiny cleanups, fixes and improvements all over the place.
* tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set()
genirq/timings: Add kernel-doc for a function parameter
genirq: Remove IRQ_MOVE_PCNTXT and related code
x86/apic: Convert to IRQCHIP_MOVE_DEFERRED
genirq: Provide IRQCHIP_MOVE_DEFERRED
hexagon: Remove GENERIC_PENDING_IRQ leftover
ARC: Remove GENERIC_PENDING_IRQ
genirq: Remove handle_enforce_irqctx() wrapper
genirq: Make handle_enforce_irqctx() unconditionally available
irqchip/loongarch-avec: Add multi-nodes topology support
irqchip/ts4800: Replace seq_printf() by seq_puts()
irqchip/ti-sci-inta : Add module build support
irqchip/ti-sci-intr: Add module build support
irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function
irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args
genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown
kexec: Consolidate machine_kexec_mask_interrupts() implementation
genirq: Reuse irq_thread_fn() for forced thread case
genirq: Move irq_thread_fn() further up in the code
EXPORT_SYMBOL_GPL() is missing for __cpuid_to_hartid_map array.
Export this symbol to allow drivers compiled as modules to use
cpuid_to_hartid_map().
Signed-off-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
- Svvptc, Zabha, and Ziccrse extension support for Guest/VM
- Virtualize SBI system suspend extension for Guest/VM
- Trap related exit statstics as SBI PMU firmware counters for Guest/VM
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmeJ3zYACgkQrUjsVaLH
LAdd4hAAj7MDKBXFfkgrg6M5b3CgjFp/1FbiQQe+gJNEOrDdWJy115imk2IKbEZq
MAHLF1nlavmi3ZxTCGRPamyyVej5A0+oNujl+ktltHXIKPqXVcJvz5Iu15uDJNZf
Ow6Vz74HONl3UKH1H029hFA8oXBTzlxyLv8EXfUJ/HZ/bUeQAfgX52pdvTZ54o4Z
gJy+dODD7LOIh9WSlqsrqRcaZkQ4uWZNzQVuXQyubon/AhyqLZyoX/Kx+eLg0QSG
LwwXQA5ew4fG3heOWGSSVhiBeqKj7Kmk7l3tyWe6f/YUw7BP1EEwY6ufdH2jy6Z3
mEE/3+mYhkBvr6sWCMPgGt8IMGqe6vRPE1SjPRk3xjzQN4a5aSGXA3J2bBHZhdgt
ksGKD0CNhFv/E9LyeQnIylqQqNL9IIb35hrRmVCGzeOJB5PhqDRZiuyyz4LMOgY9
1uY6c5fuSxV0IIZV2Y4oTxZ26dCBkGzR5rrVBSakSF1xWfU+0rzX91FslEUzPyDO
W+RcG9ziFTdCbYkTGlt6sSiXRwkYe/TD6VpDgsxbQECgW/9Itg2BSCZojuy7Lfo9
idhjHLIouruGyQKrJmadUdOuHzLOCX8XMo1oTjlrPudNoILC6GmZs+X7xUUJ6Fzi
mOgxcUBsByzZLhyhPnbwS0o0D7La7HJbuNne8VSHfEjPhr8/yl0=
=EQ0o
-----END PGP SIGNATURE-----
Merge tag 'kvm-riscv-6.14-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.14
- Svvptc, Zabha, and Ziccrse extension support for Guest/VM
- Virtualize SBI system suspend extension for Guest/VM
- Trap related exit statstics as SBI PMU firmware counters for Guest/VM
When the kernel displays "Unable to handle kernel paging request at
virtual address", we would like to confirm the status of the virtual
address in the page table. So add show_pte() before die().
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240723021820.87718-1-cuiyunhui@bytedance.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Charlie Jenkins <charlie@rivosinc.com> says:
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
[1] 95358cb2cc/xtheadvector.adoc
* b4-shazam-merge:
riscv: Add ghostwrite vulnerability
selftests: riscv: Support xtheadvector in vector tests
selftests: riscv: Fix vector tests
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
riscv: hwprobe: Add thead vendor extension probing
riscv: vector: Support xtheadvector save/restore
riscv: Add xtheadvector instruction definitions
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
RISC-V: define the elements of the VCSR vector CSR
riscv: vector: Use vlenb from DT for thead
riscv: Add thead and xtheadvector as a vendor extension
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
dt-bindings: cpus: add a thead vlen register length property
dt-bindings: riscv: Add xtheadvector ISA extension description
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-0-236c22791ef9@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Follow the patterns of the other architectures that use
GENERIC_CPU_VULNERABILITIES for riscv to introduce the ghostwrite
vulnerability and mitigation. The mitigation is to disable all vector
which is accomplished by clearing the bit from the cpufeature field.
Ghostwrite only affects thead c9xx CPUs that impelment xtheadvector, so
the vulerability will only be mitigated on these CPUs.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-14-236c22791ef9@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add a new hwprobe key "RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0" which
allows userspace to probe for the new RISCV_ISA_VENDOR_EXT_XTHEADVECTOR
vendor extension.
This new key will allow userspace code to probe for which thead vendor
extensions are supported. This API is modeled to be consistent with
RISCV_HWPROBE_KEY_IMA_EXT_0. The bitmask returned will have each bit
corresponding to a supported thead vendor extension of the cpumask set.
Just like RISCV_HWPROBE_KEY_IMA_EXT_0, this allows a userspace program
to determine all of the supported thead vendor extensions in one call.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-10-236c22791ef9@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
xtheadvector uses different encodings than standard vector for
vsetvli and vector loads/stores. Write the instruction formats to be
used in assembly code.
Co-developed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-8-236c22791ef9@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The VXRM vector csr for xtheadvector has an encoding of 0xa and VXSAT
has an encoding of 0x9.
Co-developed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-7-236c22791ef9@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The VCSR CSR contains two elements VXRM[2:1] and VXSAT[0].
Define constants for those to access the elements in a readable way.
Acked-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-6-236c22791ef9@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
If thead,vlenb is provided in the device tree, prefer that over reading
the vlenb csr.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-5-236c22791ef9@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add support to the kernel for THead vendor extensions with the target of
the new extension xtheadvector.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-4-236c22791ef9@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The D1/D1s SoCs support xtheadvector so it can be included in the
devicetree. Also include vlenb for the cpu.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-3-236c22791ef9@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This trips up with Xtheadvector enabled, but as far as I can tell it's
just been an issue since the original patchset.
Fixes: 7ca7a7b9b6 ("riscv: Add sysctl to set the default vector rule for new processes")
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20250115180251.31444-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Before pinctrl driver implemented, the uart0 controller reply on
bootloader for setting correct pin mux and configurations.
Now, let's add pinctrl property to uart0 of Bananapi-F3 board.
Signed-off-by: Yixun Lan <dlan@gentoo.org>
Banana Pi BPI-F3 [1] is a industrial grade RISC-V development board, it
design with SpacemiT K1 8 core RISC-V chip [2].
Currently only support booting into console with only uart enabled,
other features will be added soon later.
Link: https://docs.banana-pi.org/en/BPI-F3/BananaPi_BPI-F3 [1]
Link: https://www.spacemit.com/en/spacemit-key-stone-2/ [2]
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Acked-by: Jesse Taube <jesse@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Yixun Lan <dlan@gentoo.org>
Banana Pi BPI-F3 motherboard is powered by SpacemiT K1[1].
Key features:
- 4 cores per cluster, 2 clusters on chip
- UART IP is Intel XScale UART
Some key considerations:
- ISA string is inferred from vendor documentation[2]
- Cluster topology is inferred from datasheet[1] and L2 in vendor dts[3]
- No coherent DMA on this board
Inferred by taking vendor ethernet and MMC drivers to the mainline
kernel. Without dma-noncoherent in soc node, the driver fails.
- Add cache nodes
K1 SoC has 128 sets of 32KiB L1 I/D Cache for each hart, and 512 sets
of 512KiB L2 Cache for each cluster.
Currently only support booting into console with only uart, other
features will be added soon later.
Link: https://docs.banana-pi.org/en/BPI-F3/SpacemiT_K1_datasheet [1]
Link: https://developer.spacemit.com/#/documentation?token=BWbGwbx7liGW21kq9lucSA6Vnpb [2]
Link: https://gitee.com/bianbu-linux/linux-6.1/blob/bl-v1.0.y/arch/riscv/boot/dts/spacemit/k1-x.dtsi [3]
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Acked-by: Jesse Taube <jesse@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Yixun Lan <dlan@gentoo.org>
The first SoC in the SpacemiT series is K1, which contains 8 RISC-V
cores with RISC-V Vector v1.0 support.
Link: https://www.spacemit.com/en/spacemit-key-stone-2/
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Yixun Lan <dlan@gentoo.org>
Not so much RISC-V, but rather StarFive, this time around as there are
only two changes: the Milk-V Mars and Pine64 Star64 boards get their usb0
interfaces moved from peripheral to host mode.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRh246EGq/8RLhDjO14tDGHoIJi0gUCZ4VhEwAKCRB4tDGHoIJi
0t1FAQCSFrIxz189bY1TjiyspNuggR2oNuCfg1j3X/piLDbDnQEAlY8+/v53mxUi
fqn+HjkDWqdR9EnNR1s/sQYTEN3QSwk=
=zXqg
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmeJKAIACgkQYKtH/8kJ
UidepxAA4ypf7/f5DTRh7ZlWIzafjQzsryuAMuk25L8A/M7T2mIGfyCnnWpQPjDs
GvrYXtKo3jsAXRSWQqShDF2zZrRua/UGhh5dlwq6yaX7XL8dUwQm2TxB732gBSa/
CtI90BA/RyKbXyxeuFURAPla2P68d8sd0vdufp2FuPU9fXDfAOLlpujL7fusxiV4
f/jJ5LGtf+Q9S2Ba67O+MVORIAfK3zCIL8GavwCturpS2duR9kt0mAPrLCEeP456
1WnnDFgXDkMSHXsnmrvfWlvhuY8u3papqDzHOB+VhOY39zzycG5BukqPE/gQpl37
8WNQtpvr/asGXPOBjjM7aBPGQV7181gm68v49QVFITKiBYyY9UBqALGcr+PN8BJH
Yc6aoe9q7KAQe3nxZJ526WuhJ11I5Ngi7Wkp4U6ku0aHjypp8TSBd95h7wBKhMs0
jeJSPC1eCNwuJvuvmLU2XW0MJStIPSwcXyym+Xbc83E3NbiXRQqFKC0b6PTg6RPC
cKEfdXyDDz2MavW5QWXfszObCGUQunne5vcc0sadb3XW2dbZA8W3Sz9s1G6e1h9z
4IQmyz1KD1oT1DEkZezubn/Hbk0rVgytWo+Z/MNoucZDo6caxyycldb6dcqDy81w
irVdrYeXyLnj4JV13h14pW6zTNLiamCqahcv9YdU5XOEY4cS2js=
=hu/V
-----END PGP SIGNATURE-----
Merge tag 'riscv-dt-for-v6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/dt
~RISC-V~ StarFive Devicetrees for v6.14
Not so much RISC-V, but rather StarFive, this time around as there are
only two changes: the Milk-V Mars and Pine64 Star64 boards get their usb0
interfaces moved from peripheral to host mode.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-dt-for-v6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: starfive: jh7110-milkv-mars: enable usb0 host function
riscv: dts: starfive: jh7110-pine64-star64: enable usb0 host function
Link: https://lore.kernel.org/r/20250113-kennel-outplayed-21a52a654c36@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
1. Clear LLBCTL if secondary mmu mapping changed.
2. Add hypercall service support for usermode VMM.
This is a really small changeset, because the Chinese New Year
(Spring Festival) is coming. Happy New Year!
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmeFGcIWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImern1D/9AZ8M+0nBAaONZaq2qKLC+RaW6
KqvFsR1PUUFzVcQZaHh9OZcx5s4EAH12EaxBH68W0o0ejbTUJp8QXT6cmO9bNFj1
tVaczGACss34kDerrddHisOpimFdaP+ECX4Q43oTc5N7vG6zUu3ijOISnIIxkhHP
RlX/+5Djw0NoaVAkhEj4v+LkY33z5QnDFNI0OjJiHpDepP2vLQ1FD573pLMeqcGs
BVYwZv7DP3SnVajSjRhT/r5qjy9EMjrXRLkIIwyjOUArRaPq/Lfg4CTK85e5MZsR
2GTkdjvh/YArpluRki4FX1cVOwpBbEtC+24/NWB+MPijtnYqMyAoIraZGqJMAzhw
P6W70A15GvBhlhQmvKNai1oXkdZaaT7XDcbFT706Cwhu7LvcNM8kK7VrPc59WLTR
uHO+ehJh0DCpBMC2BKH/8sztGx80u7SB4Ph0ytZCK+uYznTMEiBqRup7E/QLLG+1
EotXv8U4+Bwx/inzMxwi6vR1ZXo0dIDsnvFdSZeA6PC/cSoPzdqCdrXjQT/7HUIu
DNgcsRVL3LFE+A/sDVGb5/w9UPdQfCdO10bu97FkY37ftqp7LvPTlWJvDZJx+Wle
KfErCOM1/ZRQ2knzE7fst58auA3ZFNn3jWRkD/0gJ4X1Fgu63VrYeuc4FL7r8ken
HxKLYOLtD6dOzR5DeA==
=z2VU
-----END PGP SIGNATURE-----
Merge tag 'loongarch-kvm-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.14
1. Clear LLBCTL if secondary mmu mapping changed.
2. Add hypercall service support for usermode VMM.
This is a really small changeset, because the Chinese New Year
(Spring Festival) is coming. Happy New Year!
Atish Patra <atishp@rivosinc.com> says:
Here are two minor improvement/fixes in the PMU event path. The first patch
was part of the series[1]. The 2nd patch was suggested during the series
review.
While the series can only be merged once SBI v3.0 is frozen, these two
patches can be independent of SBI v3.0 and can be merged sooner. Hence, these
two patches are sent as a separate series.
* b4-shazam-merge:
drivers/perf: riscv: Do not allow invalid raw event config
drivers/perf: riscv: Return error for default case
drivers/perf: riscv: Fix Platform firmware event data
Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-0-813e8a4f5962@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Platform firmware event data field is allowed to be 62 bits for
Linux as uppper most two bits are reserved to indicate SBI fw or
platform specific firmware events.
However, the event data field is masked as per the hardware raw
event mask which is not correct.
Fix the platform firmware event data field with proper mask.
Fixes: f0c9363db2 ("perf/riscv-sbi: Add platform specific firmware event handling")
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-1-813e8a4f5962@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Prior to commit 5d5fc33ce5 ("riscv: Improve exception and system call
latency"), backtrace through exception worked since ra was filled with
ret_from_exception symbol address and the stacktrace code checked 'pc' to
be equal to that symbol. Now that handle_exception uses regular 'call'
instructions, this isn't working anymore and backtrace stops at
handle_exception(). Since there are multiple call site to C code in the
exception handling path, rather than checking multiple potential return
addresses, add a new symbol at the end of exception handling and check pc
to be in that range.
Fixes: 5d5fc33ce5 ("riscv: Improve exception and system call latency")
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20241209155714.1239665-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In sparse vmemmap model, the virtual address of vmemmap is calculated as:
((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT)).
And the struct page's va can be calculated with an offset:
(vmemmap + (pfn)).
However, when initializing struct pages, kernel actually starts from the
first page from the same section that phys_ram_base belongs to. If the
first page's physical address is not (phys_ram_base >> PAGE_SHIFT), then
we get an va below VMEMMAP_START when calculating va for it's struct page.
For example, if phys_ram_base starts from 0x82000000 with pfn 0x82000, the
first page in the same section is actually pfn 0x80000. During
init_unavailable_range(), we will initialize struct page for pfn 0x80000
with virtual address ((struct page *)VMEMMAP_START - 0x2000), which is
below VMEMMAP_START as well as PCI_IO_END.
This commit fixes this bug by introducing a new variable
'vmemmap_start_pfn' which is aligned with memory section size and using
it to calculate vmemmap address instead of phys_ram_base.
Fixes: a11dd49dcb ("riscv: Sparse-Memory/vmemmap out-of-bounds fix")
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20241209122617.53341-1-luxu.kernel@bytedance.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
p->ainsn.api.insn is a pointer to u32, therefore arithmetic operations are
multiplied by four. This is clearly undesirable for this case.
Cast it to (void *) first before any calculation.
Below is a sample before/after. The dumped memory is two kprobe slots, the
first slot has
- c.addiw a0, 0x1c (0x7125)
- ebreak (0x00100073)
and the second slot has:
- c.addiw a0, -4 (0x7135)
- ebreak (0x00100073)
Before this patch:
(gdb) x/16xh 0xff20000000135000
0xff20000000135000: 0x7125 0x0000 0x0000 0x0000 0x7135 0x0010 0x0000 0x0000
0xff20000000135010: 0x0073 0x0010 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
After this patch:
(gdb) x/16xh 0xff20000000125000
0xff20000000125000: 0x7125 0x0073 0x0010 0x0000 0x7135 0x0073 0x0010 0x0000
0xff20000000125010: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
Fixes: b1756750a3 ("riscv: kprobes: Use patch_text_nosync() for insn slots")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20241119111056.2554419-1-namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
die() can be called in exception handler, and therefore cannot sleep.
However, die() takes spinlock_t which can sleep with PREEMPT_RT enabled.
That causes the following warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 285, name: mutex
preempt_count: 110001, expected: 0
RCU nest depth: 0, expected: 0
CPU: 0 UID: 0 PID: 285 Comm: mutex Not tainted 6.12.0-rc7-00022-ge19049cf7d56-dirty #234
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
dump_backtrace+0x1c/0x24
show_stack+0x2c/0x38
dump_stack_lvl+0x5a/0x72
dump_stack+0x14/0x1c
__might_resched+0x130/0x13a
rt_spin_lock+0x2a/0x5c
die+0x24/0x112
do_trap_insn_illegal+0xa0/0xea
_new_vmalloc_restore_context_a0+0xcc/0xd8
Oops - illegal instruction [#1]
Switch to use raw_spinlock_t, which does not sleep even with PREEMPT_RT
enabled.
Fixes: 76d2a0493a ("RISC-V: Init and Halt Code")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20241118091333.1185288-1-namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
relocation_head's list_head member, rel_entry, doesn't need to be
allocated, its storage can just be part of the allocated relocation_head.
Remove the pointer which allows to get rid of the allocation as well as
an existing memory leak found by Kai Zhang using kmemleak.
Fixes: 8fd6c51423 ("riscv: Add remaining module relocations")
Reported-by: Kai Zhang <zhangkai@iscas.ac.cn>
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20241128081636.3620468-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When building allmodconfig + ThinLTO with certain versions of clang,
arch_set_bit() may not be inlined, resulting in a modpost warning:
WARNING: modpost: vmlinux: section mismatch in reference: arch_set_bit+0x58 (section: .text.arch_set_bit) -> numa_nodes_parsed (section: .init.data)
acpi_numa_rintc_affinity_init() calls arch_set_bit() via __node_set()
with numa_nodes_parsed, which is marked as __initdata. If arch_set_bit()
is not inlined, modpost will flag that it is being called with data that
will be freed after init.
As acpi_numa_rintc_affinity_init() is marked as __init, there is not
actually a functional issue here. However, the bitop functions should be
marked as __always_inline, so that they work consistently for init and
non-init code, which the comment in include/linux/nodemask.h alludes to.
This matches s390 and x86's implementations.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Currently, kvm doesn't delegate the few traps such as misaligned
load/store, illegal instruction and load/store access faults because it
is not expected to occur in the guest very frequently. Thus, kvm gets a
chance to act upon it or collect statistics about it before redirecting
the traps to the guest.
Collect both guest and host visible statistics during the traps.
Enable them so that both guest and host can collect the stats about
them if required.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241224-kvm_guest_stat-v2-3-08a77ac36b02@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
SBI PMU specification defines few firmware counters which can be
used by the guests to collect the statstics about various traps
occurred in the host.
Update these counters whenever a corresponding trap is taken
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241224-kvm_guest_stat-v2-2-08a77ac36b02@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The M-mode redirects an unhandled instruction access
fault trap back to S-mode when not delegating it to
VS-mode(hedeleg). However, KVM running in HS-mode
terminates the VS-mode software when back from M-mode.
The KVM should redirect the trap back to VS-mode, and
let VS-mode trap handler decide the next step.
Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241224-kvm_guest_stat-v2-1-08a77ac36b02@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Implement a KVM SBI SUSP extension handler. The handler only
validates the system suspend entry criteria and prepares for resuming
in the appropriate state at the resume_addr (as specified by the SBI
spec), but then it forwards the call to the VMM where any system
suspend behavior may be implemented. Since VMM support is needed, KVM
disables the extension by default.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20241017074538.18867-5-ajones@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Commit ba6cfef057 ("riscv: enable Docker requirements in defconfig")
introduced it because of Docker, but Docker has removed this requirement
since [1] (2023-04-19).
For cgroup v1, if turned on, and there's any cgroup in the "cpu" hierarchy it
needs an RT budget assigned, otherwise the processes in it will not be able to
get RT at all. The problem with RT group scheduling is that it requires the
budget assigned but there's no way we could assign a default budget, since the
values to assign are both upper and lower time limits, are absolute, and need to
be sum up to < 1 for each individal cgroup. That means we cannot really come up
with values that would work by default in the general case.[2]
For cgroup v2, it's almost unusable as well. If it turned on, the cpu controller
can only be enabled when all RT processes are in the root cgroup. But it will
lose the benefits of cgroup v2 if all RT process were placed in the same cgroup.
Red Hat, Gentoo, Arch Linux and Debian all disable it. systemd also doesn't
support it.[3]
[1]: 005150ed69
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1229700
[3]: https://github.com/systemd/systemd/issues/13781#issuecomment-549164383
Acked-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
Acked-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240910-fix-riscv-rt_group_sched-v3-1-486e75e5ae6d@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Fprobe store its data structure address and size on the fgraph return stack
by __fprobe_header. But most 64bit architecture can combine those to
one unsigned long value because 4 MSB in the kernel address are the same.
With this encoding, fprobe can consume less space on ret_stack.
This introduces asm/fprobe.h to define arch dependent encode/decode
macros. Note that since fprobe depends on CONFIG_HAVE_FUNCTION_GRAPH_FREGS,
currently only arm64, loongarch, riscv, s390 and x86 are supported.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173519005783.391279.5307910947400277525.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Rewrite fprobe implementation on function-graph tracer.
Major API changes are:
- 'nr_maxactive' field is deprecated.
- This depends on CONFIG_DYNAMIC_FTRACE_WITH_ARGS or
!CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and
CONFIG_HAVE_FUNCTION_GRAPH_FREGS. So currently works only
on x86_64.
- Currently the entry size is limited in 15 * sizeof(long).
- If there is too many fprobe exit handler set on the same
function, it will fail to probe.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/173519003970.391279.14406792285453830996.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Add CONFIG_HAVE_FTRACE_GRAPH_FUNC kconfig in addition to ftrace_graph_func
macro check. This is for the other feature (e.g. FPROBE) which requires to
access ftrace_regs from fgraph_ops::entryfunc() can avoid compiling if
the fgraph can not pass the valid ftrace_regs.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173519001472.391279.1174901685282588467.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Add ftrace_partial_regs() which converts the ftrace_regs to pt_regs.
This is for the eBPF which needs this to keep the same pt_regs interface
to access registers.
Thus when replacing the pt_regs with ftrace_regs in fprobes (which is
used by kprobe_multi eBPF event), this will be used.
If the architecture defines its own ftrace_regs, this copies partial
registers to pt_regs and returns it. If not, ftrace_regs is the same as
pt_regs and ftrace_partial_regs() will return ftrace_regs::regs.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Florent Revest <revest@chromium.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Link: https://lore.kernel.org/173518996761.391279.4987911298206448122.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not
available, it passes a NULL instead. User callback function can access
some registers (including return address) via this ftrace_regs.
Note that the ftrace_regs can be NULL when the arch does NOT define:
HAVE_DYNAMIC_FTRACE_WITH_ARGS or HAVE_DYNAMIC_FTRACE_WITH_REGS.
More specifically, if HAVE_DYNAMIC_FTRACE_WITH_REGS is defined but
not the HAVE_DYNAMIC_FTRACE_WITH_ARGS, and the ftrace ops used to
register the function callback does not set FTRACE_OPS_FL_SAVE_REGS.
In this case, ftrace_regs can be NULL in user callback.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173518990044.391279.17406984900626078579.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
- Rework vcpu_get_reg() to return a value instead of using an out-param, and
update all affected arch code accordingly.
- Convert the max_guest_memory_test into a more generic mmu_stress_test.
The basic gist of the "conversion" is to have the test do mprotect() on
guest memory while vCPUs are accessing said memory, e.g. to verify KVM
and mmu_notifiers are working as intended.
- Play nice with treewrite builds of unsupported architectures, e.g. arm
(32-bit), as KVM selftests' Makefile doesn't do anything to ensure the
target architecture is actually one KVM selftests supports.
- Use the kernel's $(ARCH) definition instead of the target triple for arch
specific directories, e.g. arm64 instead of aarch64, mainly so as not to
be different from the rest of the kernel.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmdjehwACgkQOlYIJqCj
N/0+/RAAh3M2IsNCtaiJ7n3COe9DHUuqxherS625J9YEOTCcGrZUd1WoSBvdDCIW
46YKEYdHIpFHOYMEDPg5ODd20/y6lLw1yDKW7xj22cC8Np1TrPbt0q+PqaDdb4RR
UlyObB6OsI/wxUHjsrvg2ZmDwAH9hIzh0kTUKPv7NZdaB+kT49eBV+tILDHtUGTx
m0LUcNUIZBUhUE2YjnGNIoPQg4w+H1bYFK8lM62Rx09HtXbJ93VlMSxBq4ModMDq
v74F6263NpOiou87bXperWT7iAWAWSqjGR+K/cwtnJpg2sosLlgpXYTnuUEiQMOL
DU2c2u3lPRZMAhUVzj6qRr9nVBICfgk/lS5nkCb6khNZ296VK2UzimQF429a8Dhy
ZkgOduNNuGVMBW6/Mc96L9ygo2yXNpGcT72RZ/2C8u/Zt0RwtYMYkzrBZQeqvgXp
wMPxgkwatHLm7VSXzSLVd1c28GXTS8Fdg77HbUbOJNjvigaOeUR8ti82LRIWPn0Y
XRyMpU1fK1wtuSkIEdxF8KWqr9B4ZPMxqFNE+ntyIhfUKvNdz6mRpQj/eYVbVdfn
poXco5iq02aCDo7q1Zqx1rxeNrYUvTdq9UmllilRbYLVOC0M+Rf+jBvqM+VBhUCu
hFSd80Pkh2iR4JuRDCtXQmk/cNfjk1CTgkeDmpm00Hdk98brmlc=
=mFjt
-----END PGP SIGNATURE-----
Merge tag 'kvm-selftests-treewide-6.14' of https://github.com/kvm-x86/linux into HEAD
KVM selftests "tree"-wide changes for 6.14:
- Rework vcpu_get_reg() to return a value instead of using an out-param, and
update all affected arch code accordingly.
- Convert the max_guest_memory_test into a more generic mmu_stress_test.
The basic gist of the "conversion" is to have the test do mprotect() on
guest memory while vCPUs are accessing said memory, e.g. to verify KVM
and mmu_notifiers are working as intended.
- Play nice with treewrite builds of unsupported architectures, e.g. arm
(32-bit), as KVM selftests' Makefile doesn't do anything to ensure the
target architecture is actually one KVM selftests supports.
- Use the kernel's $(ARCH) definition instead of the target triple for arch
specific directories, e.g. arm64 instead of aarch64, mainly so as not to
be different from the rest of the kernel.
Define KVM_REG_SIZE() in the common kvm.h header, and delete the arm64 and
RISC-V versions. As evidenced by the surrounding definitions, all aspects
of the register size encoding are generic, i.e. RISC-V should have moved
arm64's definition to common code instead of copy+pasting.
Acked-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20241128005547.4077116-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Enable pinctrl and ethernet dwmac driver for the TH1520 SoC boards like
the BeagleV Ahead and the Sipeed LicheePi 4A.
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Drew Fustini <drew@pdp7.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* Fix confusion with implicitly-shifted MDCR_EL2 masks breaking
SPE/TRBE initialization.
* Align nested page table walker with the intended memory attribute
combining rules of the architecture.
* Prevent userspace from constraining the advertised ASID width,
avoiding horrors of guest TLBIs not matching the intended context in
hardware.
* Don't leak references on LPIs when insertion into the translation
cache fails.
RISC-V:
* Replace csr_write() with csr_set() for HVIEN PMU overflow bit.
x86:
* Cache CPUID.0xD XSTATE offsets+sizes during module init - On Intel's
Emerald Rapids CPUID costs hundreds of cycles and there are a lot of
leaves under 0xD. Getting rid of the CPUIDs during nested VM-Enter and
VM-Exit is planned for the next release, for now just cache them: even
on Skylake that is 40% faster.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmdcibgUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOQsgf+NwNdfNQ0V5vU7YNeVxyhkCyYvNiA
njvBTd1Lwh7EDtJ2NLKzwHktH2ymQI8qykxKr/qY3Jxkow+vcvsK0LacAaJdIzGo
jnMGxXxRCFpxdkNb1kDJk4Cd6GSSAxYwgPj3wj7whsMcVRjPlFcjuHf02bRUU0Gt
yulzBOZJ/7QTquKSnwt1kZQ1i/mJ8wCh4vJArZqtcImrDSK7oh+BaQ44h+lNe8qa
Xiw6Fw3tYXgHy5WlnUU/OyFs+bZbcVzPM75qYgdGIWSo0TdL69BeIw8S4K2Ri4eL
EoEBigwAd8PiF16Q1wO4gXWcNwinMTs3LIftxYpENTHA5gnrS5hgWWDqHw==
=4v2y
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Fix confusion with implicitly-shifted MDCR_EL2 masks breaking
SPE/TRBE initialization
- Align nested page table walker with the intended memory attribute
combining rules of the architecture
- Prevent userspace from constraining the advertised ASID width,
avoiding horrors of guest TLBIs not matching the intended context
in hardware
- Don't leak references on LPIs when insertion into the translation
cache fails
RISC-V:
- Replace csr_write() with csr_set() for HVIEN PMU overflow bit
x86:
- Cache CPUID.0xD XSTATE offsets+sizes during module init
On Intel's Emerald Rapids CPUID costs hundreds of cycles and there
are a lot of leaves under 0xD. Getting rid of the CPUIDs during
nested VM-Enter and VM-Exit is planned for the next release, for
now just cache them: even on Skylake that is 40% faster"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
RISC-V: KVM: Fix csr_write -> csr_set for HVIEN PMU overflow bit
KVM: arm64: vgic-its: Add error handling in vgic_its_cache_translation
KVM: arm64: Do not allow ID_AA64MMFR0_EL1.ASIDbits to be overridden
KVM: arm64: Fix S1/S2 combination when FWB==1 and S2 has Device memory type
arm64: Fix usage of new shifted MDCR_EL2 values
Add mailbox device tree node. This work is based on the vendor kernel [1].
Link: https://github.com/revyos/thead-kernel.git [1]
Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Reviewed-by: Drew Fustini <dfustini@tenstorrent.com>
Signed-off-by: Drew Fustini <dfustini@tenstorrent.com>
riscv uses fixmap addresses to map the dtb so we can't use __pa() which
is reserved for linear mapping addresses.
Fixes: b2473a3597 ("of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20241209074508.53037-1-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When CONFIG_DEBUG_RT_MUTEXES=y, mutex_lock->rt_mutex_try_acquire
would change from rt_mutex_cmpxchg_acquire to
rt_mutex_slowtrylock():
raw_spin_lock_irqsave(&lock->wait_lock, flags);
ret = __rt_mutex_slowtrylock(lock);
raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
Because queued_spin_#ops to ticket_#ops is changed one by one by
jump_label, raw_spin_lock/unlock would cause a deadlock during the
changing.
That means in arch/riscv/kernel/jump_label.c:
1.
arch_jump_label_transform_queue() ->
mutex_lock(&text_mutex); +-> raw_spin_lock -> queued_spin_lock
|-> raw_spin_unlock -> queued_spin_unlock
patch_insn_write -> change the raw_spin_lock to ticket_lock
mutex_unlock(&text_mutex);
...
2. /* Dirty the lock value */
arch_jump_label_transform_queue() ->
mutex_lock(&text_mutex); +-> raw_spin_lock -> *ticket_lock*
|-> raw_spin_unlock -> *queued_spin_unlock*
/* BUG: ticket_lock with queued_spin_unlock */
patch_insn_write -> change the raw_spin_unlock to ticket_unlock
mutex_unlock(&text_mutex);
...
3. /* Dead lock */
arch_jump_label_transform_queue() ->
mutex_lock(&text_mutex); +-> raw_spin_lock -> ticket_lock /* deadlock! */
|-> raw_spin_unlock -> ticket_unlock
patch_insn_write -> change other raw_spin_#op -> ticket_#op
mutex_unlock(&text_mutex);
So, the solution is to disable mutex usage of
arch_jump_label_transform_queue() during early_boot_irqs_disabled, just
like we have done for stop_machine.
Reported-by: Conor Dooley <conor@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Fixes: ab83647fad ("riscv: Add qspinlock support")
Link: https://lore.kernel.org/linux-riscv/CAJF2gTQwYTGinBmCSgVUoPv0_q4EPt_+WiyfUA1HViAKgUzxAg@mail.gmail.com/T/#mf488e6347817fca03bb93a7d34df33d8615b3775
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241130153310.3349484-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Consolidate the machine_kexec_mask_interrupts implementation into a common
function located in a new file: kernel/irq/kexec.c. This removes duplicate
implementations from architecture-specific files in arch/arm, arch/arm64,
arch/powerpc, and arch/riscv, reducing code duplication and improving
maintainability.
The new implementation retains architecture-specific behavior for
CONFIG_GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD, which was previously implemented
for ARM64. When enabled (currently for ARM64), it clears the active state
of interrupts forwarded to virtual machines (VMs) before handling other
interrupt masking operations.
Signed-off-by: Eliav Farber <farbere@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241204142003.32859-2-farbere@amazon.com
Remove redundant release/acquire barriers, optimizing the lr/sc sequence
to provide conditional RCsc synchronization, per the RVWMO.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20241113183321.491113-1-dave@stgolabs.net
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Enable pinctrl and ethernet dwmac driver for the TH1520 SoC boards like
the BeagleV Ahead and the Sipeed LicheePi 4A.
Signed-off-by: Drew Fustini <drew@pdp7.com>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://lore.kernel.org/r/20241113184333.829716-1-drew@pdp7.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This doesn't cause a problem currently as HVIEN isn't used elsewhere
yet. Found by inspection.
Signed-off-by: Michael Neuling <michaelneuling@tenstorrent.com>
Fixes: 16b0bde9a3 ("RISC-V: KVM: Add perf sampling support for guests")
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20241127041840.419940-1-michaelneuling@tenstorrent.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Milk-V Mars board routes one of four USB-A ports to USB0 on the SoC
rather than to the VL805 USB 3.0 <-> PCIe chip.
Set JH7110 on-chip USB host mode and vbus pin assignment accordingly.
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: E Shattow <e@freeshell.de>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Pine64 Star64 board routes all four USB-A ports to USB0 on the SoC.
Set JH7110 on-chip USB host mode and vbus pin assignment accordingly.
Signed-off-by: E Shattow <e@freeshell.de>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Make the CRC32 library export a function crc32_optimizations() which
returns flags that indicate which CRC32 functions are actually executing
optimized code at runtime.
This will be used to determine whether the crc32[c]-$arch shash
algorithms should be registered in the crypto API. btrfs could also
start using these flags instead of the hack that it currently uses where
it parses the crypto_shash_driver_name.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Currently the CRC32 library functions are defined as weak symbols, and
the arm64 and riscv architectures override them.
This method of arch-specific overrides has the limitation that it only
works when both the base and arch code is built-in. Also, it makes the
arch-specific code be silently not used if it is accidentally built with
lib-y instead of obj-y; unfortunately the RISC-V code does this.
This commit reorganizes the code to have explicit *_arch() functions
that are called when they are enabled, similar to how some of the crypto
library code works (e.g. chacha_crypt() calls chacha_crypt_arch()).
Make the existing kconfig choice for the CRC32 implementation also
control whether the arch-optimized implementation (if one is available)
is enabled or not. Make it enabled by default if CRC32 is also enabled.
The result is that arch-optimized CRC32 library functions will be
included automatically when appropriate, but it is now possible to
disable them. They can also now be built as a loadable module if the
CRC32 library functions happen to be used only by loadable modules, in
which case the arch and base CRC32 modules will be automatically loaded
via direct symbol dependency when appropriate.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>