The DMA map functions can fail and should be tested for errors.
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250620075602.12575-1-fourier.thomas@gmail.com
If we always align the vmemmap start to PAGE_SIZE, there is a
chance that we may end up allocating page-sized vmemmap backing
pages in RAM in the altmap not present case, because a PAGE_SIZE
aligned address is not PMD_SIZE-aligned.
In this patch, we are aligning the vmemmap start address to
PMD_SIZE if altmap is not present. This ensures that a PMD_SIZE
page is always allocated for the vmemmap mapping if altmap is
not present.
If altmap is present, Make sure we align the start vmemmap addr to
PAGE_SIZE so that we calculate the correct start_pfn in altmap
boundary check to decide whether we should use altmap or RAM based
backing memory allocation. Also the address need to be aligned for
set_pte operation. If the start addr is already PMD_SIZE aligned
and with in the altmap boundary then we will try to use a pmd size
altmap mapping else we go for page size mapping.
So if altmap is present, we try to use the maximum number of
altmap pages; otherwise, we allocate a PMD_SIZE RAM page.
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/895c4afd912c85d344a2065e348fac90529ed48f.1750593372.git.donettom@linux.ibm.com
Error conditions are not handled properly if altmap is not present
and PMD_SIZE vmemmap_alloc_block_buf fails.
In this patch, if vmemmap_alloc_block_buf fails in the non-altmap
case, we will fall back to the base mapping.
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/7f95fe91c827a2fb76367a58dbea724e811fb152.1750593372.git.donettom@linux.ibm.com
IO hotplug add event is handled in the user space with drmgr tool.
After the device is enabled, the user space uses /sys/kernel/dlpar
interface with “dt add index <drc_index>” to update the device tree.
The kernel interface (dlpar_hp_dt_add()) finds the parent node for
the specified ‘drc_index’ from ibm,drc-info property. The recent FW
provides this property from 2017 onwards. But KVM guest code in
some releases is still using the older SLOF firmware which has
ibm,drc-indexes property instead of ibm,drc-info.
If the ibm,drc-info is not available, this patch adds changes to
search ‘drc_index’ from the indexes array in ibm,drc-indexes
property to support old FW.
Fixes: 02b98ff44a ("powerpc/pseries/dlpar: Add device tree nodes for DLPAR IO add")
Reported-by: Kowshik Jois <kowsjois@linux.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Amit Machhiwal <amachhiw@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250531235002.239213-1-haren@linux.ibm.com
The macro kvm_trace_symbol_exit is used for providing the mappings
for the trap vectors and their names. Add mapping for H_VIRT so that
trap reason is displayed as string instead of a vector number when using
the kvm_guest_exit tracepoint.
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250516121225.276466-1-gautam@linux.ibm.com
use guard(mutex) for scope based resource management of mutex.
This would make the code simpler and easier to maintain.
More details on lock guards can be found at
https://lore.kernel.org/all/20230612093537.614161713@infradead.org/T/#u
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250505075333.184463-6-sshegde@linux.ibm.com
use lock guards for scope based resource management of mutex.
This would make the code simpler and easier to maintain.
More details on lock guards can be found at
https://lore.kernel.org/all/20230612093537.614161713@infradead.org/T/#u
This shows the use of both guard and scoped_guard
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250505075333.184463-5-sshegde@linux.ibm.com
use scoped_guard for scope based resource management of mutex.
This would make the code simpler and easier to maintain.
More details on lock guards can be found at
https://lore.kernel.org/all/20230612093537.614161713@infradead.org/T/#u
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250505075333.184463-4-sshegde@linux.ibm.com
The kernel uses 3100 to indicate ISA version 3.1, not 3010, so
fix the Microwatt device tree to use 3100.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/aB6taMDWvJwOl9xj@bruin
Commit 030bdc3fd0 ("powerpc/defconfigs: Set HZ=100 on pseries and ppc64
defconfigs") lowered CONFIG_HZ from 250 to 100, citing reduced need for a
higher tick rate due to high-resolution timers and concerns about timer
interrupt overhead and cascading effects in the timer wheel.
However, improvements have been made to the timer wheel algorithm since
then, particularly in eliminating cascading effects at the cost of minor
timekeeping inaccuracies. More details are available here
https://lwn.net/Articles/646950/. This removes the original concern about
cascading, and the reliance on high-resolution timers is not applicable
to the scheduler, which still depends on periodic ticks set by CONFIG_HZ.
With the introduction of the EEVDF scheduler, users can specify custom
slices for workloads. The default base_slice is 3ms, but with CONFIG_HZ=100
(10ms tick interval), base_slice is ineffective. Workloads like stress-ng
that do not voluntarily yield the CPU run for ~10ms before switching out.
Additionally, setting a custom slice below 3ms (e.g., 2ms) should lower
task latency, but this effect is lost due to the coarse 10ms tick.
By increasing CONFIG_HZ to 1000 (1ms tick), base_slice is properly honored,
and user-defined slices work as expected. Benchmark results support this
change:
Latency improvements in schbench with EEVDF under stress-ng-induced noise:
Scheduler CONFIG_HZ Custom Slice 99th Percentile Latency (µs)
--------------------------------------------------------------------
EEVDF 1000 No 0.30x
EEVDF 1000 2 ms 0.29x
EEVDF (default) 100 No 1.00x
Switching to HZ=1000 reduces the 99th percentile latency in schbench by
~70%. This improvement occurs because, with HZ=1000, stress-ng tasks run
for ~3ms before yielding, compared to ~10ms with HZ=100. As a result,
schbench gets CPU time sooner, reducing its latency.
Daytrader Performance:
Daytrader results show minor variation within standard deviation,
indicating no significant regression.
Workload (Users/Instances) Throughput 1000HZ vs 100HZ (Std Dev%)
--------------------------------------------------------------------------
30 u, 1 i +3.01% (1.62%)
60 u, 1 i +1.46% (2.69%)
90 u, 1 i –1.33% (3.09%)
30 u, 2 i -1.20% (1.71%)
30 u, 3 i –0.07% (1.33%)
Avg. Response Time: No Change (=)
pgbench select queries:
Metric 1000HZ vs 100HZ (Std Dev%)
------------------------------------------------------------------
Average TPS Change +2.16% (1.27%)
Average Latency Change –2.21% (1.21%)
Average TPS: Higher the better
Average Latency: Lower the better
pgbench shows both throughput and latency improvements beyond standard
deviation.
Given these results and the improvements in timer wheel implementation,
increasing CONFIG_HZ to 1000 ensures that powerpc benefits from EEVDF’s
base_slice and allows fine-tuned scheduling for latency-sensitive
workloads.
Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Reviewed-by: Mukesh Kumar Chaurasiya <mchauras@linux.ibm.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250330074734.16679-1-vineethr@linux.ibm.com
This adds all symbols required for use case like
livepatching. Distros already enable this config
and enabling this increases build time by 3%
(in a power9 128 cpu setup) and almost no size
changes for vmlinux.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250116073419.344453-1-maddy@linux.ibm.com
Without this, the compiler (clang21) might emit a warning under W=1
because the variable ori31_emitted is set but never used if
CONFIG_PPC_BOOK3S_64=n.
Without this patch:
$ make -j $(nproc) W=1 ARCH=powerpc SHELL=/bin/bash arch/powerpc/net
[...]
CC arch/powerpc/net/bpf_jit_comp.o
CC arch/powerpc/net/bpf_jit_comp64.o
../arch/powerpc/net/bpf_jit_comp64.c: In function 'bpf_jit_build_body':
../arch/powerpc/net/bpf_jit_comp64.c:417:28: warning: variable 'ori31_emitted' set but not used [-Wunused-but-set-variable]
417 | bool sync_emitted, ori31_emitted;
| ^~~~~~~~~~~~~
AR arch/powerpc/net/built-in.a
With this patch:
[...]
CC arch/powerpc/net/bpf_jit_comp.o
CC arch/powerpc/net/bpf_jit_comp64.o
AR arch/powerpc/net/built-in.a
Fixes: dff883d9e9 ("bpf, arm64, powerpc: Change nospec to include v1 barrier")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506180402.uUXwVoSH-lkp@intel.com/
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20250619142647.2157017-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
uart_port::{serial_in,serial_out} (and plat_serial8250_port::* likewise)
historically use:
* 'unsigned int' for 32-bit register values in reads and writes, and
* 'int' for offsets.
Make them sane such that:
* 'u32' is used for register values, and
* 'unsigned int' is used for offsets.
While at it, name hooks' parameters, so it is clear what is what.
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Vladimir Zapolskiy <vz@mleia.com>
Cc: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20250611100319.186924-9-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All these includes must have been cut & pasted. The code does not use
any tty or vt functionality at all.
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250611100319.186924-6-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It makes the code easier to read as casts are not needed.
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20250611100319.186924-4-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Caching the port and info in local variables makes the code more compact
and easier to understand.
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250611100319.186924-3-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Fix to handle VDSO32 with pcrel
- Couple of dts fixes in microwatt and mpc8315erdb
- Fix to handle PE bridge reconfiguration in VFIO EEH recovery path
- Fix ioctl macros related to struct termio
Thanks to: Christophe Leroy, Ganesh Goudar, J. Neuschäfer, Justin M. Forbes, Michael Ellerman, Narayana Murty N, Tulio Magno, Vaibhav Jain
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmhP5NwACgkQpnEsdPSH
ZJTOmxAAnDlz5yRK9IsFBeLBaQisuDJ0SuVa6QwzjtPcR7a5MqrezEz6Pple/S5i
cH+L2/N18CJ1IgG3QM4Grx0I6AeatVKaEhGeSvbX9xESpoY6x28RO09d8hHckOwl
4WvYE9QfQASJL1doatZI/1hVOV/yY/BwqYqQvucW5wcyR4RQzg7wT4Uz6AA71awW
kN9hNYrWbmG/CiLzF213n82n7wAOEB7ZMhZLveSKIl8TfkAuJT2TT8hOTez4cRmZ
bn/sM07Liey0spgYycBUx2640sglIiXYxaeRHirK+4xc0XRo3LGsBpeprMNaPTAR
g4kOCBOj7Bo8KSxZNwRmP8oPC3Lwh+50Hx7zLM9qGxguIa/pcXUZ2cTj66wmnD/7
4+/OFGBT2uirfkjJicuWxJTDHsokEiwapfIK1LH0gW/HZCAAmTGU5u7lvwknWsD6
ZPp9tcTu8bcwtUIrQEzMR+UF5IKpyhVMkQBS29DgK43PcEmneMRBfBOrmPmeUIT3
U+BJDM/0cnk7KBlQrB6EeZXB1hU2d1Lgylu/v82H8ZwpB/BoKY6AhhZCvKuJHKtJ
8D2P11XKOD8VWzgddUANI/AvE5eKLsljWvJdge7oaMJ5D70aSAkwnd24CNRFdArC
/uBoTvpTr4RFLGK7icvTXdQGs7UHNB/oR4MO33iBzGYY8QfpOWE=
=Tpj+
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fix to handle VDSO32 with pcrel
- Couple of dts fixes in microwatt and mpc8315erdb
- Fix to handle PE bridge reconfiguration in VFIO EEH recovery path
- Fix ioctl macros related to struct termio
Thanks to Christophe Leroy, Ganesh Goudar, J. Neuschäfer, Justin M.
Forbes, Michael Ellerman, Narayana Murty N, Tulio Magno, and Vaibhav
Jain
* tag 'powerpc-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix struct termio related ioctl macros
powerpc: dts: mpc8315erdb: Add GPIO controller node
powerpc/microwatt: Fix model property in device tree
powerpc/eeh: Fix missing PE bridge reconfiguration during VFIO EEH recovery
powerpc/vdso: Fix build of VDSO32 with pcrel
Since termio interface is now obsolete, include/uapi/asm/ioctls.h
has some constant macros referring to "struct termio", this caused
build failure at userspace.
In file included from /usr/include/asm/ioctl.h:12,
from /usr/include/asm/ioctls.h:5,
from tst-ioctls.c:3:
tst-ioctls.c: In function 'get_TCGETA':
tst-ioctls.c:12:10: error: invalid application of 'sizeof' to incomplete type 'struct termio'
12 | return TCGETA;
| ^~~~~~
Even though termios.h provides "struct termio", trying to juggle definitions around to
make it compile could introduce regressions. So better to open code it.
Reported-by: Tulio Magno <tuliom@ascii.art.br>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
Closes: https://lore.kernel.org/linuxppc-dev/8734dji5wl.fsf@ascii.art.br/
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250517142237.156665-1-maddy@linux.ibm.com
The MPC8315E SoC and variants have a GPIO controller at IMMR + 0xc00.
This node was previously missing from the device tree.
Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250611-mpc-gpio-v1-1-02d1f75336e2@posteo.net
VFIO EEH recovery for PCI passthrough devices fails on PowerNV and pseries
platforms due to missing host-side PE bridge reconfiguration. In the
current implementation, eeh_pe_configure() only performs RTAS or OPAL-based
bridge reconfiguration for native host devices, but skips it entirely for
PEs managed through VFIO in guest passthrough scenarios.
This leads to incomplete EEH recovery when a PCI error affects a
passthrough device assigned to a QEMU/KVM guest. Although VFIO triggers the
EEH recovery flow through VFIO_EEH_PE_ENABLE ioctl, the platform-specific
bridge reconfiguration step is silently bypassed. As a result, the PE's
config space is not fully restored, causing subsequent config space access
failures or EEH freeze-on-access errors inside the guest.
This patch fixes the issue by ensuring that eeh_pe_configure() always
invokes the platform's configure_bridge() callback (e.g.,
pseries_eeh_phb_configure_bridge) even for VFIO-managed PEs. This ensures
that RTAS or OPAL calls to reconfigure the PE bridge are correctly issued
on the host side, restoring the PE's configuration space after an EEH
event.
This fix is essential for reliable EEH recovery in QEMU/KVM guests using
VFIO PCI passthrough on PowerNV and pseries systems.
Tested with:
- QEMU/KVM guest using VFIO passthrough (IBM Power9,(lpar)Power11 host)
- Injected EEH errors with pseries EEH errinjct tool on host, recovery
verified on qemu guest.
- Verified successful config space access and CAP_EXP DevCtl restoration
after recovery
Fixes: 212d16cdca ("powerpc/eeh: EEH support for VFIO PCI device")
Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250508062928.146043-1-nnmlinux@linux.ibm.com
Building vdso32 on power10 with pcrel leads to following errors:
VDSO32A arch/powerpc/kernel/vdso/gettimeofday-32.o
arch/powerpc/kernel/vdso/gettimeofday.S: Assembler messages:
arch/powerpc/kernel/vdso/gettimeofday.S:40: Error: syntax error; found `@', expected `,'
arch/powerpc/kernel/vdso/gettimeofday.S:71: Info: macro invoked from here
arch/powerpc/kernel/vdso/gettimeofday.S:40: Error: junk at end of line: `@notoc'
arch/powerpc/kernel/vdso/gettimeofday.S:71: Info: macro invoked from here
...
make[2]: *** [arch/powerpc/kernel/vdso/Makefile:85: arch/powerpc/kernel/vdso/gettimeofday-32.o] Error 1
make[1]: *** [arch/powerpc/Makefile:388: vdso_prepare] Error 2
Once the above is fixed, the following happens:
VDSO32C arch/powerpc/kernel/vdso/vgettimeofday-32.o
cc1: error: '-mpcrel' requires '-mcmodel=medium'
make[2]: *** [arch/powerpc/kernel/vdso/Makefile:89: arch/powerpc/kernel/vdso/vgettimeofday-32.o] Error 1
make[1]: *** [arch/powerpc/Makefile:388: vdso_prepare] Error 2
make: *** [Makefile:251: __sub-make] Error 2
Make sure pcrel version of CFUNC() macro is used only for powerpc64
builds and remove -mpcrel for powerpc32 builds.
Fixes: 7e3a68be42 ("powerpc/64: vmlinux support building with PCREL addresing")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/1fa3453f07d42a50a70114da9905bf7b73304fca.1747073669.git.christophe.leroy@csgroup.eu
Make pte_swp_exclusive return bool instead of int. This will better
reflect how pte_swp_exclusive is actually used in the code.
This fixes swap/swapoff problems on Alpha due pte_swp_exclusive not
returning correct values when _PAGE_SWP_EXCLUSIVE bit resides in upper
32-bits of PTE (like on alpha).
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Cc: Sam James <sam@gentoo.org>
Link: https://lore.kernel.org/lkml/20250218175735.19882-2-linmag7@gmail.com/
Link: https://lore.kernel.org/lkml/20250602041118.GA2675383@ZenIV/
[ Applied as the 'sed' script Al suggested - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This changes the semantics of BPF_NOSPEC (previously a v4-only barrier)
to always emit a speculation barrier that works against both Spectre v1
AND v4. If mitigation is not needed on an architecture, the backend
should set bpf_jit_bypass_spec_v4/v1().
As of now, this commit only has the user-visible implication that unpriv
BPF's performance on PowerPC is reduced. This is the case because we
have to emit additional v1 barrier instructions for BPF_NOSPEC now.
This commit is required for a future commit to allow us to rely on
BPF_NOSPEC for Spectre v1 mitigation. As of this commit, the feature
that nospec acts as a v1 barrier is unused.
Commit f5e81d1117 ("bpf: Introduce BPF nospec instruction for
mitigating Spectre v4") noted that mitigation instructions for v1 and v4
might be different on some archs. While this would potentially offer
improved performance on PowerPC, it was dismissed after the following
considerations:
* Only having one barrier simplifies the verifier and allows us to
easily rely on v4-induced barriers for reducing the complexity of
v1-induced speculative path verification.
* For the architectures that implemented BPF_NOSPEC, only PowerPC has
distinct instructions for v1 and v4. Even there, some insns may be
shared between the barriers for v1 and v4 (e.g., 'ori 31,31,0' and
'sync'). If this is still found to impact performance in an
unacceptable way, BPF_NOSPEC can be split into BPF_NOSPEC_V1 and
BPF_NOSPEC_V4 later. As an optimization, we can already skip v1/v4
insns from being emitted for PowerPC with this setup if
bypass_spec_v1/v4 is set.
Vulnerability-status for BPF_NOSPEC-based Spectre mitigations (v4 as of
this commit, v1 in the future) is therefore:
* x86 (32-bit and 64-bit), ARM64, and PowerPC (64-bit): Mitigated - This
patch implements BPF_NOSPEC for these architectures. The previous
v4-only version was supported since commit f5e81d1117 ("bpf:
Introduce BPF nospec instruction for mitigating Spectre v4") and
commit b7540d6250 ("powerpc/bpf: Emit stf barrier instruction
sequences for BPF_NOSPEC").
* LoongArch: Not Vulnerable - Commit a6f6a95f25 ("LoongArch, bpf: Fix
jit to skip speculation barrier opcode") is the only other past commit
related to BPF_NOSPEC and indicates that the insn is not required
there.
* MIPS: Vulnerable (if unprivileged BPF is enabled) -
Commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation
barrier opcode") indicates that it is not vulnerable, but this
contradicts the kernel and Debian documentation. Therefore, I assume
that there exist vulnerable MIPS CPUs (but maybe not from Loongson?).
In the future, BPF_NOSPEC could be implemented for MIPS based on the
GCC speculation_barrier [1]. For now, we rely on unprivileged BPF
being disabled by default.
* Other: Unknown - To the best of my knowledge there is no definitive
information available that indicates that any other arch is
vulnerable. They are therefore left untouched (BPF_NOSPEC is not
implemented, but bypass_spec_v1/v4 is also not set).
I did the following testing to ensure the insn encoding is correct:
* ARM64:
* 'dsb nsh; isb' was successfully tested with the BPF CI in [2]
* 'sb' locally using QEMU v7.2.15 -cpu max (emitted sb insn is
executed for example with './test_progs -t verifier_array_access')
* PowerPC: The following configs were tested locally with ppc64le QEMU
v8.2 '-machine pseries -cpu POWER9':
* STF_BARRIER_EIEIO + CONFIG_PPC_BOOK32_64
* STF_BARRIER_SYNC_ORI (forced on) + CONFIG_PPC_BOOK32_64
* STF_BARRIER_FALLBACK (forced on) + CONFIG_PPC_BOOK32_64
* CONFIG_PPC_E500 (forced on) + STF_BARRIER_EIEIO
* CONFIG_PPC_E500 (forced on) + STF_BARRIER_SYNC_ORI (forced on)
* CONFIG_PPC_E500 (forced on) + STF_BARRIER_FALLBACK (forced on)
* CONFIG_PPC_E500 (forced on) + STF_BARRIER_NONE (forced on)
Most of those cobinations should not occur in practice, but I was not
able to get an PPC e6500 rootfs (for testing PPC_E500 without forcing
it on). In any case, this should ensure that there are no unexpected
conflicts between the insns when combined like this. Individual v1/v4
barriers were already emitted elsewhere.
Hari's ack is for the PowerPC changes only.
[1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=29b74545531f6afbee9fc38c267524326dbfbedf
("MIPS: Add speculation_barrier support")
[2] https://github.com/kernel-patches/bpf/pull/8576
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603211703.337860-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
JITs can set bpf_jit_bypass_spec_v1/v4() if they want the verifier to
skip analysis/patching for the respective vulnerability. For v4, this
will reduce the number of barriers the verifier inserts. For v1, it
allows more programs to be accepted.
The primary motivation for this is to not regress unpriv BPF's
performance on ARM64 in a future commit where BPF_NOSPEC is also used
against Spectre v1.
This has the user-visible change that v1-induced rejections on
non-vulnerable PowerPC CPUs are avoided.
For now, this does not change the semantics of BPF_NOSPEC. It is still a
v4-only barrier and must not be implemented if bypass_spec_v4 is always
true for the arch. Changing it to a v1 AND v4-barrier is done in a
future commit.
As an alternative to bypass_spec_v1/v4, one could introduce NOSPEC_V1
AND NOSPEC_V4 instructions and allow backends to skip their lowering as
suggested by commit f5e81d1117 ("bpf: Introduce BPF nospec instruction
for mitigating Spectre v4"). Adding bpf_jit_bypass_spec_v1/v4() was
found to be preferable for the following reason:
* bypass_spec_v1/v4 benefits non-vulnerable CPUs: Always performing the
same analysis (not taking into account whether the current CPU is
vulnerable), needlessly restricts users of CPUs that are not
vulnerable. The only use case for this would be portability-testing,
but this can later be added easily when needed by allowing users to
force bypass_spec_v1/v4 to false.
* Portability is still acceptable: Directly disabling the analysis
instead of skipping the lowering of BPF_NOSPEC(_V1/V4) might allow
programs on non-vulnerable CPUs to be accepted while the program will
be rejected on vulnerable CPUs. With the fallback to speculation
barriers for Spectre v1 implemented in a future commit, this will only
affect programs that do variable stack-accesses or are very complex.
For PowerPC, the SEC_FTR checking in bpf_jit_bypass_spec_v4() is based
on the check that was previously located in the BPF_NOSPEC case.
For LoongArch, it would likely be safe to set both
bpf_jit_bypass_spec_v1() and _v4() according to
commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation
barrier opcode"). This is omitted here as I am unable to do any testing
for LoongArch.
Hari's ack concerns the PowerPC part only.
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603211318.337474-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The user space calls mmap() to map VAS window paste address
and the kernel returns the complete mapped page for each
window. So return -EINVAL if non-zero is passed for offset
parameter to mmap().
See Documentation/arch/powerpc/vas-api.rst for mmap()
restrictions.
Co-developed-by: Jonathan Greental <yonatan02greental@gmail.com>
Signed-off-by: Jonathan Greental <yonatan02greental@gmail.com>
Reported-by: Jonathan Greental <yonatan02greental@gmail.com>
Fixes: dda44eb29c ("powerpc/vas: Add VAS user space API")
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250610021227.361980-2-maddy@linux.ibm.com
memtrace mmap issue has an out of bounds issue. This patch fixes the by
checking that the requested mapping region size should stay within the
allocated region size.
Reported-by: Jonathan Greental <yonatan02greental@gmail.com>
Fixes: 08a022ad3d ("powerpc/powernv/memtrace: Allow mmaping trace buffers")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250610021227.361980-1-maddy@linux.ibm.com
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
- Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which exports a
symbol only to specified modules
- Improve ABI handling in gendwarfksyms
- Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n
- Add checkers for redundant or missing <linux/export.h> inclusion
- Deprecate the extra-y syntax
- Fix a genksyms bug when including enum constants from *.symref files
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmhEZc4VHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGVAgQAKLRdBGga1kBJJFIkUOHWC5+g/je
U/dO5rGnuOLviWDexC6QT8AQV2N+dQXhB11x+KacSu1bwowsEvwuegtA6VqwbETs
tyWmB0PftEzVyPfc+Rjfy0LDfKkiKkm4RhXiMwcem/rlw45gvJXrVU7jJin9fI3A
So8glpOAX+mEizUHkjZkS51nkYCZFDsn7hVo0X43vqjeFrrFGLEQ5xas4Ci+dkY3
9g8Q5bFL8CC5PHjSO8wFftCcAWwTukAht6CSSb522MKGnCVZ9RxTmRwEPXrBmXtS
5eWa8yg6y0tFVmot8iwZGBYleAWDNsj0a2j2oVjUN+EF91sk3WQApJVNBok/nQFb
4MgO3N3UXZdy4tYkBX8tMgOcGkfjZAFoNxSUm5oVouh9NyT0dpqYHhJHBNVbVJoF
igQWeVOYcioDjeU1iXnP2cw64q44ROfxmOpDxOSRz9PTM6CCya1R0m/zzBLV6Lwk
rzlXk1LLf+jIfgmS5RLlkCgrXS1U0vNGXxQH9Ui9dZSEtzdU7qt5WQ/Rz44bEBhS
OeIlJfMMx6QYJztJc/BaUjkKsutTkII52QctRbRCj/nKswHd8SnHV+xk1c2WPxrg
yKq10rPpdg1BcvmODY6cmcndt7ogDRfkogm2gvGQIBZEglRimpmpg51sZQRD0ueE
0rt12TmktsLbglB4
=Dy49
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which
exports a symbol only to specified modules
- Improve ABI handling in gendwarfksyms
- Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n
- Add checkers for redundant or missing <linux/export.h> inclusion
- Deprecate the extra-y syntax
- Fix a genksyms bug when including enum constants from *.symref files
* tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
genksyms: Fix enum consts from a reference affecting new values
arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds
kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES}
efi/libstub: use 'targets' instead of extra-y in Makefile
module: make __mod_device_table__* symbols static
scripts/misc-check: check unnecessary #include <linux/export.h> when W=1
scripts/misc-check: check missing #include <linux/export.h> when W=1
scripts/misc-check: add double-quotes to satisfy shellcheck
kbuild: move W=1 check for scripts/misc-check to top-level Makefile
scripts/tags.sh: allow to use alternative ctags implementation
kconfig: introduce menu type enum
docs: symbol-namespaces: fix reST warning with literal block
kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n
tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
docs/core-api/symbol-namespaces: drop table of contents and section numbering
modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time
kbuild: move kbuild syntax processing to scripts/Makefile.build
Makefile: remove dependency on archscripts for header installation
Documentation/kbuild: Add new gendwarfksyms kABI rules
Documentation/kbuild: Drop section numbers
...
The extra-y syntax is deprecated. Instead, use always-$(KBUILD_BUILTIN),
which behaves equivalently.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
simplifies the act of creating a pte which addresses the first page in a
folio and reduces the amount of plumbing which architecture must
implement to provide this.
- The 8 patch series "Misc folio patches for 6.16" from Matthew Wilcox
is a shower of largely unrelated folio infrastructure changes which
clean things up and better prepare us for future work.
- The 3 patch series "memory,x86,acpi: hotplug memory alignment
advisement" from Gregory Price adds early-init code to prevent x86 from
leaving physical memory unused when physical address regions are not
aligned to memory block size.
- The 2 patch series "mm/compaction: allow more aggressive proactive
compaction" from Michal Clapinski provides some tuning of the (sadly,
hard-coded (more sadly, not auto-tuned)) thresholds for our invokation
of proactive compaction. In a simple test case, the reduction of a guest
VM's memory consumption was dramatic.
- The 8 patch series "Minor cleanups and improvements to swap freeing
code" from Kemeng Shi provides some code cleaups and a small efficiency
improvement to this part of our swap handling code.
- The 6 patch series "ptrace: introduce PTRACE_SET_SYSCALL_INFO API"
from Dmitry Levin adds the ability for a ptracer to modify syscalls
arguments. At this time we can alter only "system call information that
are used by strace system call tampering, namely, syscall number,
syscall arguments, and syscall return value.
This series should have been incorporated into mm.git's "non-MM"
branch, but I goofed.
- The 3 patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report
guard regions" from Andrei Vagin extends the info returned by the
PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more
efficiently get at the info about guard regions.
- The 2 patch series "Fix parameter passed to page_mapcount_is_type()"
from Gavin Shan implements that fix. No runtime effect is expected
because validate_page_before_insert() happens to fix up this error.
- The 3 patch series "kernel/events/uprobes: uprobe_write_opcode()
rewrite" from David Hildenbrand basically brings uprobe text poking into
the current decade. Remove a bunch of hand-rolled implementation in
favor of using more current facilities.
- The 3 patch series "mm/ptdump: Drop assumption that pxd_val() is u64"
from Anshuman Khandual provides enhancements and generalizations to the
pte dumping code. This might be needed when 128-bit Page Table
Descriptors are enabled for ARM.
- The 12 patch series "Always call constructor for kernel page tables"
from Kevin Brodsky "ensures that the ctor/dtor is always called for
kernel pgtables, as it already is for user pgtables". This permits the
addition of more functionality such as "insert hooks to protect page
tables". This change does result in various architectures performing
unnecesary work, but this is fixed up where it is anticipated to occur.
- The 9 patch series "Rust support for mm_struct, vm_area_struct, and
mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM
structures.
- The 3 patch series "fix incorrectly disallowed anonymous VMA merges"
from Lorenzo Stoakes takes advantage of some VMA merging opportunities
which we've been missing for 15 years.
- The 4 patch series "mm/madvise: batch tlb flushes for MADV_DONTNEED
and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB
flushing. Instead of flushing each address range in the provided iovec,
we batch the flushing across all the iovec entries. The syscall's cost
was approximately halved with a microbenchmark which was designed to
load this particular operation.
- The 6 patch series "Track node vacancy to reduce worst case allocation
counts" from Sidhartha Kumar makes the maple tree smarter about its node
preallocation. stress-ng mmap performance increased by single-digit
percentages and the amount of unnecessarily preallocated memory was
dramaticelly reduced.
- The 3 patch series "mm/gup: Minor fix, cleanup and improvements" from
Baoquan He removes a few unnecessary things which Baoquan noted when
reading the code.
- The 3 patch series ""Enhance sysfs handling for memory hotplug in
weighted interleave" from Rakie Kim "enhances the weighted interleave
policy in the memory management subsystem by improving sysfs handling,
fixing memory leaks, and introducing dynamic sysfs updates for memory
hotplug support". Fixes things on error paths which we are unlikely to
hit.
- The 7 patch series "mm/damon: auto-tune DAMOS for NUMA setups
including tiered memory" from SeongJae Park introduces new DAMOS quota
goal metrics which eliminate the manual tuning which is required when
utilizing DAMON for memory tiering.
- The 5 patch series "mm/vmalloc.c: code cleanup and improvements" from
Baoquan He provides cleanups and small efficiency improvements which
Baoquan found via code inspection.
- The 2 patch series "vmscan: enforce mems_effective during demotion"
from Gregory Price "changes reclaim to respect cpuset.mems_effective
during demotion when possible". because "presently, reclaim explicitly
ignores cpuset.mems_effective when demoting, which may cause the cpuset
settings to violated." "This is useful for isolating workloads on a
multi-tenant system from certain classes of memory more consistently."
- The 2 patch series ""Clean up split_huge_pmd_locked() and remove
unnecessary folio pointers" from Gavin Guo provides minor cleanups and
efficiency gains in in the huge page splitting and migrating code.
- The 3 patch series "Use kmem_cache for memcg alloc" from Huan Yang
creates a slab cache for `struct mem_cgroup', yielding improved memory
utilization.
- The 4 patch series "add max arg to swappiness in memory.reclaim and
lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness="
argument for memory.reclaim MGLRU's lru_gen. This directs proactive
reclaim to reclaim from only anon folios rather than file-backed folios.
- The 17 patch series "kexec: introduce Kexec HandOver (KHO)" from Mike
Rapoport is the first step on the path to permitting the kernel to
maintain existing VMs while replacing the host kernel via file-based
kexec. At this time only memblock's reserve_mem is preserved.
- The 7 patch series "mm: Introduce for_each_valid_pfn()" from David
Woodhouse provides and uses a smarter way of looping over a pfn range.
By skipping ranges of invalid pfns.
- The 2 patch series "sched/numa: Skip VMA scanning on memory pinned to
one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless
VMA scanning when a task is pinned a single NUMA mode. Dramatic
performance benefits were seen in some real world cases.
- The 2 patch series "JFS: Implement migrate_folio for
jfs_metapage_aops" from Shivank Garg addresses a warning which occurs
during memory compaction when using JFS.
- The 4 patch series "move all VMA allocation, freeing and duplication
logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c
into the more appropriate mm/vma.c.
- The 6 patch series "mm, swap: clean up swap cache mapping helper" from
Kairui Song provides code consolidation and cleanups related to the
folio_index() function.
- The 2 patch series "mm/gup: Cleanup memfd_pin_folios()" from Vishal
Moola does that.
- The 8 patch series "memcg: Fix test_memcg_min/low test failures" from
Waiman Long addresses some bogus failures which are being reported by
the test_memcontrol selftest.
- The 3 patch series "eliminate mmap() retry merge, add .mmap_prepare
hook" from Lorenzo Stoakes commences the deprecation of
file_operations.mmap() in favor of the new
file_operations.mmap_prepare(). The latter is more restrictive and
prevents drivers from messing with things in ways which, amongst other
problems, may defeat VMA merging.
- The 4 patch series "memcg: decouple memcg and objcg stocks"" from
Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's
one. This is a step along the way to making memcg and objcg charging
NMI-safe, which is a BPF requirement.
- The 6 patch series "mm/damon: minor fixups and improvements for code,
tests, and documents" from SeongJae Park is "yet another batch of
miscellaneous DAMON changes. Fix and improve minor problems in code,
tests and documents."
- The 7 patch series "memcg: make memcg stats irq safe" from Shakeel
Butt converts memcg stats to be irq safe. Another step along the way to
making memcg charging and stats updates NMI-safe, a BPF requirement.
- The 4 patch series "Let unmap_hugepage_range() and several related
functions take folio instead of page" from Fan Ni provides folio
conversions in the hugetlb code.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDt5qgAKCRDdBJ7gKXxA
ju6XAP9nTiSfRz8Cz1n5LJZpFKEGzLpSihCYyR6P3o1L9oe3mwEAlZ5+XAwk2I5x
Qqb/UGMEpilyre1PayQqOnct3aSL9Ao=
=tYYm
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
creating a pte which addresses the first page in a folio and reduces
the amount of plumbing which architecture must implement to provide
this.
- "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
largely unrelated folio infrastructure changes which clean things up
and better prepare us for future work.
- "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
Price adds early-init code to prevent x86 from leaving physical
memory unused when physical address regions are not aligned to memory
block size.
- "mm/compaction: allow more aggressive proactive compaction" from
Michal Clapinski provides some tuning of the (sadly, hard-coded (more
sadly, not auto-tuned)) thresholds for our invokation of proactive
compaction. In a simple test case, the reduction of a guest VM's
memory consumption was dramatic.
- "Minor cleanups and improvements to swap freeing code" from Kemeng
Shi provides some code cleaups and a small efficiency improvement to
this part of our swap handling code.
- "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
adds the ability for a ptracer to modify syscalls arguments. At this
time we can alter only "system call information that are used by
strace system call tampering, namely, syscall number, syscall
arguments, and syscall return value.
This series should have been incorporated into mm.git's "non-MM"
branch, but I goofed.
- "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
against /proc/pid/pagemap. This permits CRIU to more efficiently get
at the info about guard regions.
- "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
implements that fix. No runtime effect is expected because
validate_page_before_insert() happens to fix up this error.
- "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
Hildenbrand basically brings uprobe text poking into the current
decade. Remove a bunch of hand-rolled implementation in favor of
using more current facilities.
- "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
Khandual provides enhancements and generalizations to the pte dumping
code. This might be needed when 128-bit Page Table Descriptors are
enabled for ARM.
- "Always call constructor for kernel page tables" from Kevin Brodsky
ensures that the ctor/dtor is always called for kernel pgtables, as
it already is for user pgtables.
This permits the addition of more functionality such as "insert hooks
to protect page tables". This change does result in various
architectures performing unnecesary work, but this is fixed up where
it is anticipated to occur.
- "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
Ryhl adds plumbing to permit Rust access to core MM structures.
- "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
Stoakes takes advantage of some VMA merging opportunities which we've
been missing for 15 years.
- "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
SeongJae Park optimizes process_madvise()'s TLB flushing.
Instead of flushing each address range in the provided iovec, we
batch the flushing across all the iovec entries. The syscall's cost
was approximately halved with a microbenchmark which was designed to
load this particular operation.
- "Track node vacancy to reduce worst case allocation counts" from
Sidhartha Kumar makes the maple tree smarter about its node
preallocation.
stress-ng mmap performance increased by single-digit percentages and
the amount of unnecessarily preallocated memory was dramaticelly
reduced.
- "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
a few unnecessary things which Baoquan noted when reading the code.
- ""Enhance sysfs handling for memory hotplug in weighted interleave"
from Rakie Kim "enhances the weighted interleave policy in the memory
management subsystem by improving sysfs handling, fixing memory
leaks, and introducing dynamic sysfs updates for memory hotplug
support". Fixes things on error paths which we are unlikely to hit.
- "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
from SeongJae Park introduces new DAMOS quota goal metrics which
eliminate the manual tuning which is required when utilizing DAMON
for memory tiering.
- "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
provides cleanups and small efficiency improvements which Baoquan
found via code inspection.
- "vmscan: enforce mems_effective during demotion" from Gregory Price
changes reclaim to respect cpuset.mems_effective during demotion when
possible. because presently, reclaim explicitly ignores
cpuset.mems_effective when demoting, which may cause the cpuset
settings to violated.
This is useful for isolating workloads on a multi-tenant system from
certain classes of memory more consistently.
- "Clean up split_huge_pmd_locked() and remove unnecessary folio
pointers" from Gavin Guo provides minor cleanups and efficiency gains
in in the huge page splitting and migrating code.
- "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
for `struct mem_cgroup', yielding improved memory utilization.
- "add max arg to swappiness in memory.reclaim and lru_gen" from
Zhongkun He adds a new "max" argument to the "swappiness=" argument
for memory.reclaim MGLRU's lru_gen.
This directs proactive reclaim to reclaim from only anon folios
rather than file-backed folios.
- "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
first step on the path to permitting the kernel to maintain existing
VMs while replacing the host kernel via file-based kexec. At this
time only memblock's reserve_mem is preserved.
- "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
and uses a smarter way of looping over a pfn range. By skipping
ranges of invalid pfns.
- "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
when a task is pinned a single NUMA mode.
Dramatic performance benefits were seen in some real world cases.
- "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
Garg addresses a warning which occurs during memory compaction when
using JFS.
- "move all VMA allocation, freeing and duplication logic to mm" from
Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
appropriate mm/vma.c.
- "mm, swap: clean up swap cache mapping helper" from Kairui Song
provides code consolidation and cleanups related to the folio_index()
function.
- "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.
- "memcg: Fix test_memcg_min/low test failures" from Waiman Long
addresses some bogus failures which are being reported by the
test_memcontrol selftest.
- "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
Stoakes commences the deprecation of file_operations.mmap() in favor
of the new file_operations.mmap_prepare().
The latter is more restrictive and prevents drivers from messing with
things in ways which, amongst other problems, may defeat VMA merging.
- "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
the per-cpu memcg charge cache from the objcg's one.
This is a step along the way to making memcg and objcg charging
NMI-safe, which is a BPF requirement.
- "mm/damon: minor fixups and improvements for code, tests, and
documents" from SeongJae Park is yet another batch of miscellaneous
DAMON changes. Fix and improve minor problems in code, tests and
documents.
- "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
stats to be irq safe. Another step along the way to making memcg
charging and stats updates NMI-safe, a BPF requirement.
- "Let unmap_hugepage_range() and several related functions take folio
instead of page" from Fan Ni provides folio conversions in the
hugetlb code.
* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
mm: pcp: increase pcp->free_count threshold to trigger free_high
mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
mm/hugetlb: pass folio instead of page to unmap_ref_private()
memcg: objcg stock trylock without irq disabling
memcg: no stock lock for cpu hot-unplug
memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
memcg: make count_memcg_events re-entrant safe against irqs
memcg: make mod_memcg_state re-entrant safe against irqs
memcg: move preempt disable to callers of memcg_rstat_updated
memcg: memcg_rstat_updated re-entrant safe against irqs
mm: khugepaged: decouple SHMEM and file folios' collapse
selftests/eventfd: correct test name and improve messages
alloc_tag: check mem_profiling_support in alloc_tag_init
Docs/damon: update titles and brief introductions to explain DAMOS
selftests/damon/_damon_sysfs: read tried regions directories in order
mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
...
Core
----
- Implement the Device Memory TCP transmit path, allowing zero-copy
data transmission on top of TCP from e.g. GPU memory to the wire.
- Move all the IPv6 routing tables management outside the RTNL scope,
under its own lock and RCU. The route control path is now 3x times
faster.
- Convert queue related netlink ops to instance lock, reducing
again the scope of the RTNL lock. This improves the control plane
scalability.
- Refactor the software crc32c implementation, removing unneeded
abstraction layers and improving significantly the related
micro-benchmarks.
- Optimize the GRO engine for UDP-tunneled traffic, for a 10%
performance improvement in related stream tests.
- Cover more per-CPU storage with local nested BH locking; this is a
prep work to remove the current per-CPU lock in local_bh_disable()
on PREMPT_RT.
- Introduce and use nlmsg_payload helper, combining buffer bounds
verification with accessing payload carried by netlink messages.
Netfilter
---------
- Rewrite the procfs conntrack table implementation, improving
considerably the dump performance. A lot of user-space tools
still use this interface.
- Implement support for wildcard netdevice in netdev basechain
and flowtables.
- Integrate conntrack information into nft trace infrastructure.
- Export set count and backend name to userspace, for better
introspection.
BPF
---
- BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
programs and can be controlled in similar way to traditional qdiscs
using the "tc qdisc" command.
- Refactor the UDP socket iterator, addressing long standing issues
WRT duplicate hits or missed sockets.
Protocols
---------
- Improve TCP receive buffer auto-tuning and increase the default
upper bound for the receive buffer; overall this improves the single
flow maximum thoughput on 200Gbs link by over 60%.
- Add AFS GSSAPI security class to AF_RXRPC; it provides transport
security for connections to the AFS fileserver and VL server.
- Improve TCP multipath routing, so that the sources address always
matches the nexthop device.
- Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
and thus preventing DoS caused by passing around problematic FDs.
- Retire DCCP socket. DCCP only receives updates for bugs, and major
distros disable it by default. Its removal allows for better
organisation of TCP fields to reduce the number of cache lines hit
in the fast path.
- Extend TCP drop-reason support to cover PAWS checks.
Driver API
----------
- Reorganize PTP ioctl flag support to require an explicit opt-in for
the drivers, avoiding the problem of drivers not rejecting new
unsupported flags.
- Converted several device drivers to timestamping APIs.
- Introduce per-PHY ethtool dump helpers, improving the support for
dump operations targeting PHYs.
Tests and tooling
-----------------
- Add support for classic netlink in user space C codegen, so that
ynl-c can now read, create and modify links, routes addresses and
qdisc layer configuration.
- Add ynl sub-types for binary attributes, allowing ynl-c to output
known struct instead of raw binary data, clarifying the classic
netlink output.
- Extend MPTCP selftests to improve the code-coverage.
- Add tests for XDP tail adjustment in AF_XDP.
New hardware / drivers
----------------------
- OpenVPN virtual driver: offload OpenVPN data channels processing
to the kernel-space, increasing the data transfer throughput WRT
the user-space implementation.
- Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.
- Broadcom asp-v3.0 ethernet driver.
- AMD Renoir ethernet device.
- ReakTek MT9888 2.5G ethernet PHY driver.
- Aeonsemi 10G C45 PHYs driver.
Drivers
-------
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- refactor the stearing table handling to reduce significantly
the amount of memory used
- add support for complex matches in H/W flow steering
- improve flow streeing error handling
- convert to netdev instance locking
- Intel (100G, ice, igb, ixgbe, idpf):
- ice: add switchdev support for LLDP traffic over VF
- ixgbe: add firmware manipulation and regions devlink support
- igb: introduce support for frame transmission premption
- igb: adds persistent NAPI configuration
- idpf: introduce RDMA support
- idpf: add initial PTP support
- Meta (fbnic):
- extend hardware stats coverage
- add devlink dev flash support
- Broadcom (bnxt):
- add support for RX-side device memory TCP
- Wangxun (txgbe):
- implement support for udp tunnel offload
- complete PTP and SRIOV support for AML 25G/10G devices
- Ethernet NICs embedded and virtual:
- Google (gve):
- add device memory TCP TX support
- Amazon (ena):
- support persistent per-NAPI config
- Airoha:
- add H/W support for L2 traffic offload
- add per flow stats for flow offloading
- RealTek (rtl8211): add support for WoL magic packet
- Synopsys (stmmac):
- dwmac-socfpga 1000BaseX support
- add Loongson-2K3000 support
- introduce support for hardware-accelerated VLAN stripping
- Broadcom (bcmgenet):
- expose more H/W stats
- Freescale (enetc, dpaa2-eth):
- enetc: add MAC filter, VLAN filter RSS and loopback support
- dpaa2-eth: convert to H/W timestamping APIs
- vxlan: convert FDB table to rhashtable, for better scalabilty
- veth: apply qdisc backpressure on full ring to reduce TX drops
- Ethernet switches:
- Microchip (kzZ88x3): add ETS scheduler support
- Ethernet PHYs:
- RealTek (rtl8211):
- add support for WoL magic packet
- add support for PHY LEDs
- CAN:
- Adds RZ/G3E CANFD support to the rcar_canfd driver.
- Preparatory work for CAN-XL support.
- Add self-tests framework with support for CAN physical interfaces.
- WiFi:
- mac80211:
- scan improvements with multi-link operation (MLO)
- Qualcomm (ath12k):
- enable AHB support for IPQ5332
- add monitor interface support to QCN9274
- add multi-link operation support to WCN7850
- add 802.11d scan offload support to WCN7850
- monitor mode for WCN7850, better 6 GHz regulatory
- Qualcomm (ath11k):
- restore hibernation support
- MediaTek (mt76):
- WiFi-7 improvements
- implement support for mt7990
- Intel (iwlwifi):
- enhanced multi-link single-radio (EMLSR) support on 5 GHz links
- rework device configuration
- RealTek (rtw88):
- improve throughput for RTL8814AU
- RealTek (rtw89):
- add multi-link operation support
- STA/P2P concurrency improvements
- support different SAR configs by antenna
- Bluetooth:
- introduce HCI Driver protocol
- btintel_pcie: do not generate coredump for diagnostic events
- btusb: add HCI Drv commands for configuring altsetting
- btusb: add RTL8851BE device 0x0bda:0xb850
- btusb: add new VID/PID 13d3/3584 for MT7922
- btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
- btnxpuart: implement host-wakeup feature
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmg3D64SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkcIsQAK2eEc+BxQer975wzvtMg6gF9eoex4a+
rZ7jxfDzDtNvTauoQsrpehDZp0FnySaVGCU36lHGB2OvDnhCpPc5hXzKDWQpOuqQ
SHrGG3/6FTbdTG/HfHUcbNyrUzIf53SADSObiQ3qg4gyEQ3sCpcOKtVtMcU8rvsY
/HqMnsJWFaROUMjMtCcnUSgjmeY9kBvha3sTXUqgeRugEOCvZD7z4rpqFIcQqHw7
e2Fi8dwIXEYNxqPp6MRq2qdyUTewCRruE8ZIMAFuhtfYeMElUZMPlqlMENX3AzTQ
cr0EgwcFOUxRA7oZRxhoBNBsVXavtSpQr4ZDoWplxP4aQ37n5tc1E9Q72axpB/Og
FbJRl6GvWYnCd8071BczgmfHlKaTAigPvt2Z4r6JjM5I/Bij/IZ3k+On1OTuOAj/
EqfFkdZ0a5cfKrwUMP+oSGtSAywkMVUtnIKJlZeRbjSj2432sCfe2jVAlS8ELM43
3LUgXYrAKtA87g171LlsRu5EEpI5QmqPb+i5LpPlEXe2TJEgPisyfecJ3NafF/2+
j575lm+TFNm9NTNhGGjDPEvw0djI5wSGGMe9J4gC74eWi6s5t6C4cuUf84TKWdwR
x+9H0IB7rfFncAwXHJuUUtzd+fPHaYzs5dDGbSgMQOXr1cr1wlubCK8mQ1r/Wt/a
3GjFIOQKW2Q5
=t/Tz
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Implement the Device Memory TCP transmit path, allowing zero-copy
data transmission on top of TCP from e.g. GPU memory to the wire.
- Move all the IPv6 routing tables management outside the RTNL scope,
under its own lock and RCU. The route control path is now 3x times
faster.
- Convert queue related netlink ops to instance lock, reducing again
the scope of the RTNL lock. This improves the control plane
scalability.
- Refactor the software crc32c implementation, removing unneeded
abstraction layers and improving significantly the related
micro-benchmarks.
- Optimize the GRO engine for UDP-tunneled traffic, for a 10%
performance improvement in related stream tests.
- Cover more per-CPU storage with local nested BH locking; this is a
prep work to remove the current per-CPU lock in local_bh_disable()
on PREMPT_RT.
- Introduce and use nlmsg_payload helper, combining buffer bounds
verification with accessing payload carried by netlink messages.
Netfilter:
- Rewrite the procfs conntrack table implementation, improving
considerably the dump performance. A lot of user-space tools still
use this interface.
- Implement support for wildcard netdevice in netdev basechain and
flowtables.
- Integrate conntrack information into nft trace infrastructure.
- Export set count and backend name to userspace, for better
introspection.
BPF:
- BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
programs and can be controlled in similar way to traditional qdiscs
using the "tc qdisc" command.
- Refactor the UDP socket iterator, addressing long standing issues
WRT duplicate hits or missed sockets.
Protocols:
- Improve TCP receive buffer auto-tuning and increase the default
upper bound for the receive buffer; overall this improves the
single flow maximum thoughput on 200Gbs link by over 60%.
- Add AFS GSSAPI security class to AF_RXRPC; it provides transport
security for connections to the AFS fileserver and VL server.
- Improve TCP multipath routing, so that the sources address always
matches the nexthop device.
- Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
and thus preventing DoS caused by passing around problematic FDs.
- Retire DCCP socket. DCCP only receives updates for bugs, and major
distros disable it by default. Its removal allows for better
organisation of TCP fields to reduce the number of cache lines hit
in the fast path.
- Extend TCP drop-reason support to cover PAWS checks.
Driver API:
- Reorganize PTP ioctl flag support to require an explicit opt-in for
the drivers, avoiding the problem of drivers not rejecting new
unsupported flags.
- Converted several device drivers to timestamping APIs.
- Introduce per-PHY ethtool dump helpers, improving the support for
dump operations targeting PHYs.
Tests and tooling:
- Add support for classic netlink in user space C codegen, so that
ynl-c can now read, create and modify links, routes addresses and
qdisc layer configuration.
- Add ynl sub-types for binary attributes, allowing ynl-c to output
known struct instead of raw binary data, clarifying the classic
netlink output.
- Extend MPTCP selftests to improve the code-coverage.
- Add tests for XDP tail adjustment in AF_XDP.
New hardware / drivers:
- OpenVPN virtual driver: offload OpenVPN data channels processing to
the kernel-space, increasing the data transfer throughput WRT the
user-space implementation.
- Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.
- Broadcom asp-v3.0 ethernet driver.
- AMD Renoir ethernet device.
- ReakTek MT9888 2.5G ethernet PHY driver.
- Aeonsemi 10G C45 PHYs driver.
Drivers:
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- refactor the steering table handling to significantly
reduce the amount of memory used
- add support for complex matches in H/W flow steering
- improve flow streeing error handling
- convert to netdev instance locking
- Intel (100G, ice, igb, ixgbe, idpf):
- ice: add switchdev support for LLDP traffic over VF
- ixgbe: add firmware manipulation and regions devlink support
- igb: introduce support for frame transmission premption
- igb: adds persistent NAPI configuration
- idpf: introduce RDMA support
- idpf: add initial PTP support
- Meta (fbnic):
- extend hardware stats coverage
- add devlink dev flash support
- Broadcom (bnxt):
- add support for RX-side device memory TCP
- Wangxun (txgbe):
- implement support for udp tunnel offload
- complete PTP and SRIOV support for AML 25G/10G devices
- Ethernet NICs embedded and virtual:
- Google (gve):
- add device memory TCP TX support
- Amazon (ena):
- support persistent per-NAPI config
- Airoha:
- add H/W support for L2 traffic offload
- add per flow stats for flow offloading
- RealTek (rtl8211): add support for WoL magic packet
- Synopsys (stmmac):
- dwmac-socfpga 1000BaseX support
- add Loongson-2K3000 support
- introduce support for hardware-accelerated VLAN stripping
- Broadcom (bcmgenet):
- expose more H/W stats
- Freescale (enetc, dpaa2-eth):
- enetc: add MAC filter, VLAN filter RSS and loopback support
- dpaa2-eth: convert to H/W timestamping APIs
- vxlan: convert FDB table to rhashtable, for better scalabilty
- veth: apply qdisc backpressure on full ring to reduce TX drops
- Ethernet switches:
- Microchip (kzZ88x3): add ETS scheduler support
- Ethernet PHYs:
- RealTek (rtl8211):
- add support for WoL magic packet
- add support for PHY LEDs
- CAN:
- Adds RZ/G3E CANFD support to the rcar_canfd driver.
- Preparatory work for CAN-XL support.
- Add self-tests framework with support for CAN physical interfaces.
- WiFi:
- mac80211:
- scan improvements with multi-link operation (MLO)
- Qualcomm (ath12k):
- enable AHB support for IPQ5332
- add monitor interface support to QCN9274
- add multi-link operation support to WCN7850
- add 802.11d scan offload support to WCN7850
- monitor mode for WCN7850, better 6 GHz regulatory
- Qualcomm (ath11k):
- restore hibernation support
- MediaTek (mt76):
- WiFi-7 improvements
- implement support for mt7990
- Intel (iwlwifi):
- enhanced multi-link single-radio (EMLSR) support on 5 GHz links
- rework device configuration
- RealTek (rtw88):
- improve throughput for RTL8814AU
- RealTek (rtw89):
- add multi-link operation support
- STA/P2P concurrency improvements
- support different SAR configs by antenna
- Bluetooth:
- introduce HCI Driver protocol
- btintel_pcie: do not generate coredump for diagnostic events
- btusb: add HCI Drv commands for configuring altsetting
- btusb: add RTL8851BE device 0x0bda:0xb850
- btusb: add new VID/PID 13d3/3584 for MT7922
- btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
- btnxpuart: implement host-wakeup feature"
* tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits)
selftests/bpf: Fix bpf selftest build warning
selftests: netfilter: Fix skip of wildcard interface test
net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames
net: openvswitch: Fix the dead loop of MPLS parse
calipso: Don't call calipso functions for AF_INET sk.
selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem
net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback
octeontx2-pf: QOS: Perform cache sync on send queue teardown
net: mana: Add support for Multi Vports on Bare metal
net: devmem: ncdevmem: remove unused variable
net: devmem: ksft: upgrade rx test to send 1K data
net: devmem: ksft: add 5 tuple FS support
net: devmem: ksft: add exit_wait to make rx test pass
net: devmem: ksft: add ipv4 support
net: devmem: preserve sockc_err
page_pool: fix ugly page_pool formatting
net: devmem: move list_add to net_devmem_bind_dmabuf.
selftests: netfilter: nft_queue.sh: include file transfer duration in log message
net: phy: mscc: Fix memory leak when using one step timestamping
...
- Convert init_timer*(), try_to_del_timer_sync() and
destroy_timer_on_stack() over to the canonical timer_*() namespace
convention.
There are is another large converstion pending, which has not been included
because it would have caused a gazillion of merge conflicts in next. The
conversion scripts will be run towards the end of the merge window and a
pull request sent once all conflict dependencies have been merged.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzgTkTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodwVD/97rF1Juqm1JZNIZPN/vMqwCxRoUkc6
tsK0+UC7UXusuJadxJ+Bsv25iPF+qejnThMU+SQ5yTVj/PNfxOe0WPdCEGGiL8Ye
2JCk6GqSOB/360SlLmtR1B1xHDwsuuUcQTz0w57CH66HRV5vpoWSMSwj/ypy+8nU
PlgjItaxdCKa9NJ+SUJZPWIxRkt/PsA1kwlV1OcxkgB++IiIHQEbPxECq9mlzWXF
b4Sq/Sdf2OmEePN+DYoey4fneRwJnkjkeX/o+CqosCPHRIiWUlSu5W/lU5IYojM3
s3XpMNNg/z8PMXR4JA2VaPYWLUZyBOs+3dM7Y6Am+z55EoxMxfzg6pGx2tfM4ftl
vF8wG3Z1c9MmpLk+P9LatNvfHeVLNve8KgOLa5phMDQ/El/a8KqLu6HmRDPONvKp
d6iXdPq1CP8P6jOtlFfzLmKPShgEcp+Zz9W3CaQR/0ZJEsEqrpKOLzdT86hJhBV0
mBCdzixmGtKAh0BdPdmg2FCLScqER3HKIJhZSdV8I+jSETIHCuMiIfbMXR7iwm/H
R1/ayvxrbc1mPseo28scqvo7m6cn5BFBxIUf4Sokp52ZCapz1v2aWzo4vHI0cTgT
ZOjlTrf+fgYLn1dqdD45TJiQPnmRrw4dU+WWSFRFJY2qjfyucj80vdqdkE5zkp5b
UPomlVimG4ccPg==
=FHGU
-----END PGP SIGNATURE-----
Merge tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"Another set of timer API cleanups:
- Convert init_timer*(), try_to_del_timer_sync() and
destroy_timer_on_stack() over to the canonical timer_*()
namespace convention.
There is another large conversion pending, which has not been included
because it would have caused a gazillion of merge conflicts in next.
The conversion scripts will be run towards the end of the merge window
and a pull request sent once all conflict dependencies have been
merged"
* tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack()
treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try()
timers: Rename init_timers() as timers_init()
timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA
timers: Rename __init_timer_on_stack() as __timer_init_on_stack()
timers: Rename __init_timer() as __timer_init()
timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack()
timers: Rename init_timer_key() as timer_init_key()
- Consolidate on one set of functions for the interrupt domain code to
get rid of pointlessly duplicated code with only marginal different
semantics.
- Update the documentation accordingly and consolidate the coding style
of the irqdomain header.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzd+MTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodTRD/0RmG5tngCbEJmTw6lPDQzRZH4OO3ja
yRYlyBipemoRmvJRGjV4uHqN2QPrdOuoqMuyBO1aWcMdkpww5bAHcbgSFrlGM1lW
kqtaxVMbufPiLQSGYe7OQf478CE1ykoBd5Va8whFKrtA73qEUdEMfWT0stspg780
7BlmQOemL91p7Ytf03FbDdo8tZ5Xu9uXGAulwY9FZsFtsCNyvhl7nOv5Sk8ZQtGO
xHRCeunjZLWR+IaK59hdakvQybXwSnjT6jODp96nlyKABEKSPShGSPFDWd3g9px7
4911QwgnvTbcrsk6YmQEmPIOgXZzypjbnjpJr8tFpTbkVIy+6chi5cBJzXoqsUaM
ylTwFcUQNvcP8yF447qb+nyPFKM5xsC07W0UpZMuJUDmhhPRtDm5pK0jpsif96GP
l4aMsWe65PUmXHQqLdE89RJXAa8XQ2qspKVtNKq9DmEVgTviQ09Z9SSQIx4U0yIx
w+YPde8kH2+O+YtMUn/MmfHhUP4MKya7j5zd8Bnv8wLBi7XGPPA5EKKh9I0dz9m+
X94lweNXyH+Q8U9mt2cQf8VG8Yzgk0eeC0sliJIlybwRgEgRcQbVWw0VvZUA1ySa
VBlaj3SinO90FEQ0CctT51ss2mUJ/XsGCnxpiGZXfqIZzFbyD1YfZQnXJH0H67DI
CqdHw22I27Mu/A==
=9nLp
-----END PGP SIGNATURE-----
Merge tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq cleanups from Thomas Gleixner:
"A set of cleanups for the generic interrupt subsystem:
- Consolidate on one set of functions for the interrupt domain code
to get rid of pointlessly duplicated code with only marginal
different semantics.
- Update the documentation accordingly and consolidate the coding
style of the irqdomain header"
* tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
irqdomain: Consolidate coding style
irqdomain: Fix kernel-doc and add it to Documentation
Documentation: irqdomain: Update it
Documentation: irq-domain.rst: Simple improvements
Documentation: irq/concepts: Minor improvements
Documentation: irq/concepts: Add commas and reflow
irqdomain: Improve kernel-docs of functions
irqdomain: Make struct irq_domain_info variables const
irqdomain: Use irq_domain_instantiate()'s return value as initializers
irqdomain: Drop irq_linear_revmap()
pinctrl: keembay: Switch to irq_find_mapping()
irqchip/armada-370-xp: Switch to irq_find_mapping()
gpu: ipu-v3: Switch to irq_find_mapping()
gpio: idt3243x: Switch to irq_find_mapping()
sh: Switch to irq_find_mapping()
powerpc: Switch to irq_find_mapping()
irqdomain: Drop irq_domain_add_*() functions
powerpc: Switch irq_domain_add_nomap() to use fwnode
thermal: Switch to irq_domain_create_linear()
soc: Switch to irq_domain_create_*()
...
Core & generic-arch updates:
- Add support for dynamic constraints and propagate it to
the Intel driver (Kan Liang)
- Fix & enhance driver-specific throttling support (Kan Liang)
- Record sample last_period before updating on the
x86 and PowerPC platforms (Mark Barnett)
- Make perf_pmu_unregister() usable (Peter Zijlstra)
- Unify perf_event_free_task() / perf_event_exit_task_context()
(Peter Zijlstra)
- Simplify perf_event_release_kernel() and perf_event_free_task()
(Peter Zijlstra)
- Allocate non-contiguous AUX pages by default (Yabin Cui)
Uprobes updates:
- Add support to emulate NOP instructions (Jiri Olsa)
- selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa)
x86 Intel PMU enhancements:
- Support Intel Auto Counter Reload [ACR] (Kan Liang)
- Add PMU support for Clearwater Forest (Dapeng Mi)
- Arch-PEBS preparatory changes: (Dapeng Mi)
- Parse CPUID archPerfmonExt leaves for non-hybrid CPUs
- Decouple BTS initialization from PEBS initialization
- Introduce pairs of PEBS static calls
x86 AMD PMU enhancements:
- Use hrtimer for handling overflows in the AMD uncore driver
(Sandipan Das)
- Prevent UMC counters from saturating (Sandipan Das)
Fixes and cleanups:
- Fix put_ctx() ordering (Frederic Weisbecker)
- Fix irq work dereferencing garbage (Frederic Weisbecker)
- Misc fixes and cleanups (Changbin Du, Frederic Weisbecker,
Ian Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang,
Sandipan Das, Thorsten Blum)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy4zoRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1j6QRAAvQ4GBPrdJLb8oXkLjCmWSp9PfM1h2IW0
reUrcV0BPRAwz4T60QEU2KyiEjvKxNghR6bNw4i3slAZ8EFwP9eWE/0ZYOo5+W/N
wv8vsopv/oZd2L2G5TgxDJf+tLPkqnTvp651LmGAbquPFONN1lsya9UHVPnt2qtv
fvFhjW6D828VoevRcUCsdoEUNlFDkUYQ2c3M1y5H2AI6ILDVxLsp5uYtuVUP+2lQ
7UI/elqRIIblTGT7G9LvTGiXZMm8T58fe1OOLekT6NdweJ3XEt1kMdFo/SCRYfzU
eDVVVLSextZfzBXNPtAEAlM3aSgd8+4m5sACiD1EeOUNjo5J9Sj1OOCa+bZGF/Rl
XNv5Kcp6Kh1T4N5lio8DE/NabmHDqDMbUGfud+VTS8uLLku4kuOWNMxJTD1nQ2Zz
BMfJhP89G9Vk07F9fOGuG1N6mKhIKNOgXh0S92tB7XDHcdJegueu2xh4ZszBL1QK
JVXa4DbnDj+y0LvnV+A5Z6VILr5RiCAipDb9ascByPja6BbN10Nf9Aj4nWwRTwbO
ut5OK/fDKmSjEHn1+a42d4iRxdIXIWhXCyxEhH+hJXEFx9htbQ3oAbXAEedeJTlT
g9QYGAjL96QEd0CqviorV8KyU59nVkEPoLVCumXBZ0WWhNwU6GdAmsW1hLfxQdLN
sp+XHhfxf8M=
=tPRs
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
"Core & generic-arch updates:
- Add support for dynamic constraints and propagate it to the Intel
driver (Kan Liang)
- Fix & enhance driver-specific throttling support (Kan Liang)
- Record sample last_period before updating on the x86 and PowerPC
platforms (Mark Barnett)
- Make perf_pmu_unregister() usable (Peter Zijlstra)
- Unify perf_event_free_task() / perf_event_exit_task_context()
(Peter Zijlstra)
- Simplify perf_event_release_kernel() and perf_event_free_task()
(Peter Zijlstra)
- Allocate non-contiguous AUX pages by default (Yabin Cui)
Uprobes updates:
- Add support to emulate NOP instructions (Jiri Olsa)
- selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa)
x86 Intel PMU enhancements:
- Support Intel Auto Counter Reload [ACR] (Kan Liang)
- Add PMU support for Clearwater Forest (Dapeng Mi)
- Arch-PEBS preparatory changes: (Dapeng Mi)
- Parse CPUID archPerfmonExt leaves for non-hybrid CPUs
- Decouple BTS initialization from PEBS initialization
- Introduce pairs of PEBS static calls
x86 AMD PMU enhancements:
- Use hrtimer for handling overflows in the AMD uncore driver
(Sandipan Das)
- Prevent UMC counters from saturating (Sandipan Das)
Fixes and cleanups:
- Fix put_ctx() ordering (Frederic Weisbecker)
- Fix irq work dereferencing garbage (Frederic Weisbecker)
- Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian
Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan
Das, Thorsten Blum)"
* tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
perf/headers: Clean up <linux/perf_event.h> a bit
perf/uapi: Clean up <uapi/linux/perf_event.h> a bit
perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h>
mips/perf: Remove driver-specific throttle support
xtensa/perf: Remove driver-specific throttle support
sparc/perf: Remove driver-specific throttle support
loongarch/perf: Remove driver-specific throttle support
csky/perf: Remove driver-specific throttle support
arc/perf: Remove driver-specific throttle support
alpha/perf: Remove driver-specific throttle support
perf/apple_m1: Remove driver-specific throttle support
perf/arm: Remove driver-specific throttle support
s390/perf: Remove driver-specific throttle support
powerpc/perf: Remove driver-specific throttle support
perf/x86/zhaoxin: Remove driver-specific throttle support
perf/x86/amd: Remove driver-specific throttle support
perf/x86/intel: Remove driver-specific throttle support
perf: Only dump the throttle log for the leader
perf: Fix the throttle logic for a group
perf/core: Add the is_event_in_freq_mode() helper to simplify the code
...
- Support for dynamic preemption
- Migrate powerpc boards GPIO driver to new setter API
- Added new PMU for KVM host-wide measurement
- Enhancement to htmdump driver to support more functions
- Added character device for couple RTAS supported APIs
- Minor fixes and cleanup
Thanks to: Amit Machhiwal, Athira Rajeev, Bagas Sanjaya, Bartosz Golaszewski, Christophe Leroy, Eddie James, Gaurav Batra, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Jiri Slaby (SUSE), Linus Walleij, Michal Suchanek, Naveen N Rao (AMD), Nilay Shroff, Ricardo B. Marlière, Ritesh Harjani (IBM), Sathvika Vasireddy, Shrikanth Hegde, Stephen Rothwell, Sourabh Jain, Thorsten Blum, Vaibhav Jain, Venkat Rao Bagalkote, Viktor Malik
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmgzykEACgkQpnEsdPSH
ZJTEIg//a+m4lxMOO5rm6ND2nmp0/fDtC9aBpX/GNH6ZkOJX+aFH1eC0vcXX3guh
N5dMc9bbOomhDMABWH7fZcs/VCMaSCd+vrg/IulpZmqrLnd+w/qlD9IKbAcQrE7y
8m/zQ+bwsq7MRVDBHRqEmvqpUVAvAqTtF7iVA5k/YptNHCDjqI8y0/YcDBOmJrsd
GhbFVWFS95M5TeC4SFRfzr9Tb6GffNMqhZROeJfmzJlwigAztcfw7oNUvFbnexLI
Vz3Xoflbv6oY1azdq+W9XbUcH+QCG0Ua6nVpRqutzYYFCCnUysv1EujeTGVdoR7v
KDuzKSPLk6LuJRi8T/nOl2ggUD6zREJ9CE6/uRBntGBt+bl2Zb/yDr8xJ1ELHkdB
TzPGy6PoJ1lxyvV9clO1TdSBuGYMV/21XK7O5MG7tCSFUCTdAOVp9wbAHOUAU7+Q
5rvW9yrNhMPcAfWV3uFY3DPVYWnDFxVmsRVL87S3vPsvQm8v63AJYWctmjvqnuQE
619F13HiS3M4HqcPupKsxV5w/1Yj2+Wz8QKg2umtdBdlSA9fmdtDpsEXh7kJJrah
mVR9m28ffbEmd1pBaHoUVBT0j1pEQtXXIiKUhEdANfkBfoRIpeROq1VqLLeKos3B
Dj9KgR+EOqwR0syqkV6AHAaSwqCur1r6ZSkGTA8urC/2HJ/YZC0=
=dsj5
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Madhavan Srinivasan:
- Support for dynamic preemption
- Migrate powerpc boards GPIO driver to new setter API
- Added new PMU for KVM host-wide measurement
- Enhancement to htmdump driver to support more functions
- Added character device for couple RTAS supported APIs
- Minor fixes and cleanup
Thanks to Amit Machhiwal, Athira Rajeev, Bagas Sanjaya, Bartosz
Golaszewski, Christophe Leroy, Eddie James, Gaurav Batra, Gautam
Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Jiri Slaby
(SUSE), Linus Walleij, Michal Suchanek, Naveen N Rao (AMD), Nilay
Shroff, Ricardo B. Marlière, Ritesh Harjani (IBM), Sathvika Vasireddy,
Shrikanth Hegde, Stephen Rothwell, Sourabh Jain, Thorsten Blum, Vaibhav
Jain, Venkat Rao Bagalkote, and Viktor Malik.
* tag 'powerpc-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (52 commits)
MAINTAINERS: powerpc: Remove myself as a reviewer
powerpc/iommu: Use str_disabled_enabled() helper
powerpc/powermac: Use str_enabled_disabled() and str_on_off() helpers
powerpc/mm/fault: Use str_write_read() helper function
powerpc: Replace strcpy() with strscpy() in proc_ppc64_init()
powerpc/pseries/iommu: Fix kmemleak in TCE table userspace view
powerpc/kernel: Fix ppc_save_regs inclusion in build
powerpc: Transliterate author name and remove FIXME
powerpc/pseries/htmdump: Include header file to get is_kvm_guest() definition
KVM: PPC: Book3S HV: Fix IRQ map warnings with XICS on pSeries KVM Guest
powerpc/8xx: Reduce alignment constraint for kernel memory
powerpc/boot: Fix build with gcc 15
powerpc/pseries/htmdump: Add documentation for H_HTM debugfs interface
powerpc/pseries/htmdump: Add htm capabilities support to htmdump module
powerpc/pseries/htmdump: Add htm flags support to htmdump module
powerpc/pseries/htmdump: Add htm setup support to htmdump module
powerpc/pseries/htmdump: Add htm info support to htmdump module
powerpc/pseries/htmdump: Add htm status support to htmdump module
powerpc/pseries/htmdump: Add htm start support to htmdump module
powerpc/pseries/htmdump: Add htm configure support to htmdump module
...
API:
- Fix memcpy_sglist to handle partially overlapping SG lists.
- Use memcpy_sglist to replace null skcipher.
- Rename CRYPTO_TESTS to CRYPTO_BENCHMARK.
- Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS.
- Hide CRYPTO_MANAGER.
- Add delayed freeing of driver crypto_alg structures.
Compression:
- Allocate large buffers on first use instead of initialisation in scomp.
- Drop destination linearisation buffer in scomp.
- Move scomp stream allocation into acomp.
- Add acomp scatter-gather walker.
- Remove request chaining.
- Add optional async request allocation.
Hashing:
- Remove request chaining.
- Add optional async request allocation.
- Move partial block handling into API.
- Add ahash support to hmac.
- Fix shash documentation to disallow usage in hard IRQs.
Algorithms:
- Remove unnecessary SIMD fallback code on x86 and arm/arm64.
- Drop avx10_256 xts(aes)/ctr(aes) on x86.
- Improve avx-512 optimisations for xts(aes).
- Move chacha arch implementations into lib/crypto.
- Move poly1305 into lib/crypto and drop unused Crypto API algorithm.
- Disable powerpc/poly1305 as it has no SIMD fallback.
- Move sha256 arch implementations into lib/crypto.
- Convert deflate to acomp.
- Set block size correctly in cbcmac.
Drivers:
- Do not use sg_dma_len before mapping in sun8i-ss.
- Fix warm-reboot failure by making shutdown do more work in qat.
- Add locking in zynqmp-sha.
- Remove cavium/zip.
- Add support for PCI device 0x17D8 to ccp.
- Add qat_6xxx support in qat.
- Add support for RK3576 in rockchip-rng.
- Add support for i.MX8QM in caam.
Others:
- Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up.
- Add new SEV/SNP platform shutdown API in ccp.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmgz47AACgkQxycdCkmx
i6fvKRAAr4Xa903L0r1Q1P1alQqoFFCqimUWeH72m68LiWynHWi0lUo0z/+tKweg
mnPStz7/Ha9HRHJjdNCMPnlJqXQDkuH3bIOuBJCwduDuhHo9VGOd46XGzmGMv3gb
HKuZhI0lk7pznK3CSyD/2nHmbDCHD+7feTZSBMoN9mm875+aSoM6fdxgak8uPFcq
KbB1L+hObTn2kAPSqRrNOR8/xG2N7hdH8eax7Li+LAtqYNVT5HvWVECsB/CKRPfB
sgAv3UTzcIFapSSHUHaONppSeoqPAIAeV7SdQhJvlT+EUUR/h/B6+D9OUQQqbphQ
LBalgTnqMKl0ymDEQFQ6QyYCat9ZfNmDft2WcXEsxc8PxImkgJI1W3B8O51sOjbG
78D8JqVQ96dleo4FsBhM2wfG0b41JM6zU4raC4vS7a3qsUS+Q1MpehvcS1iORicy
SpGdE8e7DLlxKhzWyW1xJnbrtMZDC7Sa2hUnxrvP0/xOvRhChKscRVtWcf0a5q7X
8JmuvwVSOJuSbQ3MeFbQvpo5lR9+0WsNjM6e9miiH6Y7vZUKmWcq2yDp377qVzeh
7NK6+OwGIQZZExrmtPw2BXwssT9Eg+ks6Y7g2Ne7yzvrjVNfEPY7Cws/5w7p8mRS
qhrcpbJNFlWgD7YYkmGZFTQ8DCN25ipP8lklO/hbcfchqLE/o1o=
=O8L5
-----END PGP SIGNATURE-----
Merge tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Fix memcpy_sglist to handle partially overlapping SG lists
- Use memcpy_sglist to replace null skcipher
- Rename CRYPTO_TESTS to CRYPTO_BENCHMARK
- Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS
- Hide CRYPTO_MANAGER
- Add delayed freeing of driver crypto_alg structures
Compression:
- Allocate large buffers on first use instead of initialisation in scomp
- Drop destination linearisation buffer in scomp
- Move scomp stream allocation into acomp
- Add acomp scatter-gather walker
- Remove request chaining
- Add optional async request allocation
Hashing:
- Remove request chaining
- Add optional async request allocation
- Move partial block handling into API
- Add ahash support to hmac
- Fix shash documentation to disallow usage in hard IRQs
Algorithms:
- Remove unnecessary SIMD fallback code on x86 and arm/arm64
- Drop avx10_256 xts(aes)/ctr(aes) on x86
- Improve avx-512 optimisations for xts(aes)
- Move chacha arch implementations into lib/crypto
- Move poly1305 into lib/crypto and drop unused Crypto API algorithm
- Disable powerpc/poly1305 as it has no SIMD fallback
- Move sha256 arch implementations into lib/crypto
- Convert deflate to acomp
- Set block size correctly in cbcmac
Drivers:
- Do not use sg_dma_len before mapping in sun8i-ss
- Fix warm-reboot failure by making shutdown do more work in qat
- Add locking in zynqmp-sha
- Remove cavium/zip
- Add support for PCI device 0x17D8 to ccp
- Add qat_6xxx support in qat
- Add support for RK3576 in rockchip-rng
- Add support for i.MX8QM in caam
Others:
- Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up
- Add new SEV/SNP platform shutdown API in ccp"
* tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits)
x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining
crypto: qat - add missing header inclusion
crypto: api - Redo lookup on EEXIST
Revert "crypto: testmgr - Add hash export format testing"
crypto: marvell/cesa - Do not chain submitted requests
crypto: powerpc/poly1305 - add depends on BROKEN for now
Revert "crypto: powerpc/poly1305 - Add SIMD fallback"
crypto: ccp - Add missing tee info reg for teev2
crypto: ccp - Add missing bootloader info reg for pspv5
crypto: sun8i-ce - move fallback ahash_request to the end of the struct
crypto: octeontx2 - Use dynamic allocated memory region for lmtst
crypto: octeontx2 - Initialize cptlfs device info once
crypto: xts - Only add ecb if it is not already there
crypto: lrw - Only add ecb if it is not already there
crypto: testmgr - Add hash export format testing
crypto: testmgr - Use ahash for generic tfm
crypto: hmac - Add ahash support
crypto: testmgr - Ignore EEXIST on shash allocation
crypto: algapi - Add driver template support to crypto_inst_setname
crypto: shash - Set reqsize in shash_alg
...
Cleanups for the kernel's CRC (cyclic redundancy check) code:
- Use __ro_after_init where appropriate
- Remove unnecessary static_key on s390
- Rename some source code files
- Rename the crc32 and crc32c crypto API modules
- Use subsys_initcall instead of arch_initcall
- Restore maintainers for crc_kunit.c
- Fold crc16_byte() into crc16.c
- Add some SPDX license identifiers
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaDNd3xQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKz0tAQCDqDA4Jd/54nnKpChMlKH8MTQDuwfz
8GHZi50mn4Rw5gD/f+hOGItPfswBId/+MZy+rKWL7bE2e9DdJdtoqRRtwA4=
=RWFl
-----END PGP SIGNATURE-----
Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
"Cleanups for the kernel's CRC (cyclic redundancy check) code:
- Use __ro_after_init where appropriate
- Remove unnecessary static_key on s390
- Rename some source code files
- Rename the crc32 and crc32c crypto API modules
- Use subsys_initcall instead of arch_initcall
- Restore maintainers for crc_kunit.c
- Fold crc16_byte() into crc16.c
- Add some SPDX license identifiers"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crc32: add SPDX license identifier
lib/crc16: unexport crc16_table and crc16_byte()
w1: ds2406: use crc16() instead of crc16_byte() loop
MAINTAINERS: add crc_kunit.c back to CRC LIBRARY
lib/crc: make arch-optimized code use subsys_initcall
crypto: crc32 - remove "generic" from file and module names
x86/crc: drop "glue" from filenames
sparc/crc: drop "glue" from filenames
s390/crc: drop "glue" from filenames
powerpc/crc: rename crc32-vpmsum_core.S to crc-vpmsum-template.S
powerpc/crc: drop "glue" from filenames
arm64/crc: drop "glue" from filenames
arm/crc: drop "glue" from filenames
s390/crc32: Remove no-op module init and exit functions
s390/crc32: Remove have_vxrs static key
lib/crc: make the CPU feature static keys __ro_after_init
The throttle support has been added in the generic code. Remove
the driver-specific throttle support.
Besides the throttle, perf_event_overflow may return true because of
event_limit. It already does an inatomic event disable. The pmu->stop
is not required either.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250520181644.2673067-7-kan.liang@linux.intel.com
As discussed in the thread containing
https://lore.kernel.org/linux-crypto/20250510053308.GB505731@sol/, the
Power10-optimized Poly1305 code is currently not safe to call in softirq
context. Disable it for now. It can be re-enabled once it is fixed.
Fixes: ba8f8624fd ("crypto: poly1305-p10 - Glue code for optmized Poly1305 implementation for ppc64le")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
irq_linear_revmap() is deprecated, so remove all its uses and supersede
them by an identical call to irq_find_mapping().
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> # for 8xx
Link: https://lore.kernel.org/all/20250319092951.37667-42-jirislaby@kernel.org
All irq_domain_add_*() functions are going away. PowerPC is the only
user of irq_domain_add_nomap() and there is no irq_domain_create_nomap()
complement.
Therefore, to align with the rest of the kernel, rename
irq_domain_add_nomap() to irq_domain_create_nomap() and accept a
fwnode_handle instead of a device_node.
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-40-jirislaby@kernel.org
irq_domain_add_*() interfaces are going away as being obsolete now.
Switch to the preferred irq_domain_create_*() ones. Those differ in the
node parameter: They take more generic struct fwnode_handle instead of
struct device_node. Therefore, of_fwnode_handle() is added around the
original parameter.
Note some of the users can likely use dev->fwnode directly instead of
indirect of_fwnode_handle(dev->of_node). But dev->fwnode is not
guaranteed to be set for all, so this has to be investigated on case to
case basis (by people who can actually test with the HW).
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # For 8xx
Link: https://lore.kernel.org/all/20250319092951.37667-33-jirislaby@kernel.org
of_node_to_fwnode() is irqdomain's reimplementation of the "officially"
defined of_fwnode_handle(). The former is in the process of being
removed, so use the latter instead.
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-9-jirislaby@kernel.org
Add a SIMD fallback path for poly1305-p10 by converting the 2^64
based hash state into a 2^44 base. In order to ensure that the
generic fallback is actually 2^44, add ARCH_SUPPORTS_INT128 to
powerpc and make poly1305-p10 depend on it.
Fixes: ba8f8624fd ("crypto: poly1305-p10 - Glue code for optmized Poly1305 implementation for ppc64le")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Recent patch fixed an old commit
'fc2a5a6161a2 ("powerpc/64s: ppc_save_regs is now needed for all 64s builds")'
which is to include building of ppc_save_reg.c only when XMON
and KEXEC_CORE and PPC_BOOK3S are enabled. This was valid, since
ppc_save_regs was called only in replay_system_reset() of old
irq.c which was under BOOK3S.
But there has been multiple refactoring of irq.c and have
added call to ppc_save_regs() from __replay_soft_interrupts
-> replay_soft_interrupts which is part of irq_64.c included
under CONFIG_PPC64. And since ppc_save_regs is called in
CRASH_DUMP path as part of crash_setup_regs in kexec.h,
CONFIG_PPC32 also needs it.
So with this recent patch which enabled the building of
ppc_save_regs.c caused a build break when none of these
(XMON, KEXEC_CORE, BOOK3S) where enabled as part of config.
Patch to enable building of ppc_save_regs.c by defaults.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250511041111.841158-1-maddy@linux.ibm.com
tcrypt is actually a benchmarking module and not the actual tests. This
regularly causes confusion. Update the kconfig option name and help
text accordingly.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add explicit array bounds to the function prototypes for the parameters
that didn't already get handled by the conversion to use chacha_state:
- chacha_block_*():
Change 'u8 *out' or 'u8 *stream' to u8 out[CHACHA_BLOCK_SIZE].
- hchacha_block_*():
Change 'u32 *out' or 'u32 *stream' to u32 out[HCHACHA_OUT_WORDS].
- chacha_init():
Change 'const u32 *key' to 'const u32 key[CHACHA_KEY_WORDS]'.
Change 'const u8 *iv' to 'const u8 iv[CHACHA_IV_SIZE]'.
No functional changes. This just makes it clear when fixed-size arrays
are expected.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The ChaCha state matrix is 16 32-bit words. Currently it is represented
in the code as a raw u32 array, or even just a pointer to u32. This
weak typing is error-prone. Instead, introduce struct chacha_state:
struct chacha_state {
u32 x[16];
};
Convert all ChaCha and HChaCha functions to use struct chacha_state.
No functional changes.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
htmdump_init calls is_kvm_guest() to check for guest environment.
is_kvm_guest() is defined in kvm_guest.h header file. Without including
the header file, build hits error failing to find the function.
arch/powerpc/platforms/pseries/htmdump.c: In function 'htmdump_init':
arch/powerpc/platforms/pseries/htmdump.c:469:6: error: implicit declaration of function 'is_kvm_guest';
did you mean '__key_get'? [-Werror=implicit-function-declaration]
if (is_kvm_guest()) {
^~~~~~~~~~~~
__key_get
This is observed in configs where CONFIG_KVM_GUEST is disabled.
Include header file explicitly to avoid the build error
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505061324.elUl4njU-lkp@intel.com/
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250506135232.69014-1-atrajeev@linux.ibm.com
The commit 9576730d0e ("KVM: PPC: select IRQ_BYPASS_MANAGER") enabled
IRQ_BYPASS_MANAGER when CONFIG_KVM was set. Subsequently, commit
c57875f5f9 ("KVM: PPC: Book3S HV: Enable IRQ bypass") enabled IRQ
bypass and added the necessary callbacks to create/remove the mappings
between host real IRQ and the guest GSI.
The availability of IRQ bypass is determined by the arch-specific
function kvm_arch_has_irq_bypass(), which invokes
kvmppc_irq_bypass_add_producer_hv(). This function, in turn, calls
kvmppc_set_passthru_irq_hv() to create a mapping in the passthrough IRQ
map, associating a host IRQ to a guest GSI.
However, when a pSeries KVM guest (L2) is booted within an LPAR (L1)
with the kernel boot parameter `xive=off`, it defaults to using emulated
XICS controller. As an attempt to establish host IRQ to guest GSI
mappings via kvmppc_set_passthru_irq() on a PCI device hotplug
(passhthrough) operation fail, returning -ENOENT. This failure occurs
because only interrupts with EOI operations handled through OPAL calls
(verified via is_pnv_opal_msi()) are currently supported.
These mapping failures lead to below repeated warnings in the L1 host:
[ 509.220349] kvmppc_set_passthru_irq_hv: Could not assign IRQ map for (58,4970)
[ 509.220368] kvmppc_set_passthru_irq (irq 58, gsi 4970) fails: -2
[ 509.220376] vfio-pci 0015:01:00.0: irq bypass producer (token 0000000090bc635b) registration fails: -2
...
[ 509.291781] vfio-pci 0015:01:00.0: irq bypass producer (token 000000003822eed8) registration fails: -2
Fix this by restricting IRQ bypass enablement on pSeries systems by
making the IRQ bypass callbacks unavailable when running on pSeries
platform.
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250425185641.1611857-1-amachhiw@linux.ibm.com
8xx has three large page sizes: 8M, 512k and 16k.
A too big alignment can lead to wasting memory. On a board which has
only 32 MBytes of RAM, every single byte is worth it and a 512k
alignment is sometimes too much.
Allow mapping kernel memory with 16k pages and reduce the constraint
on kernel memory alignment. 512k and 16k pages are handled the same
way so reverse tests in order to make 8M pages the special case and
other ones (512k and 16k) the alternative.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/fa9927b70df13627cdf10b992ea71d6562c7760e.1746191262.git.christophe.leroy@csgroup.eu
The generic implementation of pte_{alloc_one,free}_kernel now calls the
[cd]tor, without initialising the ptlock needlessly as
pagetable_pte_ctor() skips it for init_mm.
On powerpc, all functions related to PTE allocation are implemented by
common helpers, which are passed a boolean to differentiate user from
kernel pgtables. This patch aligns the powerpc implementation with the
generic one by calling pagetable_pte_[cd]tor() unconditionally in those
helpers.
Link: https://lkml.kernel.org/r/20250408095222.860601-6-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <x86@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Always call constructor for kernel page tables", v2.
There has been much confusion around exactly when page table
constructors/destructors (pagetable_*_[cd]tor) are supposed to be called.
They were initially introduced for user PTEs only (to support split page
table locks), then at the PMD level for the same purpose. Accounting was
added later on, starting at the PTE level and then moving to higher levels
(PMD, PUD). Finally, with my earlier series "Account page tables at all
levels" [1], the ctor/dtor is run for all levels, all the way to PGD.
I thought this was the end of the story, and it hopefully is for user
pgtables, but I was wrong for what concerns kernel pgtables. The current
situation there makes very little sense:
* At the PTE level, the ctor/dtor is not called (at least in the generic
implementation). Specific helpers are used for kernel pgtables at this
level (pte_{alloc,free}_kernel()) and those have never called the
ctor/dtor, most likely because they were initially irrelevant in the
kernel case.
* At all other levels, the ctor/dtor is normally called. This is
potentially wasteful at the PMD level (more on that later).
This series aims to ensure that the ctor/dtor is always called for kernel
pgtables, as it already is for user pgtables. Besides consistency, the
main motivation is to guarantee that ctor/dtor hooks are systematically
called; this makes it possible to insert hooks to protect page tables [2],
for instance. There is however an extra challenge: split locks are not
used for kernel pgtables, and it would therefore be wasteful to initialise
them (ptlock_init()).
It is worth clarifying exactly when split locks are used. They clearly
are for user pgtables, but as illustrated in commit 61444cde91 ("ARM:
8591/1: mm: use fully constructed struct pages for EFI pgd allocations"),
they also are for special page tables like efi_mm. The one case where
split locks are definitely unused is pgtables owned by init_mm; this is
consistent with the behaviour of apply_to_pte_range().
The approach chosen in this series is therefore to pass the mm associated
to the pgtables being constructed to pagetable_{pte,pmd}_ctor() (patch 1),
and skip ptlock_init() if mm == &init_mm (patch 3 and 7). This makes it
possible to call the PTE ctor/dtor from pte_{alloc,free}_kernel() without
unintended consequences (patch 3). As a result the accounting functions
are now called at all levels for kernel pgtables, and split locks are
never initialised.
In configurations where ptlocks are dynamically allocated (32-bit,
PREEMPT_RT, etc.) and ARCH_ENABLE_SPLIT_PMD_PTLOCK is selected, this
series results in the removal of a kmem_cache allocation for every kernel
PMD. Additionally, for certain architectures that do not use
<asm-generic/pgalloc.h> such as s390, the same optimisation occurs at the
PTE level.
===
Things get more complicated when it comes to special pgtable allocators
(patch 8-12). All architectures need such allocators to create initial
kernel pgtables; we are not concerned with those as the ctor cannot be
called so early in the boot sequence. However, those allocators may also
be used later in the boot sequence or during normal operations. There are
two main use-cases:
1. Mapping EFI memory: efi_mm (arm, arm64, riscv)
2. arch_add_memory(): init_mm
The ctor is already explicitly run (at the PTE/PMD level) in the first
case, as required for pgtables that are not associated with init_mm.
However the same allocators may also be used for the second use-case (or
others), and this is where it gets messy. Patch 1 calls the ctor with
NULL as mm in those situations, as the actual mm isn't available.
Practically this means that ptlocks will be unconditionally initialised.
This is fine on arm - create_mapping_late() is only used for the EFI
mapping. On arm64, __create_pgd_mapping() is also used by
arch_add_memory(); patch 8/9/11 ensure that ctors are called at all levels
with the appropriate mm. The situation is similar on riscv, but
propagating the mm down to the ctor would require significant refactoring.
Since they are already called unconditionally, this series leaves riscv
no worse off - patch 10 adds comments to clarify the situation.
From a cursory look at other architectures implementing arch_add_memory(),
s390 and x86 may also need a similar treatment to add constructor calls.
This is to be taken care of in a future version or as a follow-up.
===
The complications in those special pgtable allocators beg the question:
does it really make sense to treat efi_mm and init_mm differently in e.g.
apply_to_pte_range()? Maybe what we really need is a way to tell if an mm
corresponds to user memory or not, and never use split locks for non-user
mm's. Feedback and suggestions welcome!
This patch (of 12):
In preparation for calling constructors for all kernel page tables while
eliding unnecessary ptlock initialisation, let's pass down the associated
mm to the PTE/PMD level ctors. (These are the two levels where ptlocks
are used.)
In most cases the mm is already around at the point of calling the ctor so
we simply pass it down. This is however not the case for special page
table allocators:
* arch/arm/mm/mmu.c
* arch/arm64/mm/mmu.c
* arch/riscv/mm/init.c
In those cases, the page tables being allocated are either for standard
kernel memory (init_mm) or special page directories, which may not be
associated to any mm. For now let's pass NULL as mm; this will be refined
where possible in future patches.
No functional change in this patch.
Link: https://lore.kernel.org/linux-mm/20250103184415.2744423-1-kevin.brodsky@arm.com/ [1]
Link: https://lore.kernel.org/linux-hardening/20250203101839.1223008-1-kevin.brodsky@arm.com/ [2]
Link: https://lkml.kernel.org/r/20250408095222.860601-1-kevin.brodsky@arm.com
Link: https://lkml.kernel.org/r/20250408095222.860601-2-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: <x86@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/ptdump: Drop assumption that pxd_val() is u64", v2.
Last argument passed down in note_page() is u64 assuming pxd_val()
returned value (all page table levels) is 64 bit - which might not be the
case going ahead when D128 page tables is enabled on arm64 platform.
Besides pxd_val() is very platform specific and its type should not be
assumed in generic MM. A similar problem exists for effective_prot(),
although it is restricted to x86 platform.
This series splits note_page() and effective_prot() into individual page
table level specific callbacks which accepts corresponding pxd_t page
table entry as an argument instead and later on all subscribing platforms
could derive pxd_val() from the table entries as required and proceed as
before.
Define ptdesc_t type which describes the basic page table descriptor
layout on arm64 platform. Subsequently all level specific pxxval_t
descriptors are derived from ptdesc_t thus establishing a common original
format, which can also be appropriate for page table entries, masks and
protection values etc which are used at all page table levels.
This patch (of 3):
Last argument passed down in note_page() is u64 assuming pxd_val()
returned value (all page table levels) is 64 bit - which might not be the
case going ahead when D128 page tables is enabled on arm64 platform.
Besides pxd_val() is very platform specific and its type should not be
assumed in generic MM.
Split note_page() into individual page table level specific callbacks
which accepts corresponding pxd_t argument instead and then subscribing
platforms just derive pxd_val() from the entries as required and proceed
as earlier.
Also add a note_page_flush() callback for flushing the last page table
page that was being handled earlier via level = -1.
Link: https://lkml.kernel.org/r/20250407053113.746295-1-anshuman.khandual@arm.com
Link: https://lkml.kernel.org/r/20250407053113.746295-2-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There are now no callers of mk_huge_pmd() and mk_pmd(). Remove them.
Link: https://lkml.kernel.org/r/20250402181709.2386022-12-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Richard Weinberger <richard@nod.at>
Cc: <x86@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Most architectures simply call pfn_pte(). Centralise that as the normal
definition and remove the definition of mk_pte() from the architectures
which have either that exact definition or something similar.
Link: https://lkml.kernel.org/r/20250402181709.2386022-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390
Cc: Zi Yan <ziy@nvidia.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Richard Weinberger <richard@nod.at>
Cc: <x86@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add poly1305_emit_arch with fallback instead of calling assembly
directly. This is because the state format differs between p10
and that of the generic implementation.
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reported-by: Eric Biggers <ebiggers@google.com>
Fixes: 14d3197914 ("crypto: powerpc/poly1305 - Add block-only interface")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Make the architecture-optimized CRC code do its CPU feature checks in
subsys_initcalls instead of arch_initcalls. This makes it consistent
with arch/*/lib/crypto/ and ensures that it runs after initcalls that
possibly could be a prerequisite for kernel-mode FPU, such as x86's
xfd_update_static_branch() and loongarch's init_euen_mask().
Note: as far as I can tell, x86's xfd_update_static_branch() isn't
*actually* needed for kernel-mode FPU. loongarch's init_euen_mask() is
needed to enable save/restore of the vector registers, but loongarch
doesn't yet have any CRC or crypto code that uses vector registers
anyway. Regardless, let's be consistent with arch/*/lib/crypto/ and
robust against any potential future dependency on an arch_initcall.
Link: https://lore.kernel.org/r/20250510035959.87995-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Rename poly1305_emit_64 to poly1305_emit_arch to conform with
the expectation of the poly1305 library.
Reported-by: Thorsten Leemhuis <linux@leemhuis.info>
Fixes: 14d3197914 ("crypto: powerpc/poly1305 - Add block-only interface")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Similar to x86 the ppc boot code does not build with GCC 15.
Copy the fix from
commit ee2ab467bd ("x86/boot: Use '-std=gnu11' to fix build with GCC 15")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Tested-by: Amit Machhiwal <amachhiw@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250331105722.19709-1-msuchanek@suse.de
Export the block functions as GPL only, there is no reason
to let arbitrary modules use these internal functions.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This reverts commit c4741b2305.
Crypto API self-tests no longer run at registration time and now
occur either at late_initcall or upon the first use.
Therefore the premise of the above commit no longer exists. Revert
it and subsequent additions of subsys_initcall and arch_initcall.
Note that lib/crypto calls will stay at subsys_initcall (or rather
downgraded from arch_initcall) because they may need to occur
before Crypto API registration.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of providing crypto_shash algorithms for the arch-optimized
SHA-256 code, instead implement the SHA-256 library. This is much
simpler, it makes the SHA-256 library functions be arch-optimized, and
it fixes the longstanding issue where the arch-optimized SHA-256 was
disabled by default. SHA-256 still remains available through
crypto_shash, but individual architectures no longer need to handle it.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that every architecture provides a block function, use that
to implement the lib/poly1305 and remove the old per-arch code.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Merge mainline to pick up bcachefs poly1305 patch 4bf4b5046d
("bcachefs: use library APIs for ChaCha20 and Poly1305"). This
is a prerequisite for removing the poly1305 shash algorithm.
Support dumping HTM capabilities information from Hardware
Trace Macro (HTM) function via debugfs interface. Under
debugfs folder "/sys/kernel/debug/powerpc/htmdump", add
file "htmcaps".
The interface allows only read of this file which will present the
content of HTM buffer from the hcall.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250420180844.53128-9-atrajeev@linux.ibm.com
Under debugfs folder, "/sys/kernel/debug/powerpc/htmdump", add file
"htmflags". Currently supported flag value is to enable/disable
HTM buffer wrap. wrap is used along with "configure" to prevent
HTM buffer from wrapping. Writing 1 will set noWrap while
configuring HTM
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250420180844.53128-8-atrajeev@linux.ibm.com
Add htm setup support to htmdump module. To use the
HTM (Hardware Trace Macro), HTM buffer has to be allocated.
Support setup of HTM buffers via debugfs interface. Under
debugfs folder, "/sys/kernel/debug/powerpc/htmdump", add file
"htmsetup". The interface allows setup of HTM buffer by writing
size of HTM buffer in power of 2 to the "htmsetup" file
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250420180844.53128-7-atrajeev@linux.ibm.com
Support dumping system processor configuration from Hardware
Trace Macro (HTM) function via debugfs interface. Under
debugfs folder "/sys/kernel/debug/powerpc/htmdump", add
file "htminfo".
The interface allows only read of this file which will present the
content of HTM buffer from the hcall. The 16th offset of HTM
buffer has value for the number of entries for array of processors.
Use this information to copy data to the debugfs file
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250420180844.53128-6-atrajeev@linux.ibm.com
Support dumping status of Hardware Trace Macro (HTM) function
via debugfs interface. Under debugfs folder
"/sys/kernel/debug/powerpc/htmdump", add file "htmstatus".
The interface allows only read of this file which will present the
content of HTM status buffer from the hcall. The 16th offset of HTM
status buffer has value for the number of HTM entries in the status
buffer. Each nest htm status entry is 0x6 bytes, where as core HTM
status entry is 0x8 bytes. Calculate the number of bytes to read
based on this detail.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250420180844.53128-5-atrajeev@linux.ibm.com
Support starting of Hardware Trace Macro (HTM) function
via debugfs interface. Under debugfs folder
"/sys/kernel/debug/powerpc/htmdump", add file "htmstart".
The interface allows starting of htm via this file by
writing value "1". Also allows stopping of htm tracing by
writing value "0" to this file. Any other value returns
-EINVAL.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250420180844.53128-4-atrajeev@linux.ibm.com
Support configuring of Hardware Trace Macro (HTM) function
via debugfs interface. Under debugfs folder
"/sys/kernel/debug/powerpc/htmdump", add file "htmconfigure".
The interface allows configuring of htm via this file
by writing value "1". Allow deconfiguring of htm via this file
by writing value "0". Any other value returns -EINVAL.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250420180844.53128-3-atrajeev@linux.ibm.com
H_HTM (Hardware Trace Macro) hypervisor call is an HCALL to export data
from Hardware Trace Macro (HTM) function. The debugfs interface to
export the HTM function data in an lpar currently supports only dumping
of HTM data in an lpar. To add support for setup, configuration and
control of HTM function via debugfs interface, update the hcall wrapper
function. Rename and update htm_get_dump_hardware to htm_hcall_wrapper()
so that it can be used for other HTM operations as well. Additionally
include parameter "htm_op". Update htmdump module to check the return
code of hcall in a separate function so that it can be reused for other
option too. Add check to disable the interface in guest environment.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250420180844.53128-2-atrajeev@linux.ibm.com
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc 8xx
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250502-gpiochip-set-rv-powerpc-v2-5-488e43e325bf@linaro.org
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250502-gpiochip-set-rv-powerpc-v2-4-488e43e325bf@linaro.org
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250502-gpiochip-set-rv-powerpc-v2-3-488e43e325bf@linaro.org
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250502-gpiochip-set-rv-powerpc-v2-2-488e43e325bf@linaro.org
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250502-gpiochip-set-rv-powerpc-v2-1-488e43e325bf@linaro.org
Fix the following build warning:
usr/include/asm/papr-platform-dump.h:12: found __[us]{8,16,32,64} type without #include <linux/types.h>
Fixes: 8aa9efc0be ("powerpc/pseries: Add papr-platform-dump character driver for dump retrieval")
Closes: https://lore.kernel.org/linux-next/20250429185735.034ba678@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
[Maddy: fixed the commit to combine tags together]
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250429211419.1081354-1-haren@linux.ibm.com
In non-smp configurations, crash_kexec_prepare is never called in
the crash shutdown path. One result of this is that the crashing_cpu
variable is never set, preventing crash_save_cpu from storing the
NT_PRSTATUS elf note in the core dump.
Fixes: c7255058b5 ("powerpc/crash: save cpu register data in crash_smp_send_stop()")
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250211162054.857762-1-eajames@linux.ibm.com
The Fixes commit below tried to add CONFIG_PPC_BOOK3S to one of the
conditions to enable the build of ppc_save_regs.o. But it failed to do
so, in fact. The commit omitted to add a dollar sign.
Therefore, ppc_save_regs.o is built always these days (as
"(CONFIG_PPC_BOOK3S)" is never an empty string).
Fix this by adding the missing dollar sign.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Fixes: fc2a5a6161 ("powerpc/64s: ppc_save_regs is now needed for all 64s builds")
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250417105305.397128-1-jirislaby@kernel.org