- The 4 patch series "mseal cleanups" from Lorenzo Stoakes erforms some
mseal cleaning with no intended functional change.
- The 3 patch series "Optimizations for khugepaged" from David
Hildenbrand improves khugepaged throughput by batching PTE operations
for large folios. This gain is mainly for arm64.
- The 8 patch series "x86: enable EXECMEM_ROX_CACHE for ftrace and
kprobes" from Mike Rapoport provides a bugfix, additional debug code and
cleanups to the execmem code.
- The 7 patch series "mm/shmem, swap: bugfix and improvement of mTHP
swap in" from Kairui Song provides bugfixes, cleanups and performance
improvememnts to the mTHP swapin code.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaI+6HQAKCRDdBJ7gKXxA
jv7lAQCAKE5dUhdZ0pOYbhBKTlDapQh2KqHrlV3QFcxXgknEoQD/c3gG01rY3fLh
Cnf5l9+cdyfKxFniO48sUPx6IpriRg8=
=HT5/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "mseal cleanups" (Lorenzo Stoakes)
Some mseal cleaning with no intended functional change.
- "Optimizations for khugepaged" (David Hildenbrand)
Improve khugepaged throughput by batching PTE operations for large
folios. This gain is mainly for arm64.
- "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport)
A bugfix, additional debug code and cleanups to the execmem code.
- "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song)
Bugfixes, cleanups and performance improvememnts to the mTHP swapin
code"
* tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits)
mm: mempool: fix crash in mempool_free() for zero-minimum pools
mm: correct type for vmalloc vm_flags fields
mm/shmem, swap: fix major fault counting
mm/shmem, swap: rework swap entry and index calculation for large swapin
mm/shmem, swap: simplify swapin path and result handling
mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
mm/shmem, swap: tidy up swap entry splitting
mm/shmem, swap: tidy up THP swapin checks
mm/shmem, swap: avoid redundant Xarray lookup during swapin
x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
execmem: drop writable parameter from execmem_fill_trapping_insns()
execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
execmem: move execmem_force_rw() and execmem_restore_rox() before use
execmem: rework execmem_cache_free()
execmem: introduce execmem_alloc_rw()
execmem: drop unused execmem_update_copy()
mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
mm/rmap: add anon_vma lifetime debug check
mm: remove mm/io-mapping.c
...
Some callers of execmem_alloc() require the memory to be temporarily
writable even when it is allocated from ROX cache. These callers use
execemem_make_temp_rw() right after the call to execmem_alloc().
Wrap this sequence in execmem_alloc_rw() API.
Link: https://lkml.kernel.org/r/20250713071730.4117334-3-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The variable last_unloaded_module::name tracks the name of the last
unloaded module. It is a string copy of module::name, which is
MODULE_NAME_LEN bytes in size and includes the NUL terminator. Therefore,
the size of last_unloaded_module::name can also be just MODULE_NAME_LEN,
without the need for an extra byte.
Fixes: e14af7eeb4 ("debug: track and print last unloaded module in the oops trace")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Link: https://lore.kernel.org/r/20250630143535.267745-3-petr.pavlu@suse.com
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Passing a module name longer than MODULE_NAME_LEN to the delete_module
syscall results in its silent truncation. This really isn't much of
a problem in practice, but it could theoretically lead to the removal of an
incorrect module. It is more sensible to return ENAMETOOLONG or ENOENT in
such a case.
Update the syscall to return ENOENT, as documented in the delete_module(2)
man page to mean "No module by that name exists." This is appropriate
because a module with a name longer than MODULE_NAME_LEN cannot be loaded
in the first place.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Link: https://lore.kernel.org/r/20250630143535.267745-2-petr.pavlu@suse.com
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
The struct was moved to the public header file in commit c8e21ced08
("module: fix kdb's illicit use of struct module_use.").
Back then the structure was used outside of the module core.
Nowadays this is not true anymore, so the structure can be made internal.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Link: https://lore.kernel.org/r/20250711-kunit-ifdef-modules-v2-1-39443decb1f8@linutronix.de
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
- Keep track of when fgraph_ops are registered or not
Keep accounting of when fgraph_ops are registered as if a fgraph_ops is
registered twice it can mess up the accounting and it will not work as
expected later. Trigger a warning if something registers it twice as to
catch bugs before they are found by things just not working as expected.
- Make DYNAMIC_FTRACE always enabled for architectures that support it
As static ftrace (where all functions are always traced) is very expensive
and only exists to help architectures support ftrace, do not make it an
option. As soon as an architecture supports DYNAMIC_FTRACE make it use it.
This simplifies the code.
- Remove redundant config HAVE_FTRACE_MCOUNT_RECORD
The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the
DYNAMIC_FTRACE work, but now every architecture that implements
DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant
with the HAVE_DYNAMIC_FTRACE.
- Make pid_ptr string size match the comment
In print_graph_proc() the pid_ptr string is of size 11, but the comment says
/* sign + log10(MAX_INT) + '\0' */ which is actually 12.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaIkVkRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qmdxAPsGcyT/gnyX/wf70cI63QoODrlRAd7M
tg3R0J0H41U05QD/apttbA9GSdZ8bDLLSFAXTJgr8f4GvYvbUsmu2sMBBA8=
=gd9V
-----END PGP SIGNATURE-----
Merge tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Keep track of when fgraph_ops are registered or not
Keep accounting of when fgraph_ops are registered as if a fgraph_ops
is registered twice it can mess up the accounting and it will not
work as expected later. Trigger a warning if something registers it
twice as to catch bugs before they are found by things just not
working as expected.
- Make DYNAMIC_FTRACE always enabled for architectures that support it
As static ftrace (where all functions are always traced) is very
expensive and only exists to help architectures support ftrace, do
not make it an option. As soon as an architecture supports
DYNAMIC_FTRACE make it use it. This simplifies the code.
- Remove redundant config HAVE_FTRACE_MCOUNT_RECORD
The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the
DYNAMIC_FTRACE work, but now every architecture that implements
DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it
redundant with the HAVE_DYNAMIC_FTRACE.
- Make pid_ptr string size match the comment
In print_graph_proc() the pid_ptr string is of size 11, but the
comment says /* sign + log10(MAX_INT) + '\0' */ which is actually 12.
* tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD
ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support it
fgraph: Keep track of when fgraph_ops are registered or not
fgraph: Make pid_str size match the comment
* Move sysctls out of the kern_table array
This is the final move of ctl_tables into their respective subsystems. Only 5
(out of the original 50) will remain in kernel/sysctl.c file; these handle
either sysctl or common arch variables.
By decentralizing sysctl registrations, subsystem maintainers regain control
over their sysctl interfaces, improving maintainability and reducing the
likelihood of merge conflicts.
* docs: Remove false positives from check-sysctl-docs
Stopped falsely identifying sysctls as undocumented or unimplemented in the
check-sysctl-docs script. This script can now be used to automatically
identify if documentation is missing.
* Testing
All these have been in linux-next since rc3, giving them a solid 3 to 4 weeks
worth of testing. Additionally, sysctl selftests and kunit were also run
locally on my x86_64
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmiAvd8ACgkQupfNUreW
QU+9nAv/dtxaKoL4BXJSzsA2+49bbo9QfiK5Vjz1wSRYRQTb+jhGr9QdS5hG+NeX
uN2ilvcNQqW7ENdiblU10lvcbPjIn2hw4lbMcpv/+QXnrudtGYlBFXlkWqW5nv7X
AVvHU8y3uzfs6JbRIpROUA7Cn2cDOlfP2mMtwxCXR3iP+orS1ziuVEi1JRoirIyG
iq5I/1rJMJBU3FjqqDTq6yljspLx8AlXO1yc5xUxAM67IcY4ew3ZTxqiZr6M9AhV
DUbR2lu/88wcFNERt8DJmuQ50dSGGqOEpK3FURTmkwtMFxzNLmenFDQeBKKahz3Q
2ntXSDfp2y+ppZNmcOP8tZZkra03Xpy1DQyoOgQ2r9uGekPxyr+wmKXwYPOeJIPO
YWTNBm8omX9qr49zVzaZ1f2foRGfgStHL6aa6xLIf34zzScSDEPtO3og2+5Hw/30
gnp+7v9E19uKpoE6oiGE0PtiFzAi/I6nFxzG2RRqrlMLFXyKVccTKygzY6tCnI3P
6144s/Bt
=R369
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Move sysctls out of the kern_table array
This is the final move of ctl_tables into their respective
subsystems. Only 5 (out of the original 50) will remain in
kernel/sysctl.c file; these handle either sysctl or common arch
variables.
By decentralizing sysctl registrations, subsystem maintainers regain
control over their sysctl interfaces, improving maintainability and
reducing the likelihood of merge conflicts.
- docs: Remove false positives from check-sysctl-docs
Stopped falsely identifying sysctls as undocumented or unimplemented
in the check-sysctl-docs script. This script can now be used to
automatically identify if documentation is missing.
* tag 'sysctl-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (23 commits)
docs: Downgrade arm64 & riscv from titles to comment
docs: Replace spaces with tabs in check-sysctl-docs
docs: Remove colon from ctltable title in vm.rst
docs: Add awk section for ucount sysctl entries
docs: Use skiplist when checking sysctl admin-guide
docs: nixify check-sysctl-docs
sysctl: rename kern_table -> sysctl_subsys_table
kernel/sys.c: Move overflow{uid,gid} sysctl into kernel/sys.c
uevent: mv uevent_helper into kobject_uevent.c
sysctl: Removed unused variable
sysctl: Nixify sysctl.sh
sysctl: Remove superfluous includes from kernel/sysctl.c
sysctl: Remove (very) old file changelog
sysctl: Move sysctl_panic_on_stackoverflow to kernel/panic.c
sysctl: move cad_pid into kernel/pid.c
sysctl: Move tainted ctl_table into kernel/panic.c
Input: sysrq: mv sysrq into drivers/tty/sysrq.c
fork: mv threads-max into kernel/fork.c
parisc/power: Move soft-power into power.c
mm: move randomize_va_space into memory.c
...
- DEBUGFS
- Remove unneeded debugfs_file_{get,put}() instances
- Remove last remnants of debugfs_real_fops()
- Allow storing non-const void * in struct debugfs_inode_info::aux
- SYSFS
- Switch back to attribute_group::bin_attrs (treewide)
- Switch back to bin_attribute::read()/write() (treewide)
- Constify internal references to 'struct bin_attribute'
- Support cache-ids for device-tree systems
- Add arch hook arch_compact_of_hwid()
- Use arch_compact_of_hwid() to compact MPIDR values on arm64
- Rust
- Device
- Introduce CoreInternal device context (for bus internal methods)
- Provide generic drvdata accessors for bus devices
- Provide Driver::unbind() callbacks
- Use the infrastructure above for auxiliary, PCI and platform
- Implement Device::as_bound()
- Rename Device::as_ref() to Device::from_raw() (treewide)
- Implement fwnode and device property abstractions
- Implement example usage in the Rust platform sample driver
- Devres
- Remove the inner reference count (Arc) and use pin-init instead
- Replace Devres::new_foreign_owned() with devres::register()
- Require T to be Send in Devres<T>
- Initialize the data kept inside a Devres last
- Provide an accessor for the Devres associated Device
- Device ID
- Add support for ACPI device IDs and driver match tables
- Split up generic device ID infrastructure
- Use generic device ID infrastructure in net::phy
- DMA
- Implement the dma::Device trait
- Add DMA mask accessors to dma::Device
- Implement dma::Device for PCI and platform devices
- Use DMA masks from the DMA sample module
- I/O
- Implement abstraction for resource regions (struct resource)
- Implement resource-based ioremap() abstractions
- Provide platform device accessors for I/O (remap) requests
- Misc
- Support fallible PinInit types in Revocable
- Implement Wrapper<T> for Opaque<T>
- Merge pin-init blanket dependencies (for Devres)
- Misc
- Fix OF node leak in auxiliary_device_create()
- Use util macros in device property iterators
- Improve kobject sample code
- Add device_link_test() for testing device link flags
- Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits
- Hint to prefer container_of_const() over container_of()
-----BEGIN PGP SIGNATURE-----
iHQEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCaIjkhwAKCRBFlHeO1qrK
LpXuAP9RWwfD9ZGgQZ9OsMk/0pZ2mDclaK97jcmI9TAeSxeZMgD1FHnOMTY7oSIi
iG7Muq0yLD+A5gk9HUnMUnFNrngWCg==
=jgRj
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"debugfs:
- Remove unneeded debugfs_file_{get,put}() instances
- Remove last remnants of debugfs_real_fops()
- Allow storing non-const void * in struct debugfs_inode_info::aux
sysfs:
- Switch back to attribute_group::bin_attrs (treewide)
- Switch back to bin_attribute::read()/write() (treewide)
- Constify internal references to 'struct bin_attribute'
Support cache-ids for device-tree systems:
- Add arch hook arch_compact_of_hwid()
- Use arch_compact_of_hwid() to compact MPIDR values on arm64
Rust:
- Device:
- Introduce CoreInternal device context (for bus internal methods)
- Provide generic drvdata accessors for bus devices
- Provide Driver::unbind() callbacks
- Use the infrastructure above for auxiliary, PCI and platform
- Implement Device::as_bound()
- Rename Device::as_ref() to Device::from_raw() (treewide)
- Implement fwnode and device property abstractions
- Implement example usage in the Rust platform sample driver
- Devres:
- Remove the inner reference count (Arc) and use pin-init instead
- Replace Devres::new_foreign_owned() with devres::register()
- Require T to be Send in Devres<T>
- Initialize the data kept inside a Devres last
- Provide an accessor for the Devres associated Device
- Device ID:
- Add support for ACPI device IDs and driver match tables
- Split up generic device ID infrastructure
- Use generic device ID infrastructure in net::phy
- DMA:
- Implement the dma::Device trait
- Add DMA mask accessors to dma::Device
- Implement dma::Device for PCI and platform devices
- Use DMA masks from the DMA sample module
- I/O:
- Implement abstraction for resource regions (struct resource)
- Implement resource-based ioremap() abstractions
- Provide platform device accessors for I/O (remap) requests
- Misc:
- Support fallible PinInit types in Revocable
- Implement Wrapper<T> for Opaque<T>
- Merge pin-init blanket dependencies (for Devres)
Misc:
- Fix OF node leak in auxiliary_device_create()
- Use util macros in device property iterators
- Improve kobject sample code
- Add device_link_test() for testing device link flags
- Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits
- Hint to prefer container_of_const() over container_of()"
* tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits)
rust: io: fix broken intra-doc links to `platform::Device`
rust: io: fix broken intra-doc link to missing `flags` module
rust: io: mem: enable IoRequest doc-tests
rust: platform: add resource accessors
rust: io: mem: add a generic iomem abstraction
rust: io: add resource abstraction
rust: samples: dma: set DMA mask
rust: platform: implement the `dma::Device` trait
rust: pci: implement the `dma::Device` trait
rust: dma: add DMA addressing capabilities
rust: dma: implement `dma::Device` trait
rust: net::phy Change module_phy_driver macro to use module_device_table macro
rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id
rust: device_id: split out index support into a separate trait
device: rust: rename Device::as_ref() to Device::from_raw()
arm64: cacheinfo: Provide helper to compress MPIDR value into u32
cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
cacheinfo: Set cache 'id' based on DT data
container_of: Document container_of() is not to be used in new code
driver core: auxiliary bus: fix OF node leak
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmiD0Y4UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNc3RAAnLjNithviO7pal68D+4oZDvAhh7S
B1zH3hS9mcRZtDa0ZwdzxlzZm0qacpZZgqha3vAPCnke/nYlnCJa8Iz6XzRYw1Dy
dHVa9W/F53KZgUIaI3FIuMN9aaO8bjZFJLHMtjZyL+DjmIoOfDBE1b1wdgqDFOI6
bx2czQaRP/3dP9xFRlTxDviMJXBajonvUZaOOTd/2kHaHRNSSjEja1zE0E4CVKTd
6eMT3Oj1wjVJWIVgGd5BkuXTa1aSWNu2qYrigL8572VeIllv9yZQvJEaypmmicTD
XL7aAdYYitTPOL9KGxANmCOfKfan2JYDPqWBhm+UjlKXq4PYbw0zgEYniJ6wmYkq
LDKOc+Is3QbkPCxbIxu11pugcIUlI8Kzy43bd2SwrpVqXAB378eFqmUkLEohSxaj
TVnW/r+GMh3dx8Rsig0XenhlKvJLKJrPHHRgJ2Kob4aPf2917VX/HRngqwJ6V8C4
p4jhcky8f5/XwcBmZBt8xYuHDY7QCe1U9dwYjjrm6szzDnPrMv1lupxn55C/Bton
wdlNHk/2199aguWkSAchJlPXE64eam+G88KGjxxCW5jQmbQU8myOrqipf1lxavyp
sDlve5f1mhgeV6ZoZzJkiRUKSz5VvMKoiook8hfyj3rRtzh1Kz5J0ojFaPH46keA
ixmmkpn6Ydw7yV8=
=fLqm
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit update from Paul Moore:
"A single audit patch that restores logging of an audit event in the
module load failure case"
* tag 'audit-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit,module: restore audit logging in load failure case
Move module sysctl (modprobe_path and modules_disabled) out of sysctl.c
and into the modules subsystem. Make modules_disabled static as it no
longer needs to be exported. Remove module.h from the includes in sysctl
as it no longer uses any module exported variables.
This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
Ftrace is tightly coupled with architecture specific code because it
requires the use of trampolines written in assembly. This means that when
a new feature or optimization is made, it must be done for all
architectures. To simplify the approach, CONFIG_HAVE_FTRACE_* configs are
added to denote which architecture has the new enhancement so that other
architectures can still function until they too have been updated.
The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the
DYNAMIC_FTRACE work, but now every architecture that implements
DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant
with the HAVE_DYNAMIC_FTRACE.
Remove the HAVE_FTRACE_MCOUNT config and use DYNAMIC_FTRACE directly where
applicable.
Link: https://lore.kernel.org/all/20250703154916.48e3ada7@gandalf.local.home/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20250704104838.27a18690@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The per-CPU data section is handled differently than the other sections.
The memory allocations requires a special __percpu pointer and then the
section is copied into the view of each CPU. Therefore the SHF_ALLOC
flag is removed to ensure move_module() skips it.
Later, relocations are applied and apply_relocations() skips sections
without SHF_ALLOC because they have not been copied. This also skips the
per-CPU data section.
The missing relocations result in a NULL pointer on x86-64 and very
small values on x86-32. This results in a crash because it is not
skipped like NULL pointer would and can't be dereferenced.
Such an assignment happens during static per-CPU lock initialisation
with lockdep enabled.
Allow relocation processing for the per-CPU section even if SHF_ALLOC is
missing.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
Fixes: 1a6100caae425 ("Don't relocate non-allocated regions in modules.") #v2.6.1-rc3
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Link: https://lore.kernel.org/r/20250610163328.URcsSUC1@linutronix.de
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Message-ID: <20250610163328.URcsSUC1@linutronix.de>
All error conditions in move_module() set the return value by updating the
ret variable. Therefore, it is not necessary to the initialize the variable
when declaring it.
Remove the unnecessary initialization.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Link: https://lore.kernel.org/r/20250618122730.51324-3-petr.pavlu@suse.com
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Message-ID: <20250618122730.51324-3-petr.pavlu@suse.com>
The function move_module() uses the variable t to track how many memory
types it has allocated and consequently how many should be freed if an
error occurs.
The variable is initially set to 0 and is updated when a call to
module_memory_alloc() fails. However, move_module() can fail for other
reasons as well, in which case t remains set to 0 and no memory is freed.
Fix the problem by initializing t to MOD_MEM_NUM_TYPES. Additionally, make
the deallocation loop more robust by not relying on the mod_mem_type_t enum
having a signed integer as its underlying type.
Fixes: c7ee8aebf6 ("module: add stop-grap sanity check on module memcpy()")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Link: https://lore.kernel.org/r/20250618122730.51324-2-petr.pavlu@suse.com
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Message-ID: <20250618122730.51324-2-petr.pavlu@suse.com>
The move of the module sanity check to earlier skipped the audit logging
call in the case of failure and to a place where the previously used
context is unavailable.
Add an audit logging call for the module loading failure case and get
the module name when possible.
Link: https://issues.redhat.com/browse/RHEL-52839
Fixes: 02da2cbab4 ("module: move check_modinfo() early to early_mod_check()")
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
- Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which exports a
symbol only to specified modules
- Improve ABI handling in gendwarfksyms
- Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n
- Add checkers for redundant or missing <linux/export.h> inclusion
- Deprecate the extra-y syntax
- Fix a genksyms bug when including enum constants from *.symref files
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmhEZc4VHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGVAgQAKLRdBGga1kBJJFIkUOHWC5+g/je
U/dO5rGnuOLviWDexC6QT8AQV2N+dQXhB11x+KacSu1bwowsEvwuegtA6VqwbETs
tyWmB0PftEzVyPfc+Rjfy0LDfKkiKkm4RhXiMwcem/rlw45gvJXrVU7jJin9fI3A
So8glpOAX+mEizUHkjZkS51nkYCZFDsn7hVo0X43vqjeFrrFGLEQ5xas4Ci+dkY3
9g8Q5bFL8CC5PHjSO8wFftCcAWwTukAht6CSSb522MKGnCVZ9RxTmRwEPXrBmXtS
5eWa8yg6y0tFVmot8iwZGBYleAWDNsj0a2j2oVjUN+EF91sk3WQApJVNBok/nQFb
4MgO3N3UXZdy4tYkBX8tMgOcGkfjZAFoNxSUm5oVouh9NyT0dpqYHhJHBNVbVJoF
igQWeVOYcioDjeU1iXnP2cw64q44ROfxmOpDxOSRz9PTM6CCya1R0m/zzBLV6Lwk
rzlXk1LLf+jIfgmS5RLlkCgrXS1U0vNGXxQH9Ui9dZSEtzdU7qt5WQ/Rz44bEBhS
OeIlJfMMx6QYJztJc/BaUjkKsutTkII52QctRbRCj/nKswHd8SnHV+xk1c2WPxrg
yKq10rPpdg1BcvmODY6cmcndt7ogDRfkogm2gvGQIBZEglRimpmpg51sZQRD0ueE
0rt12TmktsLbglB4
=Dy49
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which
exports a symbol only to specified modules
- Improve ABI handling in gendwarfksyms
- Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n
- Add checkers for redundant or missing <linux/export.h> inclusion
- Deprecate the extra-y syntax
- Fix a genksyms bug when including enum constants from *.symref files
* tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
genksyms: Fix enum consts from a reference affecting new values
arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds
kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES}
efi/libstub: use 'targets' instead of extra-y in Makefile
module: make __mod_device_table__* symbols static
scripts/misc-check: check unnecessary #include <linux/export.h> when W=1
scripts/misc-check: check missing #include <linux/export.h> when W=1
scripts/misc-check: add double-quotes to satisfy shellcheck
kbuild: move W=1 check for scripts/misc-check to top-level Makefile
scripts/tags.sh: allow to use alternative ctags implementation
kconfig: introduce menu type enum
docs: symbol-namespaces: fix reST warning with literal block
kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n
tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
docs/core-api/symbol-namespaces: drop table of contents and section numbering
modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time
kbuild: move kbuild syntax processing to scripts/Makefile.build
Makefile: remove dependency on archscripts for header installation
Documentation/kbuild: Add new gendwarfksyms kABI rules
Documentation/kbuild: Drop section numbers
...
Failures inside codetag_load_module() are currently ignored. As a result
an error there would not cause a module load failure and freeing of the
associated resources. Correct this behavior by propagating the error code
to the caller and handling possible errors. With this change, error to
allocate percpu counters, which happens at this stage, will not be ignored
and will cause a module load failure and freeing of resources. With this
change we also do not need to disable memory allocation profiling when
this error happens, instead we fail to load the module.
Link: https://lkml.kernel.org/r/20250521160602.1940771-1-surenb@google.com
Fixes: 10075262888b ("alloc_tag: allocate percpu counters for module tags dynamically")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: Casey Chen <cachen@purestorage.com>
Closes: https://lore.kernel.org/all/20250520231620.15259-1-cachen@purestorage.com/
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: David Wang <00107082@163.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Luis Chamberalin <mcgrof@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Make .static_call_sites in modules read-only after init
The .static_call_sites sections in modules have been made read-only after
init to avoid any (non-)accidental modifications, similarly to how they
are read-only after init in vmlinux.
- The rest are minor cleanups.
The changes have been on linux-next for 2 months, with the exception of the
last comment-only cleanup.
As discussed previously, we rotate module maintainership among its
co-maintainers every 6 months. Daniel Gomez is next in line and he will
send the next pull request for the modules.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEIduBR9MnFA82q/jtumpXJwqY6poFAmg9tSoUHHBldHIucGF2
bHVAc3VzZS5jb20ACgkQumpXJwqY6poY/gf9GgexhRSNp10orTltOFD9+c+g6ekn
q4UiLK7ooz4mFqPN4ljrSxvF1IYkjgrkkPDDEPZEZl2bQ7yWFb1xO9csihZeSI0l
Bf5m+Pxo+U2ylNr6wEsOS7P5GntraTkLJ4NHje/bYmIQMAjiowniTF3693FYs51d
bCD4Xn0zzZi+eekoAi/34GL5Zzkk5jhr2nBBvUZ3mXrRlR/iNpwUd4IWxb1Ujbfh
9YjAAGbeBowHzsyfPKoccJ/ybosrUKGqpVrp4KPe0JVoCvw8nwkxAfHGhsjjVk10
G0l4eCZ8KW87rPoJw5HvmTTNIxM10xKuZ4fa67/9ou+kHWRzMSa6ub4Jiw==
=JU3a
-----END PGP SIGNATURE-----
Merge tag 'modules-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux
Pull module updates from Petr Pavlu:
- Make .static_call_sites in modules read-only after init
The .static_call_sites sections in modules have been made read-only
after init to avoid any (non-)accidental modifications, similarly to
how they are read-only after init in vmlinux
- The rest are minor cleanups
* tag 'modules-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
module: Remove outdated comment about text_size
module: Make .static_call_sites read-only after init
module: Add a separate function to mark sections as read-only after init
module: Constify parameters of module_enforce_rwx_sections()
or aren't considered necessary for -stable kernels. 19 are for MM.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDLNqwAKCRDdBJ7gKXxA
juanAQD4aZn7ACTpbIgDIlLVJouq6OOHEYye9hhxz19UN2mAUgEAn8jPqvBDav3S
HxjMFSdgLUQVO03FCs9tpNJchi69nw0=
=R3UI
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"22 hotfixes.
13 are cc:stable and the remainder address post-6.14 issues or aren't
considered necessary for -stable kernels. 19 are for MM"
* tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
mailmap: add Jarkko's employer email address
mm: fix copy_vma() error handling for hugetlb mappings
memcg: always call cond_resched() after fn()
mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
mm: vmalloc: only zero-init on vrealloc shrink
mm: vmalloc: actually use the in-place vrealloc region
alloc_tag: allocate percpu counters for module tags dynamically
module: release codetag section when module load fails
mm/cma: make detection of highmem_start more robust
MAINTAINERS: add mm memory policy section
MAINTAINERS: add mm ksm section
kasan: avoid sleepable page allocation from atomic context
highmem: add folio_test_partial_kmap()
MAINTAINERS: add hung-task detector section
taskstats: fix struct taskstats breaks backward compatibility since version 15
mm/truncate: fix out-of-bounds when doing a right-aligned split
MAINTAINERS: add mm reclaim section
MAINTAINERS: update page allocator section
mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=y
mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled
...
Sean noted that scripts/Makefile.lib:name-fix-token rule will mangle
the module name with s/-/_/g.
Since this happens late in the build, only the kernel needs to bother
with this, the modpost tool still sees the original name.
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Instead of only accepting "module:${name}", extend it with a comma
separated list of module names and add tail glob support.
That is, something like: "module:foo-*,bar" is now possible.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Designate the "module:${modname}" symbol namespace to mean: 'only
export to the named module'.
Notably, explicit imports of anything in the "module:" space is
forbidden.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When module load fails after memory for codetag section is ready, codetag
section memory will not be properly released. This causes memory leak,
and if next module load happens to get the same module address, codetag
may pick the uninitialized section when manipulating tags during module
unload, and leads to "unable to handle page fault" BUG.
Link: https://lkml.kernel.org/r/20250519163823.7540-1-00107082@163.com
Fixes: 0db6f8d782 ("alloc_tag: load module tags into separate contiguous memory")
Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/
Signed-off-by: David Wang <00107082@163.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The text_size bit referred to by the comment has been removed as of commit
ac3b432839 ("module: replace module_layout with module_memory")
and is thus no longer relevant. Remove it and comment about the contents of
the masks array instead.
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/20250429113242.998312-23-vschneid@redhat.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Section .static_call_sites holds data structures that need to be sorted and
processed only at module load time. This initial processing happens in
static_call_add_module(), which is invoked as a callback to the
MODULE_STATE_COMING notification from prepare_coming_module().
The section is never modified afterwards. Make it therefore read-only after
module initialization to avoid any (non-)accidental modifications.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20250306131430.7016-4-petr.pavlu@suse.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Move the logic to mark special sections as read-only after module
initialization into a separate function, along other related code in
strict_rwx.c. Use a table with names of such sections to make it easier to
add more.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20250306131430.7016-3-petr.pavlu@suse.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Minor cleanup, this is a non-functional change.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20250306131430.7016-2-petr.pavlu@suse.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
With CONFIG_GENDWARFKSYMS, __gendwarfksyms_ptr variables are
added to the kernel in EXPORT_SYMBOL() to ensure DWARF type
information is available for exported symbols in the TUs where
they're actually exported. These symbols are dropped when linking
vmlinux, but dangling references to them remain in DWARF.
With CONFIG_DEBUG_INFO_BTF enabled on X86, pahole versions after
commit 47dcb534e253 ("btf_encoder: Stop indexing symbols for
VARs") and before commit 9810758003ce ("btf_encoder: Verify 0
address DWARF variables are in ELF section") place these symbols
in the .data..percpu section, which results in an "Invalid
offset" error in btf_datasec_check_meta() during boot, as all
the variables are at zero offset and have non-zero size. If
CONFIG_DEBUG_INFO_BTF_MODULES is enabled, this also results in a
failure to load modules with:
failed to validate module [$module] BTF: -22
As the issue occurs in pahole v1.28 and the fix was merged
after v1.29 was released, require pahole <v1.28 or >v1.29 when
GENDWARFKSYMS is enabled with DEBUG_INFO_BTF on X86.
Reported-by: Paolo Pisati <paolo.pisati@canonical.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
- Restructure the persistent memory to have a "scratch" area
Instead of hard coding the KASLR offset in the persistent memory
by the ring buffer, push that work up to the callers of the persistent
memory as they are the ones that need this information. The offsets
and such is not important to the ring buffer logic and it should
not be part of that.
A scratch pad is now created when the caller allocates a ring buffer
from persistent memory by stating how much memory it needs to save.
- Allow where modules are loaded to be saved in the new scratch pad
Save the addresses of modules when they are loaded into the persistent
memory scratch pad.
- A new module_for_each_mod() helper function was created
With the acknowledgement of the module maintainers a new module helper
function was created to iterate over all the currently loaded modules.
This has a callback to be called for each module. This is needed for
when tracing is started in the persistent buffer and the currently loaded
modules need to be saved in the scratch area.
- Expose the last boot information where the kernel and modules were loaded
The last_boot_info file is updated to print out the addresses of where
the kernel "_text" location was loaded from a previous boot, as well
as where the modules are loaded. If the buffer is recording the current
boot, it only prints "# Current" so that it does not expose the KASLR
offset of the currently running kernel.
- Allow the persistent ring buffer to be released (freed)
To have this in production environments, where the kernel command line can
not be changed easily, the ring buffer needs to be freed when it is not
going to be used. The memory for the buffer will always be allocated at
boot up, but if the system isn't going to enable tracing, the memory needs
to be freed. Allow it to be freed and added back to the kernel memory
pool.
- Allow stack traces to print the function names in the persistent buffer
Now that the modules are saved in the persistent ring buffer, if the same
modules are loaded, the printing of the function names will examine the
saved modules. If the module is found in the scratch area and is also
loaded, then it will do the offset shift and use kallsyms to display the
function name. If the address is not found, it simply displays the address
from the previous boot in hex.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+cUERQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qrAsAQCFt2nfzxoe3wtF5EqIT1VHp/8bQVjG
gBe8B6ouboreogD/dS7yK8MRy24ZAmObGwYG0RbVicd50S7P8Rf7+823ng8=
=OJKk
-----END PGP SIGNATURE-----
Merge tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt:
- Restructure the persistent memory to have a "scratch" area
Instead of hard coding the KASLR offset in the persistent memory by
the ring buffer, push that work up to the callers of the persistent
memory as they are the ones that need this information. The offsets
and such is not important to the ring buffer logic and it should not
be part of that.
A scratch pad is now created when the caller allocates a ring buffer
from persistent memory by stating how much memory it needs to save.
- Allow where modules are loaded to be saved in the new scratch pad
Save the addresses of modules when they are loaded into the
persistent memory scratch pad.
- A new module_for_each_mod() helper function was created
With the acknowledgement of the module maintainers a new module
helper function was created to iterate over all the currently loaded
modules. This has a callback to be called for each module. This is
needed for when tracing is started in the persistent buffer and the
currently loaded modules need to be saved in the scratch area.
- Expose the last boot information where the kernel and modules were
loaded
The last_boot_info file is updated to print out the addresses of
where the kernel "_text" location was loaded from a previous boot, as
well as where the modules are loaded. If the buffer is recording the
current boot, it only prints "# Current" so that it does not expose
the KASLR offset of the currently running kernel.
- Allow the persistent ring buffer to be released (freed)
To have this in production environments, where the kernel command
line can not be changed easily, the ring buffer needs to be freed
when it is not going to be used. The memory for the buffer will
always be allocated at boot up, but if the system isn't going to
enable tracing, the memory needs to be freed. Allow it to be freed
and added back to the kernel memory pool.
- Allow stack traces to print the function names in the persistent
buffer
Now that the modules are saved in the persistent ring buffer, if the
same modules are loaded, the printing of the function names will
examine the saved modules. If the module is found in the scratch area
and is also loaded, then it will do the offset shift and use kallsyms
to display the function name. If the address is not found, it simply
displays the address from the previous boot in hex.
* tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Use _text and the kernel offset in last_boot_info
tracing: Show last module text symbols in the stacktrace
ring-buffer: Remove the unused variable bmeta
tracing: Skip update_last_data() if cleared and remove active check for save_mod()
tracing: Initialize scratch_size to zero to prevent UB
tracing: Fix a compilation error without CONFIG_MODULES
tracing: Freeable reserved ring buffer
mm/memblock: Add reserved memory release function
tracing: Update modules to persistent instances when loaded
tracing: Show module names and addresses of last boot
tracing: Have persistent trace instances save module addresses
module: Add module_for_each_mod() function
tracing: Have persistent trace instances save KASLR offset
ring-buffer: Add ring_buffer_meta_scratch()
ring-buffer: Add buffer meta data for persistent ring buffer
ring-buffer: Use kaslr address instead of text delta
ring-buffer: Fix bytes_dropped calculation issue
- Use RCU instead of RCU-sched
The mix of rcu_read_lock(), rcu_read_lock_sched() and preempt_disable()
in the module code and its users has been replaced with just
rcu_read_lock().
- The rest of changes are smaller fixes and updates.
The changes have been on linux-next for at least 2 weeks, with the RCU
cleanup present for 2 months. One performance problem was reported with the
RCU change when KASAN + lockdep were enabled, but it was effectively
addressed by the already merged ee57ab5a32 ("locking/lockdep: Disable
KASAN instrumentation of lockdep.c").
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEIduBR9MnFA82q/jtumpXJwqY6poFAmfmwrsUHHBldHIucGF2
bHVAc3VzZS5jb20ACgkQumpXJwqY6prWxgf/S7Pvdywm10vJ6fooYa+GxXNMwhyh
XRjZ4m9gjeTNf2KLwX0XHv0XZeFHOmHfjd3iI+pS6CXZnCFTN9J3XPLYsrTxXUb6
U6zzLf8Zsz8TzeI4dgvSBsZln7oICSACkAgdJCq23hpNKeaeRo91dgiZaIwyZJG3
FekqSFtP7pYhfFoNkrFKysqbgl1+RWWZ79L2qRJA0bPzVFlvRUuh6cOHQw+8RMqf
BYLwnArjTkW8AcXpxIXSiwphDHVZ81B96xoplavyoprA5FDpv1W+8y4DtxdWFn+1
QVWCs/ZV3KrwXWpZev625w3fIOOIXILqRINOzLfvXTw+1xFS3TzSQEpVeg==
=4OKc
-----END PGP SIGNATURE-----
Merge tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux
Pull modules updates from Petr Pavlu:
- Use RCU instead of RCU-sched
The mix of rcu_read_lock(), rcu_read_lock_sched() and
preempt_disable() in the module code and its users has
been replaced with just rcu_read_lock()
- The rest of changes are smaller fixes and updates
* tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (32 commits)
MAINTAINERS: Update the MODULE SUPPORT section
module: Remove unnecessary size argument when calling strscpy()
module: Replace deprecated strncpy() with strscpy()
params: Annotate struct module_param_attrs with __counted_by()
bug: Use RCU instead RCU-sched to protect module_bug_list.
static_call: Use RCU in all users of __module_text_address().
kprobes: Use RCU in all users of __module_text_address().
bpf: Use RCU in all users of __module_text_address().
jump_label: Use RCU in all users of __module_text_address().
jump_label: Use RCU in all users of __module_address().
x86: Use RCU in all users of __module_address().
cfi: Use RCU while invoking __module_address().
powerpc/ftrace: Use RCU in all users of __module_text_address().
LoongArch: ftrace: Use RCU in all users of __module_text_address().
LoongArch/orc: Use RCU in all users of __module_address().
arm64: module: Use RCU in all users of __module_text_address().
ARM: module: Use RCU in all users of __module_text_address().
module: Use RCU in all users of __module_text_address().
module: Use RCU in all users of __module_address().
module: Use RCU in search_module_extables().
...
The tracing system needs a way to save all the currently loaded modules
and their addresses into persistent memory so that it can evaluate the
addresses on a reboot from a crash. When the persistent memory trace
starts, it will load the module addresses and names into the persistent
memory. To do so, it will call the module_for_each_mod() function and pass
it a function and data structure to get called on each loaded module. Then
it can record the memory.
This only implements that function.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: linux-modules@vger.kernel.org
Link: https://lore.kernel.org/20250305164608.962615966@goodmis.org
Acked-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The size parameter is optional and strscpy() automatically determines
the length of the destination buffer using sizeof() if the argument is
omitted. This makes the explicit sizeof() unnecessary. Remove it to
shorten and simplify the code.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20250308194631.191670-2-thorsten.blum@linux.dev
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
__module_text_address() can be invoked within a RCU section, there is no
requirement to have preemption disabled.
Replace the preempt_disable() section around __module_text_address()
with RCU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-16-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
__module_address() can be invoked within a RCU section, there is no
requirement to have preemption disabled.
Replace the preempt_disable() section around __module_address() with
RCU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-15-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
search_module_extables() returns an exception_table_entry belonging to a
module. The lookup via __module_address() can be performed with RCU
protection.
The returned exception_table_entry remains valid because the passed
address usually belongs to a module that is currently executed. So the
module can not be removed because "something else" holds a reference to
it, ensuring that it can not be removed.
Exceptions here are:
- kprobe, acquires a reference on the module beforehand
- MCE, invokes the function from within a timer and the RCU lifetime
guarantees (of the timer) are sufficient.
Therefore it is safe to return the exception_table_entry outside the RCU
section which provided the module.
Use RCU for the lookup in search_module_extables() and update the
comment.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-14-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
mod_find() uses either the modules list to find a module or a tree
lookup (CONFIG_MODULES_TREE_LOOKUP). The list and the tree can both be
iterated under RCU assumption (as well as RCU-sched).
Remove module_assert_mutex_or_preempt() from __module_address() and
entirely since __module_address() is the last user.
Update comments.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-13-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
The modules list can be accessed under RCU assumption.
Use RCU protection instead preempt_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-12-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
module_assert_mutex_or_preempt() is not needed in find_symbol(). The
function checks for RCU-sched or the module_mutex to be acquired. The
list_for_each_entry_rcu() below does the same check.
Remove module_assert_mutex_or_preempt() from try_add_tainted_module().
Use RCU protection to invoke find_symbol() and update callers.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-11-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
module_assert_mutex_or_preempt() is not needed in
try_add_tainted_module(). The function checks for RCU-sched or the
module_mutex to be acquired. The list_for_each_entry_rcu() below does
the same check.
Remove module_assert_mutex_or_preempt() from try_add_tainted_module().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-10-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
module::kallsyms can be accessed under RCU assumption.
Use rcu_dereference() to access module::kallsyms.
Update callers.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-9-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
module::kallsyms can be accessed under RCU assumption.
Use rcu_dereference() to access module::kallsyms.
Update callers.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-8-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
The modules list and module::kallsyms can be accessed under RCU
assumption.
Remove module_assert_mutex_or_preempt() from find_module_all() so it can
be used under RCU protection without warnings. Update its callers to use
RCU protection instead of preempt_disable().
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-trace-kernel@vger.kernel.org
Cc: live-patching@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20250108090457.512198-7-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
The modules list and module::kallsyms can be accessed under RCU
assumption.
Iterate the modules with RCU protection, use rcu_dereference() to access
the kallsyms pointer.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-6-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
The modules list and module::kallsyms can be accessed under RCU
assumption.
Use rcu_dereference() to reference the kallsyms pointer in
find_kallsyms_symbol(). Use a RCU section instead of preempt_disable in
callers of find_kallsyms_symbol(). Keep the preempt-disable in
module_address_lookup() due to __module_address().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-5-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
add_kallsyms() assigns the RCU pointer module::kallsyms and setups the
structures behind it which point to init-data. The module was not
published yet, nothing can see the kallsyms pointer and the data behind
it. Also module's init function was not yet invoked.
There is no need to use rcu_dereference() here, it is just to keep
checkers quiet. The whole RCU read section is also not needed.
Use a local kallsyms pointer and setup the data structures. Assign that
pointer to the data structure at the end via rcu_assign_pointer().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-4-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
The RCU usage in module was introduced in commit d72b37513c ("Remove
stop_machine during module load v2") and it claimed not to be RCU but
similar. Then there was another improvement in commit e91defa26c
("module: don't use stop_machine on module load"). It become a mix of
RCU and RCU-sched and was eventually fixed 0be964be0d ("module:
Sanitize RCU usage and locking"). Later RCU & RCU-sched was merged in
commit cb2f55369d ("modules: Replace synchronize_sched() and
call_rcu_sched()") so that was aligned.
Looking at it today, there is still leftovers. The preempt_disable() was
used instead rcu_read_lock_sched(). The RCU & RCU-sched merge was not
complete as there is still rcu_dereference_sched() for module::kallsyms.
The RCU-list modules and unloaded_tainted_modules are always accessed
under RCU protection or the module_mutex. The modules list iteration can
always happen safely because the module will not disappear.
Once the module is removed (free_module()) then after removing the
module from the list, there is a synchronize_rcu() which waits until
every RCU reader left the section. That means iterating over the list
within a RCU-read section is enough, there is no need to disable
preemption. module::kallsyms is first assigned in add_kallsyms() before
the module is added to the list. At this point, it points to init data.
This pointer is later updated and before the init code is removed there
is also synchronize_rcu() in do_free_init(). That means A RCU read lock
is enough for protection and rcu_dereference() can be safely used.
Convert module code and its users step by step. Update comments and
convert print_modules() to use RCU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-3-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
The ROX memory allocations are part of a larger vmalloc allocation and
annotating them with kmemleak_not_leak() confuses kmemleak.
Skip kmemleak_not_leak() annotations for the ROX areas.
Fixes: c287c07233 ("module: switch to execmem API for remapping as RW and restoring ROX")
Fixes: 64f6a4e10c ("x86: re-enable EXECMEM_ROX support")
Reported-by: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250214084531.3299390-1-rppt@kernel.org