Commit Graph

100712 Commits

Author SHA1 Message Date
Stefan Metzmacher
b126645b79 smb: client: remove unused enum smbd_connection_status
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:18:58 -05:00
Stefan Metzmacher
61b4918e4e smb: client: make use of smbdirect_socket.recv_io.reassembly.*
This will be used by the server too and will allow us to
create common helper functions.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:14 -05:00
Stefan Metzmacher
b7ffb4d2a0 smb: smbdirect: introduce smbdirect_socket.recv_io.reassembly.*
This will be used in common between client and server soon.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
5950045084 smb: client: make use of smb: smbdirect_socket.recv_io.free.{list,lock}
This will be used by the server too in order to have common
helper functions in future.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
d0df32a302 smb: smbdirect: introduce smbdirect_socket.recv_io.free.{list,lock}
This will allow the list of free smbdirect_recv_io messages including
the spinlock to be in common between client and server in order
to split out common helper functions in future.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
5dddf04974 smb: client: make use of struct smbdirect_recv_io
This is the shared structure that will be used in
the server too and will allow us to move helper functions
into common code soon.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
60812d20da smb: smbdirect: introduce struct smbdirect_recv_io
This will be used in client and server soon
in order to replace smbd_response/smb_direct_recvmsg.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
bbdbd9ae47 smb: client: make use of smbdirect_socket->recv_io.expected
The expected incoming message type can be per connection.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
33dd53a90e smb: smbdirect: introduce smbdirect_socket.recv_io.expected
The expected message type can be global as they never change
during the after negotiation process.

This will replace smbd_response->type and smb_direct_recvmsg->type
in future.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
0edf9fc0a3 smb: client: remove unused smbd_connection->fragment_reassembly_remaining
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
24eff17887 smb: client: let recv_done() avoid touching data_transfer after cleanup/move
Calling enqueue_reassembly() and wake_up_interruptible(&info->wait_reassembly_queue)
or put_receive_buffer() means the response/data_transfer pointer might
get re-used by another thread, which means these should be
the last operations before calling return.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
bdd7afc6dc smb: client: let recv_done() cleanup before notifying the callers.
We should call put_receive_buffer() before waking up the callers.

For the internal error case of response->type being unexpected,
we now also call smbd_disconnect_rdma_connection() instead
of not waking up the callers at all.

Note that the SMBD_TRANSFER_DATA case still has problems,
which will be addressed in the next commit in order to make
it easier to review this one.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
047682c370 smb: client: make sure we call ib_dma_unmap_single() only if we called ib_dma_map_single already
In case of failures either ib_dma_map_single() might not be called yet
or ib_dma_unmap_single() was already called.

We should make sure put_receive_buffer() only calls
ib_dma_unmap_single() if needed.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
24b6afc36d smb: client: remove separate empty_packet_queue
There's no need to maintain two lists, we can just
have a single list of receive buffers, which are free to use.

It just added unneeded complexity and resulted in
ib_dma_unmap_single() not being called from recv_done()
for empty keepalive packets.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Stefan Metzmacher
5349ae5e05 smb: client: let send_done() cleanup before calling smbd_disconnect_rdma_connection()
We should call ib_dma_unmap_single() and mempool_free() before calling
smbd_disconnect_rdma_connection().

And smbd_disconnect_rdma_connection() needs to be the last function to
call as all other state might already be gone after it returns.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:13 -05:00
Yunseong Kim
b0b73329eb cifs: Fix null-ptr-deref by static initializing global lock
A kernel panic can be triggered by reading /proc/fs/cifs/debug_dirs.
The crash is a null-ptr-deref inside spin_lock(), caused by the use of the
uninitialized global spinlock cifs_tcp_ses_lock.

init_cifs()
 └── cifs_proc_init()
      └── // User can access /proc/fs/cifs/debug_dirs here
           └── cifs_debug_dirs_proc_show()
                └── spin_lock(&cifs_tcp_ses_lock); // Uninitialized!

KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[dfff800000000000] address between user and kernel address ranges
Internal error: Oops: 0000000096000005 [#1] SMP
Modules linked in:
CPU: 3 UID: 0 PID: 16435 Comm: stress-ng-procf Not tainted 6.16.0-10385-g79f14b5d84c6 #37 PREEMPT
Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8ubuntu1 06/11/2025
pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : do_raw_spin_lock+0x84/0x2cc
lr : _raw_spin_lock+0x24/0x34
sp : ffff8000966477e0
x29: ffff800096647860 x28: ffff800096647b88 x27: ffff0001c0c22070
x26: ffff0003eb2b60c8 x25: ffff0001c0c22018 x24: dfff800000000000
x23: ffff0000f624e000 x22: ffff0003eb2b6020 x21: ffff0000f624e768
x20: 0000000000000004 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000000 x16: ffff8000804b9600 x15: ffff700012cc8f04
x14: 1ffff00012cc8f04 x13: 0000000000000004 x12: ffffffffffffffff
x11: 1ffff00012cc8f00 x10: ffff80008d9af0d2 x9 : f3f3f304f1f1f1f1
x8 : 0000000000000000 x7 : 7365733c203e6469 x6 : 20656572743c2023
x5 : ffff0000e0ce0044 x4 : ffff80008a4deb6e x3 : ffff8000804b9718
x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
do_raw_spin_lock+0x84/0x2cc (P)
_raw_spin_lock+0x24/0x34
cifs_debug_dirs_proc_show+0x1ac/0x4c0
seq_read_iter+0x3b0/0xc28
proc_reg_read_iter+0x178/0x2a8
vfs_read+0x5f8/0x88c
ksys_read+0x120/0x210
__arm64_sys_read+0x7c/0x90
invoke_syscall+0x98/0x2b8
el0_svc_common+0x130/0x23c
do_el0_svc+0x48/0x58
el0_svc+0x40/0x140
el0t_64_sync_handler+0x84/0x12c
el0t_64_sync+0x1ac/0x1b0
Code: aa0003f3 f9000feb f2fe7e69 f8386969 (38f86908)
---[ end trace 0000000000000000 ]---

The root cause is an initialization order problem. The lock is declared
as a global variable and intended to be initialized during module startup.
However, the procfs entry that uses this lock can be accessed by userspace
before the spin_lock_init() call has run. This creates a race window where
reading the proc file will attempt to use the lock before it is
initialized, leading to the crash.

For a global lock with a static lifetime, the correct and robust approach
is to use compile-time initialization.

Fixes: 844e5c0eb1 ("smb3 client: add way to show directory leases for improved debugging")
Signed-off-by: Yunseong Kim <ysk@kzalloc.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 15:04:00 -05:00
Stefan Metzmacher
a6c015b7ac smb: server: let recv_done() avoid touching data_transfer after cleanup/move
Calling enqueue_reassembly() and wake_up_interruptible(&t->wait_reassembly_queue)
or put_receive_buffer() means the recvmsg/data_transfer pointer might
get re-used by another thread, which means these should be
the last operations before calling return.

Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 14:09:57 -05:00
Stefan Metzmacher
cfe76fdbb9 smb: server: let recv_done() consistently call put_recvmsg/smb_direct_disconnect_rdma_connection
We should call put_recvmsg() before smb_direct_disconnect_rdma_connection()
in order to call it before waking up the callers.

In all error cases we should call smb_direct_disconnect_rdma_connection()
in order to avoid stale connections.

Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 14:09:57 -05:00
Stefan Metzmacher
afb4108c92 smb: server: make sure we call ib_dma_unmap_single() only if we called ib_dma_map_single already
In case of failures either ib_dma_map_single() might not be called yet
or ib_dma_unmap_single() was already called.

We should make sure put_recvmsg() only calls ib_dma_unmap_single() if needed.

Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 14:09:57 -05:00
Stefan Metzmacher
01027a62b5 smb: server: remove separate empty_recvmsg_queue
There's no need to maintain two lists, we can just
have a single list of receive buffers, which are free to use.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 14:09:57 -05:00
Steve French
1cdd5a2626 cifs: Move the SMB1 transport code out of transport.c
Shrink the size of cifs.ko when SMB1 is not enabled in the config
by moving the SMB1 transport code to different file.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06 12:01:54 -05:00
Linus Torvalds
cca7a0aae8 for-6.17-fix-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmiTNoEACgkQxWXV+ddt
 WDuo3Q/+PPl1TH5+8A9zp+Aq0teDiF4I4pfpFRcoy+18lvNiPj8zFMOelOoOXczb
 rpYegaAjTrtZFlk9lVChTI4fSlAUkhPjgzELMWmpQsTSiYqGJxLO7B3BJEBqvuEt
 HDR0l1yBOW30EFszc8LKDXFj6IPI78xtQ2QyzFOcs0Xk9RVO2e8NpTFeroSr/bou
 uORWjZKOpQwPMLHjhZmhg/iAO+BM4hdV0MP9mQL4p4fAEfm0IKkRvQC/IQitSmHD
 e2WB9GsnFn6DxkEy3AkqZyyKwEfWNTIo+af4xSkbfOIn1KRtvABCXRtvb7Iuc31z
 TjPTubCsVIa48MeYkK+Qj7roMbjpcIIPciSgQWnEF/OTaiEZtPgbJJUYtQbtg/6o
 IglLquGn96k10sWIBEaiqBbmTwDOEaPLrrq75PeN5F6BENGAGdNr19B+5UXPvGpp
 f5XBoSKrgvFmtEzhSzkOk9obJvUaY5060JMVoymUopOXQdi243gAzGJaPLKugzDY
 qwRHe594DR438FFfDtqCs9s8+F23SL+mRWoD5H26H0fTgOe8hMc1sSELD99yZrRm
 4SGum5/7VjF422Q3lacHrdCaOWCFr3tbbK+6CYB975v0XwGIZj0Lh1Y7joDx0faW
 YHee1AcPw3zAqXXUcKv5aU3KcdsCE9xXlLAUDxVGxQzebh46heU=
 =tSdx
 -----END PGP SIGNATURE-----

Merge tag 'for-6.17-fix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "A single btrfs commit. It fixes a problem that people started to hit
  since 6.15.3 during log replay (e.g. after a crash).

  The bug is old but got more likely to happen since commit
  5e85262e54 ("btrfs: fix fsync of files with no hard links not
  persisting deletion") got backported to stable (6.15 only)"

* tag 'for-6.17-fix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix log tree replay failure due to file with 0 links and extents
2025-08-06 15:52:56 +03:00
Filipe Manana
0a32e4f002 btrfs: fix log tree replay failure due to file with 0 links and extents
If we log a new inode (not persisted in a past transaction) that has 0
links and extents, then log another inode with an higher inode number, we
end up with failing to replay the log tree with -EINVAL. The steps for
this are:

1) create new file A
2) write some data to file A
3) open an fd on file A
4) unlink file A
5) fsync file A using the previously open fd
6) create file B (has higher inode number than file A)
7) fsync file B
8) power fail before current transaction commits

Now when attempting to mount the fs, the log replay will fail with
-ENOENT at replay_one_extent() when attempting to replay the first
extent of file A. The failure comes when trying to open the inode for
file A in the subvolume tree, since it doesn't exist.

Before commit 5f61b96159 ("btrfs: fix inode lookup error handling
during log replay"), the returned error was -EIO instead of -ENOENT,
since we converted any errors when attempting to read an inode during
log replay to -EIO.

The reason for this is that the log replay procedure fails to ignore
the current inode when we are at the stage LOG_WALK_REPLAY_ALL, our
current inode has 0 links and last inode we processed in the previous
stage has a non 0 link count. In other words, the issue is that at
replay_one_extent() we only update wc->ignore_cur_inode if the current
replay stage is LOG_WALK_REPLAY_INODES.

Fix this by updating wc->ignore_cur_inode whenever we find an inode item
regardless of the current replay stage. This is a simple solution and easy
to backport, but later we can do other alternatives like avoid logging
extents or inode items other than the inode item for inodes with a link
count of 0.

The problem with the wc->ignore_cur_inode logic has been around since
commit f2d72f42d5 ("Btrfs: fix warning when replaying log after fsync
of a tmpfile") but it only became frequent to hit since the more recent
commit 5e85262e54 ("btrfs: fix fsync of files with no hard links not
persisting deletion"), because we stopped skipping inodes with a link
count of 0 when logging, while before the problem would only be triggered
if trying to replay a log tree created with an older kernel which has a
logged inode with 0 links.

A test case for fstests will be submitted soon.

Reported-by: Peter Jung <ptr1337@cachyos.org>
Link: https://lore.kernel.org/linux-btrfs/fce139db-4458-4788-bb97-c29acf6cb1df@cachyos.org/
Reported-by: burneddi <burneddi@protonmail.com>
Link: https://lore.kernel.org/linux-btrfs/lh4W-Lwc0Mbk-QvBhhQyZxf6VbM3E8VtIvU3fPIQgweP_Q1n7wtlUZQc33sYlCKYd-o6rryJQfhHaNAOWWRKxpAXhM8NZPojzsJPyHMf2qY=@protonmail.com/#t
Reported-by: Russell Haley <yumpusamongus@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/598ecc75-eb80-41b3-83c2-f2317fbb9864@gmail.com/
Fixes: f2d72f42d5 ("Btrfs: fix warning when replaying log after fsync of a tmpfile")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-06 13:01:38 +02:00
Trond Myklebust
4ec752ce6d NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
Use store_release_wake_up() instead of wake_up_var_locked(), because the
waiter cannot retake the nfs_uuid->lock.

Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Suggested-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/all/175262948827.2234665.1891349021754495573@noble.neil.brown.name/
Fixes: 21fb440346 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-05 16:45:40 -07:00
Trond Myklebust
fdd015de76 NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
In order for the wait in nfs_uuid_put() to be safe, it is necessary to
ensure that nfs_uuid_add_file() doesn't add a new entry once the
nfs_uuid->net has been NULLed out.

Also fix up the wake_up_var_locked() / wait_var_event_spinlock() to both
use the nfs_uuid address, since nfl, and &nfl->uuid could be used elsewhere.

Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/all/175262893035.2234665.1735173020338594784@noble.neil.brown.name/
Fixes: 21fb440346 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-05 16:45:40 -07:00
Trond Myklebust
e144d53cf2 NFS/localio: nfs_close_local_fh() fix check for file closed
If the struct nfs_file_localio is closed, its list entry will be empty,
but the nfs_uuid->files list might still contain other entries.

Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Fixes: 21fb440346 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-05 16:45:39 -07:00
Jinjiang Tu
aa5a10b070 fs/proc/task_mmu: hold PTL in pagemap_hugetlb_range and gather_hugetlb_stats
Hold PTL in pagemap_hugetlb_range() and gather_hugetlb_stats() to avoid
operating on stale page, as pagemap_pmd_range() and gather_pte_stats()
have done.

Link: https://lkml.kernel.org/r/20250724090958.455887-3-tujinjiang@huawei.com
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Brahmajit Das <brahmajit.xyz@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-05 13:38:39 -07:00
Jinjiang Tu
45d19b4b6c mm/smaps: fix race between smaps_hugetlb_range and migration
smaps_hugetlb_range() handles the pte without holdling ptl, and may be
concurrenct with migration, leaing to BUG_ON in pfn_swap_entry_to_page(). 
The race is as follows.

smaps_hugetlb_range              migrate_pages
  huge_ptep_get
                                   remove_migration_ptes
				   folio_unlock
  pfn_swap_entry_folio
    BUG_ON

To fix it, hold ptl lock in smaps_hugetlb_range().

Link: https://lkml.kernel.org/r/20250724090958.455887-1-tujinjiang@huawei.com
Link: https://lkml.kernel.org/r/20250724090958.455887-2-tujinjiang@huawei.com
Fixes: 25ee01a2fc ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Brahmajit Das <brahmajit.xyz@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-05 13:38:39 -07:00
Wang Zhaolong
3fd8ec2fc9 smb: client: smb: client: eliminate mid_flags field
This is step 3/4 of a patch series to fix mid_q_entry memory leaks
caused by race conditions in callback execution.

Replace the mid_flags bitmask with dedicated boolean fields to
simplify locking logic and improve code readability:

- Replace MID_DELETED with bool deleted_from_q
- Replace MID_WAIT_CANCELLED with bool wait_cancelled
- Remove mid_flags field entirely

The new boolean fields have clearer semantics:
- deleted_from_q: whether mid has been removed from pending_mid_q
- wait_cancelled: whether request was cancelled during wait

This change reduces memory usage (from 4-byte bitmask to 2 boolean
flags) and eliminates confusion about which lock protects which
flag bits, preparing for per-mid locking in the next patch.

Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Acked-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-05 11:29:31 -05:00
Wang Zhaolong
9bd42798d5 smb: client: add mid_counter_lock to protect the mid counter counter
This is step 2/4 of a patch series to fix mid_q_entry memory leaks
caused by race conditions in callback execution.

Add a dedicated mid_counter_lock to protect current_mid counter,
separating it from mid_queue_lock which protects pending_mid_q
operations. This reduces lock contention and prepares for finer-
grained locking in subsequent patches.

Changes:
- Add TCP_Server_Info->mid_counter_lock spinlock
- Rename CurrentMid to current_mid for consistency
- Use mid_counter_lock to protect current_mid access
- Update locking documentation in cifsglob.h

This separation allows mid allocation to proceed without blocking
queue operations, improving performance under heavy load.

Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Acked-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-05 11:29:00 -05:00
Wang Zhaolong
f3ba7c9b04 smb: client: rename server mid_lock to mid_queue_lock
This is step 1/4 of a patch series to fix mid_q_entry memory leaks
caused by race conditions in callback execution.

The current mid_lock name is somewhat ambiguous about what it protects.
To prepare for splitting this lock into separate, more granular locks,
this patch renames mid_lock to mid_queue_lock to clearly indicate its
specific responsibility for protecting the pending_mid_q list and
related queue operations.

No functional changes are made in this patch - it only prepares the
codebase for the lock splitting that follows.

- mid_queue_lock for queue operations
- mid_counter_lock for mid counter operations
- per-mid locks for individual mid state management

Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Acked-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-05 11:25:48 -05:00
NeilBrown
e5a7315077 nfsd: avoid ref leak in nfsd_open_local_fh()
If two calls to nfsd_open_local_fh() race and both successfully call
nfsd_file_acquire_local(), they will both get an extra reference to the
net to accompany the file reference stored in *pnf.

One of them will fail to store (using xchg()) the file reference in
*pnf and will drop that reference but WON'T drop the accompanying
reference to the net.  This leak means that when the nfs server is shut
down it will hang in nfsd_shutdown_net() waiting for
&nn->nfsd_net_free_done.

This patch adds the missing nfsd_net_put().

Reported-by: Mike Snitzer <snitzer@kernel.org>
Fixes: e6f7e1487a ("nfs_localio: simplify interface to nfsd for getting nfsd_file")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neil@brown.name>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-08-05 10:31:33 -04:00
Jeff Layton
f9a348e0de nfsd: don't set the ctime on delegated atime updates
Clients will typically precede a DELEGRETURN for a delegation with
delegated timestamp with a SETATTR to set the timestamps on the server
to match what the client has.

knfsd implements this by using the nfsd_setattr() infrastructure, which
will set ATTR_CTIME on any update that goes to notify_change(). This is
problematic as it means that the client will get a spurious ctime
update when updating the atime.

POSIX unfortunately doesn't phrase it succinctly, but updating the atime
due to reads should not update the ctime. In this case, the client is
sending a SETATTR to update the atime on the server to match its latest
value. The ctime should not be advanced in this case as that would
incorrectly indicate a change to the inode.

Fix this by not implicitly setting ATTR_CTIME when ATTR_DELEG is set in
__nfsd_setattr(). The decoder for FATTR4_WORD2_TIME_DELEG_MODIFY already
sets ATTR_CTIME, so this is sufficient to make it skip setting the ctime
on atime-only updates.

Fixes: 7e13f4f8d2 ("nfsd: handle delegated timestamps in SETATTR")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-08-05 10:31:33 -04:00
Linus Torvalds
5998f2bca4 Description for this pull request:
- Use generic_write_sync instead of vfs_fsync_range in exfat_file_write_iter.
    It will fix an issue where fdatasync would be set incorrectly.
  - Fix potential infinite loop by the self-linked chain.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmiRpO8WHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCFP7EACKh6foC0CCO03dZv9aniy/qmRt
 4bOgJ4DihV9JncwIk+aFLdTEqZFOrR03DdmU5LeIBuHuycRhQ78X8NqHboN/yLS0
 2fR45B2pp6hToYMSlsw1++/k9pj2kCIEgGSBNUp+5esqHs+nkRKN/zFvOY6WU4VB
 whojNDdyXWo7nmVU1VhOpzVHp84CcY+68qscF/vxVuhWbAt4F8K/F9yLdjhXrU0F
 7gmOGUlfuXXk6ZbuSfrTJGVpsbfETUYmaRLfzsuV/O7p8+n3jaoaNDACQpZpYdZt
 PT7cZzIHr8NuWuUzD3DTDc2C6k3/SQA+C3fRmNCFgGtL34yBd/gU6yr+ob8YoMA1
 Vgvckzt+9ddA/vDnekAlMyqbCuXfzzIo2g8n9uU08VA5bEOvCE87dSKfJDL949Ru
 +8PEPwdlApIuYuNxm0X8cCK5EWHCJZFDxL/NEPjNNAQp5WGqUmMv+jPZANlT3Hob
 eCOy174y2ctqDWkkBZCs0+W5V87rfUZq737kooTwckPRZVZX2OhrnLItYZ13o+wE
 jJ8V05m6Tn7AEgFkcKaby+bFqG9WiFNL1+QsPIDehwAtB3EobkgZNZPjkgeO58z0
 PSa4TFhhZCArg4wwmWDBf3X7sHeHAYazSUwTa7DH7BzLRZ0KMQnbHzZTnkH4BBdx
 K8FERKYEPrDMHm8vfA==
 =wECt
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - Use generic_write_sync instead of vfs_fsync_range in exfat_file_write_iter.
   It will fix an issue where fdatasync would be set incorrectly.

 - Fix potential infinite loop by the self-linked chain.

* tag 'exfat-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: add cluster chain loop check for dir
  exfat: fdatasync flag should be same like generic_write_sync()
2025-08-05 16:37:05 +03:00
Paulo Alcantara
5b432ae5df smb: client: fix creating symlinks under POSIX mounts
SMB3.1.1 POSIX mounts support native symlinks that are created with
IO_REPARSE_TAG_SYMLINK reparse points, so skip the checking of
FILE_SUPPORTS_REPARSE_POINTS as some servers might not have it set.

Cc: linux-cifs@vger.kernel.org
Cc: Ralph Boehme <slow@samba.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-by: Matthew Richardson <m.richardson@ed.ac.uk>
Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-04 19:29:14 -05:00
Paulo Alcantara
6b445309ee smb: client: default to nonativesocket under POSIX mounts
SMB3.1.1 POSIX mounts require sockets to be created with NFS reparse
points.

Cc: linux-cifs@vger.kernel.org
Cc: Ralph Boehme <slow@samba.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-by: Matthew Richardson <m.richardson@ed.ac.uk>
Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-04 19:29:10 -05:00
Linus Torvalds
0974f486f3 f2fs-for-6.17-rc1
In this round, we've mainly updated three parts: 1) folio conversion by Matthew,
 2) switch to a new mount API by Hongbo and Eric, and 3) several sysfs entries
 to tune GCs for ZUFS with finer granularity by Daeho. There are also patches
 to address bugs and issues in the existing features such as GCs, file pinning,
 write-while-dio-read, contingous block allocation, and memory access violations.
 
 Enhancement:
  - switch to new mount API and folio conversion
  - add sysfs nodes to controle F2FS GCs for ZUFS
  - improve performance on the nat entry cache
  - drop inode from the donation list when the last file is closed
  - avoid splitting bio when reading multiple pages
 
 Bug fix:
  - fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
  - make sure zoned device GC to use FG_GC in shortage of free section
  - fix to calculate dirty data during has_not_enough_free_secs()
  - fix to update upper_p in __get_secs_required() correctly
  - wait for inflight dio completion, excluding pinned files read using dio
  - don't break allocation when crossing contiguous sections
  - vm_unmap_ram() may be called from an invalid context
  - fix to avoid out-of-boundary access in dnode page
  - fix to avoid panic in f2fs_evict_inode
  - fix to avoid UAF in f2fs_sync_inode_meta()
  - fix to use f2fs_is_valid_blkaddr_raw() in do_write_page()
  - fix UAF of f2fs_inode_info in f2fs_free_dic
  - fix to avoid invalid wait context issue
  - fix bio memleak when committing super block
  - handle nat.blkaddr corruption in f2fs_get_node_info()
 
 In addition, there are also clean-ups and minor bug fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmiRJ+oACgkQQBSofoJI
 UNInMA//ekJJCf/0UyMYiPA9ag4KBb/VA0VaVJbw6BA/DoT5ZII6+lCIfllyELbk
 78+ZppTrKq5OyImwiajcNijEwyDbh/asfUu+uNVsC85fjoboiBgDGVHbUEtSQ20Q
 5JVXIL5PhDDVGdVNPh57ijYK/PxhzBPaFNuaGECYrqnWhkQEb//HmN20KRfzcOjZ
 19QnOyEh0HED/izMjLhtZaCBQP53kfB7VjhTxMdY86l6IZ22gJHPRrnqBQHRTfyb
 iHcMJj4WRd7SpvbD/6bSdnUfpxOYPIm3GwQHdG46cHBEH1scnyQxx2OULlSLUbz6
 yeiG36jcuQQWOev8ikBjNzfAozD0VvUAulPpfIbAoHc5jBYkA1sP3N7JOiao1H4Z
 FnPgw/FyIQE+d9NkbyeVW+6f9WfmKlJlIJ4zKoURbZvARYCZKmiPiI9vPWWe18qV
 nchWniQMJ45TYsABUGmGJwTEe/SFaOkgLpLjAlzCy7ZY9/6LKVUlnxR0E1ZDcjSp
 5/E5fXQhds0Nn7F1jQXV3afxkECW+MNOLS/31ggL+ym6Pce3HPJCxBeRU4XaKrvA
 O0wP7n3g5jhVVWce0PBghF0mwTVVBwohTaUhL7lIIJMxKGkr4A8kH1j8tLLBdD3b
 hqcesDCtqqOZhogbwHXEgUDSikak4/1R1gDXnK0KhL1gg0Z6wR4=
 =XIPU
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "Three main updates: folio conversion by Matthew, switch to a new mount
  API by Hongbo and Eric, and several sysfs entries to tune GCs for ZUFS
  with finer granularity by Daeho.

  There are also patches to address bugs and issues in the existing
  features such as GCs, file pinning, write-while-dio-read, contingous
  block allocation, and memory access violations.

  Enhancements:
   - switch to new mount API and folio conversion
   - add sysfs nodes to controle F2FS GCs for ZUFS
   - improve performance on the nat entry cache
   - drop inode from the donation list when the last file is closed
   - avoid splitting bio when reading multiple pages

  Bug fixes:
   - fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
   - make sure zoned device GC to use FG_GC in shortage of free section
   - fix to calculate dirty data during has_not_enough_free_secs()
   - fix to update upper_p in __get_secs_required() correctly
   - wait for inflight dio completion, excluding pinned files read using dio
   - don't break allocation when crossing contiguous sections
   - vm_unmap_ram() may be called from an invalid context
   - fix to avoid out-of-boundary access in dnode page
   - fix to avoid panic in f2fs_evict_inode
   - fix to avoid UAF in f2fs_sync_inode_meta()
   - fix to use f2fs_is_valid_blkaddr_raw() in do_write_page()
   - fix UAF of f2fs_inode_info in f2fs_free_dic
   - fix to avoid invalid wait context issue
   - fix bio memleak when committing super block
   - handle nat.blkaddr corruption in f2fs_get_node_info()

  In addition, there are also clean-ups and minor bug fixes"

* tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (109 commits)
  f2fs: drop inode from the donation list when the last file is closed
  f2fs: add gc_boost_gc_greedy sysfs node
  f2fs: add gc_boost_gc_multiple sysfs node
  f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
  f2fs: fix to calculate dirty data during has_not_enough_free_secs()
  f2fs: fix to update upper_p in __get_secs_required() correctly
  f2fs: directly add newly allocated pre-dirty nat entry to dirty set list
  f2fs: avoid redundant clean nat entry move in lru list
  f2fs: zone: wait for inflight dio completion, excluding pinned files read using dio
  f2fs: ignore valid ratio when free section count is low
  f2fs: don't break allocation when crossing contiguous sections
  f2fs: remove unnecessary tracepoint enabled check
  f2fs: merge the two conditions to avoid code duplication
  f2fs: vm_unmap_ram() may be called from an invalid context
  f2fs: fix to avoid out-of-boundary access in dnode page
  f2fs: switch to the new mount api
  f2fs: introduce fs_context_operation structure
  f2fs: separate the options parsing and options checking
  f2fs: Add f2fs_fs_context to record the mount options
  f2fs: Allow sbi to be NULL in f2fs_printk
  ...
2025-08-04 16:27:21 -07:00
Trond Myklebust
b9defd611a NFSv4: Remove duplicate lookups, capability probes and fsinfo calls
When crossing into a new filesystem, the NFSv4 client will look up the
new directory, and then call nfs4_server_capabilities() as well as
nfs4_do_fsinfo() at least twice.

This patch removes the duplicate calls, and reduces the initial lookup
to retrieve just a minimal set of attributes.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-04 09:48:16 -07:00
Trond Myklebust
b01f21cacd NFS: Fix the setting of capabilities when automounting a new filesystem
Capabilities cannot be inherited when we cross into a new filesystem.
They need to be reset to the minimal defaults, and then probed for
again.

Fixes: 54ceac4515 ("NFS: Share NFS superblocks per-protocol per-server per-FSID")
Cc: stable@vger.kernel.org
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-04 09:16:45 -07:00
Linus Torvalds
e991acf1bc Significant patch series in this pull request:
- The 2 patch series "squashfs: Remove page->mapping references" from
   Matthew Wilcox gets us closer to being able to remove page->mapping.
 
 - The 5 patch series "relayfs: misc changes" from Jason Xing does some
   maintenance and minor feature addition work in relayfs.
 
 - The 5 patch series "kdump: crashkernel reservation from CMA" from Jiri
   Bohac switches us from static preallocation of the kdump crashkernel's
   working memory over to dynamic allocation.  So the difficulty of
   a-priori estimation of the second kernel's needs is removed and the
   first kernel obtains extra memory.
 
 - The 5 patch series "generalize panic_print's dump function to be used
   by other kernel parts" from Feng Tang implements some consolidation and
   rationalizatio of the various ways in which a faiing kernel splats
   information at the operator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaI+82gAKCRDdBJ7gKXxA
 jj4JAP9xb+w9DrBY6sa+7KTPIb+aTqQ7Zw3o9O2m+riKQJv6jAEA6aEwRnDA0451
 fDT5IqVlCWGvnVikdZHSnvhdD7TGsQ0=
 =rT71
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
     us closer to being able to remove page->mapping

   - "relayfs: misc changes" (Jason Xing) does some maintenance and
     minor feature addition work in relayfs

   - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
     us from static preallocation of the kdump crashkernel's working
     memory over to dynamic allocation. So the difficulty of a-priori
     estimation of the second kernel's needs is removed and the first
     kernel obtains extra memory

   - "generalize panic_print's dump function to be used by other
     kernel parts" (Feng Tang) implements some consolidation and
     rationalization of the various ways in which a failing kernel
     splats information at the operator

* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
  tools/getdelays: add backward compatibility for taskstats version
  kho: add test for kexec handover
  delaytop: enhance error logging and add PSI feature description
  samples: Kconfig: fix spelling mistake "instancess" -> "instances"
  fat: fix too many log in fat_chain_add()
  scripts/spelling.txt: add notifer||notifier to spelling.txt
  xen/xenbus: fix typo "notifer"
  net: mvneta: fix typo "notifer"
  drm/xe: fix typo "notifer"
  cxl: mce: fix typo "notifer"
  KVM: x86: fix typo "notifer"
  MAINTAINERS: add maintainers for delaytop
  ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
  ucount: fix atomic_long_inc_below() argument type
  kexec: enable CMA based contiguous allocation
  stackdepot: make max number of pools boot-time configurable
  lib/xxhash: remove unused functions
  init/Kconfig: restore CONFIG_BROKEN help text
  lib/raid6: update recov_rvv.c zero page usage
  docs: update docs after introducing delaytop
  ...
2025-08-03 16:23:09 -07:00
Li RongQing
533210f239 nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock()
The usage of read_seqbegin_or_lock() in nfs_copy_boot_verifier()
is wrong. "seq" is always even and thus "or_lock" has no effect.

nfs_copy_boot_verifier() just copies 8 bytes and is supposed to be
very rare operation, so we do not need the adaptive locking in this case.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://lore.kernel.org/r/20250731081038.3478-1-lirongqing@baidu.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-03 12:45:47 -07:00
OGAWA Hirofumi
fb0e9db99e fat: fix too many log in fat_chain_add()
This log was excessive for a serial console.  So use the ratelimited
version instead.

Link: https://lkml.kernel.org/r/87qzy611d9.fsf@mail.parknet.co.jp
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: syzbot+fa7ef54f66c189c04b73@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fa7ef54f66c189c04b73
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-02 12:01:40 -07:00
Paulo Alcantara
a967e758f8 smb: client: set symlink type as native for POSIX mounts
SMB3.1.1 POSIX mounts require symlinks to be created natively with
IO_REPARSE_TAG_SYMLINK reparse point.

Cc: linux-cifs@vger.kernel.org
Cc: Ralph Boehme <slow@samba.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-by: Matthew Richardson <m.richardson@ed.ac.uk>
Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-01 00:16:19 -05:00
Linus Torvalds
db68e4c80d 16 smb3/cifs client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmiLj1MACgkQiiy9cAdy
 T1H2UQv9ErlPpB5zI5nmwCqKg3G2uGQ9yXk1iwIPKzwqA3ypPGRz1KpC80TN2SBA
 XdjtDyTlLVasxCMomzodHCloEQSJ4HYWcOL4uzVhhbiX/KqVoWofb5D5QKzkiWvK
 HkIgFCOipXxpg4871+O2fmxxpm+D6dJHz8ffARwb6VTmp1bSBEXR8UTEVXo7LJAK
 jFZv9BPEkKkdnIZw/otbV1eyp8xxtEPZRqMgCsMgdjp1L03ujnFe0COxHnQjYC5o
 MygdVslmuuLswsDoEgZmzVLH7ubAUUhbYzkL6xd0c6zFf1jE67uKDQqOu0kYZCjr
 /dTh33WM4vDjjOGBm6XMFBEKp8+CLHKEMl8qj48MKhGpeGaCSoFP5/5cl+BRZV6D
 sZqJy3cZw3hApQeJHwBjWPdujczmPQ1/BtWkyDIccC//v9rlovmVhd3t2VM7BylS
 5rGgpa9OcFR1Dg+U63ONggTPgdFqEcmuLRXFnEMi+n8TwahIaVkrPz6AmG+8gLyH
 u9tw3DLo
 =fjzp
 -----END PGP SIGNATURE-----

Merge tag 'v6.17-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client updates from Steve French:

 - Fix network namespace refcount leak

 - Multichannel reconnect fix

 - Perf improvement to not do unneeded EA query on native symlinks

 - Performance improvement for directory leases to allow extending lease
   for actively queried directories

 - Improve debugging of directory leases by adding pseudofile to show
   them

 - Five minor mount cleanup patches

 - Minor directory lease cleanup patch

 - Allow creating special files via reparse points over SMB1

 - Two minor improvements to FindFirst over SMB1

 - Two NTLMSSP session setup fixes

* tag 'v6.17-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3 client: add way to show directory leases for improved debugging
  smb: client: get rid of kstrdup() when parsing iocharset mount option
  smb: client: get rid of kstrdup() when parsing domain mount option
  smb: client: get rid of kstrdup() when parsing pass2 mount option
  smb: client: get rid of kstrdup() when parsing pass mount option
  smb: client: get rid of kstrdup() when parsing user mount option
  cifs: Add support for creating reparse points over SMB1
  cifs: Do not query WSL EAs for native SMB symlink
  cifs: Optimize CIFSFindFirst() response when not searching
  cifs: Fix calling CIFSFindFirst() for root path without msearch
  smb: client: fix session setup against servers that require SPN
  smb: client: allow parsing zero-length AV pairs
  cifs: add new field to track the last access time of cfid
  smb: change return type of cached_dir_lease_break() to bool
  cifs: reset iface weights when we cannot find a candidate
  smb: client: fix netns refcount leak after net_passive changes
2025-07-31 21:22:04 -07:00
Yuezhang Mo
99f9a97dce exfat: add cluster chain loop check for dir
An infinite loop may occur if the following conditions occur due to
file system corruption.

(1) Condition for exfat_count_dir_entries() to loop infinitely.
    - The cluster chain includes a loop.
    - There is no UNUSED entry in the cluster chain.

(2) Condition for exfat_create_upcase_table() to loop infinitely.
    - The cluster chain of the root directory includes a loop.
    - There are no UNUSED entry and up-case table entry in the cluster
      chain of the root directory.

(3) Condition for exfat_load_bitmap() to loop infinitely.
    - The cluster chain of the root directory includes a loop.
    - There are no UNUSED entry and bitmap entry in the cluster chain
      of the root directory.

(4) Condition for exfat_find_dir_entry() to loop infinitely.
    - The cluster chain includes a loop.
    - The unused directory entries were exhausted by some operation.

(5) Condition for exfat_check_dir_empty() to loop infinitely.
    - The cluster chain includes a loop.
    - The unused directory entries were exhausted by some operation.
    - All files and sub-directories under the directory are deleted.

This commit adds checks to break the above infinite loop.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-08-01 08:34:23 +09:00
Zhengxu Zhang
2f2d42a17b exfat: fdatasync flag should be same like generic_write_sync()
Test: androbench by default setting, use 64GB sdcard.
 the random write speed:
	without this patch 3.5MB/s
	with this patch 7MB/s

After patch "11a347fb6cef", the random write speed decreased significantly.
the .write_iter() interface had been modified, and check the differences
with generic_file_write_iter(), when calling generic_write_sync() and
exfat_file_write_iter() to call vfs_fsync_range(), the fdatasync flag is
wrong, and make not use the fdatasync mode, and make random write speed
decreased. So use generic_write_sync() instead of vfs_fsync_range().

Fixes: 11a347fb6c ("exfat: change to get file size from DataLength")
Signed-off-by: Zhengxu Zhang <zhengxu.zhang@unisoc.com>
Acked-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-08-01 08:34:17 +09:00
Linus Torvalds
beace86e61 Summary of significant series in this pull request:
- The 4 patch series "mm: ksm: prevent KSM from breaking merging of new
   VMAs" from Lorenzo Stoakes addresses an issue with KSM's
   PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for
   merging with existing adjacent VMAs.
 
 - The 4 patch series "mm/damon: introduce DAMON_STAT for simple and
   practical access monitoring" from SeongJae Park adds a new kernel module
   which simplifies the setup and usage of DAMON in production
   environments.
 
 - The 6 patch series "stop passing a writeback_control to swap/shmem
   writeout" from Christoph Hellwig is a cleanup to the writeback code
   which removes a couple of pointers from struct writeback_control.
 
 - The 7 patch series "drivers/base/node.c: optimization and cleanups"
   from Donet Tom contains largely uncorrelated cleanups to the NUMA node
   setup and management code.
 
 - The 4 patch series "mm: userfaultfd: assorted fixes and cleanups" from
   Tal Zussman does some maintenance work on the userfaultfd code.
 
 - The 5 patch series "Readahead tweaks for larger folios" from Ryan
   Roberts implements some tuneups for pagecache readahead when it is
   reading into order>0 folios.
 
 - The 4 patch series "selftests/mm: Tweaks to the cow test" from Mark
   Brown provides some cleanups and consistency improvements to the
   selftests code.
 
 - The 4 patch series "Optimize mremap() for large folios" from Dev Jain
   does that.  A 37% reduction in execution time was measured in a
   memset+mremap+munmap microbenchmark.
 
 - The 5 patch series "Remove zero_user()" from Matthew Wilcox expunges
   zero_user() in favor of the more modern memzero_page().
 
 - The 3 patch series "mm/huge_memory: vmf_insert_folio_*() and
   vmf_insert_pfn_pud() fixes" from David Hildenbrand addresses some warts
   which David noticed in the huge page code.  These were not known to be
   causing any issues at this time.
 
 - The 3 patch series "mm/damon: use alloc_migrate_target() for
   DAMOS_MIGRATE_{HOT,COLD" from SeongJae Park provides some cleanup and
   consolidation work in DAMON.
 
 - The 3 patch series "use vm_flags_t consistently" from Lorenzo Stoakes
   uses vm_flags_t in places where we were inappropriately using other
   types.
 
 - The 3 patch series "mm/memfd: Reserve hugetlb folios before
   allocation" from Vivek Kasireddy increases the reliability of large page
   allocation in the memfd code.
 
 - The 14 patch series "mm: Remove pXX_devmap page table bit and pfn_t
   type" from Alistair Popple removes several now-unneeded PFN_* flags.
 
 - The 5 patch series "mm/damon: decouple sysfs from core" from SeongJae
   Park implememnts some cleanup and maintainability work in the DAMON
   sysfs layer.
 
 - The 5 patch series "madvise cleanup" from Lorenzo Stoakes does quite a
   lot of cleanup/maintenance work in the madvise() code.
 
 - The 4 patch series "madvise anon_name cleanups" from Vlastimil Babka
   provides additional cleanups on top or Lorenzo's effort.
 
 - The 11 patch series "Implement numa node notifier" from Oscar Salvador
   creates a standalone notifier for NUMA node memory state changes.
   Previously these were lumped under the more general memory on/offline
   notifier.
 
 - The 6 patch series "Make MIGRATE_ISOLATE a standalone bit" from Zi Yan
   cleans up the pageblock isolation code and fixes a potential issue which
   doesn't seem to cause any problems in practice.
 
 - The 5 patch series "selftests/damon: add python and drgn based DAMON
   sysfs functionality tests" from SeongJae Park adds additional drgn- and
   python-based DAMON selftests which are more comprehensive than the
   existing selftest suite.
 
 - The 5 patch series "Misc rework on hugetlb faulting path" from Oscar
   Salvador fixes a rather obscure deadlock in the hugetlb fault code and
   follows that fix with a series of cleanups.
 
 - The 3 patch series "cma: factor out allocation logic from
   __cma_declare_contiguous_nid" from Mike Rapoport rationalizes and cleans
   up the highmem-specific code in the CMA allocator.
 
 - The 28 patch series "mm/migration: rework movable_ops page migration
   (part 1)" from David Hildenbrand provides cleanups and
   future-preparedness to the migration code.
 
 - The 2 patch series "mm/damon: add trace events for auto-tuned
   monitoring intervals and DAMOS quota" from SeongJae Park adds some
   tracepoints to some DAMON auto-tuning code.
 
 - The 6 patch series "mm/damon: fix misc bugs in DAMON modules" from
   SeongJae Park does that.
 
 - The 6 patch series "mm/damon: misc cleanups" from SeongJae Park also
   does what it claims.
 
 - The 4 patch series "mm: folio_pte_batch() improvements" from David
   Hildenbrand cleans up the large folio PTE batching code.
 
 - The 13 patch series "mm/damon/vaddr: Allow interleaving in
   migrate_{hot,cold} actions" from SeongJae Park facilitates dynamic
   alteration of DAMON's inter-node allocation policy.
 
 - The 3 patch series "Remove unmap_and_put_page()" from Vishal Moola
   provides a couple of page->folio conversions.
 
 - The 4 patch series "mm: per-node proactive reclaim" from Davidlohr
   Bueso implements a per-node control of proactive reclaim - beyond the
   current memcg-based implementation.
 
 - The 14 patch series "mm/damon: remove damon_callback" from SeongJae
   Park replaces the damon_callback interface with a more general and
   powerful damon_call()+damos_walk() interface.
 
 - The 10 patch series "mm/mremap: permit mremap() move of multiple VMAs"
   from Lorenzo Stoakes implements a number of mremap cleanups (of course)
   in preparation for adding new mremap() functionality: newly permit the
   remapping of multiple VMAs when the user is specifying MREMAP_FIXED.  It
   still excludes some specialized situations where this cannot be
   performed reliably.
 
 - The 3 patch series "drop hugetlb_free_pgd_range()" from Anthony Yznaga
   switches some sparc hugetlb code over to the generic version and removes
   the thus-unneeded hugetlb_free_pgd_range().
 
 - The 4 patch series "mm/damon/sysfs: support periodic and automated
   stats update" from SeongJae Park augments the present
   userspace-requested update of DAMON sysfs monitoring files.  Automatic
   update is now provided, along with a tunable to control the update
   interval.
 
 - The 4 patch series "Some randome fixes and cleanups to swapfile" from
   Kemeng Shi does what is claims.
 
 - The 4 patch series "mm: introduce snapshot_page" from Luiz Capitulino
   and David Hildenbrand provides (and uses) a means by which debug-style
   functions can grab a copy of a pageframe and inspect it locklessly
   without tripping over the races inherent in operating on the live
   pageframe directly.
 
 - The 6 patch series "use per-vma locks for /proc/pid/maps reads" from
   Suren Baghdasaryan addresses the large contention issues which can be
   triggered by reads from that procfs file.  Latencies are reduced by more
   than half in some situations.  The series also introduces several new
   selftests for the /proc/pid/maps interface.
 
 - The 6 patch series "__folio_split() clean up" from Zi Yan cleans up
   __folio_split()!
 
 - The 7 patch series "Optimize mprotect() for large folios" from Dev
   Jain provides some quite large (>3x) speedups to mprotect() when dealing
   with large folios.
 
 - The 2 patch series "selftests/mm: reuse FORCE_READ to replace "asm
   volatile("" : "+r" (XXX));" and some cleanup" from wang lian does some
   cleanup work in the selftests code.
 
 - The 3 patch series "tools/testing: expand mremap testing" from Lorenzo
   Stoakes extends the mremap() selftest in several ways, including adding
   more checking of Lorenzo's recently added "permit mremap() move of
   multiple VMAs" feature.
 
 - The 22 patch series "selftests/damon/sysfs.py: test all parameters"
   from SeongJae Park extends the DAMON sysfs interface selftest so that it
   tests all possible user-requested parameters.  Rather than the present
   minimal subset.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaIqcCgAKCRDdBJ7gKXxA
 jkVBAQCCn9DR1QP0CRk961ot0cKzOgioSc0aA03DPb2KXRt2kQEAzDAz0ARurFhL
 8BzbvI0c+4tntHLXvIlrC33n9KWAOQM=
 =XsFy
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "As usual, many cleanups. The below blurbiage describes 42 patchsets.
  21 of those are partially or fully cleanup work. "cleans up",
  "cleanup", "maintainability", "rationalizes", etc.

  I never knew the MM code was so dirty.

  "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
     addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
     mapped VMAs were not eligible for merging with existing adjacent
     VMAs.

  "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
     adds a new kernel module which simplifies the setup and usage of
     DAMON in production environments.

  "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
     is a cleanup to the writeback code which removes a couple of
     pointers from struct writeback_control.

  "drivers/base/node.c: optimization and cleanups" (Donet Tom)
     contains largely uncorrelated cleanups to the NUMA node setup and
     management code.

  "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
     does some maintenance work on the userfaultfd code.

  "Readahead tweaks for larger folios" (Ryan Roberts)
     implements some tuneups for pagecache readahead when it is reading
     into order>0 folios.

  "selftests/mm: Tweaks to the cow test" (Mark Brown)
     provides some cleanups and consistency improvements to the
     selftests code.

  "Optimize mremap() for large folios" (Dev Jain)
     does that. A 37% reduction in execution time was measured in a
     memset+mremap+munmap microbenchmark.

  "Remove zero_user()" (Matthew Wilcox)
     expunges zero_user() in favor of the more modern memzero_page().

  "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
     addresses some warts which David noticed in the huge page code.
     These were not known to be causing any issues at this time.

  "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
     provides some cleanup and consolidation work in DAMON.

  "use vm_flags_t consistently" (Lorenzo Stoakes)
     uses vm_flags_t in places where we were inappropriately using other
     types.

  "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
     increases the reliability of large page allocation in the memfd
     code.

  "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
     removes several now-unneeded PFN_* flags.

  "mm/damon: decouple sysfs from core" (SeongJae Park)
     implememnts some cleanup and maintainability work in the DAMON
     sysfs layer.

  "madvise cleanup" (Lorenzo Stoakes)
     does quite a lot of cleanup/maintenance work in the madvise() code.

  "madvise anon_name cleanups" (Vlastimil Babka)
     provides additional cleanups on top or Lorenzo's effort.

  "Implement numa node notifier" (Oscar Salvador)
     creates a standalone notifier for NUMA node memory state changes.
     Previously these were lumped under the more general memory
     on/offline notifier.

  "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
     cleans up the pageblock isolation code and fixes a potential issue
     which doesn't seem to cause any problems in practice.

  "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
     adds additional drgn- and python-based DAMON selftests which are
     more comprehensive than the existing selftest suite.

  "Misc rework on hugetlb faulting path" (Oscar Salvador)
     fixes a rather obscure deadlock in the hugetlb fault code and
     follows that fix with a series of cleanups.

  "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
     rationalizes and cleans up the highmem-specific code in the CMA
     allocator.

  "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
     provides cleanups and future-preparedness to the migration code.

  "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
     adds some tracepoints to some DAMON auto-tuning code.

  "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
     does that.

  "mm/damon: misc cleanups" (SeongJae Park)
     also does what it claims.

  "mm: folio_pte_batch() improvements" (David Hildenbrand)
     cleans up the large folio PTE batching code.

  "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
     facilitates dynamic alteration of DAMON's inter-node allocation
     policy.

  "Remove unmap_and_put_page()" (Vishal Moola)
     provides a couple of page->folio conversions.

  "mm: per-node proactive reclaim" (Davidlohr Bueso)
     implements a per-node control of proactive reclaim - beyond the
     current memcg-based implementation.

  "mm/damon: remove damon_callback" (SeongJae Park)
     replaces the damon_callback interface with a more general and
     powerful damon_call()+damos_walk() interface.

  "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
     implements a number of mremap cleanups (of course) in preparation
     for adding new mremap() functionality: newly permit the remapping
     of multiple VMAs when the user is specifying MREMAP_FIXED. It still
     excludes some specialized situations where this cannot be performed
     reliably.

  "drop hugetlb_free_pgd_range()" (Anthony Yznaga)
     switches some sparc hugetlb code over to the generic version and
     removes the thus-unneeded hugetlb_free_pgd_range().

  "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
     augments the present userspace-requested update of DAMON sysfs
     monitoring files. Automatic update is now provided, along with a
     tunable to control the update interval.

  "Some randome fixes and cleanups to swapfile" (Kemeng Shi)
     does what is claims.

  "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
     provides (and uses) a means by which debug-style functions can grab
     a copy of a pageframe and inspect it locklessly without tripping
     over the races inherent in operating on the live pageframe
     directly.

  "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
     addresses the large contention issues which can be triggered by
     reads from that procfs file. Latencies are reduced by more than
     half in some situations. The series also introduces several new
     selftests for the /proc/pid/maps interface.

  "__folio_split() clean up" (Zi Yan)
     cleans up __folio_split()!

  "Optimize mprotect() for large folios" (Dev Jain)
     provides some quite large (>3x) speedups to mprotect() when dealing
     with large folios.

  "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
     does some cleanup work in the selftests code.

  "tools/testing: expand mremap testing" (Lorenzo Stoakes)
     extends the mremap() selftest in several ways, including adding
     more checking of Lorenzo's recently added "permit mremap() move of
     multiple VMAs" feature.

  "selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
     extends the DAMON sysfs interface selftest so that it tests all
     possible user-requested parameters. Rather than the present minimal
     subset"

* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
  MAINTAINERS: add missing headers to mempory policy & migration section
  MAINTAINERS: add missing file to cgroup section
  MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
  MAINTAINERS: add missing zsmalloc file
  MAINTAINERS: add missing files to page alloc section
  MAINTAINERS: add missing shrinker files
  MAINTAINERS: move memremap.[ch] to hotplug section
  MAINTAINERS: add missing mm_slot.h file THP section
  MAINTAINERS: add missing interval_tree.c to memory mapping section
  MAINTAINERS: add missing percpu-internal.h file to per-cpu section
  mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
  selftests/damon: introduce _common.sh to host shared function
  selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
  selftests/damon/sysfs.py: test non-default parameters runtime commit
  selftests/damon/sysfs.py: generalize DAMON context commit assertion
  selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
  selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
  selftests/damon/sysfs.py: test DAMOS filters commitment
  selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
  selftests/damon/sysfs.py: test DAMOS destinations commitment
  ...
2025-07-31 14:57:54 -07:00
Linus Torvalds
d6084bb815 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmiHoygACgkQnJ2qBz9k
 QNnQCwf9Et0BcGgtcE9bUPJMlK5AEJ2nsEGT24ajRPcJ9DtcpO5KY8htshPwvRBc
 KeSW3bjYaBluKK6LmbvDbaDi4lBwFNKh+dfi8PeVwBECNn2DLNv+EOB6VPgxREWw
 zmC3SL2oh7eGTtJq0h1XeI/s/AFYukOsjKcjoKmT3NyxkCwxYUoY3E1mePutBsW6
 /2CjVdQoMUgw1kLcezdccOJi6IkAk5Vfr8gIRFucQiR49YIICYk17bhYBH/Gx5PJ
 eJkqncq8FLv8gLaaHq5W8lC05nVnpNDFxUxKiN6YcuYUrUC7jFi7ad1EriZP3Zsr
 y8pDAmVUHgzpiMYiPHhP2zG8xK06TQ==
 =ZyXo
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:
 "A couple of small improvements for fsnotify subsystem.

  The most interesting is probably Amir's change modifying the meaning
  of fsnotify fmode bits (and I spell it out specifically because I know
  you care about those). There's no change for the common cases of no
  fsnotify watches or no permission event watches. But when there are
  permission watches (either for open or for pre-content events) but no
  FAN_ACCESS_PERM watch (which nobody uses in practice) we are now able
  optimize away unnecessary cache loads from the read path"

* tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: optimize FMODE_NONOTIFY_PERM for the common cases
  fsnotify: merge file_set_fsnotify_mode_from_watchers() with open perm hook
  samples: fix building fs-monitor on musl systems
  fanotify: sanitize handle_type values when reporting fid
2025-07-31 10:31:00 -07:00
Linus Torvalds
440e6d7e14 Fixes and cleanups for JFS filesystem
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmiLd+AACgkQNqiEXrVA
 jGTZjxAAjJGErKMS4XwC10Cpyy7en97xQF8qHnGWr0nIss7dvdN0zw8oUBCZFuso
 uhtWddAcEwuWK+VcNj1TdEf3tSyxbjGkMA9Gh7amv7eI7YVNrFz7OZBSWMsd2DPP
 ZJzPfExDIGEygWP5KDPd3qrCIA+hMwwJmpCY0sO7zZbrH8aftOUJQe5Gi/a+43f4
 n+vTE1d/0h0PK/Vvy1Ceiql7V0HdJ7SWOfBfTTTQivdzdk8UFqgDoC27WJ1DpfQR
 XTTEIoqizSeSV5YXDqwF6d7QQ3gROvzQlk0J7r3CUJ4gYHF0hDEFmHStYnkz8ZKB
 Z+KMMEnd2Mt/NUFBI6+tYnMzAVj5fq4XLTfe7GqM7ZL+z2JRH+kAT7SGQJ0fabPA
 FlNIA7fpLd1c0uuZghgS1rdY2uLYJEY2gFumcn9T0fbMsE10hSbfrkbldXCvkMpN
 z1pLHj0yvG1yXUOXjgvdkpJEGpdp63Dq7qNq5yAbXCURzH08nAQQ2Fx9vX86WNta
 8vkBrx/6X2uIE7Cd4WOSj3IYb4LY9BSBc9fScVd7D5kkwN0qAHTJNSdMUO0vpvjx
 +wwYMhfIVAreacrmSVsCAh+Zaf7zg/VwS+5GZ+trBx5/KNeLVI5f/QbOXcDhHWm3
 Y/2oQ8TAZgcPoyIROmZ/bQD4T+v48IESNrE54yt5WUazAeIxqYo=
 =jTbL
 -----END PGP SIGNATURE-----

Merge tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy

Pull jfs updates from Dave Kleikamp:
 "Fixes and cleanups for JFS filesystem"

* tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy:
  jfs: fix metapage reference count leak in dbAllocCtl
  jfs: stop using write_cache_pages
  jfs: truncate good inode pages when hard link is 0
  jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage()
  jfs: Regular file corruption check
  jfs: upper bound check of tree index in dbAllocAG
2025-07-31 10:27:11 -07:00
Linus Torvalds
ac46ff0f77 orangefs: fixes for string handling in debugfs and sysfs
Change scnprintf to sysfs_emit in sysfs code.
 
 Change sprintf to scnprintf in debugfs code.
 
 Refactor debugfs mask-to-string code for readability and slightly
 improved functionality.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEWLOQAZkv9m2xtUCf/Vh1C1u0YnAFAmiJMgUACgkQ/Vh1C1u0
 YnB27xAAgnJMTPfbrRF2El/WwpceZQzMtXyuzlBNgr5r3P04afEl38ztBMrwGKr1
 mZlPciTQiCZElAhUHUOCG/pswMnALjBobS0rUGITKwWAv+Ttzr6InhwHc9w7aYm/
 RBfVGaRMObAIiKpH5YSMwqVhkXsMk+G9Ekx/snx31LgBN6JiCgPHBAAAo5fhKixa
 ifyLuxsRJZAfgMzhMxuPVkjuNWrE8ee0e9bSkrNY+hMMKNtaLuhIAyuo7XZcAwCx
 1qysJiTaxrHhn5A2Fc/s6GzhWMlXoNu76OniBpKDbHUGNKhFRXgb9yrb5k1ezgwv
 CMsVuPPi3p7gyUFPKGtUGZARqeKSoq2pvg5WR50DvgRFx4PyyPC5o/98sBo+ASJS
 Tvh5Kcuze1RqoyZIvj1/1eLYxxwywRVYjlurE1C8qEPTa+v6RkkCRk7V6vN++U2u
 cO4/2MtDLphsuGDJdC8/bzfhaBlUGQ2FuK6nVvLhNAvSeAA8qEPOxgQQZqDMc710
 9KdbeL2kuisUZggplB7vmlmJfT/Qd6tTK1vvIkHNIgczLQTwIjVYtCblJvdbidBl
 5GbLWFLq2rifR3U9s3exny9dpuFEF1Xm6YgVdCf0nSCg60F9UM2Og4Jt0enPzOTz
 Js+GCMOQsGDa2E/eZf0yr782KxTdP4uuhECP+Z/Yk1VSEdzEBow=
 =GCSM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.17-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Fixes for string handling in debugfs and sysfs:

   - change scnprintf to sysfs_emit in sysfs code.

   - change sprintf to scnprintf in debugfs code.

   - refactor debugfs mask-to-string code for readability and slightly
     improved functionality"

* tag 'for-linus-6.17-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  fs/orangefs: Allow 2 more characters in do_c_string()
  fs: orangefs: replace scnprintf() with sysfs_emit()
  fs/orangefs: use snprintf() instead of sprintf()
2025-07-31 10:22:48 -07:00
Linus Torvalds
4522ae2def This pull request contains the following changes for UBI and UBIFS:
UBIFS:
 	- No longer use write_cache_pages()
 
 UBI:
 	- Removal of an unused function
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmiJJ9UWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wf4gEACYUeCKRPJOc6qoCnhsqONKymE/
 jVXxCWOBqpoyF0wtt0BXcS46FpXzKufb2U2BHL6j2/v77qXBYv/KenT1PJe9llXH
 CNoGQb/zgr7cGFULetE12iaTJYxcIyaWfCk1pH7+n8JWeLwTQE66P1PdK5uEzHEp
 5gtDyOic8hSIShLBC01/Td9FVRD3A79DhZUP2WDYxsiZGbA93IvE0RpV2x6iBnOG
 uOzcHAZfnnlO22GoCqcQe5xaqOo3f4mJch6StEDLAs7EN1jvZMSH4wjM6Sbl+R4Z
 3rzLE0LwqItVZCA9yBsZWZLvaGWhS44UH6HXiluTh0a6NzSoyOAg2U+4vU3frfbY
 Kbnum9Rujc3NaVyHHHKmpDWEglj1bk8oN0EXVIOx887JkLjJF0FkSlo+gXp+cA+f
 C0DpsSdJp6XhJsD60dp4gnas8stISmEN0xs/qgICXMo2WByIZlLtsOgBNwfocD8t
 PtckkemupulEnSb9GJ/niAJttACSQFI9DCfnZibj3PYu3OeubSD30mJs/SY6rXOR
 XS9jejSKxdyhyboL9PwuSPVbD2rj/uD/uL8R2ypGbrwNyQGvqUgBQUQjnhNDiOvt
 CHwCtkmoupWxJscmu34K91zrN19i/9Nw4g+/2SOBz6NvVmIknTtuBQfBL57f2ykq
 tZlNwdV/e/a2WCx7dg==
 =/AwZ
 -----END PGP SIGNATURE-----

Merge tag 'ubifs-for-linus-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBI and UBIFS updates from Richard Weinberger:
 "UBIFS:
   - No longer use write_cache_pages()

  UBI:
   - Remove an unused function"

* tag 'ubifs-for-linus-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubifs: stop using write_cache_pages
  mtd: ubi: Remove unused ubi_flush
2025-07-31 10:08:44 -07:00
Linus Torvalds
ff7dcfedf9 Major ext4 changes for 6.17:
- Better scalability for ext4 block allocation
   - Fix insufficient credits when writing back large folios
 
 Miscellaneous bug fixes, especially when handling exteded attriutes,
 inline data, and fast commit.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmiIQEoACgkQ8vlZVpUN
 gaPB9wf/QursT7eLjx9Gz+4PYNWPKptBERQtmmDAnNYxDlEQ28+CHdMdEeiIPPoP
 IW1DIHfR7VaTI2K7gy6D5632VAhDDKiXBpIYu1yh3KPClAxjTZbhrif8J5UBXj1K
 ZwmCeLDF40jijua4rVKq3Fqf4iTJUyU2NqLpvcze7BZg7FwstXiNJrZ3DjAwi1BW
 j/5veWwh/KrNMzT5u0+RpMs4FBrdXQXvwSe/4pSx6d75r6WAdzhgUMy09os1wAWU
 3N0JU+R5hAG6iFfbWQRURB6oLMmmxl4x2F7r5BvM27uQtELNLNcxBKZhMW97HpiE
 uSwKgo/59DKpWX0xQ2x/yugQIzd62w==
 =oPHD
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Major ext4 changes for 6.17:

   - Better scalability for ext4 block allocation

   - Fix insufficient credits when writing back large folios

  Miscellaneous bug fixes, especially when handling exteded attriutes,
  inline data, and fast commit"

* tag 'ext4_for_linus_6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (39 commits)
  ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr
  ext4: implement linear-like traversal across order xarrays
  ext4: refactor choose group to scan group
  ext4: convert free groups order lists to xarrays
  ext4: factor out ext4_mb_scan_group()
  ext4: factor out ext4_mb_might_prefetch()
  ext4: factor out __ext4_mb_scan_group()
  ext4: fix largest free orders lists corruption on mb_optimize_scan switch
  ext4: fix zombie groups in average fragment size lists
  ext4: merge freed extent with existing extents before insertion
  ext4: convert sbi->s_mb_free_pending to atomic_t
  ext4: fix typo in CR_GOAL_LEN_SLOW comment
  ext4: get rid of some obsolete EXT4_MB_HINT flags
  ext4: utilize multiple global goals to reduce contention
  ext4: remove unnecessary s_md_lock on update s_mb_last_group
  ext4: remove unnecessary s_mb_last_start
  ext4: separate stream goal hits from s_bal_goals for better tracking
  ext4: add ext4_try_lock_group() to skip busy groups
  ext4: initialize superblock fields in the kballoc-test.c kunit tests
  ext4: refactor the inline directory conversion and new directory codepaths
  ...
2025-07-31 10:02:44 -07:00
Steve French
844e5c0eb1 smb3 client: add way to show directory leases for improved debugging
When looking at performance issues around directory caching, or debugging
directory lease issues, it is helpful to be able to display the current
directory leases (as we can e.g. or open files).  Create pseudo-file
/proc/fs/cifs/open_dirs that displays current directory leases.  Here
is sample output:

cat /proc/fs/cifs/open_dirs
 Version:1
 Format:
 <tree id> <sess id> <persistent fid> <path>
Num entries: 3
0xce4c1c68 0x7176aa54 0xd95ef58e     \dira      valid file info, valid dirents
0xce4c1c68 0x7176aa54 0xd031e211     \dir5      valid file info, valid dirents
0xce4c1c68 0x7176aa54 0x96533a90     \dir1      valid file info

Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-07-31 09:42:54 -05:00
Jaegeuk Kim
078cad8212 f2fs: drop inode from the donation list when the last file is closed
Let's drop the inode from the donation list when there is no other
open file.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-30 17:13:12 +00:00
Linus Torvalds
d9104cec3e bpf-next-6.17
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmiINnEACgkQ6rmadz2v
 bToBnA/9F+A3R6rTwGk4HK3xpfc/nm2Tanl3oRN7S2ub/mskDOtWSIyG6cVFZ0UG
 1fK6IkByyRIpAF/5qhdlw8drRXHkQtGLA0lP2L9llm4X1mHLofB18y9OeLrDE1WN
 KwNP06+IGX9W802lCGSIXOY+VmRscVfXSMokyQt2ilHplKjOnDqJcYkWupi3T2rC
 mz79FY9aEl2YrIcpj9RXz+8nwP49pZBuW2P0IM5PAIj4BJBXShrUp8T1nz94okNe
 NFsnAyRxjWpUT0McEgtA9WvpD9lZqujYD8Qp0KlGZWmI3vNpV5d9S1+dBcEb1n7q
 dyNMkTF3oRrJhhg4VqoHc6fVpzSEoZ9ZxV5Hx4cs+ganH75D4YbdGqx/7mR3DUgH
 MZh6rHF1pGnK7TAm7h5gl3ZRAOkZOaahbe1i01NKo9CEe5fSh3AqMyzJYoyGHRKi
 xDN39eQdWBNA+hm1VkbK2Bv93Rbjrka2Kj+D3sSSO9Bo/u3ntcknr7LW39idKz62
 Q8dkKHcCEtun7gjk0YXPF013y81nEohj1C+52BmJ2l5JitM57xfr6YOaQpu7DPDE
 AJbHx6ASxKdyEETecd0b+cXUPQ349zmRXy0+CDMAGKpBicC0H0mHhL14cwOY1Hfu
 EIpIjmIJGI3JNF6T5kybcQGSBOYebdV0FFgwSllzPvuYt7YsHCs=
 =/O3j
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Remove usermode driver (UMD) framework (Thomas Weißschuh)

 - Introduce Strongly Connected Component (SCC) in the verifier to
   detect loops and refine register liveness (Eduard Zingerman)

 - Allow 'void *' cast using bpf_rdonly_cast() and corresponding
   '__arg_untrusted' for global function parameters (Eduard Zingerman)

 - Improve precision for BPF_ADD and BPF_SUB operations in the verifier
   (Harishankar Vishwanathan)

 - Teach the verifier that constant pointer to a map cannot be NULL
   (Ihor Solodrai)

 - Introduce BPF streams for error reporting of various conditions
   detected by BPF runtime (Kumar Kartikeya Dwivedi)

 - Teach the verifier to insert runtime speculation barrier (lfence on
   x86) to mitigate speculative execution instead of rejecting the
   programs (Luis Gerhorst)

 - Various improvements for 'veristat' (Mykyta Yatsenko)

 - For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to
   improve bug detection by syzbot (Paul Chaignon)

 - Support BPF private stack on arm64 (Puranjay Mohan)

 - Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's
   node (Song Liu)

 - Introduce kfuncs for read-only string opreations (Viktor Malik)

 - Implement show_fdinfo() for bpf_links (Tao Chen)

 - Reduce verifier's stack consumption (Yonghong Song)

 - Implement mprog API for cgroup-bpf programs (Yonghong Song)

* tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits)
  selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite
  selftests/bpf: Add selftest for attaching tracing programs to functions in deny list
  bpf: Add log for attaching tracing programs to functions in deny list
  bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions
  bpf: Fix various typos in verifier.c comments
  bpf: Add third round of bounds deduction
  selftests/bpf: Test invariants on JSLT crossing sign
  selftests/bpf: Test cross-sign 64bits range refinement
  selftests/bpf: Update reg_bound range refinement logic
  bpf: Improve bounds when s64 crosses sign boundary
  bpf: Simplify bounds refinement from s32
  selftests/bpf: Enable private stack tests for arm64
  bpf, arm64: JIT support for private stack
  bpf: Move bpf_jit_get_prog_name() to core.c
  bpf, arm64: Fix fp initialization for exception boundary
  umd: Remove usermode driver framework
  bpf/preload: Don't select USERMODE_DRIVER
  selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure
  selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure
  selftests/bpf: Increase xdp data size for arm64 64K page size
  ...
2025-07-30 09:58:50 -07:00
Linus Torvalds
8be4d31cb8 Networking changes for 6.17.
Core & protocols
 ----------------
 
  - Wrap datapath globals into net_aligned_data, to avoid false sharing.
 
  - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container).
 
  - Add SO_INQ and SCM_INQ support to AF_UNIX.
 
  - Add SIOCINQ support to AF_VSOCK.
 
  - Add TCP_MAXSEG sockopt to MPTCP.
 
  - Add IPv6 force_forwarding sysctl to enable forwarding per interface.
 
  - Make TCP validation of whether packet fully fits in the receive
    window and the rcv_buf more strict. With increased use of HW
    aggregation a single "packet" can be multiple 100s of kB.
 
  - Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
    improves latency up to 33% for sockmap users.
 
  - Convert TCP send queue handling from tasklet to BH workque.
 
  - Improve BPF iteration over TCP sockets to see each socket exactly once.
 
  - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code.
 
  - Support enabling kernel threads for NAPI processing on per-NAPI
    instance basis rather than a whole device. Fully stop the kernel NAPI
    thread when threaded NAPI gets disabled. Previously thread would stick
    around until ifdown due to tricky synchronization.
 
  - Allow multicast routing to take effect on locally-generated packets.
 
  - Add output interface argument for End.X in segment routing.
 
  - MCTP: add support for gateway routing, improve bind() handling.
 
  - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink.
 
  - Add a new neighbor flag ("extern_valid"), which cedes refresh
    responsibilities to userspace. This is needed for EVPN multi-homing
    where a neighbor entry for a multi-homed host needs to be synced
    across all the VTEPs among which the host is multi-homed.
 
  - Support NUD_PERMANENT for proxy neighbor entries.
 
  - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM.
 
  - Add sequence numbers to netconsole messages. Unregister netconsole's
    console when all net targets are removed. Code refactoring.
    Add a number of selftests.
 
  - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
    should be used for an inbound SA lookup.
 
  - Support inspecting ref_tracker state via DebugFS.
 
  - Don't force bonding advertisement frames tx to ~333 ms boundaries.
    Add broadcast_neighbor option to send ARP/ND on all bonded links.
 
  - Allow providing upcall pid for the 'execute' command in openvswitch.
 
  - Remove DCCP support from Netfilter's conntrack.
 
  - Disallow multiple packet duplications in the queuing layer.
 
  - Prevent use of deprecated iptables code on PREEMPT_RT.
 
 Driver API
 ----------
 
  - Support RSS and hashing configuration over ethtool Netlink.
 
  - Add dedicated ethtool callbacks for getting and setting hashing fields.
 
  - Add support for power budget evaluation strategy in PSE /
    Power-over-Ethernet. Generate Netlink events for overcurrent etc.
 
  - Support DPLL phase offset monitoring across all device inputs.
    Support providing clock reference and SYNC over separate DPLL
    inputs.
 
  - Support traffic classes in devlink rate API for bandwidth management.
 
  - Remove rtnl_lock dependency from UDP tunnel port configuration.
 
 Device drivers
 --------------
 
  - Add a new Broadcom driver for 800G Ethernet (bnge).
 
  - Add a standalone driver for Microchip ZL3073x DPLL.
 
  - Remove IBM's NETIUCV device driver.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
     - support zero-copy Tx of DMABUF memory
     - take page size into account for page pool recycling rings
    - Intel (100G, ice, idpf):
      - idpf: XDP and AF_XDP support preparations
      - idpf: add flow steering
      - add link_down_events statistic
      - clean up the TSPLL code
      - preparations for live VM migration
    - nVidia/Mellanox:
     - support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
     - optimize context memory usage for matchers
     - expose serial numbers in devlink info
     - support PCIe congestion metrics
    - Meta (fbnic):
      - add 25G, 50G, and 100G link modes to phylink
      - support dumping FW logs
    - Marvell/Cavium:
      - support for CN20K generation of the Octeon chips
    - Amazon:
      - add HW clock (without timestamping, just hypervisor time access)
 
  - Ethernet virtual:
    - VirtIO net:
      - support segmentation of UDP-tunnel-encapsulated packets
    - Google (gve):
      - support packet timestamping and clock synchronization
    - Microsoft vNIC:
      - add handler for device-originated servicing events
      - allow dynamic MSI-X vector allocation
      - support Tx bandwidth clamping
 
  - Ethernet NICs consumer, and embedded:
    - AMD:
      - amd-xgbe: hardware timestamping and PTP clock support
    - Broadcom integrated MACs (bcmgenet, bcmasp):
      - use napi_complete_done() return value to support NAPI polling
      - add support for re-starting auto-negotiation
    - Broadcom switches (b53):
      - support BCM5325 switches
      - add bcm63xx EPHY power control
    - Synopsys (stmmac):
      - lots of code refactoring and cleanups
    - TI:
      - icssg-prueth: read firmware-names from device tree
      - icssg: PRP offload support
    - Microchip:
      - lan78xx: convert to PHYLINK for improved PHY and MAC management
      - ksz: add KSZ8463 switch support
    - Intel:
      - support similar queue priority scheme in multi-queue and
        time-sensitive networking (taprio)
      - support packet pre-emption in both
    - RealTek (r8169):
      - enable EEE at 5Gbps on RTL8126
    - Airoha:
      - add PPPoE offload support
      - MDIO bus controller for Airoha AN7583
 
  - Ethernet PHYs:
    - support for the IPQ5018 internal GE PHY
    - micrel KSZ9477 switch-integrated PHYs:
      - add MDI/MDI-X control support
      - add RX error counters
      - add cable test support
      - add Signal Quality Indicator (SQI) reporting
    - dp83tg720: improve reset handling and reduce link recovery time
    - support bcm54811 (and its MII-Lite interface type)
    - air_en8811h: support resume/suspend
    - support PHY counters for QCA807x and QCA808x
    - support WoL for QCA807x
 
  - CAN drivers:
    - rcar_canfd: support for Transceiver Delay Compensation
    - kvaser: report FW versions via devlink dev info
 
  - WiFi:
    - extended regulatory info support (6 GHz)
    - add statistics and beacon monitor for Multi-Link Operation (MLO)
    - support S1G aggregation, improve S1G support
    - add Radio Measurement action fields
    - support per-radio RTS threshold
    - some work around how FIPS affects wifi, which was wrong (RC4 is used
      by TKIP, not only WEP)
    - improvements for unsolicited probe response handling
 
  - WiFi drivers:
    - RealTek (rtw88):
      - IBSS mode for SDIO devices
    - RealTek (rtw89):
      - BT coexistence for MLO/WiFi7
      - concurrent station + P2P support
      - support for USB devices RTL8851BU/RTL8852BU
    - Intel (iwlwifi):
      - use embedded PNVM in (to be released) FW images to fix
        compatibility issues
      - many cleanups (unused FW APIs, PCIe code, WoWLAN)
      - some FIPS interoperability
    - MediaTek (mt76):
      - firmware recovery improvements
      - more MLO work
    - Qualcomm/Atheros (ath12k):
      - fix scan on multi-radio devices
      - more EHT/Wi-Fi 7 features
      - encapsulation/decapsulation offload
    - Broadcom (brcm80211):
      - support SDIO 43751 device
 
  - Bluetooth:
    - hci_event: add support for handling LE BIG Sync Lost event
    - ISO: add socket option to report packet seqnum via CMSG
    - ISO: support SCM_TIMESTAMPING for ISO TS
 
  - Bluetooth drivers:
    - intel_pcie: support Function Level Reset
    - nxpuart: add support for 4M baudrate
    - nxpuart: implement powerup sequence, reset, FW dump, and FW loading
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmiFgLgACgkQMUZtbf5S
 IrvafxAAnQRwYBoIG+piCILx6z5pRvBGHkmEQ4AQgSCFuq2eO3ubwMFIqEybfma1
 5+QFjUZAV3OgGgKRBS2KGWxtSzdiF+/JGV1VOIN67sX3Mm0a2QgjA4n5CgKL0FPr
 o6BEzjX5XwG1zvGcBNQ5BZ19xUUKjoZQgTtnea8sZ57Fsp5RtRgmYRqoewNvNk/n
 uImh0NFsDVb0UeOpSzC34VD9l1dJvLGdui4zJAjno/vpvmT1DkXjoK419J/r52SS
 X+5WgsfJ6DkjHqVN1tIhhK34yWqBOcwGFZJgEnWHMkFIl2FqRfFKMHyqtfLlVnLA
 mnIpSyz8Sq2AHtx0TlgZ3At/Ri8p5+yYJgHOXcDKyABa8y8Zf4wrycmr6cV9JLuL
 z54nLEVnJuvfDVDVJjsLYdJXyhMpZFq6+uAItdxKaw8Ugp/QqG4QtoRj+XIHz4ZW
 z6OohkCiCzTwEISFK+pSTxPS30eOxq43kCspcvuLiwCCStJBRkRb5GdZA4dm7LA+
 1Od4ADAkHjyrFtBqTyyC2scX8UJ33DlAIpAYyIeS6w9Cj9EXxtp1z33IAAAZ03MW
 jJwIaJuc8bK2fWKMmiG7ucIXjPo4t//KiWlpkwwqLhPbjZgfDAcxq1AC2TLoqHBL
 y4EOgKpHDCMAghSyiFIAn2JprGcEt8dp+11B0JRXIn4Pm/eYDH8=
 =lqbe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Wrap datapath globals into net_aligned_data, to avoid false sharing

   - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)

   - Add SO_INQ and SCM_INQ support to AF_UNIX

   - Add SIOCINQ support to AF_VSOCK

   - Add TCP_MAXSEG sockopt to MPTCP

   - Add IPv6 force_forwarding sysctl to enable forwarding per interface

   - Make TCP validation of whether packet fully fits in the receive
     window and the rcv_buf more strict. With increased use of HW
     aggregation a single "packet" can be multiple 100s of kB

   - Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
     improves latency up to 33% for sockmap users

   - Convert TCP send queue handling from tasklet to BH workque

   - Improve BPF iteration over TCP sockets to see each socket exactly
     once

   - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code

   - Support enabling kernel threads for NAPI processing on per-NAPI
     instance basis rather than a whole device. Fully stop the kernel
     NAPI thread when threaded NAPI gets disabled. Previously thread
     would stick around until ifdown due to tricky synchronization

   - Allow multicast routing to take effect on locally-generated packets

   - Add output interface argument for End.X in segment routing

   - MCTP: add support for gateway routing, improve bind() handling

   - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink

   - Add a new neighbor flag ("extern_valid"), which cedes refresh
     responsibilities to userspace. This is needed for EVPN multi-homing
     where a neighbor entry for a multi-homed host needs to be synced
     across all the VTEPs among which the host is multi-homed

   - Support NUD_PERMANENT for proxy neighbor entries

   - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM

   - Add sequence numbers to netconsole messages. Unregister
     netconsole's console when all net targets are removed. Code
     refactoring. Add a number of selftests

   - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
     should be used for an inbound SA lookup

   - Support inspecting ref_tracker state via DebugFS

   - Don't force bonding advertisement frames tx to ~333 ms boundaries.
     Add broadcast_neighbor option to send ARP/ND on all bonded links

   - Allow providing upcall pid for the 'execute' command in openvswitch

   - Remove DCCP support from Netfilter's conntrack

   - Disallow multiple packet duplications in the queuing layer

   - Prevent use of deprecated iptables code on PREEMPT_RT

  Driver API:

   - Support RSS and hashing configuration over ethtool Netlink

   - Add dedicated ethtool callbacks for getting and setting hashing
     fields

   - Add support for power budget evaluation strategy in PSE /
     Power-over-Ethernet. Generate Netlink events for overcurrent etc

   - Support DPLL phase offset monitoring across all device inputs.
     Support providing clock reference and SYNC over separate DPLL
     inputs

   - Support traffic classes in devlink rate API for bandwidth
     management

   - Remove rtnl_lock dependency from UDP tunnel port configuration

  Device drivers:

   - Add a new Broadcom driver for 800G Ethernet (bnge)

   - Add a standalone driver for Microchip ZL3073x DPLL

   - Remove IBM's NETIUCV device driver

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support zero-copy Tx of DMABUF memory
         - take page size into account for page pool recycling rings
      - Intel (100G, ice, idpf):
         - idpf: XDP and AF_XDP support preparations
         - idpf: add flow steering
         - add link_down_events statistic
         - clean up the TSPLL code
         - preparations for live VM migration
      - nVidia/Mellanox:
         - support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
         - optimize context memory usage for matchers
         - expose serial numbers in devlink info
         - support PCIe congestion metrics
      - Meta (fbnic):
         - add 25G, 50G, and 100G link modes to phylink
         - support dumping FW logs
      - Marvell/Cavium:
         - support for CN20K generation of the Octeon chips
      - Amazon:
         - add HW clock (without timestamping, just hypervisor time access)

   - Ethernet virtual:
      - VirtIO net:
         - support segmentation of UDP-tunnel-encapsulated packets
      - Google (gve):
         - support packet timestamping and clock synchronization
      - Microsoft vNIC:
         - add handler for device-originated servicing events
         - allow dynamic MSI-X vector allocation
         - support Tx bandwidth clamping

   - Ethernet NICs consumer, and embedded:
      - AMD:
         - amd-xgbe: hardware timestamping and PTP clock support
      - Broadcom integrated MACs (bcmgenet, bcmasp):
         - use napi_complete_done() return value to support NAPI polling
         - add support for re-starting auto-negotiation
      - Broadcom switches (b53):
         - support BCM5325 switches
         - add bcm63xx EPHY power control
      - Synopsys (stmmac):
         - lots of code refactoring and cleanups
      - TI:
         - icssg-prueth: read firmware-names from device tree
         - icssg: PRP offload support
      - Microchip:
         - lan78xx: convert to PHYLINK for improved PHY and MAC management
         - ksz: add KSZ8463 switch support
      - Intel:
         - support similar queue priority scheme in multi-queue and
           time-sensitive networking (taprio)
         - support packet pre-emption in both
      - RealTek (r8169):
         - enable EEE at 5Gbps on RTL8126
      - Airoha:
         - add PPPoE offload support
         - MDIO bus controller for Airoha AN7583

   - Ethernet PHYs:
      - support for the IPQ5018 internal GE PHY
      - micrel KSZ9477 switch-integrated PHYs:
         - add MDI/MDI-X control support
         - add RX error counters
         - add cable test support
         - add Signal Quality Indicator (SQI) reporting
      - dp83tg720: improve reset handling and reduce link recovery time
      - support bcm54811 (and its MII-Lite interface type)
      - air_en8811h: support resume/suspend
      - support PHY counters for QCA807x and QCA808x
      - support WoL for QCA807x

   - CAN drivers:
      - rcar_canfd: support for Transceiver Delay Compensation
      - kvaser: report FW versions via devlink dev info

   - WiFi:
      - extended regulatory info support (6 GHz)
      - add statistics and beacon monitor for Multi-Link Operation (MLO)
      - support S1G aggregation, improve S1G support
      - add Radio Measurement action fields
      - support per-radio RTS threshold
      - some work around how FIPS affects wifi, which was wrong (RC4 is
        used by TKIP, not only WEP)
      - improvements for unsolicited probe response handling

   - WiFi drivers:
      - RealTek (rtw88):
         - IBSS mode for SDIO devices
      - RealTek (rtw89):
         - BT coexistence for MLO/WiFi7
         - concurrent station + P2P support
         - support for USB devices RTL8851BU/RTL8852BU
      - Intel (iwlwifi):
         - use embedded PNVM in (to be released) FW images to fix
           compatibility issues
         - many cleanups (unused FW APIs, PCIe code, WoWLAN)
         - some FIPS interoperability
      - MediaTek (mt76):
         - firmware recovery improvements
         - more MLO work
      - Qualcomm/Atheros (ath12k):
         - fix scan on multi-radio devices
         - more EHT/Wi-Fi 7 features
         - encapsulation/decapsulation offload
      - Broadcom (brcm80211):
         - support SDIO 43751 device

   - Bluetooth:
      - hci_event: add support for handling LE BIG Sync Lost event
      - ISO: add socket option to report packet seqnum via CMSG
      - ISO: support SCM_TIMESTAMPING for ISO TS

   - Bluetooth drivers:
      - intel_pcie: support Function Level Reset
      - nxpuart: add support for 4M baudrate
      - nxpuart: implement powerup sequence, reset, FW dump, and FW loading"

* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
  dpll: zl3073x: Fix build failure
  selftests: bpf: fix legacy netfilter options
  ipv6: annotate data-races around rt->fib6_nsiblings
  ipv6: fix possible infinite loop in fib6_info_uses_dev()
  ipv6: prevent infinite loop in rt6_nlmsg_size()
  ipv6: add a retry logic in net6_rt_notify()
  vrf: Drop existing dst reference in vrf_ip6_input_dst
  net/sched: taprio: align entry index attr validation with mqprio
  net: fsl_pq_mdio: use dev_err_probe
  selftests: rtnetlink.sh: remove esp4_offload after test
  vsock: remove unnecessary null check in vsock_getname()
  igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
  stmmac: xsk: fix negative overflow of budget in zerocopy mode
  dt-bindings: ieee802154: Convert at86rf230.txt yaml format
  net: dsa: microchip: Disable PTP function of KSZ8463
  net: dsa: microchip: Setup fiber ports for KSZ8463
  net: dsa: microchip: Write switch MAC address differently for KSZ8463
  net: dsa: microchip: Use different registers for KSZ8463
  net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
  dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
  ...
2025-07-30 08:58:55 -07:00
Linus Torvalds
22c5696e3f Driver core changes for 6.17-rc1
- DEBUGFS
 
   - Remove unneeded debugfs_file_{get,put}() instances
 
   - Remove last remnants of debugfs_real_fops()
 
   - Allow storing non-const void * in struct debugfs_inode_info::aux
 
 - SYSFS
 
   - Switch back to attribute_group::bin_attrs (treewide)
 
   - Switch back to bin_attribute::read()/write() (treewide)
 
   - Constify internal references to 'struct bin_attribute'
 
 - Support cache-ids for device-tree systems
 
   - Add arch hook arch_compact_of_hwid()
 
   - Use arch_compact_of_hwid() to compact MPIDR values on arm64
 
 - Rust
 
   - Device
 
     - Introduce CoreInternal device context (for bus internal methods)
 
     - Provide generic drvdata accessors for bus devices
 
     - Provide Driver::unbind() callbacks
 
     - Use the infrastructure above for auxiliary, PCI and platform
 
     - Implement Device::as_bound()
 
     - Rename Device::as_ref() to Device::from_raw() (treewide)
 
     - Implement fwnode and device property abstractions
 
       - Implement example usage in the Rust platform sample driver
 
   - Devres
 
     - Remove the inner reference count (Arc) and use pin-init instead
 
     - Replace Devres::new_foreign_owned() with devres::register()
 
     - Require T to be Send in Devres<T>
 
     - Initialize the data kept inside a Devres last
 
     - Provide an accessor for the Devres associated Device
 
   - Device ID
 
     - Add support for ACPI device IDs and driver match tables
 
     - Split up generic device ID infrastructure
 
     - Use generic device ID infrastructure in net::phy
 
   - DMA
 
     - Implement the dma::Device trait
 
     - Add DMA mask accessors to dma::Device
 
     - Implement dma::Device for PCI and platform devices
 
     - Use DMA masks from the DMA sample module
 
   - I/O
 
     - Implement abstraction for resource regions (struct resource)
 
     - Implement resource-based ioremap() abstractions
 
     - Provide platform device accessors for I/O (remap) requests
 
   - Misc
 
     - Support fallible PinInit types in Revocable
 
     - Implement Wrapper<T> for Opaque<T>
 
     - Merge pin-init blanket dependencies (for Devres)
 
 - Misc
 
   - Fix OF node leak in auxiliary_device_create()
 
   - Use util macros in device property iterators
 
   - Improve kobject sample code
 
   - Add device_link_test() for testing device link flags
 
   - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits
 
   - Hint to prefer container_of_const() over container_of()
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCaIjkhwAKCRBFlHeO1qrK
 LpXuAP9RWwfD9ZGgQZ9OsMk/0pZ2mDclaK97jcmI9TAeSxeZMgD1FHnOMTY7oSIi
 iG7Muq0yLD+A5gk9HUnMUnFNrngWCg==
 =jgRj
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core updates from Danilo Krummrich:
 "debugfs:
   - Remove unneeded debugfs_file_{get,put}() instances
   - Remove last remnants of debugfs_real_fops()
   - Allow storing non-const void * in struct debugfs_inode_info::aux

  sysfs:
   - Switch back to attribute_group::bin_attrs (treewide)
   - Switch back to bin_attribute::read()/write() (treewide)
   - Constify internal references to 'struct bin_attribute'

  Support cache-ids for device-tree systems:
   - Add arch hook arch_compact_of_hwid()
   - Use arch_compact_of_hwid() to compact MPIDR values on arm64

  Rust:
   - Device:
       - Introduce CoreInternal device context (for bus internal methods)
       - Provide generic drvdata accessors for bus devices
       - Provide Driver::unbind() callbacks
       - Use the infrastructure above for auxiliary, PCI and platform
       - Implement Device::as_bound()
       - Rename Device::as_ref() to Device::from_raw() (treewide)
       - Implement fwnode and device property abstractions
       - Implement example usage in the Rust platform sample driver
   - Devres:
       - Remove the inner reference count (Arc) and use pin-init instead
       - Replace Devres::new_foreign_owned() with devres::register()
       - Require T to be Send in Devres<T>
       - Initialize the data kept inside a Devres last
       - Provide an accessor for the Devres associated Device
   - Device ID:
       - Add support for ACPI device IDs and driver match tables
       - Split up generic device ID infrastructure
       - Use generic device ID infrastructure in net::phy
   - DMA:
       - Implement the dma::Device trait
       - Add DMA mask accessors to dma::Device
       - Implement dma::Device for PCI and platform devices
       - Use DMA masks from the DMA sample module
   - I/O:
       - Implement abstraction for resource regions (struct resource)
       - Implement resource-based ioremap() abstractions
       - Provide platform device accessors for I/O (remap) requests
   - Misc:
       - Support fallible PinInit types in Revocable
       - Implement Wrapper<T> for Opaque<T>
       - Merge pin-init blanket dependencies (for Devres)

  Misc:
   - Fix OF node leak in auxiliary_device_create()
   - Use util macros in device property iterators
   - Improve kobject sample code
   - Add device_link_test() for testing device link flags
   - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits
   - Hint to prefer container_of_const() over container_of()"

* tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits)
  rust: io: fix broken intra-doc links to `platform::Device`
  rust: io: fix broken intra-doc link to missing `flags` module
  rust: io: mem: enable IoRequest doc-tests
  rust: platform: add resource accessors
  rust: io: mem: add a generic iomem abstraction
  rust: io: add resource abstraction
  rust: samples: dma: set DMA mask
  rust: platform: implement the `dma::Device` trait
  rust: pci: implement the `dma::Device` trait
  rust: dma: add DMA addressing capabilities
  rust: dma: implement `dma::Device` trait
  rust: net::phy Change module_phy_driver macro to use module_device_table macro
  rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id
  rust: device_id: split out index support into a separate trait
  device: rust: rename Device::as_ref() to Device::from_raw()
  arm64: cacheinfo: Provide helper to compress MPIDR value into u32
  cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
  cacheinfo: Set cache 'id' based on DT data
  container_of: Document container_of() is not to be used in new code
  driver core: auxiliary bus: fix OF node leak
  ...
2025-07-29 12:15:39 -07:00
Daeho Jeong
c8705cefce f2fs: add gc_boost_gc_greedy sysfs node
Add this to control GC algorithm for boost GC.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-29 15:02:36 +00:00
Daeho Jeong
1d4c5dbba1 f2fs: add gc_boost_gc_multiple sysfs node
Add a sysfs knob to set a multiplier for the background GC migration
window when F2FS Garbage Collection is boosted.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-29 15:02:33 +00:00
Zheng Yu
856db37592 jfs: fix metapage reference count leak in dbAllocCtl
In dbAllocCtl(), read_metapage() increases the reference count of the
metapage. However, when dp->tree.budmin < 0, the function returns -EIO
without calling release_metapage() to decrease the reference count,
leading to a memory leak.

Add release_metapage(mp) before the error return to properly manage
the metapage reference count and prevent the leak.

Fixes: a5f5e4698f ("jfs: fix shift-out-of-bounds in dbSplit")

Signed-off-by: Zheng Yu <zheng.yu@northwestern.edu>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-29 08:34:57 -05:00
Linus Torvalds
283564a433 fscrypt updates for 6.17
Simplify how fscrypt uses the crypto API, resulting in some
 significant performance improvements:
 
  - Drop the incomplete and problematic support for asynchronous
    algorithms. These drivers are bug-prone, and it turns out they are
    actually much slower than the CPU-based code as well.
 
  - Allocate crypto requests on the stack instead of the heap. This
    improves encryption and decryption performance, especially for
    filenames. It also eliminates a point of failure during I/O.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaIZ+fhQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOK+dkAQDrHUTj9dGZI/cQ/TjP0kmOv9XfYAfj
 HOQDRikTX+Ip4QEA6L8FS8lJYf9EMznTvTPOkP7hXpwqzuf00vJWr+ySmQs=
 =N9vo
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt updates from Eric Biggers:
 "Simplify how fscrypt uses the crypto API, resulting in some
  significant performance improvements:

   - Drop the incomplete and problematic support for asynchronous
     algorithms. These drivers are bug-prone, and it turns out they are
     actually much slower than the CPU-based code as well.

   - Allocate crypto requests on the stack instead of the heap. This
     improves encryption and decryption performance, especially for
     filenames. This also eliminates a point of failure during I/O"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  ceph: Remove gfp_t argument from ceph_fscrypt_encrypt_*()
  fscrypt: Remove gfp_t argument from fscrypt_encrypt_block_inplace()
  fscrypt: Remove gfp_t argument from fscrypt_crypt_data_unit()
  fscrypt: Switch to sync_skcipher and on-stack requests
  fscrypt: Drop FORBID_WEAK_KEYS flag for AES-ECB
  fscrypt: Don't use asynchronous CryptoAPI algorithms
  fscrypt: Don't use problematic non-inline crypto engines
  fscrypt: Drop obsolete recommendation to enable optimized SHA-512
  fscrypt: Explicitly include <linux/export.h>
2025-07-28 18:07:38 -07:00
Linus Torvalds
4b65b859f5 Crypto library conversions for 6.17
Convert fsverity and apparmor to use the SHA-2 library functions
 instead of crypto_shash. This is simpler and also slightly faster.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaIZ+MhQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOK6LlAP42XrHo9hMqCnVs7uCtDVOk2kBUB7ml
 m7rkVqQLiEbgAAEAiFvQYPjiaUedTi+bH45+qP/O/LxY7y5laIby7QKa1wc=
 =X4Xs
 -----END PGP SIGNATURE-----

Merge tag 'libcrypto-conversions-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library conversions from Eric Biggers:
 "Convert fsverity and apparmor to use the SHA-2 library functions
  instead of crypto_shash. This is simpler and also slightly faster"

* tag 'libcrypto-conversions-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  fsverity: Switch from crypto_shash to SHA-2 library
  fsverity: Explicitly include <linux/export.h>
  apparmor: use SHA-256 library API instead of crypto_shash API
2025-07-28 18:05:46 -07:00
Linus Torvalds
a578dd095d CRC updates for 6.17
Updates for the kernel's CRC (cyclic redundancy check) code:
 
  - Reorganize the architecture-optimized CRC code. It now lives in
    lib/crc/$(SRCARCH)/ rather than arch/$(SRCARCH)/lib/, and it is no
    longer artificially split into separate generic and arch modules.
    This allows better inlining and dead code elimination. The generic
    CRC code is also no longer exported, simplifying the API. (This
    mirrors the similar changes to SHA-1 and SHA-2 in lib/crypto/,
    which can be found in the "Crypto library updates" pull request.)
 
  - Improve crc32c() performance on newer x86_64 CPUs on long messages
    by enabling the VPCLMULQDQ optimized code.
 
  - Simplify the crypto_shash wrappers for crc32_le() and crc32c().
    Register just one shash algorithm for each that uses the (fully
    optimized) library functions, instead of unnecessarily providing
    direct access to the generic CRC code.
 
  - Remove unused and obsolete drivers for hardware CRC engines.
 
  - Remove CRC-32 combination functions that are no longer used.
 
  - Add kerneldoc for crc32_le(), crc32_be(), and crc32c().
 
  - Convert the crc32() macro to an inline function.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaIZ8rRQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOK3yOAP9OuoCirD42ZHNSgQeGTzhhZ2jCHiPN
 BPvHChwtE2MSRwEA0ddNX36aOiEKmpjog3TMllOIBz7wBrwZV7KgoX75+AU=
 =uAY8
 -----END PGP SIGNATURE-----

Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull CRC updates from Eric Biggers:

 - Reorganize the architecture-optimized CRC code

   It now lives in lib/crc/$(SRCARCH)/ rather than arch/$(SRCARCH)/lib/,
   and it is no longer artificially split into separate generic and arch
   modules. This allows better inlining and dead code elimination

   The generic CRC code is also no longer exported, simplifying the API.
   (This mirrors the similar changes to SHA-1 and SHA-2 in lib/crypto/,
   which can be found in the "Crypto library updates" pull request)

 - Improve crc32c() performance on newer x86_64 CPUs on long messages by
   enabling the VPCLMULQDQ optimized code

 - Simplify the crypto_shash wrappers for crc32_le() and crc32c()

   Register just one shash algorithm for each that uses the (fully
   optimized) library functions, instead of unnecessarily providing
   direct access to the generic CRC code

 - Remove unused and obsolete drivers for hardware CRC engines

 - Remove CRC-32 combination functions that are no longer used

 - Add kerneldoc for crc32_le(), crc32_be(), and crc32c()

 - Convert the crc32() macro to an inline function

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (26 commits)
  lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial
  lib/crc: x86: Reorganize crc-pclmul static_call initialization
  lib/crc: crc64: Add include/linux/crc64.h to kernel-api.rst
  lib/crc: crc32: Change crc32() from macro to inline function and remove cast
  nvmem: layouts: Switch from crc32() to crc32_le()
  lib/crc: crc32: Document crc32_le(), crc32_be(), and crc32c()
  lib/crc: Explicitly include <linux/export.h>
  lib/crc: Remove ARCH_HAS_* kconfig symbols
  lib/crc: x86: Migrate optimized CRC code into lib/crc/
  lib/crc: sparc: Migrate optimized CRC code into lib/crc/
  lib/crc: s390: Migrate optimized CRC code into lib/crc/
  lib/crc: riscv: Migrate optimized CRC code into lib/crc/
  lib/crc: powerpc: Migrate optimized CRC code into lib/crc/
  lib/crc: mips: Migrate optimized CRC code into lib/crc/
  lib/crc: loongarch: Migrate optimized CRC code into lib/crc/
  lib/crc: arm64: Migrate optimized CRC code into lib/crc/
  lib/crc: arm: Migrate optimized CRC code into lib/crc/
  lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/
  lib/crc: Move files into lib/crc/
  lib/crc32: Remove unused combination support
  ...
2025-07-28 17:43:29 -07:00
Linus Torvalds
8e736a2eea hardening updates for v6.17-rc1
- Introduce and start using TRAILING_OVERLAP() helper for fixing
   embedded flex array instances (Gustavo A. R. Silva)
 
 - mux: Convert mux_control_ops to a flex array member in mux_chip
   (Thorsten Blum)
 
 - string: Group str_has_prefix() and strstarts() (Andy Shevchenko)
 
 - Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
   Kees Cook)
 
 - Refactor and rename stackleak feature to support Clang
 
 - Add KUnit test for seq_buf API
 
 - Fix KUnit fortify test under LTO
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaIfUkgAKCRA2KwveOeQk
 uypLAP92r6f47sWcOw/5B9aVffX6Bypsb7dqBJQpCNxI5U1xcAEAiCrZ98UJyOeQ
 JQgnXd4N67K4EsS2JDc+FutRn3Yi+A8=
 =+5Bq
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:

 - Introduce and start using TRAILING_OVERLAP() helper for fixing
   embedded flex array instances (Gustavo A. R. Silva)

 - mux: Convert mux_control_ops to a flex array member in mux_chip
   (Thorsten Blum)

 - string: Group str_has_prefix() and strstarts() (Andy Shevchenko)

 - Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
   Kees Cook)

 - Refactor and rename stackleak feature to support Clang

 - Add KUnit test for seq_buf API

 - Fix KUnit fortify test under LTO

* tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
  sched/task_stack: Add missing const qualifier to end_of_stack()
  kstack_erase: Support Clang stack depth tracking
  kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
  init.h: Disable sanitizer coverage for __init and __head
  kstack_erase: Disable kstack_erase for all of arm compressed boot code
  x86: Handle KCOV __init vs inline mismatches
  arm64: Handle KCOV __init vs inline mismatches
  s390: Handle KCOV __init vs inline mismatches
  arm: Handle KCOV __init vs inline mismatches
  mips: Handle KCOV __init vs inline mismatch
  powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section
  configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON
  configs/hardening: Enable CONFIG_KSTACK_ERASE
  stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS
  stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
  stackleak: Rename STACKLEAK to KSTACK_ERASE
  seq_buf: Introduce KUnit tests
  string: Group str_has_prefix() and strstarts()
  kunit/fortify: Add back "volatile" for sizeof() constants
  acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings
  ...
2025-07-28 17:16:12 -07:00
Linus Torvalds
d900c4ce63 execve updates for v6.17
- Introduce regular REGSET note macros arch-wide (Dave Martin)
 
 - Remove arbitrary 4K limitation of program header size (Yin Fengwei)
 
 - Reorder function qualifiers for copy_clone_args_from_user() (Dishank Jogi)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaIVKiAAKCRA2KwveOeQk
 u4zBAP4zUNj2+XyixVPXCzv+Hkle6zWs7yrzdA2yLxe8Qtwj5AD+N2I6MUGcCFGW
 W+uWxlWTtGLDqh1CplIUqTlxMi39Og4=
 =vYnE
 -----END PGP SIGNATURE-----

Merge tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve updates from Kees Cook:

 - Introduce regular REGSET note macros arch-wide (Dave Martin)

 - Remove arbitrary 4K limitation of program header size (Yin Fengwei)

 - Reorder function qualifiers for copy_clone_args_from_user() (Dishank Jogi)

* tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (25 commits)
  fork: reorder function qualifiers for copy_clone_args_from_user
  binfmt_elf: remove the 4k limitation of program header size
  binfmt_elf: Warn on missing or suspicious regset note names
  xtensa: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  um: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  x86/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  sparc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  sh: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  s390/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  riscv: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  powerpc/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  parisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  openrisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  nios2: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  MIPS: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  m68k: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  LoongArch: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  hexagon: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  csky: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  arm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  ...
2025-07-28 17:11:40 -07:00
Linus Torvalds
e268c230c0 zonefs changes for 6.17-rc1
- Use ZONEFS_SUPER_SIZE instead of PAGE_SIZE to read from disk the
    super block (Johannes).
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCaIcLzAAKCRDdoc3SxdoY
 du1PAQDZoyBI+zwM73r8vh26n1ftOEDbXc2RnfLbj+x9D+C52wD9GQY4VYP5DZ6u
 K04UYehzCpAHRA2FEtfgEQCGr5mfsA0=
 =Veb6
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs update from Damien Le Moal:

 - Use ZONEFS_SUPER_SIZE instead of PAGE_SIZE to read from disk the
   super block (Johannes).

* tag 'zonefs-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: use ZONEFS_SUPER_SIZE instead of PAGE_SIZE
2025-07-28 17:06:51 -07:00
Linus Torvalds
6e11664f14 for-6.17/block-20250728
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmiHdZ8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptRED/9o3dQ1QHL5yNM/AyCCGox0V4zra8qGS/Vc
 cBWpAVrmPGRw0IYlLZENtN9PdwKcbMzJq3l6cxeC7dBnAZP0AxTzP4YYJYUNVsqo
 WtJ3d/k5+cVp0OyOp4uabaqNeMeLoPk9/JXe1Ml2KxtDmHtj5yee0JRh7zlPZmZj
 tsrpIUTeHgAPn6yR1EI+0ybx/mjCb05Mv2Y8gF5hkUPA2PuON+MTFixJmqoy2ySh
 n+22mz/prqlyOSYh/VVv1+9jcQ94wMjcW0JIpg9lM3Kg8BCPU4IetvO1UiX6X33v
 154zEh2aJJDBx+yORS4BM4JMXjRZI7lYea2dkHM8Cajctu1Wpja9bNwnK9ibXvEc
 WtyBwztleLbAZef25fA/W87JE23fGa/r3nwIb2cF4QqkAFslCvhjA93WkOzNJCgQ
 qsWOrlCh3IK2NUu4b1Ncs3ZHOPvc51+zzjMzC6SUr54xhrxDK+gngDPhRy7XDqWJ
 DTMpIlr366o8GdJqnib0/e/CPBrThS6Vl6u0tgLnNbwdpK1svgo/uHW5ksKvDqHX
 kGEIhyRRJJC+4wyl4dsYKXa2twcyFrlWdAE+pZguEC2nZRYqYl9uXftOtvfp1x0y
 /skDX0FIDjvyjRqCLcqF03FSGqwCGS8WuWXZjPhVhcfz47NvbHeFDh1G/jMzsbpj
 S9zrPve/DQ==
 =e86T
 -----END PGP SIGNATURE-----

Merge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - MD pull request via Yu:
      - call del_gendisk synchronously (Xiao)
      - cleanup unused variable (John)
      - cleanup workqueue flags (Ryo)
      - fix faulty rdev can't be removed during resync (Qixing)

 - NVMe pull request via Christoph:
      - try PCIe function level reset on init failure (Keith Busch)
      - log TLS handshake failures at error level (Maurizio Lombardi)
      - pci-epf: do not complete commands twice if nvmet_req_init()
        fails (Rick Wertenbroek)
      - misc cleanups (Alok Tiwari)

 - Removal of the pktcdvd driver

   This has been more than a decade coming at this point, and some
   recently revealed breakages that had it causing issues even for cases
   where it isn't required made me re-pull the trigger on this one. It's
   known broken and nobody has stepped up to maintain the code

 - Series for ublk supporting batch commands, enabling the use of
   multishot where appropriate

 - Speed up ublk exit handling

 - Fix for the two-stage elevator fixing which could leak data

 - Convert NVMe to use the new IOVA based API

 - Increase default max transfer size to something more reasonable

 - Series fixing write operations on zoned DM devices

 - Add tracepoints for zoned block device operations

 - Prep series working towards improving blk-mq queue management in the
   presence of isolated CPUs

 - Don't allow updating of the block size of a loop device that is
   currently under exclusively ownership/open

 - Set chunk sectors from stacked device stripe size and use it for the
   atomic write size limit

 - Switch to folios in bcache read_super()

 - Fix for CD-ROM MRW exit flush handling

 - Various tweaks, fixes, and cleanups

* tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits)
  block: restore two stage elevator switch while running nr_hw_queue update
  cdrom: Call cdrom_mrw_exit from cdrom_release function
  sunvdc: Balance device refcount in vdc_port_mpgroup_check
  nvme-pci: try function level reset on init failure
  dm: split write BIOs on zone boundaries when zone append is not emulated
  block: use chunk_sectors when evaluating stacked atomic write limits
  dm-stripe: limit chunk_sectors to the stripe size
  md/raid10: set chunk_sectors limit
  md/raid0: set chunk_sectors limit
  block: sanitize chunk_sectors for atomic write limits
  ilog2: add max_pow_of_two_factor()
  nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
  nvme-tcp: log TLS handshake failures at error level
  docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
  nvme: fix typo in status code constant for self-test in progress
  nvmet: remove redundant assignment of error code in nvmet_ns_enable()
  nvme: fix incorrect variable in io cqes error message
  nvme: fix multiple spelling and grammar issues in host drivers
  block: fix blk_zone_append_update_request_bio() kernel-doc
  md/raid10: fix set but not used variable in sync_request_write()
  ...
2025-07-28 16:43:54 -07:00
Linus Torvalds
c3018a2c6a for-6.17/io_uring-20250728
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmiHcP8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgphu+EACp+MNKBIx7iIM1arbmafEmroaR9Whi5CJu
 8BczAKRVPNhICPXAjr6UOmRuD4nBzPysPNi73xNRdXdnd/dnOL27zMqF/0JCJsd6
 V4Xt7pXcanJ4BoeRDv9P332h6/IVo1ulxqEzhWJt2KKoq4cWbSwlvk18NPOdGIkp
 PcMPARf15d4KbLprdST5Dzrg2uqZ3jZ4Y4zvIMKery56AZCkp5ZrtPFttw4jhTff
 FQzIjY/hXXDFURubv75EHFg7KC9rcZa4CuU40duTHmsXD/EsSsa0eAJMmiVAGjhM
 PXK/70ZOoxLK1zU7zN+iAMUnVL7onoNvo8g9e7/FJNuHV25am8aUx4DzzP3ci+He
 +2C4PpOKx9Egz0oKmC6hl9eMg/muPd/3j8uxQlGOvz38icgeirQB/ebr1KLtBkM9
 pKQmCsGaIzncXWddcTMCayEj3wbvn2lgWSqSgZuwTt1AXqSTxLOmTywCYcXDBdli
 2ejh/Hk45TALnBhiiDWdOT2euh2EEP1ADOZzVsZzeR29OzLqJDRjRnitb40NUU60
 +ny2HOcplIN6aah+QdFwK31FieVKDY/ufjt1yGvwCP+kIxUEDSYi+NRRldmwznxy
 8UZwFyvd8wOxUc+iG/wCK7ccY+MtYyaZAE4ok2QEvQvMUTBJE/LvI4bkd77fx7Sy
 2cMeDukZ/w==
 =EGH5
 -----END PGP SIGNATURE-----

Merge tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:

 - Optimization to avoid reference counts on non-cloned registered
   buffers. This is how these buffers were handled prior to having
   cloning support, and we can still use that approach as long as the
   buffers haven't been cloned to another ring.

 - Cleanup and improvement for uring_cmd, where btrfs was the only user
   of storing allocated data for the lifetime of the uring_cmd. Clean
   that up so we can get rid of the need to do that.

 - Avoid unnecessary memory copies in uring_cmd usage. This is
   particularly important as a lot of uring_cmd usage necessitates the
   use of 128b SQEs.

 - A few updates for recv multishot, where it's now possible to add
   fairness limits for limiting how much is transferred for each retry
   loop. Additionally, recv multishot now supports an overall cap as
   well, where once reached the multishot recv will terminate. The
   latter is useful for buffer management and juggling many recv streams
   at the same time.

 - Add support for returning the TX timestamps via a new socket command.
   This feature can work in either singleshot or multishot mode, where
   the latter triggers a completion whenever new timestamps are
   available. This is an alternative to using the existing error queue.

 - Add support for an io_uring "mock" file, which is the start of being
   able to do 100% targeted testing in terms of exercising io_uring
   request handling. The idea is to have a file type that can be
   anything the tester would like, and behave exactly how you want it to
   behave in terms of hitting the code paths you want.

 - Improve zcrx by using sgtables to de-duplicate and improve dma
   address handling.

 - Prep work for supporting larger pages for zcrx.

 - Various little improvements and fixes.

* tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits)
  io_uring/zcrx: fix leaking pages on sg init fail
  io_uring/zcrx: don't leak pages on account failure
  io_uring/zcrx: fix null ifq on area destruction
  io_uring: fix breakage in EXPERT menu
  io_uring/cmd: remove struct io_uring_cmd_data
  btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd
  io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag
  io_uring/zcrx: account area memory
  io_uring: export io_[un]account_mem
  io_uring/net: Support multishot receive len cap
  io_uring: deduplicate wakeup handling
  io_uring/net: cast min_not_zero() type
  io_uring/poll: cleanup apoll freeing
  io_uring/net: allow multishot receive per-invocation cap
  io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags
  io_uring/net: use passed in 'len' in io_recv_buf_select()
  io_uring/zcrx: prepare fallback for larger pages
  io_uring/zcrx: assert area type in io_zcrx_iov_page
  io_uring/zcrx: allocate sgtable for umem areas
  io_uring/zcrx: introduce io_populate_area_dma
  ...
2025-07-28 16:30:12 -07:00
Linus Torvalds
e5cf61fa6e eight SMB3 server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmiHqYkACgkQiiy9cAdy
 T1FfyQwAi+luMNvR8p9rH3zNVk+WN/a4vV3suJFuc6Bl6kXFfeNU0DFIB2ZCqAmz
 lk0N5B/dAd71IpNkTtIX2nZiJpRIBzTy5xAnwARq2LdrDi6ipNSKOv4qfOEMeufl
 ehfkqJL/9YQWJv/78Tntfuub2V3t479lpVRDp5hGAwZB4i9KZcVew7ifJJhCoUSh
 9NxcRD6AefeZTZaoTyKXayDCTjEn9N7Q6sXOTsSnLSX+e2nIDobFh+Sm1sDcSHMh
 2BH9a2Eg9CAZkLj6/53sIYUR4LLJjFMeDRpsGa4RZW0QA23WYJqR+R2Ouc5eCkBD
 OS79J5GAbag0RqujLqbl4QXFJRJgW6epB6cCOCjqqo5fQPQa9TIPem2iB5T1NHTg
 yMiuJk+qRX1vqyt8QI05yromSYKY7QBqgKzfkjv7054N1XfyUK/99Y+lX5t2nMZ7
 Fodxdaxrmk78cUpyaCbTLlW7zrkvMK6hQRQ2h++Q2RrYZ8mo+ciR13B6nftR3sdA
 mewSF+yO
 =DRH3
 -----END PGP SIGNATURE-----

Merge tag 'v6.17-rc-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server updates from Steve French:

 - Fix mtime/ctime reporting issue

 - Auth fixes, including two session setup race bugs reported by ZDI

 - Locking improvement in query directory

 - Fix for potential deadlock in creating hardlinks

 - Improvements to path name processing

* tag 'v6.17-rc-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix corrupted mtime and ctime in smb2_open
  ksmbd: fix Preauh_HashValue race condition
  ksmbd: check return value of xa_store() in krb5_authenticate
  ksmbd: fix null pointer dereference error in generate_encryptionkey
  smb/server: add ksmbd_vfs_kern_path()
  smb/server: avoid deadlock when linking with ReplaceIfExists
  smb/server: simplify ksmbd_vfs_kern_path_locked()
  smb/server: use lookup_one_unlocked()
2025-07-28 16:25:24 -07:00
Linus Torvalds
cb6bbff7e6 hfs/hfsplus updates for v6.17
- hfs: fix general protection fault in hfs_find_init()
 - hfs: fix slab-out-of-bounds in hfs_bnode_read()
 - hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read()
 - hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc()
 - hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file()
 - hfsplus: don't set REQ_SYNC for hfsplus_submit_bio()
 - hfsplus: remove mutex_lock check in hfsplus_free_extents
 - hfs: make splice write available again
 - hfsplus: make splice write available again
 - hfs: fix not erasing deleted b-tree node issue
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQT4wVoLCG92poNnMFAhI4xTh21NnQUCaIQQ0wAKCRAhI4xTh21N
 nW3yAQDMhJcNyjP1j2dhNRq8l2PO6jDJqLhxAYGKwWMwv1GTvQD5AaOUSeMQbmcs
 hNkMtjzb7OlfBLUthvrWlaCfLKWCmAk=
 =dI94
 -----END PGP SIGNATURE-----

Merge tag 'hfs-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs

Pull hfs/hfsplus updates from Viacheslav Dubeyko:
 "Johannes Thumshirn has made nice cleanup in hfsplus_submit_bio().

  Tetsuo Handa has fixed the syzbot reported issue in
  hfsplus_create_attributes_file() for the case of corruption the
  Attributes File's metadata.

  Yangtao Li has fixed the syzbot reported issue by removing the
  uneccessary WARN_ON() in hfsplus_free_extents().

  Other fixes:

   - restore generic/001 successful execution by erasing deleted b-tree
     nodes

   - eliminate slab-out-of-bounds issue in hfs_bnode_read() and
     hfsplus_bnode_read() by checking correctness of offset and length
     when accessing b-tree node contents

   - eliminate slab-out-of-bounds read in hfsplus_uni2asc() if the
     b-tree node record has corrupted length of a name that could be
     bigger than HFSPLUS_MAX_STRLEN

   - eliminate general protection fault in hfs_find_init() for the case
     of initial b-tree object creation"

* tag 'hfs-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
  hfs: fix general protection fault in hfs_find_init()
  hfs: fix slab-out-of-bounds in hfs_bnode_read()
  hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read()
  hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc()
  hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file()
  hfsplus: don't set REQ_SYNC for hfsplus_submit_bio()
  hfsplus: remove mutex_lock check in hfsplus_free_extents
  hfs: make splice write available again
  hfsplus: make splice write available again
  hfs: fix not erasing deleted b-tree node issue
2025-07-28 16:17:44 -07:00
Linus Torvalds
c7bfaff47a \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmiHny8ACgkQnJ2qBz9k
 QNl/dQf/Wh/wsQwnYbZ1P1Mk1aGq+5xLAZJrt8dY8umHfCklWzjrmrpbxV11KbSX
 sxnAzRRGP/GlP9Atb6J4oBH2odIY57aKZcfeA64FYYM7yDo3ZvNQvNe+Il3Wr5Zn
 UBCGxr6mbeGt1GjBiP77kZzgLeHNnDKBR8Eu9i6zqYBsPk6wBM83oC2g+Ala++vM
 leb+uph2fL0rGqXu07LUpQDeLadBjhqRkzdgKOYJ6OXmckWASpYBkQlnCNTnAPYv
 AvDod+Mh5UNR0Sq+zB/EYEQn6OTFq+IB6MpGypQjVKxACUt3pTlRyraKWpvakR6r
 tih1TyOS5z1wZXBcoU2EtA9N/xYnjw==
 =Kj02
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull udf and ext2 updates from Jan Kara:
 "A few udf and ext2 fixes and cleanups"

* tag 'fs_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Verify partition map count
  udf: stop using write_cache_pages
  ext2: Handle fiemap on empty files to prevent EINVAL
2025-07-28 16:16:09 -07:00
Joanne Koong
595d7ebeaf fuse: remove page alignment check for writeback len
Remove incorrect page alignment check for the writeback len arg in
fuse_iomap_writeback_range().  len will always be block-aligned as
passed in by iomap.

On regular fuse filesystems, i_blkbits is set to PAGE_SHIFT so this is
not a problem but for fuseblk filesystems, the block size is set to a
default of 512 bytes or a block size passed in at mount time.

Please note that non-page-aligned lengths are fine for the logic in
fuse_iomap_writeback_range().  The check was originally added as a
safeguard to detect conspicuously wrong ranges.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Fixes: ef7e7cbb32 ("fuse: use iomap for writeback")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Link: https://lore.kernel.org/linux-fsdevel/CA+G9fYs5AdVM-T2Tf3LciNCwLZEHetcnSkHsjZajVwwpM2HmJw@mail.gmail.com/
Reported-by: Sasha Levin <sashal@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-07-28 16:14:18 -07:00
Linus Torvalds
b5d760d53a vfs-6.17-rc1.iomap
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCtwAKCRCRxhvAZXjc
 ogPuAQChc4tCjlNp+yAwbSmuzWooKTN8PHI6v+3ftjdaKSy9AgD/Yya1i8aBYBA8
 9HBtIKGAqvcgNB3por7yN+GJ8fxb/Ag=
 =YmLL
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs iomap updates from Christian Brauner:

 - Refactor the iomap writeback code and split the generic and ioend/bio
   based writeback code.

   There are two methods that define the split between the generic
   writeback code, and the implemementation of it, and all knowledge of
   ioends and bios now sits below that layer.

 - Add fuse iomap support for buffered writes and dirty folio writeback.

   This is needed so that granular uptodate and dirty tracking can be
   used in fuse when large folios are enabled. This has two big
   advantages. For writes, instead of the entire folio needing to be
   read into the page cache, only the relevant portions need to be. For
   writeback, only the dirty portions need to be written back instead of
   the entire folio.

* tag 'vfs-6.17-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fuse: refactor writeback to use iomap_writepage_ctx inode
  fuse: hook into iomap for invalidating and checking partial uptodateness
  fuse: use iomap for folio laundering
  fuse: use iomap for writeback
  fuse: use iomap for buffered writes
  iomap: build the writeback code without CONFIG_BLOCK
  iomap: add read_folio_range() handler for buffered writes
  iomap: improve argument passing to iomap_read_folio_sync
  iomap: replace iomap_folio_ops with iomap_write_ops
  iomap: export iomap_writeback_folio
  iomap: move folio_unlock out of iomap_writeback_folio
  iomap: rename iomap_writepage_map to iomap_writeback_folio
  iomap: move all ioend handling to ioend.c
  iomap: add public helpers for uptodate state manipulation
  iomap: hide ioends from the generic writeback code
  iomap: refactor the writeback interface
  iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks
  iomap: pass more arguments using the iomap writeback context
  iomap: header diet
2025-07-28 16:09:03 -07:00
Linus Torvalds
0965549d6f vfs-6.17-rc1.super
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCsAAKCRCRxhvAZXjc
 op1/AQCYRmE6MsFclZ/6Qhpd8Xxl6jYaw0VuSIGneh/HA5EmqQEAiE3/Q0paC1HB
 PHryCsVau1yOfJtE1P05/3JLA73hWA4=
 =MBdP
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull superblock callback update from Christian Brauner:
 "Currently all filesystems which implement super_operations::shutdown()
  can not afford losing a device.

  Thus fs_bdev_mark_dead() will just call the ->shutdown() callback for
  the involved filesystem.

  But it will no longer be the case, as multi-device filesystems like
  btrfs can handle certain device loss without the need to shutdown the
  whole filesystem.

  To allow those multi-device filesystems to be integrated to use
  fs_holder_ops:

   - Add a new super_operations::remove_bdev() callback

   - Try ->remove_bdev() callback first inside fs_bdev_mark_dead().

     If the callback returned 0, meaning the fs can handling the device
     loss, then exit without doing anything else.

     If there is no such callback or the callback returned non-zero
     value, continue to shutdown the filesystem as usual.

  This means the new remove_bdev() should only do the check on whether
  the operation can continue, and if so do the fs specific handlings.
  The shutdown handling should still be handled by the existing
  ->shutdown() callback.

  For all existing filesystems with shutdown callback, there is no
  change to the code nor behavior.

  Btrfs is going to implement both the ->remove_bdev() and ->shutdown()
  callbacks soon"

* tag 'vfs-6.17-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: add a new remove_bdev() callback
2025-07-28 15:50:15 -07:00
Linus Torvalds
57fcb7d930 vfs-6.17-rc1.fileattr
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCpgAKCRCRxhvAZXjc
 oqfFAQDcy3rROUF3W34KcSi7rDmaKVSX53d1tUoqH+1zDRpSlwEAriKDNC1ybudp
 YAnxVzkRHjHs1296WIuwKq5lfhJ60Q4=
 =geAl
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull fileattr updates from Christian Brauner:
 "This introduces the new file_getattr() and file_setattr() system calls
  after lengthy discussions.

  Both system calls serve as successors and extensible companions to
  the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have
  started to show their age in addition to being named in a way that
  makes it easy to conflate them with extended attribute related
  operations.

  These syscalls allow userspace to set filesystem inode attributes on
  special files. One of the usage examples is the XFS quota projects.

  XFS has project quotas which could be attached to a directory. All new
  inodes in these directories inherit project ID set on parent
  directory.

  The project is created from userspace by opening and calling
  FS_IOC_FSSETXATTR on each inode. This is not possible for special
  files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
  with empty project ID. Those inodes then are not shown in the quota
  accounting but still exist in the directory. This is not critical but
  in the case when special files are created in the directory with
  already existing project quota, these new inodes inherit extended
  attributes. This creates a mix of special files with and without
  attributes. Moreover, special files with attributes don't have a
  possibility to become clear or change the attributes. This, in turn,
  prevents userspace from re-creating quota project on these existing
  files.

  In addition, these new system calls allow the implementation of
  additional attributes that we couldn't or didn't want to fit into the
  legacy ioctls anymore"

* tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: tighten a sanity check in file_attr_to_fileattr()
  tree-wide: s/struct fileattr/struct file_kattr/g
  fs: introduce file_getattr and file_setattr syscalls
  fs: prepare for extending file_get/setattr()
  fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
  selinux: implement inode_file_[g|s]etattr hooks
  lsm: introduce new hooks for setting/getting inode fsxattr
  fs: split fileattr related helpers into separate file
2025-07-28 15:24:14 -07:00
Linus Torvalds
7e7bc8335b vfs-6.17-rc1.bpf
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCjwAKCRCRxhvAZXjc
 osnVAQCv4rM7sF4yJvGlm1myIJcJy5Sabk2q31qMdI1VHmkcOwD+Mxs7d1aByTS8
 /6djhVleq6lcT2LpP9j8YI3Rb+x30QY=
 =PF3o
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs bpf updates from Christian Brauner:
 "These changes allow bpf to read extended attributes from cgroupfs.

  This is useful in redirecting AF_UNIX socket connections based on
  cgroup membership of the socket. One use-case is the ability to
  implement log namespaces in systemd so services and containers are
  redirected to different journals"

* tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/kernfs: test xattr retrieval
  selftests/bpf: Add tests for bpf_cgroup_read_xattr
  bpf: Mark cgroup_subsys_state->cgroup RCU safe
  bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node
  kernfs: remove iattr_mutex
2025-07-28 14:42:31 -07:00
Linus Torvalds
672dcda246 vfs-6.17-rc1.pidfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCiQAKCRCRxhvAZXjc
 orltAQDq3y1anYETz5/FD6P2gXY1W5hXdSm3EHHeacQ1JjTXvgEA2g1lWO7J4anf
 oOVE8aSvMow/FOjivLZBYmI65pkYJAE=
 =oDKB
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pidfs updates from Christian Brauner:

 - persistent info

   Persist exit and coredump information independent of whether anyone
   currently holds a pidfd for the struct pid.

   The current scheme allocated pidfs dentries on-demand repeatedly.
   This scheme is reaching it's limits as it makes it impossible to pin
   information that needs to be available after the task has exited or
   coredumped and that should not be lost simply because the pidfd got
   closed temporarily. The next opener should still see the stashed
   information.

   This is also a prerequisite for supporting extended attributes on
   pidfds to allow attaching meta information to them.

   If someone opens a pidfd for a struct pid a pidfs dentry is allocated
   and stashed in pid->stashed. Once the last pidfd for the struct pid
   is closed the pidfs dentry is released and removed from pid->stashed.

   So if 10 callers create a pidfs dentry for the same struct pid
   sequentially, i.e., each closing the pidfd before the other creates a
   new one then a new pidfs dentry is allocated every time.

   Because multiple tasks acquiring and releasing a pidfd for the same
   struct pid can race with each another a task may still find a valid
   pidfs entry from the previous task in pid->stashed and reuse it. Or
   it might find a dead dentry in there and fail to reuse it and so
   stashes a new pidfs dentry. Multiple tasks may race to stash a new
   pidfs dentry but only one will succeed, the other ones will put their
   dentry.

   The current scheme aims to ensure that a pidfs dentry for a struct
   pid can only be created if the task is still alive or if a pidfs
   dentry already existed before the task was reaped and so exit
   information has been was stashed in the pidfs inode.

   That's great except that it's buggy. If a pidfs dentry is stashed in
   pid->stashed after pidfs_exit() but before __unhash_process() is
   called we will return a pidfd for a reaped task without exit
   information being available.

   The pidfds_pid_valid() check does not guard against this race as it
   doens't sync at all with pidfs_exit(). The pid_has_task() check might
   be successful simply because we're before __unhash_process() but
   after pidfs_exit().

   Introduce a new scheme where the lifetime of information associated
   with a pidfs entry (coredump and exit information) isn't bound to the
   lifetime of the pidfs inode but the struct pid itself.

   The first time a pidfs dentry is allocated for a struct pid a struct
   pidfs_attr will be allocated which will be used to store exit and
   coredump information.

   If all pidfs for the pidfs dentry are closed the dentry and inode can
   be cleaned up but the struct pidfs_attr will stick until the struct
   pid itself is freed. This will ensure minimal memory usage while
   persisting relevant information.

   The new scheme has various advantages. First, it allows to close the
   race where we end up handing out a pidfd for a reaped task for which
   no exit information is available. Second, it minimizes memory usage.
   Third, it allows to remove complex lifetime tracking via dentries
   when registering a struct pid with pidfs. There's no need to get or
   put a reference. Instead, the lifetime of exit and coredump
   information associated with a struct pid is bound to the lifetime of
   struct pid itself.

 - extended attributes

   Now that we have a way to persist information for pidfs dentries we
   can start supporting extended attributes on pidfds. This will allow
   userspace to attach meta information to tasks.

   One natural extension would be to introduce a custom pidfs.* extended
   attribute space and allow for the inheritance of extended attributes
   across fork() and exec().

   The first simple scheme will allow privileged userspace to set
   trusted extended attributes on pidfs inodes.

 - Allow autonomous pidfs file handles

   Various filesystems such as pidfs and drm support opening file
   handles without having to require a file descriptor to identify the
   filesystem. The filesystem are global single instances and can be
   trivially identified solely on the information encoded in the file
   handle.

   This makes it possible to not have to keep or acquire a sentinal file
   descriptor just to pass it to open_by_handle_at() to identify the
   filesystem. That's especially useful when such sentinel file
   descriptor cannot or should not be acquired.

   For pidfs this means a file handle can function as full replacement
   for storing a pid in a file. Instead a file handle can be stored and
   reopened purely based on the file handle.

   Such autonomous file handles can be opened with or without specifying
   a a file descriptor. If no proper file descriptor is used the
   FD_PIDFS_ROOT sentinel must be passed. This allows us to define
   further special negative fd sentinels in the future.

   Userspace can trivially test for support by trying to open the file
   handle with an invalid file descriptor.

 - Allow pidfds for reaped tasks with SCM_PIDFD messages

   This is a logical continuation of the earlier work to create pidfds
   for reaped tasks through the SO_PEERPIDFD socket option merged in
   923ea4d448 ("Merge patch series "net, pidfs: enable handing out
   pidfds for reaped sk->sk_peer_pid"").

 - Two minor fixes:

    * Fold fs_struct->{lock,seq} into a seqlock

    * Don't bother with path_{get,put}() in unix_open_file()

* tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits)
  don't bother with path_get()/path_put() in unix_open_file()
  fold fs_struct->{lock,seq} into a seqlock
  selftests: net: extend SCM_PIDFD test to cover stale pidfds
  af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD
  af_unix: stash pidfs dentry when needed
  af_unix/scm: fix whitespace errors
  af_unix: introduce and use scm_replace_pid() helper
  af_unix: introduce unix_skb_to_scm helper
  af_unix: rework unix_maybe_add_creds() to allow sleep
  selftests/pidfd: decode pidfd file handles withou having to specify an fd
  fhandle, pidfs: support open_by_handle_at() purely based on file handle
  uapi/fcntl: add FD_PIDFS_ROOT
  uapi/fcntl: add FD_INVALID
  fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP
  uapi/fcntl: mark range as reserved
  fhandle: reflow get_path_anchor()
  pidfs: add pidfs_root_path() helper
  fhandle: rename to get_path_anchor()
  fhandle: hoist copy_from_user() above get_path_from_fd()
  fhandle: raise FILEID_IS_DIR in handle_type
  ...
2025-07-28 14:10:15 -07:00
Linus Torvalds
7031769e10 vfs-6.17-rc1.mmap_prepare
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCgQAKCRCRxhvAZXjc
 os+nAP9LFHUwWO6EBzHJJGEVjJvvzsbzqeYrRFamYiMc5ulPJwD+KW4RIgJa/MWO
 pcYE40CacaekD8rFWwYUyszpgmv6ewc=
 =wCwp
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull mmap_prepare updates from Christian Brauner:
 "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b ("mm:
  introduce new .mmap_prepare() file callback").

  This is preferred to the existing f_op->mmap() hook as it does require
  a VMA to be established yet, thus allowing the mmap logic to invoke
  this hook far, far earlier, prior to inserting a VMA into the virtual
  address space, or performing any other heavy handed operations.

  This allows for much simpler unwinding on error, and for there to be a
  single attempt at merging a VMA rather than having to possibly
  reattempt a merge based on potentially altered VMA state.

  Far more importantly, it prevents inappropriate manipulation of
  incompletely initialised VMA state, which is something that has been
  the cause of bugs and complexity in the past.

  The intent is to gradually deprecate f_op->mmap, and in that vein this
  series coverts the majority of file systems to using f_op->mmap_prepare.

  Prerequisite steps are taken - firstly ensuring all checks for mmap
  capabilities use the file_has_valid_mmap_hooks() helper rather than
  directly checking for f_op->mmap (which is now not a valid check) and
  secondly updating daxdev_mapping_supported() to not require a VMA
  parameter to allow ext4 and xfs to be converted.

  Commit bb666b7c27 ("mm: add mmap_prepare() compatibility layer for
  nested file systems") handles the nasty edge-case of nested file
  systems like overlayfs, which introduces a compatibility shim to allow
  f_op->mmap_prepare() to be invoked from an f_op->mmap() callback.

  This allows for nested filesystems to continue to function correctly
  with all file systems regardless of which callback is used. Once we
  finally convert all file systems, this shim can be removed.

  As a result, ecryptfs, fuse, and overlayfs remain unaltered so they
  can nest all other file systems.

  We additionally do not update resctl - as this requires an update to
  remap_pfn_range() (or an alternative to it) which we defer to a later
  series, equally we do not update cramfs which needs a mixed mapping
  insertion with the same issue, nor do we update procfs, hugetlbfs,
  syfs or kernfs all of which require VMAs for internal state and hooks.
  We shall return to all of these later"

* tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: update porting, vfs documentation to describe mmap_prepare()
  fs: replace mmap hook with .mmap_prepare for simple mappings
  fs: convert most other generic_file_*mmap() users to .mmap_prepare()
  fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
  mm/filemap: introduce generic_file_*_mmap_prepare() helpers
  fs/xfs: transition from deprecated .mmap hook to .mmap_prepare
  fs/ext4: transition from deprecated .mmap hook to .mmap_prepare
  fs/dax: make it possible to check dev dax support without a VMA
  fs: consistently use can_mmap_file() helper
  mm/nommu: use file_has_valid_mmap_hooks() helper
  mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-28 13:43:25 -07:00
Linus Torvalds
278c7d9b5e vfs-6.17-rc1.fallocate
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCeQAKCRCRxhvAZXjc
 otqEAP9bWFExQtnzrNR+1s4UBfPVDAaTJzDnBWj6z0+Idw9oegEAoxF2ifdCPnR4
 t/xWiM4FmSA+9pwvP3U5z3sOReDDsgo=
 =WMMB
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull fallocate updates from Christian Brauner:
 "fallocate() currently supports creating preallocated files
  efficiently. However, on most filesystems fallocate() will preallocate
  blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified.

  The extent state must later be converted to a written state when the
  user writes data into this range, which can trigger numerous metadata
  changes and journal I/O. This may leads to significant write
  amplification and performance degradation in synchronous write mode.

  At the moment, the only method to avoid this is to create an empty
  file and write zero data into it (for example, using 'dd' with a large
  block size). However, this method is slow and consumes a considerable
  amount of disk bandwidth.

  Now that more and more flash-based storage devices are available it is
  possible to efficiently write zeros to SSDs using the unmap write
  zeroes command if the devices do not write physical zeroes to the
  media.

  For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support
  the DEAC bit[1], the write zeroes command does not write actual data
  to the device, instead, NVMe converts the zeroed range to a
  deallocated state, which works fast and consumes almost no disk write
  bandwidth.

  This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and
  BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and
  device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and
  STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices.

  fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES
  flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a
  way that subsequent writes to that range do not require further
  changes to the file mapping metadata. This flag is beneficial for
  subsequent pure overwriting within this range, as it can save on block
  allocation and, consequently, significant metadata changes"

* tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  ext4: add FALLOC_FL_WRITE_ZEROES support
  block: add FALLOC_FL_WRITE_ZEROES support
  block: factor out common part in blkdev_fallocate()
  fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate
  dm: clear unmap write zeroes limits when disabling write zeroes
  scsi: sd: set max_hw_wzeroes_unmap_sectors if device supports SD_ZERO_*_UNMAP
  nvmet: set WZDS and DRB if device enables unmap write zeroes operation
  nvme: set max_hw_wzeroes_unmap_sectors if device supports DEAC bit
  block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits
2025-07-28 13:36:49 -07:00
Linus Torvalds
0c4ec4a339 vfs-6.17-rc1.async.dir
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCcgAKCRCRxhvAZXjc
 ose6AQDUhwws5T7FYbqQRZC7tc19xJ4CJN2MH6WQsRJ8PrXMtQD/dY/KVPGtOZgb
 +fFGcOPkO9c+D9WUNXjcGtGMv+fsegc=
 =iV8T
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull async directory updates from Christian Brauner:
 "This contains preparatory changes for the asynchronous directory
  locking scheme.

  While the locking scheme is still very much controversial and we're
  still far away from landing any actual changes in that area the
  preparatory work that we've been upstreaming for a while now has been
  very useful. This is another set of minor changes and cleanups"

* tag 'vfs-6.17-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  exportfs: use lookup_one_unlocked()
  coda: use iterate_dir() in coda_readdir()
  VFS: Minor fixes for porting.rst
  VFS: merge lookup_one_qstr_excl_raw() back into lookup_one_qstr_excl()
2025-07-28 13:31:32 -07:00
Linus Torvalds
f70d24c230 vfs-6.17-rc1.nsfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCaAAKCRCRxhvAZXjc
 ouTCAQCrNCM3h9MpgcMDDUbi9b+lIR11JlvtWNGRUACv3RSNLgEA1vm30+u+JM87
 KpVYg1RrkXDyMFXXuzy7UWpzLcBRywI=
 =L0eW
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull namespace updates from Christian Brauner:
 "This contains namespace updates. This time specifically for nsfs:

   - Userspace heavily relies on the root inode numbers for namespaces
     to identify the initial namespaces. That's already a hard
     dependency. So we cannot change that anymore. Move the initial
     inode numbers to a public header and align the only two namespaces
     that currently don't do that with all the other namespaces.

   - The root inode of /proc having a fixed inode number has been part
     of the core kernel ABI since its inception, and recently some
     userspace programs (mainly container runtimes) have started to
     explicitly depend on this behaviour.

     The main reason this is useful to userspace is that by checking
     that a suspect /proc handle has fstype PROC_SUPER_MAGIC and is
     PROCFS_ROOT_INO, they can then use openat2() together with
     RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH} to ensure that there isn't a
     bind-mount that replaces some procfs file with a different one.

     This kind of attack has lead to security issues in container
     runtimes in the past (such as CVE-2019-19921) and libraries like
     libpathrs[1] use this feature of procfs to provide safe procfs
     handling functions"

* tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  uapi: export PROCFS_ROOT_INO
  mntns: use stable inode number for initial mount ns
  netns: use stable inode number for initial mount ns
  nsfs: move root inode number to uapi
2025-07-28 12:50:56 -07:00
Linus Torvalds
934600daa7 vfs-6.17-rc1.ovl
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINakQAKCRCRxhvAZXjc
 okGZAP9CUQfiiT3DUq0pAeuXR2BjpjM8hnNTlO7REC/AmoDWcQD/SDZWfjP2uhtk
 TgGlT1fS5cVcRf72+8JBtT7LGmDB7wA=
 =5vdH
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull overlayfs updates from Christian Brauner:
 "This contains overlayfs updates for this cycle.

  The changes for overlayfs in here are primarily focussed on preparing
  for some proposed changes to directory locking.

  Overlayfs currently will sometimes lock a directory on the upper
  filesystem and do a few different things while holding the lock. This
  is incompatible with the new potential scheme.

  This series narrows the region of code protected by the directory
  lock, taking it multiple times when necessary. This theoretically
  opens up the possibilty of other changes happening on the upper
  filesytem between the unlock and the lock. To some extent the patches
  guard against that by checking the dentries still have the expect
  parent after retaking the lock. In general, concurrent changes to the
  upper and lower filesystems aren't supported properly anyway"

* tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  ovl: properly print correct variable
  ovl: rename ovl_cleanup_unlocked() to ovl_cleanup()
  ovl: change ovl_create_real() to receive dentry parent
  ovl: narrow locking in ovl_check_rename_whiteout()
  ovl: narrow locking in ovl_whiteout()
  ovl: change ovl_cleanup_and_whiteout() to take rename lock as needed
  ovl: narrow locking on ovl_remove_and_whiteout()
  ovl: change ovl_workdir_cleanup() to take dir lock as needed.
  ovl: narrow locking in ovl_workdir_cleanup_recurse()
  ovl: narrow locking in ovl_indexdir_cleanup()
  ovl: narrow locking in ovl_workdir_create()
  ovl: narrow locking in ovl_cleanup_index()
  ovl: narrow locking in ovl_cleanup_whiteouts()
  ovl: narrow locking in ovl_rename()
  ovl: simplify gotos in ovl_rename()
  ovl: narrow locking in ovl_create_over_whiteout()
  ovl: narrow locking in ovl_clear_empty()
  ovl: narrow locking in ovl_create_upper()
  ovl: narrow the locked region in ovl_copy_up_workdir()
  ovl: Call ovl_create_temp() without lock held.
  ...
2025-07-28 12:20:06 -07:00
Linus Torvalds
117eab5c6e vfs-6.17-rc1.coredump
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINAYAAKCRCRxhvAZXjc
 opJiAQDXGs+gQcxJ+4BpV4QszT2OJC19oI/f5AQ4PWMJdHgr4AEA7fc6NbBrpmW7
 L/tbdAwIiWp8bL1Q8Wy7Q2qldHtcggM=
 =KbD9
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull coredump updates from Christian Brauner:
 "This contains an extension to the coredump socket and a proper rework
  of the coredump code.

   - This extends the coredump socket to allow the coredump server to
     tell the kernel how to process individual coredumps. This allows
     for fine-grained coredump management. Userspace can decide to just
     let the kernel write out the coredump, or generate the coredump
     itself, or just reject it.

     * COREDUMP_KERNEL
       The kernel will write the coredump data to the socket.

     * COREDUMP_USERSPACE
       The kernel will not write coredump data but will indicate to the
       parent that a coredump has been generated. This is used when
       userspace generates its own coredumps.

     * COREDUMP_REJECT
       The kernel will skip generating a coredump for this task.

     * COREDUMP_WAIT
       The kernel will prevent the task from exiting until the coredump
       server has shutdown the socket connection.

     The flexible coredump socket can be enabled by using the "@@"
     prefix instead of the single "@" prefix for the regular coredump
     socket:

       @@/run/systemd/coredump.socket

   - Cleanup the coredump code properly while we have to touch it
     anyway.

     Split out each coredump mode in a separate helper so it's easy to
     grasp what is going on and make the code easier to follow. The core
     coredump function should now be very trivial to follow"

* tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
  cleanup: add a scoped version of CLASS()
  coredump: add coredump_skip() helper
  coredump: avoid pointless variable
  coredump: order auto cleanup variables at the top
  coredump: add coredump_cleanup()
  coredump: auto cleanup prepare_creds()
  cred: add auto cleanup method
  coredump: directly return
  coredump: auto cleanup argv
  coredump: add coredump_write()
  coredump: use a single helper for the socket
  coredump: move pipe specific file check into coredump_pipe()
  coredump: split pipe coredumping into coredump_pipe()
  coredump: move core_pipe_count to global variable
  coredump: prepare to simplify exit paths
  coredump: split file coredumping into coredump_file()
  coredump: rename do_coredump() to vfs_coredump()
  selftests/coredump: make sure invalid paths are rejected
  coredump: validate socket path in coredump_parse()
  coredump: don't allow ".." in coredump socket path
  ...
2025-07-28 11:50:36 -07:00
Linus Torvalds
7879d7aff0 vfs-6.17-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaIM/KwAKCRCRxhvAZXjc
 opT+AP407JwhRSBjUEmHg5JzUyDoivkOySdnthunRjaBKD8rlgEApM6SOIZYucU7
 cPC3ZY6ORFM6Mwaw+iDW9lasM5ucHQ8=
 =CHha
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc VFS updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Add ext4 IOCB_DONTCACHE support

     This refactors the address_space_operations write_begin() and
     write_end() callbacks to take const struct kiocb * as their first
     argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate
     to the filesystem's buffered I/O path.

     Ext4 is updated to implement handling of the IOCB_DONTCACHE flag
     and advertises support via the FOP_DONTCACHE file operation flag.

     Additionally, the i915 driver's shmem write paths are updated to
     bypass the legacy write_begin/write_end interface in favor of
     directly calling write_iter() with a constructed synchronous kiocb.
     Another i915 change replaces a manual write loop with
     kernel_write() during GEM shmem object creation.

  Cleanups:

   - don't duplicate vfs_open() in kernel_file_open()

   - proc_fd_getattr(): don't bother with S_ISDIR() check

   - fs/ecryptfs: replace snprintf with sysfs_emit in show function

   - vfs: Remove unnecessary list_for_each_entry_safe() from
     evict_inodes()

   - filelock: add new locks_wake_up_waiter() helper

   - fs: Remove three arguments from block_write_end()

   - VFS: change old_dir and new_dir in struct renamedata to dentrys

   - netfs: Remove unused declaration netfs_queue_write_request()

  Fixes:

   - eventpoll: Fix semi-unbounded recursion

   - eventpoll: fix sphinx documentation build warning

   - fs/read_write: Fix spelling typo

   - fs: annotate data race between poll_schedule_timeout() and
     pollwake()

   - fs/pipe: set FMODE_NOWAIT in create_pipe_files()

   - docs/vfs: update references to i_mutex to i_rwsem

   - fs/buffer: remove comment about hard sectorsize

   - fs/buffer: remove the min and max limit checks in __getblk_slow()

   - fs/libfs: don't assume blocksize <= PAGE_SIZE in
     generic_check_addressable

   - fs_context: fix parameter name in infofc() macro

   - fs: Prevent file descriptor table allocations exceeding INT_MAX"

* tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits)
  netfs: Remove unused declaration netfs_queue_write_request()
  eventpoll: fix sphinx documentation build warning
  ext4: support uncached buffered I/O
  mm/pagemap: add write_begin_get_folio() helper function
  fs: change write_begin/write_end interface to take struct kiocb *
  drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
  drm/i915: Use kernel_write() in shmem object create
  eventpoll: Fix semi-unbounded recursion
  vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes()
  fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable
  fs/buffer: remove the min and max limit checks in __getblk_slow()
  fs: Prevent file descriptor table allocations exceeding INT_MAX
  fs: Remove three arguments from block_write_end()
  fs/ecryptfs: replace snprintf with sysfs_emit in show function
  fs: annotate suspected data race between poll_schedule_timeout() and pollwake()
  docs/vfs: update references to i_mutex to i_rwsem
  fs/buffer: remove comment about hard sectorsize
  fs_context: fix parameter name in infofc() macro
  VFS: change old_dir and new_dir in struct renamedata to dentrys
  proc_fd_getattr(): don't bother with S_ISDIR() check
  ...
2025-07-28 11:22:56 -07:00
Linus Torvalds
794cbac9c0 mount changes. I've got more stuff in the local tree, but
this is getting too much for one merge window as it is.
 
 * mount hash conflicts rudiments are gone now - we do not allow
 	multiple mounts with the same parent/mountpoint to be
 	hashed at the same time.
 * struct mount changes
 	mnt_umounting is gone;
 	mnt_slave_list/mnt_slave is an hlist now;
 	overmounts are kept track of by explicit pointer in mount;
 	a bunch of flags moved out of mnt_flags to a new field,
 	with only namespace_sem for protection;
 	mnt_expiry is protected by mount_lock now (instead of
 	namespace_sem);
 	MNT_LOCKED is used only for mounts that need to remain
 	attached to their parents to prevent mountpoint exposure -
 	no more overloading it for absolute roots;
 	all mnt_list uses are transient now - it's used only to
 	represent temporary sets during umount_tree().
 * mount refcounting change
 	children no longer pin parents for any mounts, whether they'd
 	passed through umount_tree() or not.
 * struct mountpoint changes
 	refcount is no more; what matters is ->m_list emptiness;
 	instead of temporary bumping the refcount, we insert a new object
 	(pinned_mountpoint) into ->m_list;
 	new calling conventions for lock_mount() and friends.
 * do_move_mount()/attach_recursive_mnt() seriously cleaned up.
 * globals in fs/pnode.c are gone.
 * propagate_mnt(), change_mnt_propagation() and propagate_umount() cleaned up
 	(in the last case - pretty much completely rewritten).
 * freeing of emptied mnt_namespace is done in namespace_unlock()
 	for one thing, there are subtle ordering requirements there;
 	for another it simplifies cleanups.
 * assorted cleanups.
 * restore the machinery for long-term mounts from accumulated bitrot.
 	This is going to get a followup come next cycle, when #work.fs_context
 	with its change of vfs_fs_parse_string() calling conventions goes
 	into -next.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIR2dQAKCRBZ7Krx/gZQ
 6/SzAP4x+Fjjc5Tm2UNgGW5dptDY5s9O5RuFauo1MM6rcrekagEApTarcMlPnZvC
 mj1TVJFNfdVhZyTXnz5ocHhGX1udmgU=
 =qT69
 -----END PGP SIGNATURE-----

Merge tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs mount updates from Al Viro:

 - mount hash conflicts rudiments are gone now - we do not allow
     multiple mounts with the same parent/mountpoint to be hashed at the
     same time.

 - 'struct mount' changes:
      - mnt_umounting is gone
      - mnt_slave_list/mnt_slave is an hlist now
      - overmounts are kept track of by explicit pointer in mount
      - a bunch of flags moved out of mnt_flags to a new field, with
        only namespace_sem for protection
      - mnt_expiry is protected by mount_lock now (instead of
        namespace_sem)
      - MNT_LOCKED is used only for mounts that need to remain attached
        to their parents to prevent mountpoint exposure - no more
        overloading it for absolute roots
      - all mnt_list uses are transient now - it's used only to
        represent temporary sets during umount_tree()

 - mount refcounting change: children no longer pin parents for any
   mounts, whether they'd passed through umount_tree() or not

 - 'struct mountpoint' changes:
      - refcount is no more; what matters is ->m_list emptiness
      - instead of temporary bumping the refcount, we insert a new
        object (pinned_mountpoint) into ->m_list
      - new calling conventions for lock_mount() and friends

 - do_move_mount()/attach_recursive_mnt() seriously cleaned up

 - globals in fs/pnode.c are gone

 - propagate_mnt(), change_mnt_propagation() and propagate_umount()
   cleaned up (in the last case - pretty much completely rewritten).

 - freeing of emptied mnt_namespace is done in namespace_unlock(). For
   one thing, there are subtle ordering requirements there; for another
   it simplifies cleanups.

 - assorted cleanups

 - restore the machinery for long-term mounts from accumulated bitrot.

   This is going to get a followup come next cycle, when the change of
   vfs_fs_parse_string() calling conventions goes into -next

* tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (48 commits)
  statmount_mnt_basic(): simplify the logics for group id
  invent_group_ids(): zero ->mnt_group_id always implies !IS_MNT_SHARED()
  get rid of CL_SHARE_TO_SLAVE
  take freeing of emptied mnt_namespace to namespace_unlock()
  copy_tree(): don't link the mounts via mnt_list
  change_mnt_propagation(): move ->mnt_master assignment into MS_SLAVE case
  mnt_slave_list/mnt_slave: turn into hlist_head/hlist_node
  turn do_make_slave() into transfer_propagation()
  do_make_slave(): choose new master sanely
  change_mnt_propagation(): do_make_slave() is a no-op unless IS_MNT_SHARED()
  change_mnt_propagation() cleanups, step 1
  propagate_mnt(): fix comment and convert to kernel-doc, while we are at it
  propagate_mnt(): get rid of last_dest
  fs/pnode.c: get rid of globals
  propagate_one(): fold into the sole caller
  propagate_one(): separate the "what should be the master for this copy" part
  propagate_one(): separate the "do we need secondary here?" logics
  propagate_mnt(): handle all peer groups in the same loop
  propagate_one(): get rid of dest_master
  mount: separate the flags accessed only under namespace_sem
  ...
2025-07-28 10:49:38 -07:00
Linus Torvalds
815d3c1628 ceph ->d_name race fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIRLdAAKCRBZ7Krx/gZQ
 6ysfAQDtJazbwSdu5MFLZs0YBv757xiWvsYBWWmgEPedtTRYnAEAnawe/IkQ4IAd
 9c5h4IPLOK9wwwFWgHY60L1pwlwrSAY=
 =B0Yx
 -----END PGP SIGNATURE-----

Merge tag 'pull-ceph-d_name-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull ceph dentry->d_name fixes from Al Viro:
 "Stuff that had fallen through the cracks back in February; ceph folks
  tested that pile and said they prefer to have it go through my tree..."

* tag 'pull-ceph-d_name-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ceph: fix a race with rename() in ceph_mdsc_build_path()
  prep for ceph_encode_encrypted_fname() fixes
  [ceph] parse_longname(): strrchr() expects NUL-terminated string
2025-07-28 10:35:13 -07:00
Linus Torvalds
ddf52f12ef Massage rpc_pipefs to use saner primitives and clean up the
APIs provided to the rest of the kernel.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIRDbQAKCRBZ7Krx/gZQ
 63n6APwNnJXwgtSDi9N0FfHOlYqYSCaCjezVLbq+GR8K+r4wowD/TX/A4Qbyjjic
 /VG8VbYe6fRaD53vp1giGI/dJiTI2Qg=
 =Ta4H
 -----END PGP SIGNATURE-----

Merge tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull rpc_pipefs updates from Al Viro:
 "Massage rpc_pipefs to use saner primitives and clean up the APIs
  provided to the rest of the kernel"

* tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  rpc_create_client_dir(): return 0 or -E...
  rpc_create_client_dir(): don't bother with rpc_populate()
  rpc_new_dir(): the last argument is always NULL
  rpc_pipe: expand the calls of rpc_mkdir_populate()
  rpc_gssd_dummy_populate(): don't bother with rpc_populate()
  rpc_mkpipe_dentry(): switch to simple_start_creating()
  rpc_pipe: saner primitive for creating regular files
  rpc_pipe: saner primitive for creating subdirectories
  rpc_pipe: don't overdo directory locking
  rpc_mkpipe_dentry(): saner calling conventions
  rpc_unlink(): saner calling conventions
  rpc_populate(): lift cleanup into callers
  rpc_unlink(): use simple_recursive_removal()
  rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead
  rpc_pipe: clean failure exits in fill_super
  new helper: simple_start_creating()
2025-07-28 09:56:09 -07:00
Linus Torvalds
1959e18cc0 Removing subtrees of kernel filesystems is done in quite a few
places; unfortunately, it's easy to get wrong.  A number of open-coded
 attempts are out there, with varying amount of bogosities.
 
 	simple_recursive_removal() had been introduced for doing that with
 all precautions needed; it does an equivalent of rm -rf, with sufficient
 locking, eviction of anything mounted on top of the subtree, etc.
 
 	This series converts a bunch of open-coded instances to using that.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIRCUwAKCRBZ7Krx/gZQ
 66XWAP9BNyHcvl9uV/ku/mswYiRBxYoVogciIKeugwYTVLuTJgEA7jdh1eyLkvbS
 rwbL7XD+Q35/vXZHEet+RLCGH3ae6wc=
 =yaKF
 -----END PGP SIGNATURE-----

Merge tag 'pull-simple_recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull simple_recursive_removal() update from Al Viro:
 "Removing subtrees of kernel filesystems is done in quite a few places;
  unfortunately, it's easy to get wrong. A number of open-coded attempts
  are out there, with varying amount of bogosities.

  simple_recursive_removal() had been introduced for doing that with all
  precautions needed; it does an equivalent of rm -rf, with sufficient
  locking, eviction of anything mounted on top of the subtree, etc.

  This series converts a bunch of open-coded instances to using that"

* tag 'pull-simple_recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  functionfs, gadgetfs: use simple_recursive_removal()
  kill binderfs_remove_file()
  fuse_ctl: use simple_recursive_removal()
  pstore: switch to locked_recursive_removal()
  binfmt_misc: switch to locked_recursive_removal()
  spufs: switch to locked_recursive_removal()
  add locked_recursive_removal()
  better lockdep annotations for simple_recursive_removal()
  simple_recursive_removal(): saner interaction with fsnotify
2025-07-28 09:43:51 -07:00
Chao Yu
1005a3ca28 f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
w/ "mode=lfs" mount option, generic/299 will cause system panic as below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2835!
Call Trace:
 <TASK>
 f2fs_allocate_data_block+0x6f4/0xc50
 f2fs_map_blocks+0x970/0x1550
 f2fs_iomap_begin+0xb2/0x1e0
 iomap_iter+0x1d6/0x430
 __iomap_dio_rw+0x208/0x9a0
 f2fs_file_write_iter+0x6b3/0xfa0
 aio_write+0x15d/0x2e0
 io_submit_one+0x55e/0xab0
 __x64_sys_io_submit+0xa5/0x230
 do_syscall_64+0x84/0x2f0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0010:new_curseg+0x70f/0x720

The root cause of we run out-of-space is: in f2fs_map_blocks(), f2fs may
trigger foreground gc only if it allocates any physical block, it will be
a little bit later when there is multiple threads writing data w/
aio/dio/bufio method in parallel, since we always use OPU in lfs mode, so
f2fs_map_blocks() does block allocations aggressively.

In order to fix this issue, let's give a chance to trigger foreground
gc in prior to block allocation in f2fs_map_blocks().

Fixes: 36abef4e79 ("f2fs: introduce mode=lfs mount option")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-28 16:36:54 +00:00
Chao Yu
e194e140ab f2fs: fix to calculate dirty data during has_not_enough_free_secs()
In lfs mode, dirty data needs OPU, we'd better calculate lower_p and
upper_p w/ them during has_not_enough_free_secs(), otherwise we may
encounter out-of-space issue due to we missed to reclaim enough
free section w/ foreground gc.

Fixes: 36abef4e79 ("f2fs: introduce mode=lfs mount option")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-28 16:36:42 +00:00
Chao Yu
6840faddb6 f2fs: fix to update upper_p in __get_secs_required() correctly
Commit 1acd73edbb ("f2fs: fix to account dirty data in __get_secs_required()")
missed to calculate upper_p w/ data_secs, fix it.

Fixes: 1acd73edbb ("f2fs: fix to account dirty data in __get_secs_required()")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-28 16:36:20 +00:00
wangzijie
40aa9e1223 f2fs: directly add newly allocated pre-dirty nat entry to dirty set list
When we need to alloc nat entry and set it dirty, we can directly add it to
dirty set list(or initialize its list_head for new_ne) instead of adding it
to clean list and make a move. Introduce init_dirty flag to do it.

Signed-off-by: wangzijie <wangzijie1@honor.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-28 16:30:02 +00:00
wangzijie
0349b7f95c f2fs: avoid redundant clean nat entry move in lru list
__lookup_nat_cache follows LRU manner to move clean nat entry, when nat
entries are going to be dirty, no need to move them to tail of lru list.
Introduce a parameter 'for_dirty' to avoid it.

Signed-off-by: wangzijie <wangzijie1@honor.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-28 16:29:59 +00:00
Linus Torvalds
11fe69fbd5 Current exclusion rules for ->d_flags stores are rather unpleasant.
The basic rules are simple:
 	* stores to dentry->d_flags are OK under dentry->d_lock.
 	* stores to dentry->d_flags are OK in the dentry constructor, before
 becomes potentially visible to other threads.
 Unfortunately, there's a couple of exceptions to that, and that's where the
 headache comes from.
 
 	Main PITA comes from d_set_d_op(); that primitive sets ->d_op
 of dentry and adjusts the flags that correspond to presence of individual
 methods.  It's very easy to misuse; existing uses _are_ safe, but proof
 of correctness is brittle.
 
 	Use in __d_alloc() is safe (we are within a constructor), but we
 might as well precalculate the initial value of ->d_flags when we set
 the default ->d_op for given superblock and set ->d_flags directly
 instead of messing with that helper.
 
 	The reasons why other uses are safe are bloody convoluted; I'm not going
 to reproduce it here.  See https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/
 for gory details, if you care.  The critical part is using d_set_d_op() only
 just prior to d_splice_alias(), which makes a combination of d_splice_alias()
 with setting ->d_op, etc. a natural replacement primitive.  Better yet, if
 we go that way, it's easy to take setting ->d_op and modifying ->d_flags
 under ->d_lock, which eliminates the headache as far as ->d_flags exclusion
 rules are concerned.  Other exceptions are minor and easy to deal with.
 
 	What this series does:
 * d_set_d_op() is no longer available; new primitive (d_splice_alias_ops())
 is provided, equivalent to combination of d_set_d_op() and d_splice_alias().
 * new field of struct super_block - ->s_d_flags.  Default value of ->d_flags
 to be used when allocating dentries on this filesystem.
 * new primitive for setting ->s_d_op: set_default_d_op().  Replaces stores
 to ->s_d_op at mount time.  All in-tree filesystems converted; out-of-tree
 ones will get caught by compiler (->s_d_op is renamed, so stores to it will
 be caught).  ->s_d_flags is set by the same primitive to match the ->s_d_op.
 * a lot of filesystems had ->s_d_op->d_delete equal to always_delete_dentry;
 that is equivalent to setting DCACHE_DONTCACHE in ->d_flags, so such filesystems
 can bloody well set that bit in ->s_d_flags and drop ->d_delete() from
 dentry_operations.  In quite a few cases that results in empty dentry_operations,
 which means that we can get rid of those.
 * kill simple_dentry_operations - not needed anymore.
 * massage d_alloc_parallel() to get rid of the other exception wrt ->d_flags
 stores - we can set DCACHE_PAR_LOOKUP as soon as we allocate the new dentry;
 no need to delay that until we commit to using the sucker.
 
 As the result, ->d_flags stores are all either under ->d_lock or done before
 the dentry becomes visible in any shared data structures.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIQ/tQAKCRBZ7Krx/gZQ
 66AhAQDgQ+S224x5YevNXc9mDoGUBMF4OG0n0fIla9rfdL4I6wEAqpOWMNDcVPCZ
 GwYOvJ9YuqNdz+MyprAI18Yza4GOmgs=
 =rTYB
 -----END PGP SIGNATURE-----

Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull dentry d_flags updates from Al Viro:
 "The current exclusion rules for dentry->d_flags stores are rather
  unpleasant. The basic rules are simple:

   - stores to dentry->d_flags are OK under dentry->d_lock

   - stores to dentry->d_flags are OK in the dentry constructor, before
     becomes potentially visible to other threads

  Unfortunately, there's a couple of exceptions to that, and that's
  where the headache comes from.

  The main PITA comes from d_set_d_op(); that primitive sets ->d_op of
  dentry and adjusts the flags that correspond to presence of individual
  methods. It's very easy to misuse; existing uses _are_ safe, but proof
  of correctness is brittle.

  Use in __d_alloc() is safe (we are within a constructor), but we might
  as well precalculate the initial value of 'd_flags' when we set the
  default ->d_op for given superblock and set 'd_flags' directly instead
  of messing with that helper.

  The reasons why other uses are safe are bloody convoluted; I'm not
  going to reproduce it here. See [1] for gory details, if you care. The
  critical part is using d_set_d_op() only just prior to
  d_splice_alias(), which makes a combination of d_splice_alias() with
  setting ->d_op, etc a natural replacement primitive.

  Better yet, if we go that way, it's easy to take setting ->d_op and
  modifying 'd_flags' under ->d_lock, which eliminates the headache as
  far as 'd_flags' exclusion rules are concerned. Other exceptions are
  minor and easy to deal with.

  What this series does:

   - d_set_d_op() is no longer available; instead a new primitive
     (d_splice_alias_ops()) is provided, equivalent to combination of
     d_set_d_op() and d_splice_alias().

   - new field of struct super_block - 's_d_flags'. This sets the
     default value of 'd_flags' to be used when allocating dentries on
     this filesystem.

   - new primitive for setting 's_d_op': set_default_d_op(). This
     replaces stores to 's_d_op' at mount time.

     All in-tree filesystems converted; out-of-tree ones will get caught
     by the compiler ('s_d_op' is renamed, so stores to it will be
     caught). 's_d_flags' is set by the same primitive to match the
     's_d_op'.

   - a lot of filesystems had sb->s_d_op->d_delete equal to
     always_delete_dentry; that is equivalent to setting
     DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well
     set that bit in 's_d_flags' and drop 'd_delete()' from
     dentry_operations.

     In quite a few cases that results in empty dentry_operations, which
     means that we can get rid of those.

   - kill simple_dentry_operations - not needed anymore

   - massage d_alloc_parallel() to get rid of the other exception wrt
     'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we
     allocate the new dentry; no need to delay that until we commit to
     using the sucker.

  As the result, 'd_flags' stores are all either under ->d_lock or done
  before the dentry becomes visible in any shared data structures"

Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/ [1]

* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
  configfs: use DCACHE_DONTCACHE
  debugfs: use DCACHE_DONTCACHE
  efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry()
  9p: don't bother with always_delete_dentry
  ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE
  kill simple_dentry_operations
  devpts, sunrpc, hostfs: don't bother with ->d_op
  shmem: no dentry retention past the refcount reaching zero
  d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier
  make d_set_d_op() static
  simple_lookup(): just set DCACHE_DONTCACHE
  tracefs: Add d_delete to remove negative dentries
  set_default_d_op(): calculate the matching value for ->d_flags
  correct the set of flags forbidden at d_set_d_op() time
  split d_flags calculation out of d_set_d_op()
  new helper: set_default_d_op()
  fuse: no need for special dentry_operations for root dentry
  switch procfs from d_set_d_op() to d_splice_alias_ops()
  new helper: d_splice_alias_ops()
  procfs: kill ->proc_dops
  ...
2025-07-28 09:17:57 -07:00
Amir Goldstein
0d4c4d4ea4 fsnotify: optimize FMODE_NONOTIFY_PERM for the common cases
The most unlikely watched permission event is FAN_ACCESS_PERM, because
at the time that it was introduced there were no evictable ignore mark,
so subscribing to FAN_ACCESS_PERM would have incured a very high
overhead.

Yet, when we set the fmode to FMODE_NOTIFY_HSM(), we never skip trying
to send FAN_ACCESS_PERM, which is almost always a waste of cycles.

We got to this logic because of bundling FAN_OPEN*_PERM and
FAN_ACCESS_PERM in the same category and because FAN_OPEN_PERM is a
commonly used event.

By open coding fsnotify_open_perm() in fsnotify_open_perm_and_set_mode(),
we no longer need to regard FAN_OPEN*_PERM when calculating fmode.

This leaves the case of having pre-content events and not having any
other permission event in the object masks a more likely case than the
other way around.

Rework the fmode macros and code so that their meaning now refers only
to hooks on an already open file:

- FMODE_NOTIFY_NONE()		skip all events
- FMODE_NOTIFY_ACCESS_PERM()	send all permission events including
  				FAN_ACCESS_PERM
- FMODE_NOTIFY_HSM()		send pre-content permission events

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250708143641.418603-3-amir73il@gmail.com
2025-07-28 18:14:38 +02:00
Amir Goldstein
08da98e1b2 fsnotify: merge file_set_fsnotify_mode_from_watchers() with open perm hook
Create helper fsnotify_open_perm_and_set_mode() that moves the
fsnotify_open_perm() hook into file_set_fsnotify_mode_from_watchers().

This will allow some more optimizations.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250708143641.418603-2-amir73il@gmail.com
2025-07-28 18:14:38 +02:00
Benjamin Coddington
99765233ab NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY
If the NFS client is doing writeback from a workqueue context, avoid using
__GFP_NORETRY for allocations if the task has set PF_MEMALLOC_NOIO or
PF_MEMALLOC_NOFS.  The combination of these flags makes memory allocation
failures much more likely.

We've seen those allocation failures show up when the loopback driver is
doing writeback from a workqueue to a file on NFS, where memory allocation
failure results in errors or corruption within the loopback device's
filesystem.

Suggested-by: Trond Myklebust <trondmy@kernel.org>
Fixes: 0bae835b63 ("NFS: Avoid writeback threads getting stuck in mempool_alloc()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/f83ac1155a4bc670f2663959a7a068571e06afd9.1752111622.git.bcodding@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-28 12:01:21 -04:00
Linus Torvalds
ce3f5bb750 NFSD 6.17 Release Notes
NFSD is finally able to offer write delegations to clients that open
 files with O_WRONLY, thanks to patches from Dai Ngo. We're expecting
 this to accelerate a few interesting corner cases.
 
 The cap on the number of operations per NFSv4 COMPOUND has been
 lifted. Now, clients that send COMPOUNDs containing dozens of
 operations (for example, a long stream of LOOKUP operations to walk
 a pathname in a single round trip) will no longer be rejected.
 
 This release re-enables the ability for NFSD to perform NFSv4.2 COPY
 operations asynchronously. This feature has been disabled to
 mitigate the risk of denial-of-service when too many such requests
 arrive.
 
 Many thanks to the contributors, reviewers, testers, and bug
 reporters who participated during the v6.17 development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmiFJAcACgkQM2qzM29m
 f5fOvA/+I1W3iAXMeuS4MdBD+20976XZNazXKXXfJE9ay/0I7rXka0uD9HH+cTnU
 3wY1p+jjTs+Tatc5A39MjuS9a6o23FnHZB7IOimL+9ASRjBgjXisOyb7yEnfcA4s
 9NjM5sMHskmrNpLX5kDPNHzTMdaozGl/uSDKg5WSAU/NMrtAT9c9snx4bO5A6mdk
 48XPkP5++aBKGehsPqI0WGOeSzGKI7dc/kJS9F8kIbBCAMJSbIY7PKly+y+fbJkk
 eMapUX257DCRQejA6hnFff0/x1NnR2tC8lQAZE1c7P5D9CV+1UEAQWK4/OOD2aeQ
 hY9Ieb7CFZRot3VDGnnrYjLbApiZCY9m10ukDTykPErJ4ZEWEjUtMN7oAhRN3/Ie
 O2NKvyVo4bOI5zHf4iCIVNp/hDHs01FoMfJfQYRACpBtsIKm+1pn4uTJtrezhJvn
 qsvctMEMtXwZDlntwhQwU54XJyyGq7gJwuRAZ5xgW6WWQQI+NNKUm2XZu3YwJZF+
 4Ji2vj6kRpS46HWG0VRUX12hXdDZdwFjcZ7eXZiSL3gZJ3xuEJDQ3jyyRwfe5t+8
 W6eQRW9Sq1gN4OLwWjfltqs9l52XYfw0jitmX8Y98l1K05a4X74iIPRC5s97HV0E
 XfvW+jRS4+x7tRp4wwcI2cGTPRTdK8xjmRWM3l2PQzgeG3AHUs8=
 =rC5G
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "NFSD is finally able to offer write delegations to clients that open
  files with O_WRONLY, thanks to patches from Dai Ngo. We're expecting
  this to accelerate a few interesting corner cases.

  The cap on the number of operations per NFSv4 COMPOUND has been
  lifted. Now, clients that send COMPOUNDs containing dozens of
  operations (for example, a long stream of LOOKUP operations to walk a
  pathname in a single round trip) will no longer be rejected.

  This release re-enables the ability for NFSD to perform NFSv4.2 COPY
  operations asynchronously. This feature has been disabled to mitigate
  the risk of denial-of-service when too many such requests arrive.

  Many thanks to the contributors, reviewers, testers, and bug reporters
  who participated during the v6.17 development cycle"

* tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (32 commits)
  nfsd: Drop dprintk in blocklayout xdr functions
  sunrpc: make svc_tcp_sendmsg() take a signed sentp pointer
  sunrpc: rearrange struct svc_rqst for fewer cachelines
  sunrpc: return better error in svcauth_gss_accept() on alloc failure
  sunrpc: reset rq_accept_statp when starting a new RPC
  sunrpc: remove SVC_SYSERR
  sunrpc: fix handling of unknown auth status codes
  NFSD: Simplify struct knfsd_fh
  NFSD: Access a knfsd_fh's fsid by pointer
  Revert "NFSD: Force all NFSv4.2 COPY requests to be synchronous"
  NFSD: Avoid multiple -Wflex-array-member-not-at-end warnings
  NFSD: Use vfs_iocb_iter_write()
  NFSD: Use vfs_iocb_iter_read()
  NFSD: Clean up kdoc for nfsd_open_local_fh()
  NFSD: Clean up kdoc for nfsd_file_put_local()
  NFSD: Remove definition for trace_nfsd_ctl_maxconn
  NFSD: Remove definition for trace_nfsd_file_gc_recent
  NFSD: Remove definitions for unused trace_nfsd_file_lru trace points
  NFSD: Remove definition for trace_nfsd_file_unhash_and_queue
  nfsd: Use correct error code when decoding extents
  ...
2025-07-28 09:01:09 -07:00
Linus Torvalds
a90f1b6ad6 gfs2 changes
- Prevent cluster nodes from trying to recover their own filesystems
   during a withdraw.
 
 - Add two missing migrate_folio aops and an additional exhash directory
   consistency check (both triggered by syzbot bug reports).
 
 - Sanitize how dlm results are processed and clean up a few quirks in
   the glock code.
 
 - Minor stuff: Get rid of the GIF_ALLOC_FAILED flag; use SECTOR_SIZE and
   SECTOR_SHIFT.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmiHiqQUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrK9w/7BU//ySUwhfK2bXVxhXNgkMACv1UG
 MXt08FgmVAg6CLVQzsCxT2J/vBnauY9KaEzNZTspMMZtHJLpIoy095bcPLO1SHFh
 AbnLIcfX4wbdPJumeI6DJXSdeInVTBCjaPanHdZ9BvCcl9vSJ5WXUzHVfos2dcSl
 P6xKhAOICE7T/ML03o81KtHEypblj3E34i7y1fIBJN2F2/eKCywQOmtYPa6ertIA
 WNKiH6OR1YloQF4ddYkU7vpeLgQFItkHIDqGrr2R8FEByprBrs1FbkJKZGQHRZuq
 0g5rRdA7KOjc/pJPPl0aiaSlPm6DQx6TJD+suDwerKpu0HZo65QuWCvVq6Dipkp5
 Y0dgP1aPZw/LAcJ86BDI/OhBCPEQI4xa92mCPU9OofvgbjhtJi3a5l2GVZtmbbBW
 Yp6o+rUCYYlYSzP0FDM9hIdYcpNdDC3v7u8IZ1bd9bgflIgoGb7TEqm8geZ4XcyC
 chqjkRcffNRoxuon+Csfv/P/fSvaNxCqaCKTa+4+iewIFTYwpTj8pQqjw/cI6w7Y
 8q6nLBFPvPU/nms6kQfOvG0YN1NpemFHp7uvdqwmkqDXMBw7x2mR5+J2bvl6XtCG
 yuliqWNYcIvxRG4zlbK+nw4j6CQ71V4aTQVt0XJ9wKYhHLZTowOEm/nkat4O285G
 BAwyQjpCULSAQFk=
 =Sfn8
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Prevent cluster nodes from trying to recover their own filesystems
   during a withdraw

 - Add two missing migrate_folio aops and an additional exhash directory
   consistency check (both triggered by syzbot bug reports)

 - Sanitize how dlm results are processed and clean up a few quirks in
   the glock code

 - Minor stuff: Get rid of the GIF_ALLOC_FAILED flag; use SECTOR_SIZE
   and SECTOR_SHIFT

* tag 'gfs2-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: No more self recovery
  gfs2: Validate i_depth for exhash directories
  gfs2: Set .migrate_folio in gfs2_{rgrp,meta}_aops
  gfs2: a minor finish_xmote cleanup
  gfs2: simplify finish_xmote
  gfs2: sanitize the gdlm_ast -> finish_xmote interface
  gfs2: Minor do_xmote cancelation fix
  gfs2: Remove GIF_ALLOC_FAILED flag
  gfs2: Use SECTOR_SIZE and SECTOR_SHIFT
2025-07-28 08:58:58 -07:00
Linus Torvalds
f3f5edc5e4 xfs: New code for 6.17
Signed-off-by: Carlos Maiolino <cem@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQSmtYVZ/MfVMGUq1GNcsMJ8RxYuYwUCaIcswwAKCRBcsMJ8RxYu
 Y5UqAYCWhEmZNTd0aN4kR3xqGMV+/3bubml4TXB8HvIv35gaOrFCgQkytVJLD74e
 A0WtR+4BgJOIKmYfz4AAx6vbsoMWevya1f879i7NI7SYuI0/oB1klRtApZF0Slwv
 OKJsN/s6Sg==
 =/Q43
 -----END PGP SIGNATURE-----

Merge tag 'xfs-merge-6.17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Carlos Maiolino:
 "This doesn't contain any new features. It mostly is a collection of
  clean ups and code refactoring that I preferred to postpone to the
  merge window.

  It includes removal of several unused tracepoints, refactoring key
  comparing routines under the B-Trees management and cleanup of xfs
  journaling code"

* tag 'xfs-merge-6.17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits)
  xfs: don't use a xfs_log_iovec for ri_buf in log recovery
  xfs: don't use a xfs_log_iovec for attr_item names and values
  xfs: use better names for size members in xfs_log_vec
  xfs: cleanup the ordered item logic in xlog_cil_insert_format_items
  xfs: don't pass the old lv to xfs_cil_prepare_item
  xfs: remove unused trace event xfs_reflink_cow_enospc
  xfs: remove unused trace event xfs_discard_rtrelax
  xfs: remove unused trace event xfs_log_cil_return
  xfs: remove unused trace event xfs_dqreclaim_dirty
  fs/xfs: replace strncpy with memtostr_pad()
  xfs: Remove unused label in xfs_dax_notify_dev_failure
  xfs: improve the comments in xfs_select_zone_nowait
  xfs: improve the comments in xfs_max_open_zones
  xfs: stop passing an inode to the zone space reservation helpers
  xfs: rename oz_write_pointer to oz_allocated
  xfs: use a uint32_t to cache i_used_blocks in xfs_init_zone
  xfs: improve the xg_active_ref check in xfs_group_free
  xfs: remove the xlog_ticket_t typedef
  xfs: remove xrep_trans_{alloc,cancel}_hook_dummy
  xfs: return the allocated transaction from xchk_trans_alloc_empty
  ...
2025-07-28 08:55:53 -07:00