Commit Graph

4329 Commits

Author SHA1 Message Date
Jeff Layton
fba13c06f0 nfsd: remove dprintks for v2/3 RENAME events
Observability here is now covered by static tracepoints.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:36 -04:00
Jeff Layton
6b6943fe27 nfsd: remove REMOVE/RMDIR dprintks
Observability here is now covered by static tracepoints.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:36 -04:00
Jeff Layton
0dbe0171fa nfsd: remove old LINK dprintks
Observability here is now covered by static tracepoints.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:36 -04:00
Jeff Layton
0e159f8ce9 nfsd: remove old v2/3 SYMLINK dprintks
Observability here is now covered by static tracepoints.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:35 -04:00
Jeff Layton
c242efc788 nfsd: remove old v2/3 create path dprintks
Observability here is now covered by static tracepoints.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:35 -04:00
Jeff Layton
0d885f7e79 nfsd: add tracepoint for getattr and statfs events
There isn't a common helper for getattrs, so add these into the
protocol-specific helpers.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:34 -04:00
Jeff Layton
a91bfc4571 nfsd: add tracepoint to nfsd_readdir
Observe the start of NFS READDIR operations.

The NFS READDIR's count argument can be interesting when tuning a
client's readdir behavior.

However, the count argument is not passed to nfsd_readdir(). To
properly capture the count argument, this tracepoint must appear in
each proc function before the nfsd_readdir() call.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:34 -04:00
Jeff Layton
b52f2a79fb nfsd: add tracepoint to nfsd_rename
Observe the start of RENAME operations for all NFS versions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:33 -04:00
Jeff Layton
51195263cd nfsd: add tracepoints for unlink events
Observe the start of UNLINK, REMOVE, and RMDIR operations for all
NFS versions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:33 -04:00
Jeff Layton
272e2b89b8 nfsd: add tracepoint to nfsd_link()
Observe the start of NFS LINK operations.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:32 -04:00
Jeff Layton
2344b72302 nfsd: add tracepoint to nfsd_symlink
Observe the start of SYMLINK operations for all NFS versions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:32 -04:00
Jeff Layton
85b0b1f217 nfsd: add nfsd_vfs_create tracepoints
Observe the start of file and directory creation for all NFS
versions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:31 -04:00
Jeff Layton
a5256e687f nfsd: add a tracepoint to nfsd_lookup_dentry
Replace the dprintk in nfsd_lookup_dentry() with a trace point.
nfsd_lookup_dentry() is called frequently enough that enabling this
dprintk call site would result in log floods and performance issues.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:31 -04:00
Jeff Layton
adbdd746e8 nfsd: add a tracepoint for nfsd_setattr
Turn Sargun's internal kprobe based implementation of this into a normal
static tracepoint. Also, remove the dprintk's that got added recently
with the fix for zero-length ACLs.

Cc: Sargun Dillon <sargun@sargun.me>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:30 -04:00
Chuck Lever
218927aa47 NFSD: Add a Call equivalent to the NFSD_TRACE_PROC_RES macros
Introduce tracing helpers that can be used before the procedure
status code is known. These macros are similar to the
SVC_RQST_ENDPOINT helpers, but they can be modified to include
NFS-specific fields if that is needed later.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:30 -04:00
Chuck Lever
45e3eda46d NFSD: Use sockaddr instead of a generic array
Record and emit presentation addresses using tracing helpers
designed for the task.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:29 -04:00
Chuck Lever
d6ca7d2643 NFSD: Implement FATTR4_CLONE_BLKSIZE attribute
RFC 7862 states that if an NFS server implements a CLONE operation,
it MUST also implement FATTR4_CLONE_BLKSIZE. NFSD implements CLONE,
but does not implement FATTR4_CLONE_BLKSIZE.

Note that in Section 12.2, RFC 7862 claims that
FATTR4_CLONE_BLKSIZE is RECOMMENDED, not REQUIRED. Likely this is
because a minor version is not permitted to add a REQUIRED
attribute. Confusing.

We assume this attribute reports a block size as a count of bytes,
as RFC 7862 does not specify a unit.

Reported-by: Roland Mainz <roland.mainz@nrubsig.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Cc: stable@vger.kernel.org # v6.7+
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:29 -04:00
Eric Biggers
c2c90a8b26 nfsd: use SHA-256 library API instead of crypto_shash API
This user of SHA-256 does not support any other algorithm, so the
crypto_shash abstraction provides no value.  Just use the SHA-256
library API instead, which is much simpler and easier to use.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:29 -04:00
Li Lingfeng
b31da62889 nfsd: Initialize ssc before laundromat_work to prevent NULL dereference
In nfs4_state_start_net(), laundromat_work may access nfsd_ssc through
nfs4_laundromat -> nfsd4_ssc_expire_umount. If nfsd_ssc isn't initialized,
this can cause NULL pointer dereference.

Normally the delayed start of laundromat_work allows sufficient time for
nfsd_ssc initialization to complete. However, when the kernel waits too
long for userspace responses (e.g. in nfs4_state_start_net ->
nfsd4_end_grace -> nfsd4_record_grace_done -> nfsd4_cld_grace_done ->
cld_pipe_upcall -> __cld_pipe_upcall -> wait_for_completion path), the
delayed work may start before nfsd_ssc initialization finishes.

Fix this by moving nfsd_ssc initialization before starting laundromat_work.

Fixes: f4e44b3933 ("NFSD: delay unmount source's export after inter-server copy completed.")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:27 -04:00
Jeff Layton
8c4aae5582 nfsd: add commit start/done tracepoints around nfsd_commit()
Very useful for gauging how long the vfs_fsync_range() takes.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:26 -04:00
NeilBrown
1244f0b2c3 nfsd: nfsd4_spo_must_allow() must check this is a v4 compound request
If the request being processed is not a v4 compound request, then
examining the cstate can have undefined results.

This patch adds a check that the rpc procedure being executed
(rq_procinfo) is the NFSPROC4_COMPOUND procedure.

Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:25 -04:00
Olga Kornievskaia
0813c5f012 nfsd: fix access checking for NLM under XPRTSEC policies
When an export policy with xprtsec policy is set with "tls"
and/or "mtls", but an NFS client is doing a v3 xprtsec=tls
mount, then NLM locking calls fail with an error because
there is currently no support for NLM with TLS.

Until such support is added, allow NLM calls under TLS-secured
policy.

Fixes: 4cc9b9f2bf ("nfsd: refine and rename NFSD_MAY_LOCK")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:25 -04:00
Guoqing Jiang
c447d2ac98 nfsd: remove redundant WARN_ON_ONCE in nfsd4_write
It can be removed since svc_fill_write_vector already has the
same WARN_ON_ONCE.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:24 -04:00
Chuck Lever
1218149037 NFSD: Add experimental setting to disable the use of splice read
NFSD currently has two separate code paths for handling read
requests. One uses page splicing; the other is a traditional read
based on an iov iterator.

Because most Linux file systems support splice read, the latter
does not get nearly the same test experience as splice reads.

To force the use of vectored reads for testing and benchmarking,
introduce the ability to disable splice reads for all NFS READ
operations.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:24 -04:00
Chuck Lever
9fe5ea760e NFSD: Add /sys/kernel/debug/nfsd
Create a small sandbox under /sys/kernel/debug for experimental NFS
server feature settings. There is no API/ABI compatibility guarantee
for these settings.

The only documentation for such settings, if any documentation exists,
is in the kernel source code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:23 -04:00
Maninder Singh
f7fb730cac NFSD: fix race between nfsd registration and exports_proc
As of now nfsd calls create_proc_exports_entry() at start of init_nfsd
and cleanup by remove_proc_entry() at last of exit_nfsd.

Which causes kernel OOPs if there is race between below 2 operations:
(i) exportfs -r
(ii) mount -t nfsd none /proc/fs/nfsd

for 5.4 kernel ARM64:

CPU 1:
el1_irq+0xbc/0x180
arch_counter_get_cntvct+0x14/0x18
running_clock+0xc/0x18
preempt_count_add+0x88/0x110
prep_new_page+0xb0/0x220
get_page_from_freelist+0x2d8/0x1778
__alloc_pages_nodemask+0x15c/0xef0
__vmalloc_node_range+0x28c/0x478
__vmalloc_node_flags_caller+0x8c/0xb0
kvmalloc_node+0x88/0xe0
nfsd_init_net+0x6c/0x108 [nfsd]
ops_init+0x44/0x170
register_pernet_operations+0x114/0x270
register_pernet_subsys+0x34/0x50
init_nfsd+0xa8/0x718 [nfsd]
do_one_initcall+0x54/0x2e0

CPU 2 :
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010

PC is at : exports_net_open+0x50/0x68 [nfsd]

Call trace:
exports_net_open+0x50/0x68 [nfsd]
exports_proc_open+0x2c/0x38 [nfsd]
proc_reg_open+0xb8/0x198
do_dentry_open+0x1c4/0x418
vfs_open+0x38/0x48
path_openat+0x28c/0xf18
do_filp_open+0x70/0xe8
do_sys_open+0x154/0x248

Sometimes it crashes at exports_net_open() and sometimes cache_seq_next_rcu().

and same is happening on latest 6.14 kernel as well:

[    0.000000] Linux version 6.14.0-rc5-next-20250304-dirty
...
[  285.455918] Unable to handle kernel paging request at virtual address 00001f4800001f48
...
[  285.464902] pc : cache_seq_next_rcu+0x78/0xa4
...
[  285.469695] Call trace:
[  285.470083]  cache_seq_next_rcu+0x78/0xa4 (P)
[  285.470488]  seq_read+0xe0/0x11c
[  285.470675]  proc_reg_read+0x9c/0xf0
[  285.470874]  vfs_read+0xc4/0x2fc
[  285.471057]  ksys_read+0x6c/0xf4
[  285.471231]  __arm64_sys_read+0x1c/0x28
[  285.471428]  invoke_syscall+0x44/0x100
[  285.471633]  el0_svc_common.constprop.0+0x40/0xe0
[  285.471870]  do_el0_svc_compat+0x1c/0x34
[  285.472073]  el0_svc_compat+0x2c/0x80
[  285.472265]  el0t_32_sync_handler+0x90/0x140
[  285.472473]  el0t_32_sync+0x19c/0x1a0
[  285.472887] Code: f9400885 93407c23 937d7c27 11000421 (f86378a3)
[  285.473422] ---[ end trace 0000000000000000 ]---

It reproduced simply with below script:
while [ 1 ]
do
/exportfs -r
done &

while [ 1 ]
do
insmod /nfsd.ko
mount -t nfsd none /proc/fs/nfsd
umount /proc/fs/nfsd
rmmod nfsd
done &

So exporting interfaces to user space shall be done at last and
cleanup at first place.

With change there is no Kernel OOPs.

Co-developed-by: Shubham Rana <s9.rana@samsung.com>
Signed-off-by: Shubham Rana <s9.rana@samsung.com>
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:23 -04:00
Maninder Singh
ff12eb3795 NFSD: unregister filesystem in case genl_register_family() fails
With rpc_status netlink support, unregister of register_filesystem()
was missed in case of genl_register_family() fails.

Correcting it by making new label.

Fixes: bd9d6a3efa ("NFSD: add rpc_status netlink support")
Cc: stable@vger.kernel.org
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:22 -04:00
Chuck Lever
b53a42970f NFSD: Record each NFSv4 call's session slot index
Help the client resolve the race between the reply to an
asynchronous COPY reply and the associated CB_OFFLOAD callback by
planting the session, slot, and sequence number of the COPY in the
CB_SEQUENCE contained in the CB_OFFLOAD COMPOUND.

Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:21 -04:00
Chuck Lever
281ae67c94 NFSD: Implement CB_SEQUENCE referring call lists
The slot index number of the current COMPOUND has, until now, not
been needed outside of nfsd4_sequence(). But to record the tuple
that represents a referring call, the slot number will be needed
when processing subsequent operations in the COMPOUND.

Refactor the code that allocates a new struct nfsd4_slot to ensure
that the new sl_index field is always correctly initialized.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:21 -04:00
Chuck Lever
4f3c8d8c9e NFSD: Implement CB_SEQUENCE referring call lists
We have yet to implement a mechanism in NFSD for resolving races
between a server's reply and a related callback operation. For
example, a CB_OFFLOAD callback can race with the matching COPY
response. The client will not recognize the copy state ID in the
CB_OFFLOAD callback until the COPY response arrives.

Trond adds:
> It is also needed for the same kind of race with delegation
> recalls, layout recalls, CB_NOTIFY_DEVICEID and would also be
> helpful (although not as strongly required) for CB_NOTIFY_LOCK.

RFC 8881 Section 20.9.3 describes referring call lists this way:
> The csa_referring_call_lists array is the list of COMPOUND
> requests, identified by session ID, slot ID, and sequence ID.
> These are requests that the client previously sent to the server.
> These previous requests created state that some operation(s) in
> the same CB_COMPOUND as the csa_referring_call_lists are
> identifying. A session ID is included because leased state is tied
> to a client ID, and a client ID can have multiple sessions. See
> Section 2.10.6.3.

Introduce the XDR infrastructure for populating the
csa_referring_call_lists argument of CB_SEQUENCE. Subsequent patches
will put the referring call list to use.

Note that cb_sequence_enc_sz estimates that only zero or one rcl is
included in each CB_SEQUENCE, but the new infrastructure can
manage any number of referring calls.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:20 -04:00
Chuck Lever
71aeab7bd9 NFSD: Shorten CB_OFFLOAD response to NFS4ERR_DELAY
Try not to prolong the wait for completion of a COPY or COPY_NOTIFY
operation.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:20 -04:00
Chuck Lever
2f2b6d0b9b NFSD: OFFLOAD_CANCEL should mark an async COPY as completed
Update the status of an async COPY operation when it has been
stopped. OFFLOAD_STATUS needs to indicate that the COPY is no longer
running.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:19 -04:00
Linus Torvalds
1ca0f935a1 nfsd-6.15 fixes:
- v6.15 libcrc clean-up makes invalid configurations possible
 - Fix a potential deadlock introduced during the v6.15 merge window
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmgD2mYACgkQM2qzM29m
 f5fn3xAAlPZP7MInbeszGMeRw9zdWaqbg4C90Gehdvi7sMBIYRo9E36NZuT4vl4D
 dfpHhoMIPMO5gsGrOeJZAT27La1F8M+O4yZOiuheTfIikaBrbZcZZtpwdRQHayGw
 qhHz8XoVNPsqvnoG+95u15XWG4IbI4kme2gNbYxEgF8zGERLv1aKPQTh3FqYG7Bk
 +9ANjoPNCAlnUAYdVHJZ5kwfTR10pDwx+jC45pZA4+hAusLx1Ek43pkcBgLMvoh2
 ylbWa6SKceVdh/kb3rWaKXtVptNjcaMWLUZNSpd6Bu9e3zDCfjJN0hKUamHD+hSn
 HjYKbNU1W/l0wJsq9KKDwrOgE7LDZPP/mQ3MkFtMR/ccyTCAGe9Wy6nv7PnwEAu2
 GqRh34VbNDGMPp4bpuTlp2FAL91BtGyYCkuwDZ1lqUDqnCmg4gRjyzFpkFF4lXPJ
 hq4J3NgA9BjO9wlQ4P3KNKFgE34s/8R0ey7euyrI5hxFoVFKxr5DNgKXVeW0FsY/
 zMaDLzVN2Uv37IxjKhc1XvE2Ml6+J3tRv9awv+Yp05tG8/3vAhogs7So3TtwfWdU
 iWA3G0+zB30ATqdeXwpmJ8XxIqBnixDKhJcnLNOwrT+vAeqqqa0C39M0sOTQTMg3
 VyFYkgt7v8jljp34DRfc4DAhTCsHzBzgAh9H/jrTP1xDlp7i7AU=
 =vCFi
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - v6.15 libcrc clean-up makes invalid configurations possible

 - Fix a potential deadlock introduced during the v6.15 merge window

* tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: decrease sc_count directly if fail to queue dl_recall
  nfs: add missing selections of CONFIG_CRC32
2025-04-19 10:38:03 -07:00
Li Lingfeng
a1d14d931b nfsd: decrease sc_count directly if fail to queue dl_recall
A deadlock warning occurred when invoking nfs4_put_stid following a failed
dl_recall queue operation:
            T1                            T2
                                nfs4_laundromat
                                 nfs4_get_client_reaplist
                                  nfs4_anylock_blockers
__break_lease
 spin_lock // ctx->flc_lock
                                   spin_lock // clp->cl_lock
                                   nfs4_lockowner_has_blockers
                                    locks_owner_has_blockers
                                     spin_lock // flctx->flc_lock
 nfsd_break_deleg_cb
  nfsd_break_one_deleg
   nfs4_put_stid
    refcount_dec_and_lock
     spin_lock // clp->cl_lock

When a file is opened, an nfs4_delegation is allocated with sc_count
initialized to 1, and the file_lease holds a reference to the delegation.
The file_lease is then associated with the file through kernel_setlease.

The disassociation is performed in nfsd4_delegreturn via the following
call chain:
nfsd4_delegreturn --> destroy_delegation --> destroy_unhashed_deleg -->
nfs4_unlock_deleg_lease --> kernel_setlease --> generic_delete_lease
The corresponding sc_count reference will be released after this
disassociation.

Since nfsd_break_one_deleg executes while holding the flc_lock, the
disassociation process becomes blocked when attempting to acquire flc_lock
in generic_delete_lease. This means:
1) sc_count in nfsd_break_one_deleg will not be decremented to 0;
2) The nfs4_put_stid called by nfsd_break_one_deleg will not attempt to
acquire cl_lock;
3) Consequently, no deadlock condition is created.

Given that sc_count in nfsd_break_one_deleg remains non-zero, we can
safely perform refcount_dec on sc_count directly. This approach
effectively avoids triggering deadlock warnings.

Fixes: 230ca75845 ("nfsd: put dl_stid if fail to queue dl_recall")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-04-13 16:39:42 -04:00
Eric Biggers
cd35b6cb46 nfs: add missing selections of CONFIG_CRC32
nfs.ko, nfsd.ko, and lockd.ko all use crc32_le(), which is available
only when CONFIG_CRC32 is enabled.  But the only NFS kconfig option that
selected CONFIG_CRC32 was CONFIG_NFS_DEBUG, which is client-specific and
did not actually guard the use of crc32_le() even on the client.

The code worked around this bug by only actually calling crc32_le() when
CONFIG_CRC32 is built-in, instead hard-coding '0' in other cases.  This
avoided randconfig build errors, and in real kernels the fallback code
was unlikely to be reached since CONFIG_CRC32 is 'default y'.  But, this
really needs to just be done properly, especially now that I'm planning
to update CONFIG_CRC32 to not be 'default y'.

Therefore, make CONFIG_NFS_FS, CONFIG_NFSD, and CONFIG_LOCKD select
CONFIG_CRC32.  Then remove the fallback code that becomes unnecessary,
as well as the selection of CONFIG_CRC32 from CONFIG_NFS_DEBUG.

Fixes: 1264a2f053 ("NFS: refactor code for calculating the crc32 hash of a filehandle")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-04-13 16:39:42 -04:00
NeilBrown
8ad9248471
nfsd: Use lookup_one() rather than lookup_one_len()
nfsd uses some VFS interfaces (such as vfs_mkdir) which take an explicit
mnt_idmap, and it passes &nop_mnt_idmap as nfsd doesn't yet support
idmapped mounts.

It also uses the lookup_one_len() family of functions which implicitly
use &nop_mnt_idmap.  This mixture of implicit and explicit could be
confusing.  When we eventually update nfsd to support idmap mounts it
would be best if all places which need an idmap determined from the
mount point were similar and easily found.

So this patch changes nfsd to use lookup_one(), lookup_one_unlocked(),
and lookup_one_positive_unlocked(), passing &nop_mnt_idmap.

This has the benefit of removing some uses of the lookup_one_len
functions where permission checking is actually needed.  Many callers
don't care about permission checking and using these function only where
permission checking is needed is a valuable simplification.

This change requires passing the name in a qstr.  Currently this is a
little clumsy, but if nfsd is changed to use qstr more broadly it will
result in a net improvement.

Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-3-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 09:25:32 +02:00
Linus Torvalds
b6dde1e527 NFSD 6.15 Release Notes
Neil Brown contributed more scalability improvements to NFSD's
 open file cache, and Jeff Layton contributed a menagerie of
 repairs to NFSD's NFSv4 callback / backchannel implementation.
 
 Mike Snitzer contributed a change to NFS re-export support that
 disables support for file locking on a re-exported NFSv4 mount.
 This is because NFSv4 state recovery is currently difficult if
 not impossible for re-exported NFS mounts. The change aims to
 prevent data integrity exposures after the re-export server
 crashes.
 
 Work continues on the evolving NFSD netlink administrative API.
 
 Many thanks to the contributors, reviewers, testers, and bug
 reporters who participated during the v6.15 development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmfmpMIACgkQM2qzM29m
 f5f6DA/+P0YqoRg3Zk/4oWwXZWbfEOMhWFltT+D1PE2QjUfOZpiwUSFQfsfYgXO6
 OFu0iDQ4g8BxBeP6Umv61qy7Cv6n4fVzIHqzymXQvymh9JzoQiXlE9/fA8nAHuiH
 u7kkNPRi7faBz1sMg/WpN9CHctg7STPOhhG/JrZcSFZnh87mU1i4i4bZBNz8tVnK
 ZWf483OUuSmJY2/bUTkwvr4GbceTKBlLWFFjiRhfAKvJBWvu4myfC0DI5QzxmsgI
 MJ62do7AFJP1ww2Ih9LLi2kFIt/yyInSVAgyts1CPhlJ4BfPnTSOw/i2+CuF3D/M
 bZYEAOjH3AqjBZmq58sIQezpD5f9/TOrTSwYwS31zl/THYE413WiW80/MDoWqo0y
 9cSNkD3nJlPVLLCfF58vXLoe7wpLoN/ZbTdxoozzUWEFR5A4Jz3XP8F/Cws0cjem
 uWWAQMItiQpg1+RYJYfu4dg5+iN6dbgYbvzlr7buISwFNXi3Zo99MkJ4wHj9TJbL
 Tpjth1rWGPwwSOMT6ojKiYMq1oUzx5PuAm9Saq9oIzQAbBySmxHF/LSDz3wEuBoO
 MK1jzKroEmMk3fJOOAajSDLOdAbL3vfj6H/xi2IHvKnaz9yHCZNu2YGV05BBMprd
 hWePf69AO5Ky5Q9KuGClEtwvJ9ZR5pb4DO2dqaYu8ximu3O4vPo=
 =e2E2
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Neil Brown contributed more scalability improvements to NFSD's open
  file cache, and Jeff Layton contributed a menagerie of repairs to
  NFSD's NFSv4 callback / backchannel implementation.

  Mike Snitzer contributed a change to NFS re-export support that
  disables support for file locking on a re-exported NFSv4 mount. This
  is because NFSv4 state recovery is currently difficult if not
  impossible for re-exported NFS mounts. The change aims to prevent data
  integrity exposures after the re-export server crashes.

  Work continues on the evolving NFSD netlink administrative API.

  Many thanks to the contributors, reviewers, testers, and bug reporters
  who participated during the v6.15 development cycle"

* tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits)
  NFSD: Add a Kconfig setting to enable delegated timestamps
  sysctl: Fixes nsm_local_state bounds
  nfsd: use a long for the count in nfsd4_state_shrinker_count()
  nfsd: remove obsolete comment from nfs4_alloc_stid
  nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()
  nfsd: reorganize struct nfs4_delegation for better packing
  nfsd: handle errors from rpc_call_async()
  nfsd: move cb_need_restart flag into cb_flags
  nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING
  nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY
  nfsd: prevent callback tasks running concurrently
  nfsd: disallow file locking and delegations for NFSv4 reexport
  nfsd: filecache: drop the list_lru lock during lock gc scans
  nfsd: filecache: don't repeatedly add/remove files on the lru list
  nfsd: filecache: introduce NFSD_FILE_RECENT
  nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()
  nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync()
  NFSD: Re-organize nfsd_file_gc_worker()
  nfsd: filecache: remove race handling.
  fs: nfs: acl: Avoid -Wflex-array-member-not-at-end warning
  ...
2025-03-31 17:28:17 -07:00
Chuck Lever
26a8076215 NFSD: Add a Kconfig setting to enable delegated timestamps
After three tries, we still see test failures with delegated
timestamps. Disable them by default, but leave the implementation
intact so that development can continue.

Cc: stable@vger.kernel.org # v6.14
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-14 10:49:47 -04:00
Jeff Layton
261e3bbf97 nfsd: use a long for the count in nfsd4_state_shrinker_count()
If there are no courtesy clients then the return value from the
atomic_long_read() could overflow an int. Use a long to store the value
instead.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:12 -04:00
Jeff Layton
387625808c nfsd: remove obsolete comment from nfs4_alloc_stid
idr_alloc_cyclic() is what guarantees that now, not this long-gone trick.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:12 -04:00
Jeff Layton
d917d78311 nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()
This isn't needed.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:11 -04:00
Jeff Layton
87055f8aea nfsd: reorganize struct nfs4_delegation for better packing
Move dl_type field above dl_time, which shaves 8 bytes off this struct.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:11 -04:00
Jeff Layton
ff383e8f94 nfsd: handle errors from rpc_call_async()
It's possible for rpc_call_async() to fail (mainly due to memory
allocation failure). If it does, there isn't much recourse other than to
requeue the callback and try again later.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:10 -04:00
Jeff Layton
32ce62c0f0 nfsd: move cb_need_restart flag into cb_flags
Since there is now a cb_flags word, use a new NFSD4_CALLBACK_REQUEUE
flag in that instead of a separate boolean.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:10 -04:00
Jeff Layton
49bdbdb11f nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING
These flags serve essentially the same purpose and get set and cleared
at the same time. Drop CB_GETATTR_BUSY and just use
NFSD4_CALLBACK_RUNNING instead.

For this to work, we must use clear_and_wake_up_bit(), but doing that on
for other types of callbacks is wasteful. Declare a new NFSD4_CALLBACK_WAKE
flag in cb_flags to indicate that wake_up is needed, and only set that
for CB_GETATTRs.

Also, make the wait use a TASK_UNINTERRUPTIBLE sleep. This is done in
the context of an nfsd thread, and it should never need to deal with
signals.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:10 -04:00
Jeff Layton
424dd3df1f nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY
deleg_reaper() will walk the client_lru list and put any suitable
entries onto "cblist" using the cl_ra_cblist pointer. It then walks the
objects outside the spinlock and queues callbacks for them.

None of the operations that deleg_reaper() does outside the
nn->client_lock are blocking operations. Just queue their workqueue jobs
under the nn->client_lock instead.

Also, the NFSD4_CLIENT_CB_RECALL_ANY and NFSD4_CALLBACK_RUNNING flags
serve an identical purpose now. Drop the NFSD4_CLIENT_CB_RECALL_ANY flag
and just use the one in the callback.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:09 -04:00
Jeff Layton
1054e8ffc5 nfsd: prevent callback tasks running concurrently
The nfsd4_callback workqueue jobs exist to queue backchannel RPCs to
rpciod. Because they run in different workqueue contexts, the rpc_task
can run concurrently with the workqueue job itself, should it become
requeued. This is problematic as there is no locking when accessing the
fields in the nfsd4_callback.

Add a new unsigned long to nfsd4_callback and declare a new
NFSD4_CALLBACK_RUNNING flag to be set in it. When attempting to run a
workqueue job, do a test_and_set_bit() on that flag first, and don't
queue the workqueue job if it returns true. Clear NFSD4_CALLBACK_RUNNING
in nfsd41_destroy_cb().

This also gives us a more reliable mechanism for handling queueing
failures in codepaths where we have to take references under spinlocks.
We can now do the test_and_set_bit on NFSD4_CALLBACK_RUNNING first, and
only take references to the objects if that returns false.

Most of the nfsd4_run_cb() callers are converted to use this new flag or
the nfsd4_try_run_cb() wrapper. The main exception is the callback
channel probe, which has its own synchronization.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:09 -04:00
Mike Snitzer
9254c8ae9b nfsd: disallow file locking and delegations for NFSv4 reexport
We do not and cannot support file locking with NFS reexport over
NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport
server reboot cannot allow clients to recover locks because the source
NFS server has not rebooted, and so it is not in grace.  Since the
source NFS server is not in grace, it cannot offer any guarantees that
the file won't have been changed between the locks getting lost and
any attempt to recover/reclaim them.  The same applies to delegations
and any associated locks, so disallow them too.

Clients are no longer allowed to get file locks or delegations from a
reexport server, any attempts will fail with operation not supported.

Update the "Reboot recovery" section accordingly in
Documentation/filesystems/nfs/reexport.rst

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:08 -04:00
NeilBrown
fbfdc9fc0f nfsd: filecache: drop the list_lru lock during lock gc scans
Under a high NFSv3 load with lots of different files being accessed,
the LRU list of garbage-collectable files can become quite long.

Asking list_lru_scan_node() to scan the whole list can result in a long
period during which a spinlock is held, blocking the addition of new LRU
items.

So ask list_lru_scan_node() to scan only a few entries at a time, and
repeat until the scan is complete.

If the shrinker runs between two consecutive calls of
list_lru_scan_node() it could invalidate the "remaining" counter which
could lead to premature freeing.  So add a spinlock to avoid that.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:08 -04:00
NeilBrown
56221b42d7 nfsd: filecache: don't repeatedly add/remove files on the lru list
There is no need to remove a file from the lru every time we access it,
and then add it back.  It is sufficient to set the REFERENCED flag every
time we put the file.  The order in the lru of REFERENCED files is
largely irrelevant as they will all be moved to the end.

With this patch, files are added only when they are allocated (if
want_gc) and they are removed only by the list_lru_(shrink_)walk
callback or when forcibly removing a file.

This should reduce contention on the list_lru spinlock(s) and reduce
memory traffic a little.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:07 -04:00
NeilBrown
64912122a4 nfsd: filecache: introduce NFSD_FILE_RECENT
The filecache lru is walked in 2 circumstances for 2 different reasons.

1/ When called from the shrinker we want to discard the first few
   entries on the list, ignoring any with NFSD_FILE_REFERENCED set
   because they should really be at the end of the LRU as they have been
   referenced recently.  So those ones are ROTATED.

2/ When called from the nfsd_file_gc() timer function we want to discard
   anything that hasn't been used since before the previous call, and
   mark everything else as unused at this point in time.

Using the same flag for both of these can result in some unexpected
outcomes.  If the shrinker callback clears NFSD_FILE_REFERENCED then
nfsd_file_gc() will think the file hasn't been used in a while, while
really it has.

I think it is easier to reason about the behaviour if we instead have
two flags.

 NFSD_FILE_REFERENCED means "this should be at the end of the LRU, please
     put it there when convenient"
 NFSD_FILE_RECENT means "this has been used recently - since the last
     run of nfsd_file_gc()

When either caller finds an NFSD_FILE_REFERENCED entry, that entry
should be moved to the end of the LRU and the flag cleared.  This can
safely happen at any time.  The actual order on the lru might not be
strictly least-recently-used, but that is normal for linux lrus.

The shrinker callback can ignore the "recent" flag.  If it ends up
freeing something that is "recent" that simply means that memory
pressure is sufficient to limit the acceptable cache age to less than
the nfsd_file_gc frequency.

The gc callback should primarily focus on NFSD_FILE_RECENT.  It should
free everything that doesn't have this flag set, and should clear the
flag on everything else.  When it clears the flag it is convenient to
clear the "REFERENCED" flag and move to the end of the LRU too.

With this, calls from the shrinker do not prematurely age files.  It
will focus only on freeing those that are least recently used.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:07 -04:00
NeilBrown
8017afd66c nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()
list_lru_walk() is only useful when the aim is to remove all elements
from the list_lru.  It will repeatedly visit rotated elements of the
first per-node sublist before proceeding to subsequent sublists.

This patch changes nfsd_file_gc() to use list_lru_walk_node() and
list_lru_count_node() on each NUMA node.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:06 -04:00
NeilBrown
e8e6f5cdbc nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync()
nfsd_file_close_inode_sync() contains an exact copy of
nfsd_file_dispose_list().

This patch removes that copy and calls nfsd_file_dispose_list()
instead.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:06 -04:00
Chuck Lever
1601e2fde9 NFSD: Re-organize nfsd_file_gc_worker()
Dave opines:

IMO, there is no need to do this unnecessary work on every object
that is added to the LRU.  Changing the gc worker to always run
every 2s and check if it has work to do like so:

 static void
 nfsd_file_gc_worker(struct work_struct *work)
 {
-	nfsd_file_gc();
-	if (list_lru_count(&nfsd_file_lru))
-		nfsd_file_schedule_laundrette();
+	if (list_lru_count(&nfsd_file_lru))
+		nfsd_file_gc();
+	nfsd_file_schedule_laundrette();
 }

means that nfsd_file_gc() will be run the same way and have the same
behaviour as the current code. When the system it idle, it does a
list_lru_count() check every 2 seconds and goes back to sleep.
That's going to be pretty much unnoticable on most machines that run
NFS servers.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:05 -04:00
NeilBrown
f77ce2e570 nfsd: filecache: remove race handling.
The race that this code tries to protect against is not interesting.
The code is problematic as we access the "nf" after we have given our
reference to the lru system.  While that takes 2+ seconds to free
things, it is still poor form.

The only interesting race I can find would be with
nfsd_file_close_inode_sync();
This is the only place that really doesn't want the file to stay on the
LRU when unhashed (which is the direct consequence of the race).

However for the race to happen, some other thread must own a reference
to a file and be putting it while nfsd_file_close_inode_sync() is trying
to close all files for an inode.  If this is possible, that other thread
could simply call nfsd_file_put() a little bit later and the result
would be the same: not all files are closed when
nfsd_file_close_inode_sync() completes.

If this was really a problem, we would need to wait in close_inode_sync
for the other references to be dropped.  We probably don't want to do
that.

So it is best to simply remove this code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:05 -04:00
Chuck Lever
8ce35dcaf3 NFSD: Fix callback decoder status codes
fs/nfsd/nfs4callback.c implements a callback client. Thus its XDR
decoders are decoding replies, not calls.

NFS4ERR_BAD_XDR is an on-the-wire status code that reports that the
client sent a corrupted RPC /call/. It's not used as the internal
error code when a /reply/ can't be decoded, since that kind of
failure is never reported to the sender of that RPC message.

Instead, a reply decoder should return -EIO, as the reply decoders
in the NFS client do.

Fixes: 6487a13b5c ("NFSD: add support for CB_GETATTR callback")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:04 -04:00
Jeff Layton
4b54b85e38 nfsd: eliminate special handling of NFS4ERR_SEQ_MISORDERED
On a SEQ_MISORDERED error, the current code will reattempt the call, but
set the slot sequence ID to 1. I can find no mention of this remedy in
the spec, and it seems potentially dangerous. It's possible that the
last call was sent with seqid 1, and doing this will cause a
retransmission of the reply.

Drop this special handling, and always treat SEQ_MISORDERED like
BADSLOT. Retry the call, but leak the slot so that it is no longer used.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:03 -04:00
Jeff Layton
999595a651 nfsd: handle NFS4ERR_BADSLOT on CB_SEQUENCE better
Currently it just restarts the call, without getting a new slot.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:03 -04:00
Jeff Layton
bf36c14972 nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr
ESERVERFAULT means that the server sent a successful and legitimate
reply, but the session info didn't match what was expected. Don't
increment the seq_nr in that case.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:02 -04:00
Jeff Layton
f049911b5b nfsd: only check RPC_SIGNALLED() when restarting rpc_task
nfsd4_cb_sequence_done() currently checks RPC_SIGNALLED() when
processing the compound and releasing the slot. If RPC_SIGNALLED()
returns true, then that means that the client is going to be torn down.

Don't check RPC_SIGNALLED() after processing a successful reply. Check
it only before restarting the rpc_task. If it returns true, then requeue
the callback instead of restarting the task.

Also, handle rpc_restart_call() and rpc_restart_call_prepare() failures
correctly, by requeueing the callback.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:02 -04:00
Jeff Layton
43fa8905db nfsd: always release slot when requeueing callback
If the callback is going to be requeued to the workqueue, then release
the slot. The callback client and session could change and the slot may
no longer be valid after that point.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:02 -04:00
Jeff Layton
6c1cefb84b nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done()
It's a bit strange to call nfsd4_cb_sequence_done() on a callback with no
CB_SEQUENCE. Lift the handling of restarting a call into a new helper,
and move the handling of NFSv4.0 into nfsd4_cb_done().

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:01 -04:00
Jeff Layton
1c2d0957dc nfsd: prepare nfsd4_cb_sequence_done() for error handling rework
There is only one case where we want to proceed with processing the rest
of the CB_COMPOUND, and that's when the cb_seq_status is 0. Make the
default return value be false, and only set it to true in that case.

Rename the "need_restart" label to "requeue", to better indicate that
it's being requeued to the workqueue.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:01 -04:00
Li Lingfeng
230ca75845 nfsd: put dl_stid if fail to queue dl_recall
Before calling nfsd4_run_cb to queue dl_recall to the callback_wq, we
increment the reference count of dl_stid.
We expect that after the corresponding work_struct is processed, the
reference count of dl_stid will be decremented through the callback
function nfsd4_cb_recall_release.
However, if the call to nfsd4_run_cb fails, the incremented reference
count of dl_stid will not be decremented correspondingly, leading to the
following nfs4_stid leak:
unreferenced object 0xffff88812067b578 (size 344):
  comm "nfsd", pid 2761, jiffies 4295044002 (age 5541.241s)
  hex dump (first 32 bytes):
    01 00 00 00 6b 6b 6b 6b b8 02 c0 e2 81 88 ff ff  ....kkkk........
    00 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 ad 4e ad de  .kkkkkkk.....N..
  backtrace:
    kmem_cache_alloc+0x4b9/0x700
    nfsd4_process_open1+0x34/0x300
    nfsd4_open+0x2d1/0x9d0
    nfsd4_proc_compound+0x7a2/0xe30
    nfsd_dispatch+0x241/0x3e0
    svc_process_common+0x5d3/0xcc0
    svc_process+0x2a3/0x320
    nfsd+0x180/0x2e0
    kthread+0x199/0x1d0
    ret_from_fork+0x30/0x50
    ret_from_fork_asm+0x1b/0x30
unreferenced object 0xffff8881499f4d28 (size 368):
  comm "nfsd", pid 2761, jiffies 4295044005 (age 5541.239s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 30 4d 9f 49 81 88 ff ff  ........0M.I....
    30 4d 9f 49 81 88 ff ff 20 00 00 00 01 00 00 00  0M.I.... .......
  backtrace:
    kmem_cache_alloc+0x4b9/0x700
    nfs4_alloc_stid+0x29/0x210
    alloc_init_deleg+0x92/0x2e0
    nfs4_set_delegation+0x284/0xc00
    nfs4_open_delegation+0x216/0x3f0
    nfsd4_process_open2+0x2b3/0xee0
    nfsd4_open+0x770/0x9d0
    nfsd4_proc_compound+0x7a2/0xe30
    nfsd_dispatch+0x241/0x3e0
    svc_process_common+0x5d3/0xcc0
    svc_process+0x2a3/0x320
    nfsd+0x180/0x2e0
    kthread+0x199/0x1d0
    ret_from_fork+0x30/0x50
    ret_from_fork_asm+0x1b/0x30
Fix it by checking the result of nfsd4_run_cb and call nfs4_put_stid if
fail to queue dl_recall.

Cc: stable@vger.kernel.org
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:00 -04:00
Jeff Layton
d1bc15b147 nfsd: allow SC_STATUS_FREEABLE when searching via nfs4_lookup_stateid()
The pynfs DELEG8 test fails when run against nfsd. It acquires a
delegation and then lets the lease time out. It then tries to use the
deleg stateid and expects to see NFS4ERR_DELEG_REVOKED, but it gets
bad NFS4ERR_BAD_STATEID instead.

When a delegation is revoked, it's initially marked with
SC_STATUS_REVOKED, or SC_STATUS_ADMIN_REVOKED and later, it's marked
with the SC_STATUS_FREEABLE flag, which denotes that it is waiting for
s FREE_STATEID call.

nfs4_lookup_stateid() accepts a statusmask that includes the status
flags that a found stateid is allowed to have. Currently, that mask
never includes SC_STATUS_FREEABLE, which means that revoked delegations
are (almost) never found.

Add SC_STATUS_FREEABLE to the always-allowed status flags, and remove it
from nfsd4_delegreturn() since it's now always implied.

Fixes: 8dd91e8d31 ("nfsd: fix race between laundromat and free_stateid")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:00 -04:00
Jeff Layton
930b64ca0c nfsd: don't ignore the return code of svc_proc_register()
Currently, nfsd_proc_stat_init() ignores the return value of
svc_proc_register(). If the procfile creation fails, then the kernel
will WARN when it tries to remove the entry later.

Fix nfsd_proc_stat_init() to return the same type of pointer as
svc_proc_register(), and fix up nfsd_net_init() to check that and fail
the nfsd_net construction if it occurs.

svc_proc_register() can fail if the dentry can't be allocated, or if an
identical dentry already exists. The second case is pretty unlikely in
the nfsd_net construction codepath, so if this happens, return -ENOMEM.

Reported-by: syzbot+e34ad04f27991521104c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-nfs/67a47501.050a0220.19061f.05f9.GAE@google.com/
Cc: stable@vger.kernel.org # v6.9
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:59 -04:00
Chuck Lever
2ed4f6fe15 NFSD: Fix trace_nfsd_slot_seqid_sequence
While running down the problem triggered by disconnect injection,
I noticed the "in use" string was actually never hooked up in this
trace point, so it always showed the traced slot as not in use. But
what might be more useful is showing all the slot status flags.

Also, this trace point can record and report the slot's index
number, which among other things is useful for troubleshooting slot
table expansion and contraction.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:58 -04:00
Chuck Lever
6e45906a0b NFSD: Return NFS4ERR_FILE_OPEN only when linking an open file
RFC 8881 Section 18.9.4 paragraphs 1 - 2 tell us that RENAME should
return NFS4ERR_FILE_OPEN only when the target object is a file that
is currently open. If the target is a directory, some other status
must be returned.

The VFS is unlikely to return -EBUSY, but NFSD has to ensure that
errno does not leak to clients as a status code that is not
permitted by spec.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:57 -04:00
Chuck Lever
3b60984e79 NFSD: Return NFS4ERR_FILE_OPEN only when renaming over an open file
RFC 8881 Section 18.26.4 paragraphs 1 - 3 tell us that RENAME should
return NFS4ERR_FILE_OPEN only when the target object is a file that
is currently open. If the target is a directory, some other status
must be returned.

Generally I expect that a delegation recall will be triggered in
some of these circumstances. In other cases, the VFS might return
-EBUSY for other reasons, and NFSD has to ensure that errno does
not leak to clients as a status code that is not permitted by spec.

There are some error flows where the target dentry hasn't been
found yet. The default value for @type therefore is S_IFDIR to return
an alternate status code in those cases.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:57 -04:00
Chuck Lever
370345b4bd NFSD: Never return NFS4ERR_FILE_OPEN when removing a directory
RFC 8881 Section 18.25.4 paragraph 5 tells us that the server
should return NFS4ERR_FILE_OPEN only if the target object is an
opened file. This suggests that returning this status when removing
a directory will confuse NFS clients.

This is a version-specific issue; nfsd_proc_remove/rmdir() and
nfsd3_proc_remove/rmdir() already return nfserr_access as
appropriate.

Unfortunately there is no quick way for nfsd4_remove() to determine
whether the target object is a file or not, so the check is done in
in nfsd_unlink() for now.

Reported-by: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 466e16f092 ("nfsd: check for EBUSY from vfs_rmdir/vfs_unink.")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:56 -04:00
Chuck Lever
d7d8e3169b NFSD: nfsd_unlink() clobbers non-zero status returned from fh_fill_pre_attrs()
If fh_fill_pre_attrs() returns a non-zero status, the error flow
takes it through out_unlock, which then overwrites the returned
status code with

	err = nfserrno(host_err);

Fixes: a332018a91 ("nfsd: handle failure to collect pre/post-op attrs more sanely")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:56 -04:00
Li Lingfeng
904201c7b5 nfsd: remove the redundant mapping of nfserr_mlink
There two mappings of nfserr_mlink in nfs_errtbl.
Remove one of them.

Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:56 -04:00
Chuck Lever
8a388c1fab NFSD: Skip sending CB_RECALL_ANY when the backchannel isn't up
NFSD sends CB_RECALL_ANY to clients when the server is low on
memory or that client has a large number of delegations outstanding.

We've seen cases where NFSD attempts to send CB_RECALL_ANY requests
to disconnected clients, and gets confused. These calls never go
anywhere if a backchannel transport to the target client isn't
available. Before the server can send any backchannel operation, the
client has to connect first and then do a BIND_CONN_TO_SESSION.

This patch doesn't address the root cause of the confusion, but
there's no need to queue up these optional operations if they can't
go anywhere.

Fixes: 44df6f439a ("NFSD: add delegation reaper to react to low memory condition")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:55 -04:00
Olga Kornievskaia
45de52d034 nfsd: adjust WARN_ON_ONCE in revoke_delegation
A WARN_ON_ONCE() is added to revoke delegations to make sure that the
state has been marked for revocation. However, that's only true for 4.1+
stateids. For 4.0 stateids, in unhash_delegation_locked() the sc_status
is set to SC_STATUS_CLOSED. Modify the check to reflect it, otherwise
a WARN_ON_ONCE is erronously triggered.

Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:55 -04:00
Olga Kornievskaia
d093c90892 nfsd: fix management of listener transports
Currently, when no active threads are running, a root user using nfsdctl
command can try to remove a particular listener from the list of previously
added ones, then start the server by increasing the number of threads,
it leads to the following problem:

[  158.835354] refcount_t: addition on 0; use-after-free.
[  158.835603] WARNING: CPU: 2 PID: 9145 at lib/refcount.c:25 refcount_warn_saturate+0x160/0x1a0
[  158.836017] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace overlay isofs uinput snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables qrtr sunrpc vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_hda_codec_generic mc e1000e snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore sg loop dm_multipath dm_mod nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs libcrc32c crct10dif_ce ghash_ce vmwgfx sha2_ce sha256_arm64 sr_mod sha1_ce cdrom nvme drm_client_lib drm_ttm_helper ttm nvme_core drm_kms_helper nvme_auth drm fuse
[  158.840093] CPU: 2 UID: 0 PID: 9145 Comm: nfsd Kdump: loaded Tainted: G    B   W          6.13.0-rc6+ #7
[  158.840624] Tainted: [B]=BAD_PAGE, [W]=WARN
[  158.840802] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024
[  158.841220] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  158.841563] pc : refcount_warn_saturate+0x160/0x1a0
[  158.841780] lr : refcount_warn_saturate+0x160/0x1a0
[  158.842000] sp : ffff800089be7d80
[  158.842147] x29: ffff800089be7d80 x28: ffff00008e68c148 x27: ffff00008e68c148
[  158.842492] x26: ffff0002e3b5c000 x25: ffff600011cd1829 x24: ffff00008653c010
[  158.842832] x23: ffff00008653c000 x22: 1fffe00011cd1829 x21: ffff00008653c028
[  158.843175] x20: 0000000000000002 x19: ffff00008653c010 x18: 0000000000000000
[  158.843505] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  158.843836] x14: 0000000000000000 x13: 0000000000000001 x12: ffff600050a26493
[  158.844143] x11: 1fffe00050a26492 x10: ffff600050a26492 x9 : dfff800000000000
[  158.844475] x8 : 00009fffaf5d9b6e x7 : ffff000285132493 x6 : 0000000000000001
[  158.844823] x5 : ffff000285132490 x4 : ffff600050a26493 x3 : ffff8000805e72bc
[  158.845174] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000098588000
[  158.845528] Call trace:
[  158.845658]  refcount_warn_saturate+0x160/0x1a0 (P)
[  158.845894]  svc_recv+0x58c/0x680 [sunrpc]
[  158.846183]  nfsd+0x1fc/0x348 [nfsd]
[  158.846390]  kthread+0x274/0x2f8
[  158.846546]  ret_from_fork+0x10/0x20
[  158.846714] ---[ end trace 0000000000000000 ]---

nfsd_nl_listener_set_doit() would manipulate the list of transports of
server's sv_permsocks and close the specified listener but the other
list of transports (server's sp_xprts list) would not be changed leading
to the problem above.

Instead, determined if the nfsdctl is trying to remove a listener, in
which case, delete all the existing listener transports and re-create
all-but-the-removed ones.

Fixes: 16a4711774 ("NFSD: add listener-{set,get} netlink command")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:54 -04:00
NeilBrown
c54b386969
VFS: Change vfs_mkdir() to return the dentry.
vfs_mkdir() does not guarantee to leave the child dentry hashed or make
it positive on success, and in many such cases the filesystem had to use
a different dentry which it can now return.

This patch changes vfs_mkdir() to return the dentry provided by the
filesystems which is hashed and positive when provided.  This reduces
the number of cases where the resulting dentry is not positive to a
handful which don't deserve extra efforts.

The only callers of vfs_mkdir() which are interested in the resulting
inode are in-kernel filesystem clients: cachefiles, nfsd, smb/server.
The only filesystems that don't reliably provide the inode are:
- kernfs, tracefs which these clients are unlikely to be interested in
- cifs in some configurations would need to do a lookup to find the
  created inode, but doesn't.  cifs cannot be exported via NFS, is
  unlikely to be used by cachefiles, and smb/server only has a soft
  requirement for the inode, so this is unlikely to be a problem in
  practice.
- hostfs, nfs, cifs may need to do a lookup (rarely for NFS) and it is
  possible for a race to make that lookup fail.  Actual failure
  is unlikely and providing callers handle negative dentries graceful
  they will fail-safe.

So this patch removes the lookup code in nfsd and smb/server and adjusts
them to fail safe if a negative dentry is provided:
- cache-files already fails safe by restarting the task from the
  top - it still does with this change, though it no longer calls
  cachefiles_put_directory() as that will crash if the dentry is
  negative.
- nfsd reports "Server-fault" which it what it used to do if the lookup
  failed. This will never happen on any file-systems that it can actually
  export, so this is of no consequence.  I removed the fh_update()
  call as that is not needed and out-of-place.  A subsequent
  nfsd_create_setattr() call will call fh_update() when needed.
- smb/server only wants the inode to call ksmbd_smb_inherit_owner()
  which updates ->i_uid (without calling notify_change() or similar)
  which can be safely skipping on cifs (I hope).

If a different dentry is returned, the first one is put.  If necessary
the fact that it is new can be determined by comparing pointers.  A new
dentry will certainly have a new pointer (as the old is put after the
new is obtained).
Similarly if an error is returned (via ERR_PTR()) the original dentry is
put.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-7-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 11:52:50 +01:00
NeilBrown
4cf006b739
nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
nfsd_create_locked() doesn't need to explicitly call fh_update().
On success (which is the only time that fh_update() matters at all),
nfsd_create_setattr() will be called and it will call fh_update().

This extra call is not harmful, but is not necessary.

Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250226062135.2043651-3-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-26 09:55:18 +01:00
Linus Torvalds
febbc555cf nfsd-6.14 fixes:
- Introduced during the v6.14 merge window:
   - A fix for CB_GETATTR reply decoding was not quite correct
   - Fix the NFSD connection limiting logic
   - Fix a bug in the new session table resizing logic
 
 - Bugs that pre-date v6.14
   - Support for courteous clients (5.19) introduced a shutdown hang
   - Fix a crash in the filecache laundrette (6.9)
   - Fix a zero-day crash in NFSD's NFSv3 ACL implementation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmeqSYIACgkQM2qzM29m
 f5cuTQ//crg4df/QhLAFNXdUaqQd12C5s9pcQuNsOK5JVrRCEQXchL48SCOd6/xn
 9SLbgSoq2kuE6ZeCTuE88U6fo3MqX6XpLZ7kPhFO9rmpULFxFvavT89iWFSpNO1p
 00Ges+Y1RA7+S9QgurYTgvcwhwlTbIzIMtGmqh4BawIG1VKfT9lxHeC2NXN8Fe3W
 63p3yd9+cOM2BaXm2GbFf24YKTCvecrMYK0Li2xBmRZn5bDvpmWCFiHxRcHXnDtk
 cbEUt+ZLl2IVqlgQPOlZly7/VOAVPEQfRKM/a9YLIiLxR0GqoLWjUQEprO6N8jz8
 6b4qiPHX5Mbh/zpgwyKFril7pfdtuT+KIvSbw70XDwMoS7voWpc4uGQfL3tZ4Znt
 S9wzTCJAcdZvz7PZ1LXkaAL8mbXY5ItIgfKsQCJ70RStRsqQ8tuyFEx7j6Rrp6Iy
 5KkRO3HAcBJnhL89NgZ2kYc/E8pvuW53LhYZcZbL7Vx4u0aVn/BbFUjVtmhp8/pm
 njj2RCYJ7AKZW2Wf5XLW3nIEke2lFRlwIlmOAdREYHTFFUT0v/TGsSMQe4Yx5FEF
 c+fkIO9lXNThqcibOis5sAKIRx5X/Y+lsqP7Z+eSpoIPEhheQo3HXdYeJ6Ewcws1
 xk288Lmx8RqIUoZU9tF2EujYkTOqyEaAKtQZ7aktQ1tOPKRdiCE=
 =bre8
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Fixes for new bugs:
   - A fix for CB_GETATTR reply decoding was not quite correct
   - Fix the NFSD connection limiting logic
   - Fix a bug in the new session table resizing logic

  Bugs that pre-date v6.14:
   - Support for courteous clients (5.19) introduced a shutdown hang
   - Fix a crash in the filecache laundrette (6.9)
   - Fix a zero-day crash in NFSD's NFSv3 ACL implementation"

* tag 'nfsd-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Fix CB_GETATTR status fix
  NFSD: fix hang in nfsd4_shutdown_callback
  nfsd: fix __fh_verify for localio
  nfsd: fix uninitialised slot info when a request is retried
  nfsd: validate the nfsd_serv pointer before calling svc_wake_up
  nfsd: clear acl_access/acl_default after releasing them
2025-02-10 13:11:24 -08:00
Chuck Lever
4990d09843 NFSD: Fix CB_GETATTR status fix
Jeff says:

Now that I look, 1b3e26a5cc is wrong. The patch on the ml was correct, but
the one that got committed is different. It should be:

    status = decode_cb_op_status(xdr, OP_CB_GETATTR, &cb->cb_status);
    if (unlikely(status || cb->cb_status))

If "status" is non-zero, decoding failed (usu. BADXDR), but we also want to
bail out and not decode the rest of the call if the decoded cb_status is
non-zero. That's not happening here, cb_seq_status has already been checked and
is non-zero, so this ends up trying to decode the rest of the CB_GETATTR reply
when it doesn't exist.

Reported-by: Jeff Layton <jlayton@kernel.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219737
Fixes: 1b3e26a5cc ("NFSD: fix decoding in nfs4_xdr_dec_cb_getattr")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-02-10 13:31:28 -05:00
Dai Ngo
036ac2778f NFSD: fix hang in nfsd4_shutdown_callback
If nfs4_client is in courtesy state then there is no point to send
the callback. This causes nfsd4_shutdown_callback to hang since
cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
notifies NFSD that the connection was dropped.

This patch modifies nfsd4_run_cb_work to skip the RPC call if
nfs4_client is in courtesy state.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Fixes: 66af257999 ("NFSD: add courteous server support for thread with only delegation")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-02-10 13:31:16 -05:00
Olga Kornievskaia
d9d6b74e4b nfsd: fix __fh_verify for localio
__fh_verify() added a call to svc_xprt_set_valid() to help do connection
management but during LOCALIO path rqstp argument is NULL, leading to
NULL pointer dereferencing and a crash.

Fixes: eccbbc7c00 ("nfsd: don't use sv_nrthreads in connection limiting calculations.")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-02-10 13:31:11 -05:00
NeilBrown
5fb2516121 nfsd: fix uninitialised slot info when a request is retried
A recent patch moved the assignment of seq->maxslots from before the
test for a resent request (which ends with a goto) to after, resulting
in it not being run in that case.  This results in the server returning
bogus "high slot id" and "target high slot id" values.

The assignments to ->maxslots and ->target_maxslots need to be *after*
the out: label so that the correct values are returned in replies to
requests that are served from cache.

Fixes: 60aa656431 ("nfsd: allocate new session-based DRC slots on demand.")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-02-10 13:30:55 -05:00
Jeff Layton
b9382e29ca nfsd: validate the nfsd_serv pointer before calling svc_wake_up
nfsd_file_dispose_list_delayed can be called from the filecache
laundrette, which is shut down after the nfsd threads are shut down and
the nfsd_serv pointer is cleared. If nn->nfsd_serv is NULL then there
are no threads to wake.

Ensure that the nn->nfsd_serv pointer is non-NULL before calling
svc_wake_up in nfsd_file_dispose_list_delayed. This is safe since the
svc_serv is not freed until after the filecache laundrette is cancelled.

Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Closes: https://bugs.debian.org/1093734
Fixes: ffb4025961 ("nfsd: Don't leave work of closing files to a work queue")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-02-02 11:31:58 -05:00
Li Lingfeng
7faf14a7b0 nfsd: clear acl_access/acl_default after releasing them
If getting acl_default fails, acl_access and acl_default will be released
simultaneously. However, acl_access will still retain a pointer pointing
to the released posix_acl, which will trigger a WARNING in
nfs3svc_release_getacl like this:

------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 26 PID: 3199 at lib/refcount.c:28
refcount_warn_saturate+0xb5/0x170
Modules linked in:
CPU: 26 UID: 0 PID: 3199 Comm: nfsd Not tainted
6.12.0-rc6-00079-g04ae226af01f-dirty #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
RIP: 0010:refcount_warn_saturate+0xb5/0x170
Code: cc cc 0f b6 1d b3 20 a5 03 80 fb 01 0f 87 65 48 d8 00 83 e3 01 75
e4 48 c7 c7 c0 3b 9b 85 c6 05 97 20 a5 03 01 e8 fb 3e 30 ff <0f> 0b eb
cd 0f b6 1d 8a3
RSP: 0018:ffffc90008637cd8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83904fde
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88871ed36380
RBP: ffff888158beeb40 R08: 0000000000000001 R09: fffff520010c6f56
R10: ffffc90008637ab7 R11: 0000000000000001 R12: 0000000000000001
R13: ffff888140e77400 R14: ffff888140e77408 R15: ffffffff858b42c0
FS:  0000000000000000(0000) GS:ffff88871ed00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000562384d32158 CR3: 000000055cc6a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? refcount_warn_saturate+0xb5/0x170
 ? __warn+0xa5/0x140
 ? refcount_warn_saturate+0xb5/0x170
 ? report_bug+0x1b1/0x1e0
 ? handle_bug+0x53/0xa0
 ? exc_invalid_op+0x17/0x40
 ? asm_exc_invalid_op+0x1a/0x20
 ? tick_nohz_tick_stopped+0x1e/0x40
 ? refcount_warn_saturate+0xb5/0x170
 ? refcount_warn_saturate+0xb5/0x170
 nfs3svc_release_getacl+0xc9/0xe0
 svc_process_common+0x5db/0xb60
 ? __pfx_svc_process_common+0x10/0x10
 ? __rcu_read_unlock+0x69/0xa0
 ? __pfx_nfsd_dispatch+0x10/0x10
 ? svc_xprt_received+0xa1/0x120
 ? xdr_init_decode+0x11d/0x190
 svc_process+0x2a7/0x330
 svc_handle_xprt+0x69d/0x940
 svc_recv+0x180/0x2d0
 nfsd+0x168/0x200
 ? __pfx_nfsd+0x10/0x10
 kthread+0x1a2/0x1e0
 ? kthread+0xf4/0x1e0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x34/0x60
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>
Kernel panic - not syncing: kernel: panic_on_warn set ...

Clear acl_access/acl_default after posix_acl_release is called to prevent
UAF from being triggered.

Fixes: a257cdd0e2 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241107014705.2509463-1-lilingfeng@huaweicloud.com/
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-02-02 11:31:45 -05:00
Linus Torvalds
b88fe2b5dd NFS Client Updates for Linux 6.14
New Features:
   * Enable using direct IO with localio
   * Added localio related tracepoints
 
 Bugfixes:
   * Sunrpc fixes for working with a very large cl_tasks list
   * Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
   * Fixes for handling reconnections with localio
   * Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
   * Fix COPY_NOTIFY xdr_buf size calculations
   * pNFS/Flexfiles fix for retrying requesting a layout segment for reads
   * Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is expired
 
 Cleanups:
   * Various other nfs & nfsd localio cleanups
   * Prepratory patches for async copy improvements that are under development
   * Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other xprts
   * Add netns inum and srcaddr to debugfs rpc_xprt info
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmeZUzUACgkQ18tUv7Cl
 QOvArw/9HltIlcJHbi7tApGJ4dFpuJCa/fHbA1n5bHvKrCR5aElmFZoiFDdsM1JX
 kFAlMED9n1dW9VmzLJcepxmrLo/t7KXueiZNharHynWTxcszSl6jS+tOFBW6OflG
 Rrrjq/SrsWI2Fu8X4e/7ZV7pqRLGGn5SSMwgbuMbcyzBvVgN8mZM/BneIp1J59AI
 5NOsif5KWetVhQc43zlRlbVWR5cvNGcUK4i58LIaPFzPMt0xq/XJI+QWffj6kv4g
 cHabCNYTdQYMkhiPQC+LLYkw6sMbw2NatajTTYNMWfR/I+7wz9k5ej6CHKPIFCSr
 xjmscypySTLfMFQjrDFZkpX2CwSp/VIbV6go36DJwAlcCRzqz+I7cajlrRK4zvyr
 DyrcaZHvClEczP9QqdPj2wqRXbmIOsDMksOu4ACTUImd4o3f2v1K6DcwRj9oUIhV
 AGR31OEMt2A+RaVvVZYR4PpixJ01vH9LcmsaOu5KkHX8X4q2osQ7eMy+FV4kV09S
 pMnxDMAyszJU8IuzUG1/HfkonNlDMivIbqpgG4ZaVW08Nq4mCxJll1vTAa9FTLz2
 z+9eocqKwf724q1RAgOB7vj4AwOwL4Ul6d18UBtyUitZz3ndLRZ8Yy6r/AhrpCsC
 3co0Y3znZbKeRjmReNl0GLG4qiKE+E7Xh23Lf3IqXg8GE2Mu+Ls=
 =srvH
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Enable using direct IO with localio
   - Added localio related tracepoints

  Bugfixes:
   - Sunrpc fixes for working with a very large cl_tasks list
   - Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
   - Fixes for handling reconnections with localio
   - Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
   - Fix COPY_NOTIFY xdr_buf size calculations
   - pNFS/Flexfiles fix for retrying requesting a layout segment for
     reads
   - Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is
     expired

  Cleanups:
   - Various other nfs & nfsd localio cleanups
   - Prepratory patches for async copy improvements that are under
     development
   - Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other
     xprts
   - Add netns inum and srcaddr to debugfs rpc_xprt info"

* tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits)
  SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired
  sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info
  pnfs/flexfiles: retry getting layout segment for reads
  NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE
  NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE
  NFSv4.2: fix COPY_NOTIFY xdr buf size calculation
  NFS: Rename struct nfs4_offloadcancel_data
  NFS: Fix typo in OFFLOAD_CANCEL comment
  NFS: CB_OFFLOAD can return NFS4ERR_DELAY
  nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it
  nfs: fix incorrect error handling in LOCALIO
  nfs: probe for LOCALIO when v3 client reconnects to server
  nfs: probe for LOCALIO when v4 client reconnects to server
  nfs/localio: remove redundant code and simplify LOCALIO enablement
  nfs_common: add nfs_localio trace events
  nfs_common: track all open nfsd_files per LOCALIO nfs_client
  nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
  nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file
  nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
  nfsd: update percpu_ref to manage references on nfsd_net
  ...
2025-01-28 14:23:46 -08:00
Linus Torvalds
f34b580514 NFSD 6.14 Release Notes
Jeff Layton contributed an implementation of NFSv4.2+ attribute
 delegation, as described here:
 
 https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html
 
 This interoperates with similar functionality introduced into the
 Linux NFS client in v6.11. An attribute delegation permits an NFS
 client to manage a file's mtime, rather than flushing dirty data to
 the NFS server so that the file's mtime reflects the last write,
 which is considerably slower.
 
 Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
 This facility enables NFSD to increase or decrease the number of
 slots per NFS session depending on server memory availability. More
 session slots means greater parallelism.
 
 Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
 encoding screws up when crossing a page boundary in the encoding
 buffer. This is a zero-day bug, but hitting it is rare and depends
 on the NFS client implementation. The Linux NFS client does not
 happen to trigger this issue.
 
 A variety of bug fixes and other incremental improvements fill out
 the list of commits in this release. Great thanks to all
 contributors, reviewers, testers, and bug reporters who participated
 during this development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmeVPBUACgkQM2qzM29m
 f5dClxAAmW4O2bOJaR8neJ54fzeFYXtFYEXRF/XbIh2KdnCy2LoywT9ux8ndzE0k
 1tsjtv0g4y84IcYfrxXPRTuhh2GO2pHw5L1kGVRezJg2ODSFbpdzGtcVK7SIrs2S
 eijTViqdi9xRgfj1jPqRrvxC99RL3/fmCztqorAPLDFsqYAd/6ZRxZ7+IcZ2h4+J
 cJ0Z6Wx6eh10roacZPXweH13XJ7xWO/ublYZvQFQpK2BAKyO98aXGgLDraNt8k60
 X3DZLSKkGB/eBlNeAlTtcrXec/ot6XGJPKr3b/7zhwfMi8B13RGdSmCR8SxMdRQM
 vCQO4G2YadU4YFS6FFIw9Wc1XDYUuYh2YgcveafjzjXbgi7NnY7rYOxtnTgBi0Xv
 XGjtGqpvD676gPm+8b3DcwqmWI3c/WUdtQIZ1uYRCFZdFqkVP91bySPT2aHtx2GO
 4j3uEyTlypC00kyvu1oL3+tVUG/EFlJCpYvIbOwqDG2m7KWPStzpfJTD5Q9cdlEl
 fdgs6l82EVqe1YyjLTqajDuOcRrLYK2hlR/5STc03hQV+GpKSo5UypRejzE9WtRV
 zT/tyelqhj4+0EZJz4ay/8q9s2Jp+5JGVxoVvjujSuH7+Ulb3T+IDkldtMamO8Fm
 www2y0/fLfU2xIapMJdCoJ+ZKgel2i8RZMPIc0cIfO5ITXm+dOs=
 =hwXx
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Jeff Layton contributed an implementation of NFSv4.2+ attribute
  delegation, as described here:

    https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html

  This interoperates with similar functionality introduced into the
  Linux NFS client in v6.11. An attribute delegation permits an NFS
  client to manage a file's mtime, rather than flushing dirty data to
  the NFS server so that the file's mtime reflects the last write, which
  is considerably slower.

  Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
  This facility enables NFSD to increase or decrease the number of slots
  per NFS session depending on server memory availability. More session
  slots means greater parallelism.

  Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
  encoding screws up when crossing a page boundary in the encoding
  buffer. This is a zero-day bug, but hitting it is rare and depends on
  the NFS client implementation. The Linux NFS client does not happen to
  trigger this issue.

  A variety of bug fixes and other incremental improvements fill out the
  list of commits in this release. Great thanks to all contributors,
  reviewers, testers, and bug reporters who participated during this
  development cycle"

* tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
  sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode
  sunrpc: Remove gss_generic_token deadcode
  sunrpc: Remove unused xprt_iter_get_xprt
  Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages"
  nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION
  nfsd: handle delegated timestamps in SETATTR
  nfsd: add support for delegated timestamps
  nfsd: rework NFS4_SHARE_WANT_* flag handling
  nfsd: add support for FATTR4_OPEN_ARGUMENTS
  nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations
  nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_*
  nfsd: switch to autogenerated definitions for open_delegation_type4
  nfs_common: make include/linux/nfs4.h include generated nfs4_1.h
  nfsd: fix handling of delegated change attr in CB_GETATTR
  SUNRPC: Document validity guarantees of the pointer returned by reserve_space
  NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer
  NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer
  NFSD: Refactor nfsd4_do_encode_secinfo() again
  NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer
  NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer
  ...
2025-01-27 17:27:24 -08:00
Linus Torvalds
f96a974170 lsm/stable-6.14 PR 20250121
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmeQFBoUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPvcA//XCdwMz0bGtWKv58nuyP8vkQx08n6
 //olz/O8te3uWK5O3kRiarzFLwH8qsHQ6A7GYalwwix34hatR4ndJE0Y/guVRWa1
 +aBmJxJ7Jm/q3fvpAEfqiSgreuE6kBoztlDOWEq+hUQGu4qfnQGm2EnvbvfFrAmN
 VheOfIQSU2KCL/Scc3FGnF6uru4WrqN0JJ9RbvrEpfdQgmcyTGLnQsZLljutWSIq
 kDWkteIr7cj3O9J45zpxZsTftvYSgVn/y1iKeXbHI4DBA1eheK12vsHB9AADKI1J
 GwHxOrnLpZtv+ICUKqcfFTmWTl+NmfJJurAT5KXKdBjL3xM5MoJlBvK1A5qE9CMo
 LaHVG/TZR2MmBaoM3EN+gvWhDgWlvT02Q/0cYaafTlVLMez3HtfctxN6OnCvTXTB
 Y8dqYClhhlBm/mHQwYfMoeKw4MftUpzEqBd1Nj7Qe8dbP0f/62Ca3K2B3D6Rf8QV
 pj3ryMlSWYV9mdTerruLNQexTGoN7l66jPwzdWpTbFeL3WmNtfCako8OZGbXgPIu
 Iahm3P+jnSVx8ZQro2c9zwdKXI5xiI335pCBbDZ8aX+JAsfj0OofHsFx5Q5diber
 M7tAEhxDqRisbpz7Ei+/LOAEGg2Z619XKg8ks4z6Y4P5PF7zEgeWTkZJk2iLbxXe
 6LLOjmF7LLw+G4M=
 =fgyr
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Improved handling of LSM "secctx" strings through lsm_context struct

   The LSM secctx string interface is from an older time when only one
   LSM was supported, migrate over to the lsm_context struct to better
   support the different LSMs we now have and make it easier to support
   new LSMs in the future.

   These changes explain the Rust, VFS, and networking changes in the
   diffstat.

 - Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are
   enabled

   Small tweak to be a bit smarter about when we build the LSM's common
   audit helpers.

 - Check for absurdly large policies from userspace in SafeSetID

   SafeSetID policies rules are fairly small, basically just "UID:UID",
   it easy to impose a limit of KMALLOC_MAX_SIZE on policy writes which
   helps quiet a number of syzbot related issues. While work is being
   done to address the syzbot issues through other mechanisms, this is a
   trivial and relatively safe fix that we can do now.

 - Various minor improvements and cleanups

   A collection of improvements to the kernel selftests, constification
   of some function parameters, removing redundant assignments, and
   local variable renames to improve readability.

* tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lockdown: initialize local array before use to quiet static analysis
  safesetid: check size of policy writes
  net: corrections for security_secid_to_secctx returns
  lsm: rename variable to avoid shadowing
  lsm: constify function parameters
  security: remove redundant assignment to return variable
  lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set
  selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test
  binder: initialize lsm_context structure
  rust: replace lsm context+len with lsm_context
  lsm: secctx provider check on release
  lsm: lsm_context in security_dentry_init_security
  lsm: use lsm_context in security_inode_getsecctx
  lsm: replace context+len with lsm_context
  lsm: ensure the correct LSM context releaser
2025-01-21 20:03:04 -08:00
Jeff Layton
d3edfd9ed1 nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION
Allow clients to request getting a delegation xor an open stateid if a
delegation isn't available. This allows the client to avoid sending a
final CLOSE for the (useless) open stateid, when it is granted a
delegation.

If this flag is requested by the client and there isn't already a new
open stateid, discard the new open stateid before replying.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
7e13f4f8d2 nfsd: handle delegated timestamps in SETATTR
Allow SETATTR to handle delegated timestamps. This patch assumes that
only the delegation holder has the ability to set the timestamps in this
way, so we allow this only if the SETATTR stateid refers to a
*_ATTRS_DELEG delegation.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
6ae30d6eb2 nfsd: add support for delegated timestamps
Add support for the delegated timestamps on write delegations. This
allows the server to proxy timestamps from the delegation holder to
other clients that are doing GETATTRs vs. the same inode.

When OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS bit is set in the OPEN
call, set the dl_type to the *_ATTRS_DELEG flavor of delegation.

Add timespec64 fields to nfs4_cb_fattr and decode the timestamps into
those. Vet those timestamps according to the delstid spec and update
the inode attrs if necessary.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
cee9b4ef42 nfsd: rework NFS4_SHARE_WANT_* flag handling
The delstid draft adds new NFS4_SHARE_WANT_TYPE_MASK values that don't
fit neatly into the existing WANT_MASK or WHEN_MASK. Add a new
NFS4_SHARE_WANT_MOD_MASK value and redefine NFS4_SHARE_WANT_MASK to
include it.

Also fix the checks in nfsd4_deleg_xgrade_none_ext() to check for the
flags instead of equality, since there may be modifier flags in the
value.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
51c0d4f7e3 nfsd: add support for FATTR4_OPEN_ARGUMENTS
Add support for FATTR4_OPEN_ARGUMENTS. This a new mechanism for the
client to discover what OPEN features the server supports.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
fbd5573d0d nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations
Add some preparatory code to various functions that handle delegation
types to allow them to handle the OPEN_DELEGATE_*_ATTRS_DELEG constants.
Add helpers for detecting whether it's a read or write deleg, and
whether the attributes are delegated.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
c9c99a33e2 nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_*
Add the OPEN4_SHARE_ACCESS_WANT constants from the nfs4.1 and delstid
draft into the nfs4_1.x file, and regenerate the headers and source
files. Do a mass renaming of NFS4_SHARE_WANT_* to
OPEN4_SHARE_ACCESS_WANT_* in the nfsd directory.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
8dfbea8bde nfsd: switch to autogenerated definitions for open_delegation_type4
Rename the enum with the same name in include/linux/nfs4.h, add the
proper enum to nfs4_1.x and regenerate the headers and source files.  Do
a mass rename of all NFS4_OPEN_DELEGATE_* to OPEN_DELEGATE_* in the nfsd
directory.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
8e1d32273a nfs_common: make include/linux/nfs4.h include generated nfs4_1.h
In the long run, the NFS development community intends to autogenerate a
lot of the XDR handling code.  Both the NFS client and server include
"include/linux/nfs4.hi". That file was hand-rolled, and some of the symbols
in it conflict with the autogenerated symbols.

Add a small nfs4_1.x to Documentation that currently just has the
necessary definitions for the delstid draft, and generate the relevant
header and source files. Make include/linux/nfs4.h include the generated
include/linux/sunrpc/xdrgen/nfs4_1.h and remove the conflicting
definitions from it and nfs_xdr.h.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:00 -05:00
Jeff Layton
531503054e nfsd: fix handling of delegated change attr in CB_GETATTR
RFC8881, section 10.4.3 has some specific guidance as to how the
delegated change attribute should be handled. We currently don't follow
that guidance properly.

In particular, when the file is modified, the server always reports the
initial change attribute + 1. Section 10.4.3 however indicates that it
should be incremented on every GETATTR request from other clients.

Only request the change attribute until the file has been modified. If
there is an outstanding delegation, then increment the cached change
attribute on every GETATTR.

Fixes: 6487a13b5c ("NFSD: add support for CB_GETATTR callback")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:29:40 -05:00
Linus Torvalds
37c12fcb3c kernel-6.14-rc1.cred
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ4pRuAAKCRCRxhvAZXjc
 okEiAP4wZOkUGX+d3FUXxM1DJfCsBssoYh01S4LE+s+hkq81vgD8D7PRZk7d12Jw
 zaS6/cLt12UDz1v6Ez103S9AQ5E6ywg=
 =Sknj
 -----END PGP SIGNATURE-----

Merge tag 'kernel-6.14-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull cred refcount updates from Christian Brauner:
 "For the v6.13 cycle we switched overlayfs to a variant of
  override_creds() that doesn't take an extra reference. To this end the
  {override,revert}_creds_light() helpers were introduced.

  This generalizes the idea behind {override,revert}_creds_light() to
  the {override,revert}_creds() helpers. Afterwards overriding and
  reverting credentials is reference count free unless the caller
  explicitly takes a reference.

  All callers have been appropriately ported"

* tag 'kernel-6.14-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
  cred: fold get_new_cred_many() into get_cred_many()
  cred: remove unused get_new_cred()
  nfsd: avoid pointless cred reference count bump
  cachefiles: avoid pointless cred reference count bump
  dns_resolver: avoid pointless cred reference count bump
  trace: avoid pointless cred reference count bump
  cgroup: avoid pointless cred reference count bump
  acct: avoid pointless reference count bump
  io_uring: avoid pointless cred reference count bump
  smb: avoid pointless cred reference count bump
  cifs: avoid pointless cred reference count bump
  cifs: avoid pointless cred reference count bump
  ovl: avoid pointless cred reference count bump
  open: avoid pointless cred reference count bump
  nfsfh: avoid pointless cred reference count bump
  nfs/nfs4recover: avoid pointless cred reference count bump
  nfs/nfs4idmap: avoid pointless reference count bump
  nfs/localio: avoid pointless cred reference count bumps
  coredump: avoid pointless cred reference count bump
  binfmt_misc: avoid pointless cred reference count bump
  ...
2025-01-20 10:13:06 -08:00
Mike Snitzer
085804110a nfs_common: track all open nfsd_files per LOCALIO nfs_client
This tracking enables __nfsd_file_cache_purge() to call
nfs_localio_invalidate_clients(), upon shutdown or export change, to
nfs_close_local_fh() all open nfsd_files that are still cached by the
LOCALIO nfs clients associated with nfsd_net that is being shutdown.

Now that the client must track all open nfsd_files there was more work
than necessary being done with the global nfs_uuids_lock contended.
This manifested in various RCU issues, e.g.:
 hrtimer: interrupt took 47969440 ns
 rcu: INFO: rcu_sched detected stalls on CPUs/tasks:

Use nfs_uuid->lock to protect all nfs_uuid_t members, instead of
nfs_uuids_lock, once nfs_uuid_is_local() adds the client to
nn->local_clients.

Also add 'local_clients_lock' to 'struct nfsd_net' to protect
nn->local_clients.  And store a pointer to spinlock in the 'list_lock'
member of nfs_uuid_t so nfs_localio_disable_client() can use it to
avoid taking the global nfs_uuids_lock.

In combination, these split out locks eliminate the use of the single
nfslocalio.c global nfs_uuids_lock in the IO paths (open and close).

Also refactored associated fs/nfs_common/nfslocalio.c methods' locking
to reduce work performed with spinlocks held in general.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14 17:05:10 -05:00
Mike Snitzer
e1943f4eb8 nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
This global spinlock protects all nfs_uuid_t relative to the global
nfs_uuids list.  A later commit will split this global spinlock so
prepare by renaming this lock to reflect its intended narrow scope.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14 17:05:10 -05:00