Commit Graph

5003 Commits

Author SHA1 Message Date
Linus Torvalds
53e760d894 nfsd-6.17 fixes:
- A correctness fix for delegated timestamps
 - Address an NFSD shutdown hang when LOCALIO is in use
 - Prevent a remotely exploitable crasher when TLS is in use
 
 These arrived too late to be included in the initial nfsd-6.17
 pull request.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmiZ+UMACgkQM2qzM29m
 f5cevBAAgvdeL/4VUue/p7vZEHBtHr3HlaoPpGi/mhFh/f9rrKKs/osSP45uV/we
 tDq8k8f37S/PPAKu5Ts0BmJUVeI16ZvqYw1tXcq6Xifl+qYtowP9re/Xf+6Uln/5
 ebVgqQDO8Zl6rEIZGen/iSp4oq/yk7g7n8XAlL2DzoMcfdju8q5mtyaqKiJtHhor
 lE69sI73v0lj1HLpy/NHdSOQQVAUmhBJQYSpDRGh6jlkWhm9T/U5CP79TBAJVLlx
 Jglhs7GQe0dlP6lLHD0tc7dZ/3LImICQBw2P7PdYaM3Dc1Y2y5uzSfKHnxZ4EHBr
 +uDOD8WFxzt/9WzIoXSCDeMe7KvA8lUnqzEV06Ov5H8h8fHQ1ClR7hhEom+32DKo
 7IC61/MNP+TcWrar+ObucjtuBsuFC65IkPdRAQHUyh0U9rOjFV0Riye9RCMRHZFy
 JPOlfPaUK8wP9AR4O3o6+Aeq4nx49RKd9su4YM/sAl+NdmCZjUnXbryvqymHp99d
 Lmxq9VIIoNyhX0tEbwNx8aop97yOb+76yFGFzLCPFWwV46x1Q49WsTL+fY9xN2uj
 6DAK6wJOMfQGmPFxHH1ttzryjBvCXcVS4SEgzR3UK6KMDYHjI6WE+y1PO/AV9Wae
 RJTHooz4Hsw3h80/yoleZ5YixEiXiQSUDuu7sUVAvksRpnbhFxs=
 =qWSE
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - A correctness fix for delegated timestamps

 - Address an NFSD shutdown hang when LOCALIO is in use

 - Prevent a remotely exploitable crasher when TLS is in use

* tag 'nfsd-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  sunrpc: fix handling of server side tls alerts
  nfsd: avoid ref leak in nfsd_open_local_fh()
  nfsd: don't set the ctime on delegated atime updates
2025-08-11 07:38:55 -07:00
Linus Torvalds
ccc1ead23c NFS client updates for Linux 6.17
Highlights include:
 
 Stable fixes:
 - NFS don't inherit NFS filesystem capabilities when crossing from one
   filesystem to another.
 
 Bugfixes:
 - NFS wakeup of __nfs_lookup_revalidate() needs memory barriers.
 - NFS improve bounds checking in nfs_fh_to_dentry().
 - NFS Fix allocation errors when writing to a NFS file backed loopback
   device.
 - NFSv4: More listxattr fixes
 - SUNRPC: fix client handling of TLS alerts.
 - pNFS block/scsi layout fix for an uninitialised pointer dereference.
 - pNFS block/scsi layout fixes for the extent encoding, stripe mapping,
   and disk offset overflows.
 - pNFS layoutcommit work around for RPC size limitations.
 - pNFS/flexfiles avoid looping when handling fatal errors after layoutget.
 - localio: fix various race conditions.
 
 Features and cleanups:
 - Add NFSv4 support for retrieving the btime.
 - NFS: Allow folio migration for the case of mode == MIGRATE_SYNC.
 - NFS: Support using a kernel keyring to store TLS certificates.
 - NFSv4: Speed up delegation lookup using a hash table.
 - Assorted cleanups to remove unused variables and struct fields.
 - Assorted new tracepoints to improve debugging.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmiWBC8ACgkQZwvnipYK
 APKTfRAAn3ETKN15+yR6wYr/wiibaL6sRzQVo8OzFSI9hEVxljX6kK2HEHXIV93T
 F8bjUMB24KK+Eim8zIeLf4Ke7ldbRqtbiYLJox/I12TtQ6yaiFF+xDm7Fyc7UwcT
 ZUl1UnGeNY30RfQ1n8O4O/suBOsTJy1rpWBWynGeQZLiNHFVDoxH4OgCXGZ5579p
 3GACtToDDH9lgCBbKLM3J0nrcW5Or6BidFxT+zN/FXqeroepuvEcloiwJY7N4f/o
 DW436v7ep92WlaJfuypeMmdusx6+vVaYJEKw+B+UjS3tRjbDmhj2FL3su4dQVCqU
 JVW7TwGFL2zwfjTZjfp43ACN16goqRhX7DnTQkgD1mDnnhENqOsa0rqyIS3Wla4d
 W9phfGmOo2FwuVOUXH2L4k7cfPIsktsZ0s/xg+5UfcsoG2yxUxnY9HaWQGFw6fnN
 Fr9B7gUmaO6o1qpZ3emjecoOdQM9IqxMo39P9/72J9pWFstO9Js/kmGXcfsDo2IE
 z2ZYd+roj2ylcLD9mCJ94tjNsAx0ytvUl1fxNFTRLToQ3Nti+jNy22i4jrCgWfyW
 5jwCjjO87CJlJC0kixEPtQXAB6HQSraZw3Qmok1Ywez3fp4+j+QDFYP7OPJPQ0LY
 C3R1uzvUQKqsbsE8cwYQBrG0VNzZrET6x1dxunVtcYGOkU8s4YA=
 =HCFp
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - don't inherit NFS filesystem capabilities when crossing from one
     filesystem to another

  Bugfixes:
   - NFS wakeup of __nfs_lookup_revalidate() needs memory barriers
   - NFS improve bounds checking in nfs_fh_to_dentry()
   - NFS Fix allocation errors when writing to a NFS file backed
     loopback device
   - NFSv4: More listxattr fixes
   - SUNRPC: fix client handling of TLS alerts
   - pNFS block/scsi layout fix for an uninitialised pointer
     dereference
   - pNFS block/scsi layout fixes for the extent encoding, stripe
     mapping, and disk offset overflows
   - pNFS layoutcommit work around for RPC size limitations
   - pNFS/flexfiles avoid looping when handling fatal errors after
     layoutget
   - localio: fix various race conditions

  Features and cleanups:
   - Add NFSv4 support for retrieving the btime
   - NFS: Allow folio migration for the case of mode == MIGRATE_SYNC
   - NFS: Support using a kernel keyring to store TLS certificates
   - NFSv4: Speed up delegation lookup using a hash table
   - Assorted cleanups to remove unused variables and struct fields
   - Assorted new tracepoints to improve debugging"

* tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
  NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
  NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
  NFS/localio: nfs_close_local_fh() fix check for file closed
  NFSv4: Remove duplicate lookups, capability probes and fsinfo calls
  NFS: Fix the setting of capabilities when automounting a new filesystem
  sunrpc: fix client side handling of tls alerts
  nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock()
  NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY
  NFSv4.2: another fix for listxattr
  NFS: Fix filehandle bounds checking in nfs_fh_to_dentry()
  SUNRPC: Silence warnings about parameters not being described
  NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final()
  NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate()
  NFS: use a hash table for delegation lookup
  NFS: track active delegations per-server
  NFS: move the delegation_watermark module parameter
  NFS: cleanup nfs_inode_reclaim_delegation
  NFS: cleanup error handling in nfs4_server_common_setup
  pNFS/flexfiles: don't attempt pnfs on fatal DS errors
  NFS: drop __exit from nfs_exit_keyring
  ...
2025-08-09 07:20:44 +03:00
Olga Kornievskaia
bee47cb026 sunrpc: fix handling of server side tls alerts
Scott Mayhew discovered a security exploit in NFS over TLS in
tls_alert_recv() due to its assumption it can read data from
the msg iterator's kvec..

kTLS implementation splits TLS non-data record payload between
the control message buffer (which includes the type such as TLS
aler or TLS cipher change) and the rest of the payload (say TLS
alert's level/description) which goes into the msg payload buffer.

This patch proposes to rework how control messages are setup and
used by sock_recvmsg().

If no control message structure is setup, kTLS layer will read and
process TLS data record types. As soon as it encounters a TLS control
message, it would return an error. At that point, NFS can setup a
kvec backed msg buffer and read in the control message such as a
TLS alert. Msg iterator can advance the kvec pointer as a part of
the copy process thus we need to revert the iterator before calling
into the tls_alert_recv.

Reported-by: Scott Mayhew <smayhew@redhat.com>
Fixes: 5e052dda12 ("SUNRPC: Recognize control messages in server-side TCP socket code")
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-08-06 09:57:50 -04:00
Olga Kornievskaia
cc5d59081f sunrpc: fix client side handling of tls alerts
A security exploit was discovered in NFS over TLS in tls_alert_recv
due to its assumption that there is valid data in the msghdr's
iterator's kvec.

Instead, this patch proposes the rework how control messages are
setup and used by sock_recvmsg().

If no control message structure is setup, kTLS layer will read and
process TLS data record types. As soon as it encounters a TLS control
message, it would return an error. At that point, NFS can setup a kvec
backed control buffer and read in the control message such as a TLS
alert. Scott found that a msg iterator can advance the kvec pointer
as a part of the copy process thus we need to revert the iterator
before calling into the tls_alert_recv.

Fixes: dea034b963 ("SUNRPC: Capture CMSG metadata on client-side receive")
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Suggested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Link: https://lore.kernel.org/r/20250731180058.4669-3-okorniev@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-03 12:45:47 -07:00
Linus Torvalds
ddf52f12ef Massage rpc_pipefs to use saner primitives and clean up the
APIs provided to the rest of the kernel.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIRDbQAKCRBZ7Krx/gZQ
 63n6APwNnJXwgtSDi9N0FfHOlYqYSCaCjezVLbq+GR8K+r4wowD/TX/A4Qbyjjic
 /VG8VbYe6fRaD53vp1giGI/dJiTI2Qg=
 =Ta4H
 -----END PGP SIGNATURE-----

Merge tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull rpc_pipefs updates from Al Viro:
 "Massage rpc_pipefs to use saner primitives and clean up the APIs
  provided to the rest of the kernel"

* tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  rpc_create_client_dir(): return 0 or -E...
  rpc_create_client_dir(): don't bother with rpc_populate()
  rpc_new_dir(): the last argument is always NULL
  rpc_pipe: expand the calls of rpc_mkdir_populate()
  rpc_gssd_dummy_populate(): don't bother with rpc_populate()
  rpc_mkpipe_dentry(): switch to simple_start_creating()
  rpc_pipe: saner primitive for creating regular files
  rpc_pipe: saner primitive for creating subdirectories
  rpc_pipe: don't overdo directory locking
  rpc_mkpipe_dentry(): saner calling conventions
  rpc_unlink(): saner calling conventions
  rpc_populate(): lift cleanup into callers
  rpc_unlink(): use simple_recursive_removal()
  rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead
  rpc_pipe: clean failure exits in fill_super
  new helper: simple_start_creating()
2025-07-28 09:56:09 -07:00
Linus Torvalds
11fe69fbd5 Current exclusion rules for ->d_flags stores are rather unpleasant.
The basic rules are simple:
 	* stores to dentry->d_flags are OK under dentry->d_lock.
 	* stores to dentry->d_flags are OK in the dentry constructor, before
 becomes potentially visible to other threads.
 Unfortunately, there's a couple of exceptions to that, and that's where the
 headache comes from.
 
 	Main PITA comes from d_set_d_op(); that primitive sets ->d_op
 of dentry and adjusts the flags that correspond to presence of individual
 methods.  It's very easy to misuse; existing uses _are_ safe, but proof
 of correctness is brittle.
 
 	Use in __d_alloc() is safe (we are within a constructor), but we
 might as well precalculate the initial value of ->d_flags when we set
 the default ->d_op for given superblock and set ->d_flags directly
 instead of messing with that helper.
 
 	The reasons why other uses are safe are bloody convoluted; I'm not going
 to reproduce it here.  See https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/
 for gory details, if you care.  The critical part is using d_set_d_op() only
 just prior to d_splice_alias(), which makes a combination of d_splice_alias()
 with setting ->d_op, etc. a natural replacement primitive.  Better yet, if
 we go that way, it's easy to take setting ->d_op and modifying ->d_flags
 under ->d_lock, which eliminates the headache as far as ->d_flags exclusion
 rules are concerned.  Other exceptions are minor and easy to deal with.
 
 	What this series does:
 * d_set_d_op() is no longer available; new primitive (d_splice_alias_ops())
 is provided, equivalent to combination of d_set_d_op() and d_splice_alias().
 * new field of struct super_block - ->s_d_flags.  Default value of ->d_flags
 to be used when allocating dentries on this filesystem.
 * new primitive for setting ->s_d_op: set_default_d_op().  Replaces stores
 to ->s_d_op at mount time.  All in-tree filesystems converted; out-of-tree
 ones will get caught by compiler (->s_d_op is renamed, so stores to it will
 be caught).  ->s_d_flags is set by the same primitive to match the ->s_d_op.
 * a lot of filesystems had ->s_d_op->d_delete equal to always_delete_dentry;
 that is equivalent to setting DCACHE_DONTCACHE in ->d_flags, so such filesystems
 can bloody well set that bit in ->s_d_flags and drop ->d_delete() from
 dentry_operations.  In quite a few cases that results in empty dentry_operations,
 which means that we can get rid of those.
 * kill simple_dentry_operations - not needed anymore.
 * massage d_alloc_parallel() to get rid of the other exception wrt ->d_flags
 stores - we can set DCACHE_PAR_LOOKUP as soon as we allocate the new dentry;
 no need to delay that until we commit to using the sucker.
 
 As the result, ->d_flags stores are all either under ->d_lock or done before
 the dentry becomes visible in any shared data structures.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIQ/tQAKCRBZ7Krx/gZQ
 66AhAQDgQ+S224x5YevNXc9mDoGUBMF4OG0n0fIla9rfdL4I6wEAqpOWMNDcVPCZ
 GwYOvJ9YuqNdz+MyprAI18Yza4GOmgs=
 =rTYB
 -----END PGP SIGNATURE-----

Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull dentry d_flags updates from Al Viro:
 "The current exclusion rules for dentry->d_flags stores are rather
  unpleasant. The basic rules are simple:

   - stores to dentry->d_flags are OK under dentry->d_lock

   - stores to dentry->d_flags are OK in the dentry constructor, before
     becomes potentially visible to other threads

  Unfortunately, there's a couple of exceptions to that, and that's
  where the headache comes from.

  The main PITA comes from d_set_d_op(); that primitive sets ->d_op of
  dentry and adjusts the flags that correspond to presence of individual
  methods. It's very easy to misuse; existing uses _are_ safe, but proof
  of correctness is brittle.

  Use in __d_alloc() is safe (we are within a constructor), but we might
  as well precalculate the initial value of 'd_flags' when we set the
  default ->d_op for given superblock and set 'd_flags' directly instead
  of messing with that helper.

  The reasons why other uses are safe are bloody convoluted; I'm not
  going to reproduce it here. See [1] for gory details, if you care. The
  critical part is using d_set_d_op() only just prior to
  d_splice_alias(), which makes a combination of d_splice_alias() with
  setting ->d_op, etc a natural replacement primitive.

  Better yet, if we go that way, it's easy to take setting ->d_op and
  modifying 'd_flags' under ->d_lock, which eliminates the headache as
  far as 'd_flags' exclusion rules are concerned. Other exceptions are
  minor and easy to deal with.

  What this series does:

   - d_set_d_op() is no longer available; instead a new primitive
     (d_splice_alias_ops()) is provided, equivalent to combination of
     d_set_d_op() and d_splice_alias().

   - new field of struct super_block - 's_d_flags'. This sets the
     default value of 'd_flags' to be used when allocating dentries on
     this filesystem.

   - new primitive for setting 's_d_op': set_default_d_op(). This
     replaces stores to 's_d_op' at mount time.

     All in-tree filesystems converted; out-of-tree ones will get caught
     by the compiler ('s_d_op' is renamed, so stores to it will be
     caught). 's_d_flags' is set by the same primitive to match the
     's_d_op'.

   - a lot of filesystems had sb->s_d_op->d_delete equal to
     always_delete_dentry; that is equivalent to setting
     DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well
     set that bit in 's_d_flags' and drop 'd_delete()' from
     dentry_operations.

     In quite a few cases that results in empty dentry_operations, which
     means that we can get rid of those.

   - kill simple_dentry_operations - not needed anymore

   - massage d_alloc_parallel() to get rid of the other exception wrt
     'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we
     allocate the new dentry; no need to delay that until we commit to
     using the sucker.

  As the result, 'd_flags' stores are all either under ->d_lock or done
  before the dentry becomes visible in any shared data structures"

Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/ [1]

* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
  configfs: use DCACHE_DONTCACHE
  debugfs: use DCACHE_DONTCACHE
  efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry()
  9p: don't bother with always_delete_dentry
  ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE
  kill simple_dentry_operations
  devpts, sunrpc, hostfs: don't bother with ->d_op
  shmem: no dentry retention past the refcount reaching zero
  d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier
  make d_set_d_op() static
  simple_lookup(): just set DCACHE_DONTCACHE
  tracefs: Add d_delete to remove negative dentries
  set_default_d_op(): calculate the matching value for ->d_flags
  correct the set of flags forbidden at d_set_d_op() time
  split d_flags calculation out of d_set_d_op()
  new helper: set_default_d_op()
  fuse: no need for special dentry_operations for root dentry
  switch procfs from d_set_d_op() to d_splice_alias_ops()
  new helper: d_splice_alias_ops()
  procfs: kill ->proc_dops
  ...
2025-07-28 09:17:57 -07:00
Trond Myklebust
f66e6bffc5 SUNRPC: Silence warnings about parameters not being described
Warning: net/sunrpc/auth_gss/gss_krb5_crypto.c:902 function parameter
'len' not described in 'krb5_etm_decrypt'
Warning: net/sunrpc/auth_gss/gss_krb5_crypto.c:902 function parameter
'buf' not described in 'krb5_etm_decrypt'

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:41 -04:00
Dr. David Alan Gilbert
48693d119b SUNRPC: Remove unused xdr functions
Remove a bunch of unused xdr_*decode* functions:
  The last use of xdr_decode_netobj() was removed in 2021 by:
commit 7cf96b6d01 ("lockd: Update the NLMv4 SHARE arguments decoder to
use struct xdr_stream")
  The last use of xdr_decode_string_inplace() was removed in 2021 by:
commit 3049e974a7 ("lockd: Update the NLMv4 FREE_ALL arguments decoder
to use struct xdr_stream")
  The last use of xdr_stream_decode_opaque() was removed in 2024 by:
commit fed8a17c61 ("xdrgen: typedefs should use the built-in string and
opaque functions")

  The functions xdr_stream_decode_string() and
xdr_stream_decode_opaque_dup() were both added in 2018 by the
commit 0e779aa703 ("SUNRPC: Add helpers for decoding opaque and string
types")
but never used.

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20250712233006.403226-1-linux@treblig.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:28 -07:00
Jeff Layton
24569f0249 sunrpc: make svc_tcp_sendmsg() take a signed sentp pointer
The return value of sock_sendmsg() is signed, and svc_tcp_sendto() wants
a signed value to return.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14 12:46:49 -04:00
Jeff Layton
0f2b8ee630 sunrpc: return better error in svcauth_gss_accept() on alloc failure
This ends up returning AUTH_BADCRED when memory allocation fails today.
Fix it to return AUTH_FAILED, which better indicates a failure on the
server.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14 12:46:48 -04:00
Jeff Layton
c8af9d3d4b sunrpc: reset rq_accept_statp when starting a new RPC
rq_accept_statp should point to the location of the accept_status in the
reply. This field is not reset between RPCs so if svc_authenticate or
pg_authenticate return SVC_DENIED without setting the pointer, it could
result in the status being written to the wrong place.

This pointer starts its lifetime as NULL. Reset it on every iteration
so we get consistent behavior if this happens.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14 12:46:48 -04:00
Jeff Layton
6f0e26243b sunrpc: remove SVC_SYSERR
Nothing returns this error code.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14 12:46:48 -04:00
Jeff Layton
d49afc90a3 sunrpc: fix handling of unknown auth status codes
In the case of an unknown error code from svc_authenticate or
pg_authenticate, return AUTH_ERROR with a status of AUTH_FAILED. Also
add the other auth_stat value from RFC 5531, and document all the status
codes.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14 12:46:47 -04:00
Jeff Layton
f26c930530 sunrpc: new tracepoints around svc thread wakeups
Convert the svc_wake_up tracepoint into svc_pool_thread_event class.
Have it also record the pool id, and add new tracepoints for when the
thread is already running and for when there are no idle threads.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14 12:46:38 -04:00
Christoph Hellwig
1aa3f767e0 sunrpc: unexport csum_partial_copy_to_xdr
csum_partial_copy_to_xdr is only used inside the sunrpc module, so
remove the export.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14 12:46:38 -04:00
Christoph Hellwig
37149988ea sunrpc: simplify xdr_partial_copy_from_skb
csum_partial_copy_to_xdr can handle a checksumming and non-checksumming
case and implements this using a callback, which leads to a lot of
boilerplate code and indirect calls in the fast path.

Switch to storing a need_checksum flag in struct xdr_skb_reader instead
to remove the indirect call and simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14 12:46:37 -04:00
Christoph Hellwig
8d43417e93 sunrpc: simplify xdr_init_encode_pages
The rqst argument to xdr_init_encode_pages is set to NULL by all callers,
and pages is always set to buf->pages.  Remove the two arguments and
hardcode the assignments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14 12:46:37 -04:00
Al Viro
350db61fbe rpc_create_client_dir(): return 0 or -E...
Callers couldn't care less which dentry did we get - anything
valid is treated as success.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
3ee735ef5a rpc_create_client_dir(): don't bother with rpc_populate()
not for a single file...

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
db83fa912e rpc_new_dir(): the last argument is always NULL
All callers except the one in rpc_populate() pass explicit NULL
there; rpc_populate() passes its last argument ('private') instead,
but in the only call of rpc_populate() that creates any subdirectories
(when creating fixed subdirectories of root) private itself is NULL.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
805060a69c rpc_pipe: expand the calls of rpc_mkdir_populate()
... and get rid of convoluted callbacks.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
065e88fa33 rpc_gssd_dummy_populate(): don't bother with rpc_populate()
Just have it create gssd (in root), clntXX in gssd, then info and gssd in clntXX
- all with explicit rpc_new_dir()/rpc_new_file()/rpc_mkpipe_dentry().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
a117bf4caa rpc_mkpipe_dentry(): switch to simple_start_creating()
... and make sure we set the fs-private part of inode up before
attaching it to dentry.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
5c1da75895 rpc_pipe: saner primitive for creating regular files
rpc_new_file(); similar to rpc_new_dir(), except that here we pass
file_operations as well.  Callers don't care about dentry, just
return 0 or -E...

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
fc1abdca51 rpc_pipe: saner primitive for creating subdirectories
All users of __rpc_mkdir() have the same form - start_creating(),
followed, in case of success, by __rpc_mkdir() and unlocking parent.

Combine that into a single helper, expanding __rpc_mkdir() into it,
along with the call of __rpc_create_common() in it.

Don't mess with d_drop() + d_add() - just d_instantiate() and be
done with that.  The reason __rpc_create_common() goes for that
dance is that dentry it gets might or might not be hashed; here
we know it's hashed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
41a6b9e52b rpc_pipe: don't overdo directory locking
Don't try to hold directories locked more than VFS requires;
lock just before getting a child to be made positive (using
simple_start_creating()) and unlock as soon as the child is
created.  There's no benefit in keeping the parent locked
while populating the child - it won't stop dcache lookups anyway.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
19a6314a99 rpc_mkpipe_dentry(): saner calling conventions
Instead of returning a dentry or ERR_PTR(-E...), return 0 and store
dentry into pipe->dentry on success and return -E... on failure.

Callers are happier that way...

NOTE: dummy rpc_pipe is getting ->dentry set; we never access that,
since we
	1) never call rpc_unlink() for it (dentry is taken out by
->kill_sb())
	2) never call rpc_queue_upcall() for it (writing to that
sucker fails; no downcalls are ever submitted, so no replies are
going to arrive)
IOW, having that ->dentry set (and left dangling) is harmless,
if ugly; cleaner solution will take more massage.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
bccea4ed06 rpc_unlink(): saner calling conventions
1) pass it pipe instead of pipe->dentry
2) zero pipe->dentry afterwards
3) it always returns 0; why bother?

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
8be22c4964 rpc_populate(): lift cleanup into callers
rpc_populate() is called either from fill_super (where we don't
need to remove any files on failure - rpc_kill_sb() will take
them all out anyway) or from rpc_mkdir_populate(), where we need
to remove the directory we'd been trying to populate along with
whatever we'd put into it before we failed.  Simpler to combine
that into simple_recursive_removal() there.

Note that rpc_pipe is overlocking directories quite a bit -
locked parent is no obstacle to finding a child in dcache, so
keeping it locked won't prevent userland observing a partially
built subtree.

All we need is to follow minimal VFS requirements; it's not
as if clients used directory locking for exclusion - tree
changes are serialized, but that's done on ->pipefs_sb_lock.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
3829b30e77 rpc_unlink(): use simple_recursive_removal()
note that the callback of simple_recursive_removal() is called with
the parent locked; the victim isn't locked by the caller.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
8e7490c40e rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead
no need to give an exact list of objects to be remove when it's
simply every child of the victim directory.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:54 -04:00
Al Viro
4b2f61af8a rpc_pipe: clean failure exits in fill_super
->kill_sb() will be called immediately after a failure return
anyway, so we don't need to bother with notifier chain and
rpc_gssd_dummy_depopulate().  What's more, rpc_gssd_dummy_populate()
failure exits do not need to bother with __rpc_depopulate() -
anything added to the tree will be taken out by ->kill_sb().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:54 -04:00
Nikhil Jha
9c19b3315c sunrpc: fix loop in gss seqno cache
There was a silly bug in the initial implementation where a loop
variable was not incremented. This commit increments the loop variable.

This bug is somewhat tricky to catch because it can only happen on loops
of two or more. If it is hit, it locks up a kernel thread in an infinite
loop.

Signed-off-by: Nikhil Jha <njha@janestreet.com>
Tested-by: Nikhil Jha <njha@janestreet.com>
Fixes: 08d6ee6d8a ("sunrpc: implement rfc2203 rpcsec_gss seqnum cache")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-06-23 11:01:15 -04:00
Linus Torvalds
739a6c93cc nfsd-6.16 fixes:
- Two fixes for commits in the nfsd-6.16 merge
 - One fix for the recently-added NFSD netlink facility
 - One fix for a remote SunRPC crasher
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmhWykQACgkQM2qzM29m
 f5cfIA/+MtqpF8iHCNWBk2mmgQHtfS78qhQexEsQ69F1k90+bzCJ9Mu7AZT67xRZ
 2E6Fm3gyJuFPaJ9YOGewyD6OV/eE0SioPQteZIIwUVclKcoyVtqJpc57A/aQfMJm
 Te1bPWYrc5jfxoF/QzkOjzd4KWXOej2nIqsybFIDK5uTe7gn9QpPiFL8ePRuf7bJ
 5oWBTZPpmuWpDcebQrz56OT0bvvrCNH9EV9TZNYwVa/B7yl3UNHwM2YB5O9m2Rl7
 GUeWytc2Y7WgdDJe9poPwNrYyXFb8JX4pCbS54q5wCRLYLDgo8PsgsFCAFfgjx0l
 ArATKDhzfYK0azASDtJma2eCJkrm4/nXRNFON7xucN79pYRh9GChcmhhfubb8Yo3
 HopUaCSLOgy7PNsbbz3lNREn5uBpomeK1Mb88/UkJ/d8fmKGIO1ITTlarazYv2w5
 GgmzZSBS43VdUmipqNyddIMg2uy7uaeAQZpr4jHVFVctPypjXCn/zIrhG2mmjRhW
 M7PCJqtFj0a7DM48LA6r9K8N4Yi7EoSQKNAKfwduE++XXYWgBk9blaF/7om/fIRJ
 VJm/RfiuFuyGaxCLd7WpPpASeS/nq410BMV5/kxol+UgiCxk8xjsvU7y8OWLOl7S
 rUend5Yei7/h0NGoSmQPnBoYnb1YuTeMMzjfaRWSsV5gbvVDejQ=
 =1C8u
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Two fixes for commits in the nfsd-6.16 merge

 - One fix for the recently-added NFSD netlink facility

 - One fix for a remote SunRPC crasher

* tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  sunrpc: handle SVC_GARBAGE during svc auth processing as auth error
  nfsd: use threads array as-is in netlink interface
  SUNRPC: Cleanup/fix initial rq_pages allocation
  NFSD: Avoid corruption of a referring call list
2025-06-21 09:20:15 -07:00
Jeff Layton
94d10a4dba sunrpc: handle SVC_GARBAGE during svc auth processing as auth error
tianshuo han reported a remotely-triggerable crash if the client sends a
kernel RPC server a specially crafted packet. If decoding the RPC reply
fails in such a way that SVC_GARBAGE is returned without setting the
rq_accept_statp pointer, then that pointer can be dereferenced and a
value stored there.

If it's the first time the thread has processed an RPC, then that
pointer will be set to NULL and the kernel will crash. In other cases,
it could create a memory scribble.

The server sunrpc code treats a SVC_GARBAGE return from svc_authenticate
or pg_authenticate as if it should send a GARBAGE_ARGS reply. RFC 5531
says that if authentication fails that the RPC should be rejected
instead with a status of AUTH_ERR.

Handle a SVC_GARBAGE return as an AUTH_ERROR, with a reason of
AUTH_BADCRED instead of returning GARBAGE_ARGS in that case. This
sidesteps the whole problem of touching the rpc_accept_statp pointer in
this situation and avoids the crash.

Cc: stable@kernel.org
Fixes: 29cd2927fb ("SUNRPC: Fix encoding of accepted but unsuccessful RPC replies")
Reported-by: tianshuo han <hantianshuo233@gmail.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-06-19 09:35:45 -04:00
Benjamin Coddington
8b3ac9faba SUNRPC: Cleanup/fix initial rq_pages allocation
While investigating some reports of memory-constrained NUMA machines
failing to mount v3 and v4.0 nfs mounts, we found that svc_init_buffer()
was not attempting to retry allocations from the bulk page allocator.
Typically, this results in a single page allocation being returned and
the mount attempt fails with -ENOMEM.  A retry would have allowed the mount
to succeed.

Additionally, it seems that the bulk allocation in svc_init_buffer() is
redundant because svc_alloc_arg() will perform the required allocation and
does the correct thing to retry the allocations.

The call to allocate memory in svc_alloc_arg() drops the preferred node
argument, but I expect we'll still allocate on the preferred node because
the allocation call happens within the svc thread context, which chooses
the node with memory closest to the current thread's execution.

This patch cleans out the bulk allocation in svc_init_buffer() to allow
svc_alloc_arg() to handle the allocation/retry logic for rq_pages.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: ed603bcf4f ("sunrpc: Replace the rq_pages array with dynamically-allocated memory")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-06-12 20:37:33 -04:00
Al Viro
fe3c5120d6 devpts, sunrpc, hostfs: don't bother with ->d_op
Default ->d_op being simple_dentry_operations is equivalent to leaving
it NULL and putting DCACHE_DONTCACHE into ->s_d_flags.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11 13:40:04 -04:00
Al Viro
05fb0e6664 new helper: set_default_d_op()
... to be used instead of manually assigning to ->s_d_op.
All in-tree filesystem converted (and field itself is renamed,
so any out-of-tree ones in need of conversion will be caught
by compiler).

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-10 22:21:16 -04:00
Ingo Molnar
41cb08555c treewide, timers: Rename from_timer() to timer_container_of()
Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-06-08 09:07:37 +02:00
Linus Torvalds
5abc7438f1 NFS Clent Updates for Linux 6.16
New Features:
   * Implement the Sunrpc rfc2203 rpcsec_gss sequence number cache
   * Add support for FALLOC_FL_ZERO_RANGE on NFS v4.2
   * Add a localio sysfs attribute
 
 Stable Fixes:
   * Fix double-unlock bug in nfs_return_empty_folio()
   * Don't check for OPEN feature support in v4.1
   * Always probe for LOCALIO support asynchronously
   * Prevent hang on NFS mounts with xprtsec=[m]tls
 
 Other Bugfixes:
   * xattr handlers should check for absent nfs filehandles
   * Fix setattr caching of TIME_[MODIFY|ACCESS]_SET when timestamps are delegated
   * Fix listxattr to return selinux security labels
   * Connect to NFSv3 DS using TLS if MDS connection uses TLS
   * Clear SB_RDONLY before getting a superblock, and ignore when remounting
   * Fix incorrect handling of NFS error codes in nfs4_do_mkdir()
   * Various nfs_localio fixes from Neil Brown that include fixing an
       rcu compilation error found by older gcc versions.
   * Update stats on flexfiles pNFS DSes when receiving NFS4ERR_DELAY
 
 Cleanups:
   * Add a refcount tracker for struct net in the nfs_client
   * Allow FREE_STATEID to clean up delegations
   * Always set NLINK even if the server doesn't support it
   * Cleanups to the NFS folio writeback code
   * Remove dead code from xs_tcp_tls_setup_socket()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmg/YkAACgkQ18tUv7Cl
 QOuGpQ/+OuG/xkVX6j7FerUcdbVhcZ+5jDUKC0cNe6EeFeFRjgqsdFB0uqH+AgJh
 DlxEJuXTMq+9mcptl0rjrOn0tj7dlTpgZowp3kWdK3bX1zSI2jBEJjnz3xVzjBQx
 3lbmF/UAIaHv5bPVc9aF8mioaj5DSRKWTBLTg7iOM1ol1DqgHK/M0q2D7d2n1yB4
 WYGI7LlAWSBGV4PvEkhHW6PwVPDSqECPBvIxd1obq8TSNl+YZlmVxCoJ99+zVqWf
 dvaDOwfs5x+YEQH/+N/XWdc38QiCGfu7H79qGHShWB8t/KT4axxmjVs2fT7xtUsv
 yN3fb77rlFOCJaPLRF549/4EJqHYMWmFDKIMUZ7YC1vEBCG4B1kQUqarA5eCbsAi
 s/rxBs1VNKeev/RecDaViAeH3XZoVU1rNyIBJjOuWgNlC5wnbF+An3zE0m8MAXxO
 Vh7wQSH3GZEY+VCR6ljwLhIv6+tvSVQxEZKUUjfVQXp5UuNwN3wKa+sW6li+FBl6
 uV6lJcmdUffrurNhvSghIiSQGDkerHUVhSltgtj5FnmRp/AM95Z850t5a7qqc7Cv
 duks9siLLaeC4K5W+AOcKLWXho1dJMIPWUej3ErCiHWnA20QiNXsQN4QoimkDKqf
 9SYdcl6UECqV5MzIa/L7cW96S3K0acrq+8ofJCjN3A8M0pcTGgU=
 =5DFQ
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.16-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS clent updates from Anna Schumaker:
 "New Features:

   - Implement the Sunrpc rfc2203 rpcsec_gss sequence number cache

   - Add support for FALLOC_FL_ZERO_RANGE on NFS v4.2

   - Add a localio sysfs attribute

  Stable Fixes:

   - Fix double-unlock bug in nfs_return_empty_folio()

   - Don't check for OPEN feature support in v4.1

   - Always probe for LOCALIO support asynchronously

   - Prevent hang on NFS mounts with xprtsec=[m]tls

  Other Bugfixes:

   - xattr handlers should check for absent nfs filehandles

   - Fix setattr caching of TIME_[MODIFY|ACCESS]_SET when timestamps are
     delegated

   - Fix listxattr to return selinux security labels

   - Connect to NFSv3 DS using TLS if MDS connection uses TLS

   - Clear SB_RDONLY before getting a superblock, and ignore when
     remounting

   - Fix incorrect handling of NFS error codes in nfs4_do_mkdir()

   - Various nfs_localio fixes from Neil Brown that include fixing an
     rcu compilation error found by older gcc versions.

   - Update stats on flexfiles pNFS DSes when receiving NFS4ERR_DELAY

  Cleanups:

   - Add a refcount tracker for struct net in the nfs_client

   - Allow FREE_STATEID to clean up delegations

   - Always set NLINK even if the server doesn't support it

   - Cleanups to the NFS folio writeback code

   - Remove dead code from xs_tcp_tls_setup_socket()"

* tag 'nfs-for-6.16-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (30 commits)
  flexfiles/pNFS: update stats on NFS4ERR_DELAY for v4.1 DSes
  nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer
  nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()
  nfs_localio: duplicate nfs_close_local_fh()
  nfs_localio: simplify interface to nfsd for getting nfsd_file
  nfs_localio: always hold nfsd net ref with nfsd_file ref
  nfs_localio: use cmpxchg() to install new nfs_file_localio
  SUNRPC: Remove dead code from xs_tcp_tls_setup_socket()
  SUNRPC: Prevent hang on NFS mount with xprtsec=[m]tls
  nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()
  nfs: ignore SB_RDONLY when remounting nfs
  nfs: clear SB_RDONLY before getting superblock
  NFS: always probe for LOCALIO support asynchronously
  pnfs/flexfiles: connect to NFSv3 DS using TLS if MDS connection uses TLS
  NFS: add localio to sysfs
  nfs: use writeback_iter directly
  nfs: refactor nfs_do_writepage
  nfs: don't return AOP_WRITEPAGE_ACTIVATE from nfs_do_writepage
  nfs: fold nfs_page_async_flush into nfs_do_writepage
  NFSv4: Always set NLINK even if the server doesn't support it
  ...
2025-06-03 16:13:32 -07:00
Chuck Lever
111f9e4b0d SUNRPC: Remove dead code from xs_tcp_tls_setup_socket()
xs_tcp_tls_finish_connecting() already marks the upper xprt
connected, so the same code in xs_tcp_tls_setup_socket() is
never executed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-05-28 17:17:14 -04:00
Chuck Lever
0bd2f6b899 SUNRPC: Prevent hang on NFS mount with xprtsec=[m]tls
Engineers at Hammerspace noticed that sometimes mounting with
"xprtsec=tls" hangs for a minute or so, and then times out, even
when the NFS server is reachable and responsive.

kTLS shuts off data_ready callbacks if strp->msg_ready is set to
mitigate data_ready callbacks when a full TLS record is not yet
ready to be read from the socket.

Normally msg_ready is clear when the first TLS record arrives on
a socket. However, I observed that sometimes tls_setsockopt() sets
strp->msg_ready, and that prevents forward progress because
tls_data_ready() becomes a no-op.

Moreover, Jakub says: "If there's a full record queued at the time
when [tlshd] passes the socket back to the kernel, it's up to the
reader to read the already queued data out." So SunRPC cannot
expect a data_ready call when ingress data is already waiting.

Add an explicit poll after SunRPC's upper transport is set up to
pick up any data that arrived after the TLS handshake but before
transport set-up is complete.

Reported-by: Steve Sears <sjs@hammerspace.com>
Suggested-by: Jakub Kacinski <kuba@kernel.org>
Fixes: 75eb6af7ac ("SUNRPC: Add a TCP-with-TLS RPC transport class")
Tested-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-05-28 17:17:14 -04:00
Linus Torvalds
2c26b68cd5 NFSD 6.16 Release Notes
The marquee feature for this release is that the limit on the
 maximum rsize and wsize has been raised to 4MB. The default remains
 at 1MB, but risk-seeking administrators now have the ability to try
 larger I/O sizes with NFS clients that support them. Eventually the
 default setting will be increased when we have confidence that this
 change will not have negative impact.
 
 With v6.16, NFSD now has its own debugfs file system where we can
 add experimental features and make them available outside of our
 development community without impacting production deployments. The
 first experimental setting added is one that makes all NFS READ
 operations use vfs_iter_read() instead of the NFSD splice actor. The
 plan is to eventually retire the splice actor, as that will enable a
 number of new capabilities such as the use of struct bio_vec from the
 top to the bottom of the NFSD stack.
 
 Jeff Layton contributed a number of observability improvements. The
 use of dprintk() in a number of high-traffic code paths has been
 replaced with static trace points.
 
 This release sees the continuation of efforts to harden the NFSv4.2
 COPY operation. Soon, the restriction on async COPY operations can
 be lifted.
 
 Many thanks to the contributors, reviewers, testers, and bug
 reporters who participated during the v6.16 development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmg1yGUACgkQM2qzM29m
 f5frMA//TJTbSWiM7qBX1GhVMNr1lxQcjU4BPKo0qZfEtwV06F2BB9mWgDU+BIQh
 AcGfMZUmNWAnhOTOYvwqyW6dnX+1yt8sBsCZ/1ctY30A4JH4AgG5sdZS7BUrlEEr
 bGDMUCaPnvQ3maeDjMlefe7Xv/rUhj9TVhXmDkt4vf/jCde2JODTB/z8n7WeAxYJ
 eOvmr/n5z6VI5Q67M7b5/xqofBEaEoq9P5UEgn61ThfeR0bMlrklm/avDCbbNIH8
 6n7Z3tjzllK1CAjEmwHalq4LRbMX5FHWzNkyJw+wtviXS18J5vCAvRe+JDoykusu
 L2bgXT8bBUqy46eO4WKEOJtEqVQhIsRFx/8ku1iTLrpDWlwrR4mHVyObEDkkdlMX
 EyBQ4svg2OxCXSyy5O8oggzU0TWVJStIjbIEHbJYusWLU7HxxFveBwqwzYHXLtip
 WKm6N2ANqQi1du+Pc6xmgXo9svA5Vk+DQjljm1Y5up9dhi2K9cvCIHjwFsZ+E0VL
 XqXJ2YgIQb3oXK7FttzLOiDrpX1OX82sTIbgdcPcfT7lP+ej7uiHMBPmdPwgaZIU
 EbIp0ThoTkh8/VRMDcWIt+B6SEhmb5vY3Zgz9Lcf2J0PM1fuYJ67L7xGTviFX7Ci
 DpohiCgceb6PHYeIuarayF86tPJGF8Vb7XvQZej2Ybv8QdxLFg8=
 =FbeG
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "The marquee feature for this release is that the limit on the maximum
  rsize and wsize has been raised to 4MB. The default remains at 1MB,
  but risk-seeking administrators now have the ability to try larger I/O
  sizes with NFS clients that support them. Eventually the default
  setting will be increased when we have confidence that this change
  will not have negative impact.

  With v6.16, NFSD now has its own debugfs file system where we can add
  experimental features and make them available outside of our
  development community without impacting production deployments. The
  first experimental setting added is one that makes all NFS READ
  operations use vfs_iter_read() instead of the NFSD splice actor. The
  plan is to eventually retire the splice actor, as that will enable a
  number of new capabilities such as the use of struct bio_vec from the
  top to the bottom of the NFSD stack.

  Jeff Layton contributed a number of observability improvements. The
  use of dprintk() in a number of high-traffic code paths has been
  replaced with static trace points.

  This release sees the continuation of efforts to harden the NFSv4.2
  COPY operation. Soon, the restriction on async COPY operations can be
  lifted.

  Many thanks to the contributors, reviewers, testers, and bug reporters
  who participated during the v6.16 development cycle"

* tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (60 commits)
  xdrgen: Fix code generated for counted arrays
  SUNRPC: Bump the maximum payload size for the server
  NFSD: Add a "default" block size
  NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro
  NFSD: Remove NFSD_BUFSIZE
  sunrpc: Remove the RPCSVC_MAXPAGES macro
  svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
  svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
  sunrpc: Adjust size of socket's receive page array dynamically
  SUNRPC: Remove svc_rqst :: rq_vec
  SUNRPC: Remove svc_fill_write_vector()
  NFSD: Use rqstp->rq_bvec in nfsd_iter_write()
  SUNRPC: Export xdr_buf_to_bvec()
  NFSD: De-duplicate the svc_fill_write_vector() call sites
  NFSD: Use rqstp->rq_bvec in nfsd_iter_read()
  sunrpc: Replace the rq_bvec array with dynamically-allocated memory
  sunrpc: Replace the rq_pages array with dynamically-allocated memory
  sunrpc: Remove backchannel check in svc_init_buffer()
  sunrpc: Add a helper to derive maxpages from sv_max_mesg
  svcrdma: Reduce the number of rdma_rw contexts per-QP
  ...
2025-05-28 12:21:12 -07:00
Linus Torvalds
6d5b940e1e vfs-6.16-rc1.async.dir
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBN6wAKCRCRxhvAZXjc
 ok32AQD9DTiSCAoVg+7s+gSBuLTi8drPTN++mCaxdTqRh5WpRAD9GVyrGQT0s6LH
 eo9bm8d1TAYjilEWM0c0K0TxyQ7KcAA=
 =IW7H
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs directory lookup updates from Christian Brauner:
 "This contains cleanups for the lookup_one*() family of helpers.

  We expose a set of functions with names containing "lookup_one_len"
  and others without the "_len". This difference has nothing to do with
  "len". It's rater a historical accident that can be confusing.

  The functions without "_len" take a "mnt_idmap" pointer. This is found
  in the "vfsmount" and that is an important question when choosing
  which to use: do you have a vfsmount, or are you "inside" the
  filesystem. A related question is "is permission checking relevant
  here?".

  nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len
  functions. They pass nop_mnt_idmap and refuse to work on filesystems
  which have any other idmap.

  This work changes nfsd and cachefile to use the lookup_one family of
  functions and to explictily pass &nop_mnt_idmap which is consistent
  with all other vfs interfaces used where &nop_mnt_idmap is explicitly
  passed.

  The remaining uses of the "_one" functions do not require permission
  checks so these are renamed to be "_noperm" and the permission
  checking is removed.

  This series also changes these lookup function to take a qstr instead
  of separate name and len. In many cases this simplifies the call"

* tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  VFS: change lookup_one_common and lookup_noperm_common to take a qstr
  Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS
  VFS: rename lookup_one_len family to lookup_noperm and remove permission check
  cachefiles: Use lookup_one() rather than lookup_one_len()
  nfsd: Use lookup_one() rather than lookup_one_len()
  VFS: improve interface for lookup_one functions
2025-05-26 08:02:43 -07:00
Nikhil Jha
fadc0f3bb2 sunrpc: don't immediately retransmit on seqno miss
RFC2203 requires that retransmitted messages use a new gss sequence
number, but the same XID. This means that if the server is just slow
(e.x. overloaded), the client might receive a response using an older
seqno than the one it has recorded.

Currently, Linux's client immediately retransmits in this case. However,
this leads to a lot of wasted retransmits until the server eventually
responds faster than the client can resend.

Client -> SEQ 1 -> Server
Client -> SEQ 2 -> Server
Client <- SEQ 1 <- Server (misses, expecting seqno = 2)
Client -> SEQ 3 -> Server (immediate retransmission on miss)
Client <- SEQ 2 <- Server (misses, expecting seqno = 3)
Client -> SEQ 4 -> Server (immediate retransmission on miss)
... and so on ...

This commit makes it so that we ignore messages with bad checksums
due to seqnum mismatch, and rely on the usual timeout behavior for
retransmission instead of doing so immediately.

Signed-off-by: Nikhil Jha <njha@janestreet.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-05-19 10:14:29 -04:00
Nikhil Jha
08d6ee6d8a sunrpc: implement rfc2203 rpcsec_gss seqnum cache
This implements a sequence number cache of the last three (right now
hardcoded) sent sequence numbers for a given XID, as suggested by the
RFC.

From RFC2203 5.3.3.1:

"Note that the sequence number algorithm requires that the client
increment the sequence number even if it is retrying a request with
the same RPC transaction identifier.  It is not infrequent for
clients to get into a situation where they send two or more attempts
and a slow server sends the reply for the first attempt. With
RPCSEC_GSS, each request and reply will have a unique sequence
number. If the client wishes to improve turn around time on the RPC
call, it can cache the RPCSEC_GSS sequence number of each request it
sends. Then when it receives a response with a matching RPC
transaction identifier, it can compute the checksum of each sequence
number in the cache to try to match the checksum in the reply's
verifier."

Signed-off-by: Nikhil Jha <njha@janestreet.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-05-19 10:14:29 -04:00
Chuck Lever
56ab43f50d svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
Allow allocation of more entries in the sc_pages[] array when the
maximum size of an RPC message is increased.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15 16:16:26 -04:00
Chuck Lever
81381d1a90 svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
Allow allocation of more entries in the rc_pages[] array when the
maximum size of an RPC message is increased.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15 16:16:26 -04:00
Chuck Lever
f4126823c1 sunrpc: Adjust size of socket's receive page array dynamically
As a step towards making NFSD's maximum rsize and wsize variable at
run-time, make sk_pages a flexible array.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15 16:16:25 -04:00
Chuck Lever
b406c6b781 SUNRPC: Remove svc_fill_write_vector()
Clean up: This API is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15 16:16:25 -04:00