Commit Graph

7068 Commits

Author SHA1 Message Date
Linus Torvalds
ccc1ead23c NFS client updates for Linux 6.17
Highlights include:
 
 Stable fixes:
 - NFS don't inherit NFS filesystem capabilities when crossing from one
   filesystem to another.
 
 Bugfixes:
 - NFS wakeup of __nfs_lookup_revalidate() needs memory barriers.
 - NFS improve bounds checking in nfs_fh_to_dentry().
 - NFS Fix allocation errors when writing to a NFS file backed loopback
   device.
 - NFSv4: More listxattr fixes
 - SUNRPC: fix client handling of TLS alerts.
 - pNFS block/scsi layout fix for an uninitialised pointer dereference.
 - pNFS block/scsi layout fixes for the extent encoding, stripe mapping,
   and disk offset overflows.
 - pNFS layoutcommit work around for RPC size limitations.
 - pNFS/flexfiles avoid looping when handling fatal errors after layoutget.
 - localio: fix various race conditions.
 
 Features and cleanups:
 - Add NFSv4 support for retrieving the btime.
 - NFS: Allow folio migration for the case of mode == MIGRATE_SYNC.
 - NFS: Support using a kernel keyring to store TLS certificates.
 - NFSv4: Speed up delegation lookup using a hash table.
 - Assorted cleanups to remove unused variables and struct fields.
 - Assorted new tracepoints to improve debugging.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmiWBC8ACgkQZwvnipYK
 APKTfRAAn3ETKN15+yR6wYr/wiibaL6sRzQVo8OzFSI9hEVxljX6kK2HEHXIV93T
 F8bjUMB24KK+Eim8zIeLf4Ke7ldbRqtbiYLJox/I12TtQ6yaiFF+xDm7Fyc7UwcT
 ZUl1UnGeNY30RfQ1n8O4O/suBOsTJy1rpWBWynGeQZLiNHFVDoxH4OgCXGZ5579p
 3GACtToDDH9lgCBbKLM3J0nrcW5Or6BidFxT+zN/FXqeroepuvEcloiwJY7N4f/o
 DW436v7ep92WlaJfuypeMmdusx6+vVaYJEKw+B+UjS3tRjbDmhj2FL3su4dQVCqU
 JVW7TwGFL2zwfjTZjfp43ACN16goqRhX7DnTQkgD1mDnnhENqOsa0rqyIS3Wla4d
 W9phfGmOo2FwuVOUXH2L4k7cfPIsktsZ0s/xg+5UfcsoG2yxUxnY9HaWQGFw6fnN
 Fr9B7gUmaO6o1qpZ3emjecoOdQM9IqxMo39P9/72J9pWFstO9Js/kmGXcfsDo2IE
 z2ZYd+roj2ylcLD9mCJ94tjNsAx0ytvUl1fxNFTRLToQ3Nti+jNy22i4jrCgWfyW
 5jwCjjO87CJlJC0kixEPtQXAB6HQSraZw3Qmok1Ywez3fp4+j+QDFYP7OPJPQ0LY
 C3R1uzvUQKqsbsE8cwYQBrG0VNzZrET6x1dxunVtcYGOkU8s4YA=
 =HCFp
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - don't inherit NFS filesystem capabilities when crossing from one
     filesystem to another

  Bugfixes:
   - NFS wakeup of __nfs_lookup_revalidate() needs memory barriers
   - NFS improve bounds checking in nfs_fh_to_dentry()
   - NFS Fix allocation errors when writing to a NFS file backed
     loopback device
   - NFSv4: More listxattr fixes
   - SUNRPC: fix client handling of TLS alerts
   - pNFS block/scsi layout fix for an uninitialised pointer
     dereference
   - pNFS block/scsi layout fixes for the extent encoding, stripe
     mapping, and disk offset overflows
   - pNFS layoutcommit work around for RPC size limitations
   - pNFS/flexfiles avoid looping when handling fatal errors after
     layoutget
   - localio: fix various race conditions

  Features and cleanups:
   - Add NFSv4 support for retrieving the btime
   - NFS: Allow folio migration for the case of mode == MIGRATE_SYNC
   - NFS: Support using a kernel keyring to store TLS certificates
   - NFSv4: Speed up delegation lookup using a hash table
   - Assorted cleanups to remove unused variables and struct fields
   - Assorted new tracepoints to improve debugging"

* tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
  NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
  NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
  NFS/localio: nfs_close_local_fh() fix check for file closed
  NFSv4: Remove duplicate lookups, capability probes and fsinfo calls
  NFS: Fix the setting of capabilities when automounting a new filesystem
  sunrpc: fix client side handling of tls alerts
  nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock()
  NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY
  NFSv4.2: another fix for listxattr
  NFS: Fix filehandle bounds checking in nfs_fh_to_dentry()
  SUNRPC: Silence warnings about parameters not being described
  NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final()
  NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate()
  NFS: use a hash table for delegation lookup
  NFS: track active delegations per-server
  NFS: move the delegation_watermark module parameter
  NFS: cleanup nfs_inode_reclaim_delegation
  NFS: cleanup error handling in nfs4_server_common_setup
  pNFS/flexfiles: don't attempt pnfs on fatal DS errors
  NFS: drop __exit from nfs_exit_keyring
  ...
2025-08-09 07:20:44 +03:00
Trond Myklebust
b9defd611a NFSv4: Remove duplicate lookups, capability probes and fsinfo calls
When crossing into a new filesystem, the NFSv4 client will look up the
new directory, and then call nfs4_server_capabilities() as well as
nfs4_do_fsinfo() at least twice.

This patch removes the duplicate calls, and reduces the initial lookup
to retrieve just a minimal set of attributes.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-04 09:48:16 -07:00
Trond Myklebust
b01f21cacd NFS: Fix the setting of capabilities when automounting a new filesystem
Capabilities cannot be inherited when we cross into a new filesystem.
They need to be reset to the minimal defaults, and then probed for
again.

Fixes: 54ceac4515 ("NFS: Share NFS superblocks per-protocol per-server per-FSID")
Cc: stable@vger.kernel.org
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-04 09:16:45 -07:00
Li RongQing
533210f239 nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock()
The usage of read_seqbegin_or_lock() in nfs_copy_boot_verifier()
is wrong. "seq" is always even and thus "or_lock" has no effect.

nfs_copy_boot_verifier() just copies 8 bytes and is supposed to be
very rare operation, so we do not need the adaptive locking in this case.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://lore.kernel.org/r/20250731081038.3478-1-lirongqing@baidu.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-03 12:45:47 -07:00
Linus Torvalds
beace86e61 Summary of significant series in this pull request:
- The 4 patch series "mm: ksm: prevent KSM from breaking merging of new
   VMAs" from Lorenzo Stoakes addresses an issue with KSM's
   PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for
   merging with existing adjacent VMAs.
 
 - The 4 patch series "mm/damon: introduce DAMON_STAT for simple and
   practical access monitoring" from SeongJae Park adds a new kernel module
   which simplifies the setup and usage of DAMON in production
   environments.
 
 - The 6 patch series "stop passing a writeback_control to swap/shmem
   writeout" from Christoph Hellwig is a cleanup to the writeback code
   which removes a couple of pointers from struct writeback_control.
 
 - The 7 patch series "drivers/base/node.c: optimization and cleanups"
   from Donet Tom contains largely uncorrelated cleanups to the NUMA node
   setup and management code.
 
 - The 4 patch series "mm: userfaultfd: assorted fixes and cleanups" from
   Tal Zussman does some maintenance work on the userfaultfd code.
 
 - The 5 patch series "Readahead tweaks for larger folios" from Ryan
   Roberts implements some tuneups for pagecache readahead when it is
   reading into order>0 folios.
 
 - The 4 patch series "selftests/mm: Tweaks to the cow test" from Mark
   Brown provides some cleanups and consistency improvements to the
   selftests code.
 
 - The 4 patch series "Optimize mremap() for large folios" from Dev Jain
   does that.  A 37% reduction in execution time was measured in a
   memset+mremap+munmap microbenchmark.
 
 - The 5 patch series "Remove zero_user()" from Matthew Wilcox expunges
   zero_user() in favor of the more modern memzero_page().
 
 - The 3 patch series "mm/huge_memory: vmf_insert_folio_*() and
   vmf_insert_pfn_pud() fixes" from David Hildenbrand addresses some warts
   which David noticed in the huge page code.  These were not known to be
   causing any issues at this time.
 
 - The 3 patch series "mm/damon: use alloc_migrate_target() for
   DAMOS_MIGRATE_{HOT,COLD" from SeongJae Park provides some cleanup and
   consolidation work in DAMON.
 
 - The 3 patch series "use vm_flags_t consistently" from Lorenzo Stoakes
   uses vm_flags_t in places where we were inappropriately using other
   types.
 
 - The 3 patch series "mm/memfd: Reserve hugetlb folios before
   allocation" from Vivek Kasireddy increases the reliability of large page
   allocation in the memfd code.
 
 - The 14 patch series "mm: Remove pXX_devmap page table bit and pfn_t
   type" from Alistair Popple removes several now-unneeded PFN_* flags.
 
 - The 5 patch series "mm/damon: decouple sysfs from core" from SeongJae
   Park implememnts some cleanup and maintainability work in the DAMON
   sysfs layer.
 
 - The 5 patch series "madvise cleanup" from Lorenzo Stoakes does quite a
   lot of cleanup/maintenance work in the madvise() code.
 
 - The 4 patch series "madvise anon_name cleanups" from Vlastimil Babka
   provides additional cleanups on top or Lorenzo's effort.
 
 - The 11 patch series "Implement numa node notifier" from Oscar Salvador
   creates a standalone notifier for NUMA node memory state changes.
   Previously these were lumped under the more general memory on/offline
   notifier.
 
 - The 6 patch series "Make MIGRATE_ISOLATE a standalone bit" from Zi Yan
   cleans up the pageblock isolation code and fixes a potential issue which
   doesn't seem to cause any problems in practice.
 
 - The 5 patch series "selftests/damon: add python and drgn based DAMON
   sysfs functionality tests" from SeongJae Park adds additional drgn- and
   python-based DAMON selftests which are more comprehensive than the
   existing selftest suite.
 
 - The 5 patch series "Misc rework on hugetlb faulting path" from Oscar
   Salvador fixes a rather obscure deadlock in the hugetlb fault code and
   follows that fix with a series of cleanups.
 
 - The 3 patch series "cma: factor out allocation logic from
   __cma_declare_contiguous_nid" from Mike Rapoport rationalizes and cleans
   up the highmem-specific code in the CMA allocator.
 
 - The 28 patch series "mm/migration: rework movable_ops page migration
   (part 1)" from David Hildenbrand provides cleanups and
   future-preparedness to the migration code.
 
 - The 2 patch series "mm/damon: add trace events for auto-tuned
   monitoring intervals and DAMOS quota" from SeongJae Park adds some
   tracepoints to some DAMON auto-tuning code.
 
 - The 6 patch series "mm/damon: fix misc bugs in DAMON modules" from
   SeongJae Park does that.
 
 - The 6 patch series "mm/damon: misc cleanups" from SeongJae Park also
   does what it claims.
 
 - The 4 patch series "mm: folio_pte_batch() improvements" from David
   Hildenbrand cleans up the large folio PTE batching code.
 
 - The 13 patch series "mm/damon/vaddr: Allow interleaving in
   migrate_{hot,cold} actions" from SeongJae Park facilitates dynamic
   alteration of DAMON's inter-node allocation policy.
 
 - The 3 patch series "Remove unmap_and_put_page()" from Vishal Moola
   provides a couple of page->folio conversions.
 
 - The 4 patch series "mm: per-node proactive reclaim" from Davidlohr
   Bueso implements a per-node control of proactive reclaim - beyond the
   current memcg-based implementation.
 
 - The 14 patch series "mm/damon: remove damon_callback" from SeongJae
   Park replaces the damon_callback interface with a more general and
   powerful damon_call()+damos_walk() interface.
 
 - The 10 patch series "mm/mremap: permit mremap() move of multiple VMAs"
   from Lorenzo Stoakes implements a number of mremap cleanups (of course)
   in preparation for adding new mremap() functionality: newly permit the
   remapping of multiple VMAs when the user is specifying MREMAP_FIXED.  It
   still excludes some specialized situations where this cannot be
   performed reliably.
 
 - The 3 patch series "drop hugetlb_free_pgd_range()" from Anthony Yznaga
   switches some sparc hugetlb code over to the generic version and removes
   the thus-unneeded hugetlb_free_pgd_range().
 
 - The 4 patch series "mm/damon/sysfs: support periodic and automated
   stats update" from SeongJae Park augments the present
   userspace-requested update of DAMON sysfs monitoring files.  Automatic
   update is now provided, along with a tunable to control the update
   interval.
 
 - The 4 patch series "Some randome fixes and cleanups to swapfile" from
   Kemeng Shi does what is claims.
 
 - The 4 patch series "mm: introduce snapshot_page" from Luiz Capitulino
   and David Hildenbrand provides (and uses) a means by which debug-style
   functions can grab a copy of a pageframe and inspect it locklessly
   without tripping over the races inherent in operating on the live
   pageframe directly.
 
 - The 6 patch series "use per-vma locks for /proc/pid/maps reads" from
   Suren Baghdasaryan addresses the large contention issues which can be
   triggered by reads from that procfs file.  Latencies are reduced by more
   than half in some situations.  The series also introduces several new
   selftests for the /proc/pid/maps interface.
 
 - The 6 patch series "__folio_split() clean up" from Zi Yan cleans up
   __folio_split()!
 
 - The 7 patch series "Optimize mprotect() for large folios" from Dev
   Jain provides some quite large (>3x) speedups to mprotect() when dealing
   with large folios.
 
 - The 2 patch series "selftests/mm: reuse FORCE_READ to replace "asm
   volatile("" : "+r" (XXX));" and some cleanup" from wang lian does some
   cleanup work in the selftests code.
 
 - The 3 patch series "tools/testing: expand mremap testing" from Lorenzo
   Stoakes extends the mremap() selftest in several ways, including adding
   more checking of Lorenzo's recently added "permit mremap() move of
   multiple VMAs" feature.
 
 - The 22 patch series "selftests/damon/sysfs.py: test all parameters"
   from SeongJae Park extends the DAMON sysfs interface selftest so that it
   tests all possible user-requested parameters.  Rather than the present
   minimal subset.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaIqcCgAKCRDdBJ7gKXxA
 jkVBAQCCn9DR1QP0CRk961ot0cKzOgioSc0aA03DPb2KXRt2kQEAzDAz0ARurFhL
 8BzbvI0c+4tntHLXvIlrC33n9KWAOQM=
 =XsFy
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "As usual, many cleanups. The below blurbiage describes 42 patchsets.
  21 of those are partially or fully cleanup work. "cleans up",
  "cleanup", "maintainability", "rationalizes", etc.

  I never knew the MM code was so dirty.

  "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
     addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
     mapped VMAs were not eligible for merging with existing adjacent
     VMAs.

  "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
     adds a new kernel module which simplifies the setup and usage of
     DAMON in production environments.

  "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
     is a cleanup to the writeback code which removes a couple of
     pointers from struct writeback_control.

  "drivers/base/node.c: optimization and cleanups" (Donet Tom)
     contains largely uncorrelated cleanups to the NUMA node setup and
     management code.

  "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
     does some maintenance work on the userfaultfd code.

  "Readahead tweaks for larger folios" (Ryan Roberts)
     implements some tuneups for pagecache readahead when it is reading
     into order>0 folios.

  "selftests/mm: Tweaks to the cow test" (Mark Brown)
     provides some cleanups and consistency improvements to the
     selftests code.

  "Optimize mremap() for large folios" (Dev Jain)
     does that. A 37% reduction in execution time was measured in a
     memset+mremap+munmap microbenchmark.

  "Remove zero_user()" (Matthew Wilcox)
     expunges zero_user() in favor of the more modern memzero_page().

  "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
     addresses some warts which David noticed in the huge page code.
     These were not known to be causing any issues at this time.

  "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
     provides some cleanup and consolidation work in DAMON.

  "use vm_flags_t consistently" (Lorenzo Stoakes)
     uses vm_flags_t in places where we were inappropriately using other
     types.

  "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
     increases the reliability of large page allocation in the memfd
     code.

  "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
     removes several now-unneeded PFN_* flags.

  "mm/damon: decouple sysfs from core" (SeongJae Park)
     implememnts some cleanup and maintainability work in the DAMON
     sysfs layer.

  "madvise cleanup" (Lorenzo Stoakes)
     does quite a lot of cleanup/maintenance work in the madvise() code.

  "madvise anon_name cleanups" (Vlastimil Babka)
     provides additional cleanups on top or Lorenzo's effort.

  "Implement numa node notifier" (Oscar Salvador)
     creates a standalone notifier for NUMA node memory state changes.
     Previously these were lumped under the more general memory
     on/offline notifier.

  "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
     cleans up the pageblock isolation code and fixes a potential issue
     which doesn't seem to cause any problems in practice.

  "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
     adds additional drgn- and python-based DAMON selftests which are
     more comprehensive than the existing selftest suite.

  "Misc rework on hugetlb faulting path" (Oscar Salvador)
     fixes a rather obscure deadlock in the hugetlb fault code and
     follows that fix with a series of cleanups.

  "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
     rationalizes and cleans up the highmem-specific code in the CMA
     allocator.

  "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
     provides cleanups and future-preparedness to the migration code.

  "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
     adds some tracepoints to some DAMON auto-tuning code.

  "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
     does that.

  "mm/damon: misc cleanups" (SeongJae Park)
     also does what it claims.

  "mm: folio_pte_batch() improvements" (David Hildenbrand)
     cleans up the large folio PTE batching code.

  "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
     facilitates dynamic alteration of DAMON's inter-node allocation
     policy.

  "Remove unmap_and_put_page()" (Vishal Moola)
     provides a couple of page->folio conversions.

  "mm: per-node proactive reclaim" (Davidlohr Bueso)
     implements a per-node control of proactive reclaim - beyond the
     current memcg-based implementation.

  "mm/damon: remove damon_callback" (SeongJae Park)
     replaces the damon_callback interface with a more general and
     powerful damon_call()+damos_walk() interface.

  "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
     implements a number of mremap cleanups (of course) in preparation
     for adding new mremap() functionality: newly permit the remapping
     of multiple VMAs when the user is specifying MREMAP_FIXED. It still
     excludes some specialized situations where this cannot be performed
     reliably.

  "drop hugetlb_free_pgd_range()" (Anthony Yznaga)
     switches some sparc hugetlb code over to the generic version and
     removes the thus-unneeded hugetlb_free_pgd_range().

  "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
     augments the present userspace-requested update of DAMON sysfs
     monitoring files. Automatic update is now provided, along with a
     tunable to control the update interval.

  "Some randome fixes and cleanups to swapfile" (Kemeng Shi)
     does what is claims.

  "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
     provides (and uses) a means by which debug-style functions can grab
     a copy of a pageframe and inspect it locklessly without tripping
     over the races inherent in operating on the live pageframe
     directly.

  "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
     addresses the large contention issues which can be triggered by
     reads from that procfs file. Latencies are reduced by more than
     half in some situations. The series also introduces several new
     selftests for the /proc/pid/maps interface.

  "__folio_split() clean up" (Zi Yan)
     cleans up __folio_split()!

  "Optimize mprotect() for large folios" (Dev Jain)
     provides some quite large (>3x) speedups to mprotect() when dealing
     with large folios.

  "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
     does some cleanup work in the selftests code.

  "tools/testing: expand mremap testing" (Lorenzo Stoakes)
     extends the mremap() selftest in several ways, including adding
     more checking of Lorenzo's recently added "permit mremap() move of
     multiple VMAs" feature.

  "selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
     extends the DAMON sysfs interface selftest so that it tests all
     possible user-requested parameters. Rather than the present minimal
     subset"

* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
  MAINTAINERS: add missing headers to mempory policy & migration section
  MAINTAINERS: add missing file to cgroup section
  MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
  MAINTAINERS: add missing zsmalloc file
  MAINTAINERS: add missing files to page alloc section
  MAINTAINERS: add missing shrinker files
  MAINTAINERS: move memremap.[ch] to hotplug section
  MAINTAINERS: add missing mm_slot.h file THP section
  MAINTAINERS: add missing interval_tree.c to memory mapping section
  MAINTAINERS: add missing percpu-internal.h file to per-cpu section
  mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
  selftests/damon: introduce _common.sh to host shared function
  selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
  selftests/damon/sysfs.py: test non-default parameters runtime commit
  selftests/damon/sysfs.py: generalize DAMON context commit assertion
  selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
  selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
  selftests/damon/sysfs.py: test DAMOS filters commitment
  selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
  selftests/damon/sysfs.py: test DAMOS destinations commitment
  ...
2025-07-31 14:57:54 -07:00
Linus Torvalds
7031769e10 vfs-6.17-rc1.mmap_prepare
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCgQAKCRCRxhvAZXjc
 os+nAP9LFHUwWO6EBzHJJGEVjJvvzsbzqeYrRFamYiMc5ulPJwD+KW4RIgJa/MWO
 pcYE40CacaekD8rFWwYUyszpgmv6ewc=
 =wCwp
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull mmap_prepare updates from Christian Brauner:
 "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b ("mm:
  introduce new .mmap_prepare() file callback").

  This is preferred to the existing f_op->mmap() hook as it does require
  a VMA to be established yet, thus allowing the mmap logic to invoke
  this hook far, far earlier, prior to inserting a VMA into the virtual
  address space, or performing any other heavy handed operations.

  This allows for much simpler unwinding on error, and for there to be a
  single attempt at merging a VMA rather than having to possibly
  reattempt a merge based on potentially altered VMA state.

  Far more importantly, it prevents inappropriate manipulation of
  incompletely initialised VMA state, which is something that has been
  the cause of bugs and complexity in the past.

  The intent is to gradually deprecate f_op->mmap, and in that vein this
  series coverts the majority of file systems to using f_op->mmap_prepare.

  Prerequisite steps are taken - firstly ensuring all checks for mmap
  capabilities use the file_has_valid_mmap_hooks() helper rather than
  directly checking for f_op->mmap (which is now not a valid check) and
  secondly updating daxdev_mapping_supported() to not require a VMA
  parameter to allow ext4 and xfs to be converted.

  Commit bb666b7c27 ("mm: add mmap_prepare() compatibility layer for
  nested file systems") handles the nasty edge-case of nested file
  systems like overlayfs, which introduces a compatibility shim to allow
  f_op->mmap_prepare() to be invoked from an f_op->mmap() callback.

  This allows for nested filesystems to continue to function correctly
  with all file systems regardless of which callback is used. Once we
  finally convert all file systems, this shim can be removed.

  As a result, ecryptfs, fuse, and overlayfs remain unaltered so they
  can nest all other file systems.

  We additionally do not update resctl - as this requires an update to
  remap_pfn_range() (or an alternative to it) which we defer to a later
  series, equally we do not update cramfs which needs a mixed mapping
  insertion with the same issue, nor do we update procfs, hugetlbfs,
  syfs or kernfs all of which require VMAs for internal state and hooks.
  We shall return to all of these later"

* tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: update porting, vfs documentation to describe mmap_prepare()
  fs: replace mmap hook with .mmap_prepare for simple mappings
  fs: convert most other generic_file_*mmap() users to .mmap_prepare()
  fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
  mm/filemap: introduce generic_file_*_mmap_prepare() helpers
  fs/xfs: transition from deprecated .mmap hook to .mmap_prepare
  fs/ext4: transition from deprecated .mmap hook to .mmap_prepare
  fs/dax: make it possible to check dev dax support without a VMA
  fs: consistently use can_mmap_file() helper
  mm/nommu: use file_has_valid_mmap_hooks() helper
  mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-28 13:43:25 -07:00
Linus Torvalds
7879d7aff0 vfs-6.17-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaIM/KwAKCRCRxhvAZXjc
 opT+AP407JwhRSBjUEmHg5JzUyDoivkOySdnthunRjaBKD8rlgEApM6SOIZYucU7
 cPC3ZY6ORFM6Mwaw+iDW9lasM5ucHQ8=
 =CHha
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc VFS updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Add ext4 IOCB_DONTCACHE support

     This refactors the address_space_operations write_begin() and
     write_end() callbacks to take const struct kiocb * as their first
     argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate
     to the filesystem's buffered I/O path.

     Ext4 is updated to implement handling of the IOCB_DONTCACHE flag
     and advertises support via the FOP_DONTCACHE file operation flag.

     Additionally, the i915 driver's shmem write paths are updated to
     bypass the legacy write_begin/write_end interface in favor of
     directly calling write_iter() with a constructed synchronous kiocb.
     Another i915 change replaces a manual write loop with
     kernel_write() during GEM shmem object creation.

  Cleanups:

   - don't duplicate vfs_open() in kernel_file_open()

   - proc_fd_getattr(): don't bother with S_ISDIR() check

   - fs/ecryptfs: replace snprintf with sysfs_emit in show function

   - vfs: Remove unnecessary list_for_each_entry_safe() from
     evict_inodes()

   - filelock: add new locks_wake_up_waiter() helper

   - fs: Remove three arguments from block_write_end()

   - VFS: change old_dir and new_dir in struct renamedata to dentrys

   - netfs: Remove unused declaration netfs_queue_write_request()

  Fixes:

   - eventpoll: Fix semi-unbounded recursion

   - eventpoll: fix sphinx documentation build warning

   - fs/read_write: Fix spelling typo

   - fs: annotate data race between poll_schedule_timeout() and
     pollwake()

   - fs/pipe: set FMODE_NOWAIT in create_pipe_files()

   - docs/vfs: update references to i_mutex to i_rwsem

   - fs/buffer: remove comment about hard sectorsize

   - fs/buffer: remove the min and max limit checks in __getblk_slow()

   - fs/libfs: don't assume blocksize <= PAGE_SIZE in
     generic_check_addressable

   - fs_context: fix parameter name in infofc() macro

   - fs: Prevent file descriptor table allocations exceeding INT_MAX"

* tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits)
  netfs: Remove unused declaration netfs_queue_write_request()
  eventpoll: fix sphinx documentation build warning
  ext4: support uncached buffered I/O
  mm/pagemap: add write_begin_get_folio() helper function
  fs: change write_begin/write_end interface to take struct kiocb *
  drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
  drm/i915: Use kernel_write() in shmem object create
  eventpoll: Fix semi-unbounded recursion
  vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes()
  fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable
  fs/buffer: remove the min and max limit checks in __getblk_slow()
  fs: Prevent file descriptor table allocations exceeding INT_MAX
  fs: Remove three arguments from block_write_end()
  fs/ecryptfs: replace snprintf with sysfs_emit in show function
  fs: annotate suspected data race between poll_schedule_timeout() and pollwake()
  docs/vfs: update references to i_mutex to i_rwsem
  fs/buffer: remove comment about hard sectorsize
  fs_context: fix parameter name in infofc() macro
  VFS: change old_dir and new_dir in struct renamedata to dentrys
  proc_fd_getattr(): don't bother with S_ISDIR() check
  ...
2025-07-28 11:22:56 -07:00
Linus Torvalds
ddf52f12ef Massage rpc_pipefs to use saner primitives and clean up the
APIs provided to the rest of the kernel.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIRDbQAKCRBZ7Krx/gZQ
 63n6APwNnJXwgtSDi9N0FfHOlYqYSCaCjezVLbq+GR8K+r4wowD/TX/A4Qbyjjic
 /VG8VbYe6fRaD53vp1giGI/dJiTI2Qg=
 =Ta4H
 -----END PGP SIGNATURE-----

Merge tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull rpc_pipefs updates from Al Viro:
 "Massage rpc_pipefs to use saner primitives and clean up the APIs
  provided to the rest of the kernel"

* tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  rpc_create_client_dir(): return 0 or -E...
  rpc_create_client_dir(): don't bother with rpc_populate()
  rpc_new_dir(): the last argument is always NULL
  rpc_pipe: expand the calls of rpc_mkdir_populate()
  rpc_gssd_dummy_populate(): don't bother with rpc_populate()
  rpc_mkpipe_dentry(): switch to simple_start_creating()
  rpc_pipe: saner primitive for creating regular files
  rpc_pipe: saner primitive for creating subdirectories
  rpc_pipe: don't overdo directory locking
  rpc_mkpipe_dentry(): saner calling conventions
  rpc_unlink(): saner calling conventions
  rpc_populate(): lift cleanup into callers
  rpc_unlink(): use simple_recursive_removal()
  rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead
  rpc_pipe: clean failure exits in fill_super
  new helper: simple_start_creating()
2025-07-28 09:56:09 -07:00
Linus Torvalds
11fe69fbd5 Current exclusion rules for ->d_flags stores are rather unpleasant.
The basic rules are simple:
 	* stores to dentry->d_flags are OK under dentry->d_lock.
 	* stores to dentry->d_flags are OK in the dentry constructor, before
 becomes potentially visible to other threads.
 Unfortunately, there's a couple of exceptions to that, and that's where the
 headache comes from.
 
 	Main PITA comes from d_set_d_op(); that primitive sets ->d_op
 of dentry and adjusts the flags that correspond to presence of individual
 methods.  It's very easy to misuse; existing uses _are_ safe, but proof
 of correctness is brittle.
 
 	Use in __d_alloc() is safe (we are within a constructor), but we
 might as well precalculate the initial value of ->d_flags when we set
 the default ->d_op for given superblock and set ->d_flags directly
 instead of messing with that helper.
 
 	The reasons why other uses are safe are bloody convoluted; I'm not going
 to reproduce it here.  See https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/
 for gory details, if you care.  The critical part is using d_set_d_op() only
 just prior to d_splice_alias(), which makes a combination of d_splice_alias()
 with setting ->d_op, etc. a natural replacement primitive.  Better yet, if
 we go that way, it's easy to take setting ->d_op and modifying ->d_flags
 under ->d_lock, which eliminates the headache as far as ->d_flags exclusion
 rules are concerned.  Other exceptions are minor and easy to deal with.
 
 	What this series does:
 * d_set_d_op() is no longer available; new primitive (d_splice_alias_ops())
 is provided, equivalent to combination of d_set_d_op() and d_splice_alias().
 * new field of struct super_block - ->s_d_flags.  Default value of ->d_flags
 to be used when allocating dentries on this filesystem.
 * new primitive for setting ->s_d_op: set_default_d_op().  Replaces stores
 to ->s_d_op at mount time.  All in-tree filesystems converted; out-of-tree
 ones will get caught by compiler (->s_d_op is renamed, so stores to it will
 be caught).  ->s_d_flags is set by the same primitive to match the ->s_d_op.
 * a lot of filesystems had ->s_d_op->d_delete equal to always_delete_dentry;
 that is equivalent to setting DCACHE_DONTCACHE in ->d_flags, so such filesystems
 can bloody well set that bit in ->s_d_flags and drop ->d_delete() from
 dentry_operations.  In quite a few cases that results in empty dentry_operations,
 which means that we can get rid of those.
 * kill simple_dentry_operations - not needed anymore.
 * massage d_alloc_parallel() to get rid of the other exception wrt ->d_flags
 stores - we can set DCACHE_PAR_LOOKUP as soon as we allocate the new dentry;
 no need to delay that until we commit to using the sucker.
 
 As the result, ->d_flags stores are all either under ->d_lock or done before
 the dentry becomes visible in any shared data structures.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIQ/tQAKCRBZ7Krx/gZQ
 66AhAQDgQ+S224x5YevNXc9mDoGUBMF4OG0n0fIla9rfdL4I6wEAqpOWMNDcVPCZ
 GwYOvJ9YuqNdz+MyprAI18Yza4GOmgs=
 =rTYB
 -----END PGP SIGNATURE-----

Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull dentry d_flags updates from Al Viro:
 "The current exclusion rules for dentry->d_flags stores are rather
  unpleasant. The basic rules are simple:

   - stores to dentry->d_flags are OK under dentry->d_lock

   - stores to dentry->d_flags are OK in the dentry constructor, before
     becomes potentially visible to other threads

  Unfortunately, there's a couple of exceptions to that, and that's
  where the headache comes from.

  The main PITA comes from d_set_d_op(); that primitive sets ->d_op of
  dentry and adjusts the flags that correspond to presence of individual
  methods. It's very easy to misuse; existing uses _are_ safe, but proof
  of correctness is brittle.

  Use in __d_alloc() is safe (we are within a constructor), but we might
  as well precalculate the initial value of 'd_flags' when we set the
  default ->d_op for given superblock and set 'd_flags' directly instead
  of messing with that helper.

  The reasons why other uses are safe are bloody convoluted; I'm not
  going to reproduce it here. See [1] for gory details, if you care. The
  critical part is using d_set_d_op() only just prior to
  d_splice_alias(), which makes a combination of d_splice_alias() with
  setting ->d_op, etc a natural replacement primitive.

  Better yet, if we go that way, it's easy to take setting ->d_op and
  modifying 'd_flags' under ->d_lock, which eliminates the headache as
  far as 'd_flags' exclusion rules are concerned. Other exceptions are
  minor and easy to deal with.

  What this series does:

   - d_set_d_op() is no longer available; instead a new primitive
     (d_splice_alias_ops()) is provided, equivalent to combination of
     d_set_d_op() and d_splice_alias().

   - new field of struct super_block - 's_d_flags'. This sets the
     default value of 'd_flags' to be used when allocating dentries on
     this filesystem.

   - new primitive for setting 's_d_op': set_default_d_op(). This
     replaces stores to 's_d_op' at mount time.

     All in-tree filesystems converted; out-of-tree ones will get caught
     by the compiler ('s_d_op' is renamed, so stores to it will be
     caught). 's_d_flags' is set by the same primitive to match the
     's_d_op'.

   - a lot of filesystems had sb->s_d_op->d_delete equal to
     always_delete_dentry; that is equivalent to setting
     DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well
     set that bit in 's_d_flags' and drop 'd_delete()' from
     dentry_operations.

     In quite a few cases that results in empty dentry_operations, which
     means that we can get rid of those.

   - kill simple_dentry_operations - not needed anymore

   - massage d_alloc_parallel() to get rid of the other exception wrt
     'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we
     allocate the new dentry; no need to delay that until we commit to
     using the sucker.

  As the result, 'd_flags' stores are all either under ->d_lock or done
  before the dentry becomes visible in any shared data structures"

Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/ [1]

* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
  configfs: use DCACHE_DONTCACHE
  debugfs: use DCACHE_DONTCACHE
  efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry()
  9p: don't bother with always_delete_dentry
  ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE
  kill simple_dentry_operations
  devpts, sunrpc, hostfs: don't bother with ->d_op
  shmem: no dentry retention past the refcount reaching zero
  d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier
  make d_set_d_op() static
  simple_lookup(): just set DCACHE_DONTCACHE
  tracefs: Add d_delete to remove negative dentries
  set_default_d_op(): calculate the matching value for ->d_flags
  correct the set of flags forbidden at d_set_d_op() time
  split d_flags calculation out of d_set_d_op()
  new helper: set_default_d_op()
  fuse: no need for special dentry_operations for root dentry
  switch procfs from d_set_d_op() to d_splice_alias_ops()
  new helper: d_splice_alias_ops()
  procfs: kill ->proc_dops
  ...
2025-07-28 09:17:57 -07:00
Benjamin Coddington
99765233ab NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY
If the NFS client is doing writeback from a workqueue context, avoid using
__GFP_NORETRY for allocations if the task has set PF_MEMALLOC_NOIO or
PF_MEMALLOC_NOFS.  The combination of these flags makes memory allocation
failures much more likely.

We've seen those allocation failures show up when the loopback driver is
doing writeback from a workqueue to a file on NFS, where memory allocation
failure results in errors or corruption within the loopback device's
filesystem.

Suggested-by: Trond Myklebust <trondmy@kernel.org>
Fixes: 0bae835b63 ("NFS: Avoid writeback threads getting stuck in mempool_alloc()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/f83ac1155a4bc670f2663959a7a068571e06afd9.1752111622.git.bcodding@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-28 12:01:21 -04:00
Olga Kornievskaia
9acb237def NFSv4.2: another fix for listxattr
Currently, when the server supports NFS4.1 security labels then
security.selinux label in included twice. Instead, only add it
when the server doesn't possess security label support.

Fixes: 243fea1346 ("NFSv4.2: fix listxattr to return selinux security label")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Link: https://lore.kernel.org/r/20250722205641.79394-1-okorniev@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-28 11:51:18 -04:00
Trond Myklebust
ef93a685e0 NFS: Fix filehandle bounds checking in nfs_fh_to_dentry()
The function needs to check the minimal filehandle length before it can
access the embedded filehandle.

Reported-by: zhangjian <zhangjian496@huawei.com>
Fixes: 20fa190272 ("nfs: add export operations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 09:50:56 -04:00
Trond Myklebust
ec0abdda89 NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final()
Use the wake_up_var_locked() and wait_var_event_spinlock() helpers.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:41 -04:00
Trond Myklebust
1db3a48e83 NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate()
Use store_release_wake_up() to add the appropriate memory barrier before
calling wake_up_var(&dentry->d_fsdata).

Reported-by: Lukáš Hejtmánek<xhejtman@ics.muni.cz>
Suggested-by: Santosh Pradhan <santosh.pradhan@gmail.com>
Link: https://lore.kernel.org/all/18945D18-3EDB-4771-B019-0335CE671077@ics.muni.cz/
Fixes: 99bc9f2eb3 ("NFS: add barriers when testing for NFS_FSDATA_BLOCKED")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:41 -04:00
Christoph Hellwig
f5b3108e6a NFS: use a hash table for delegation lookup
nfs_delegation_find_inode currently has to walk the entire list of
delegations per inode, which can become pretty large, and can become even
larger when increasing the delegation watermark.

Add a hash table to speed up the delegation lookup, sized as a fraction
of the delegation watermark.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250718081509.2607553-6-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:41 -04:00
Christoph Hellwig
2fb4af5ea3 NFS: track active delegations per-server
The active delegation watermark was added to avoid overloading servers.
Track the active delegation per-server instead of globally so that clients
talking to multiple servers aren't limited by the global limit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250718081509.2607553-5-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:41 -04:00
Christoph Hellwig
aee077d8ed NFS: move the delegation_watermark module parameter
Keep the module_param_named next to the variable declaration instead of
somewhere unrelated, following the best practice in the rest of the
kernel.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250718081509.2607553-4-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:41 -04:00
Christoph Hellwig
7375bbad46 NFS: cleanup nfs_inode_reclaim_delegation
Reduce a level of indentation for most of the code in this function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250718081509.2607553-3-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:41 -04:00
Christoph Hellwig
67173860a7 NFS: cleanup error handling in nfs4_server_common_setup
Return error directly instead of using a goto label for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250718081509.2607553-2-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:41 -04:00
Tigran Mkrtchyan
f06bedfa62 pNFS/flexfiles: don't attempt pnfs on fatal DS errors
When an applications get killed (SIGTERM/SIGINT) while pNFS client performs a connection
to DS, client ends in an infinite loop of connect-disconnect. This
source of the issue, it that flexfilelayoutdev#nfs4_ff_layout_prepare_ds gets an error
on nfs4_pnfs_ds_connect with status ERESTARTSYS, which is set by rpc_signal_task, but
the error is treated as transient, thus retried.

The issue is reproducible with Ctrl+C the following script(there should be ~1000 files in
a directory, client should must not have any connections to DSes):

```
echo 3 > /proc/sys/vm/drop_caches

for i in *
do
   head -1 $i
done
```

The change aims to propagate the nfs4_ff_layout_prepare_ds error state
to the caller that can decide whatever this is a retryable error or not.

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Link: https://lore.kernel.org/r/20250627071751.189663-1-tigran.mkrtchyan@desy.de
Fixes: 260f32adb8 ("pNFS/flexfiles: Check the result of nfs4_pnfs_ds_connect")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:17 -04:00
Christoph Hellwig
c262b444bd NFS: drop __exit from nfs_exit_keyring
Otherwise built-in NFS can lead to sectіon mismatches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250714062450.1468117-1-hch@lst.de
Fixes: 87268f7a4f ("nfs: create a kernel keyring")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:17 -04:00
Christoph Hellwig
f3fc8f0649 NFS: pass struct nfs_client_initdata to nfs4_set_client
Passed the partially filled out structure to nfs4_set_client instead of
11 arguments that then get stashed into the structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-22 08:10:17 -04:00
Taotao Chen
e9d8e2bf23
fs: change write_begin/write_end interface to take struct kiocb *
Change the address_space_operations callbacks write_begin() and
write_end() to take struct kiocb * as the first argument instead of
struct file *.

Update all affected function prototypes, implementations, call sites,
and related documentation across VFS, filesystems, and block layer.

Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.

Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-16 14:48:18 +02:00
Sergey Bashirov
7db6e66663 pNFS: Fix disk addr range check in block/scsi layout
At the end of the isect translation, disc_addr represents the physical
disk offset. Thus, end calculated from disk_addr is also a physical disk
offset. Therefore, range checking should be done using map->disk_offset,
not map->start.

Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250702133226.212537-1-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:28 -07:00
Sergey Bashirov
81438498a2 pNFS: Fix stripe mapping in block/scsi layout
Because of integer division, we need to carefully calculate the
disk offset. Consider the example below for a stripe of 6 volumes,
a chunk size of 4096, and an offset of 70000.

chunk = div_u64(offset, dev->chunk_size) = 70000 / 4096 = 17
offset = chunk * dev->chunk_size = 17 * 4096 = 69632
disk_offset_wrong = div_u64(offset, dev->nr_children) = 69632 / 6 = 11605
disk_chunk = div_u64(chunk, dev->nr_children) = 17 / 6 = 2
disk_offset = disk_chunk * dev->chunk_size = 2 * 4096 = 8192

Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250701122341.199112-1-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:28 -07:00
Sergey Bashirov
d897d81671 pNFS: Handle RPC size limit for layoutcommits
When there are too many block extents for a layoutcommit, they may not
all fit into the maximum-sized RPC. This patch allows the generic pnfs
code to properly handle -ENOSPC returned by the block/scsi layout driver
and trigger additional layoutcommits if necessary.

Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250630183537.196479-5-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:28 -07:00
Sergey Bashirov
66642bbee5 pNFS: Add prepare commit trace to block/scsi layout
Replace dprintk with trace event in ext_tree_prepare_commit() function.

Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250630183537.196479-4-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:28 -07:00
Sergey Bashirov
d84c4754f8 pNFS: Fix extent encoding in block/scsi layout
The ext_tree_encode_commit() function may be called multiple times for
the same file, layout, and last written byte if the provided buffer is
not large enough to encode all extents in it.

The first problem is that the last written byte field must be zeroed
only on a successful call, otherwise we will lose its actual value and
get an integer overflow on the next encoding attempt.

The second problem is that we can't count and encode in one pass. The
extent state changes during encoding, so if we return -ENOSPC but have
already encoded some extents into a small buffer, they will not be
re-encoded into a new larger buffer on the next try. As a result, the
client never commits these extents to the server.

Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250630183537.196479-3-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:28 -07:00
Sergey Bashirov
9768797c21 pNFS: Fix uninited ptr deref in block/scsi layout
The error occurs on the third attempt to encode extents. When function
ext_tree_prepare_commit() reallocates a larger buffer to retry encoding
extents, the "layoutupdate_pages" page array is initialized only after the
retry loop. But ext_tree_free_commitdata() is called on every iteration
and tries to put pages in the array, thus dereferencing uninitialized
pointers.

An additional problem is that there is no limit on the maximum possible
buffer_size. When there are too many extents, the client may create a
layoutcommit that is larger than the maximum possible RPC size accepted
by the server.

During testing, we observed two typical scenarios. First, one memory page
for extents is enough when we work with small files, append data to the
end of the file, or preallocate extents before writing. But when we fill
a new large file without preallocating, the number of extents can be huge,
and counting the number of written extents in ext_tree_encode_commit()
does not help much. Since this number increases even more between
unlocking and locking of ext_tree, the reallocated buffer may not be
large enough again and again.

Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250630183537.196479-2-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:28 -07:00
Dr. David Alan Gilbert
3b3bc9a1f7 NFS: Remove unused function nfs_umount
nfs_umount() has been unused since 2013's
commit 4580a92d44 ("NFS: Use server-recommended security flavor by
default (NFSv3)")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20250218215250.263709-1-linux@treblig.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:28 -07:00
Christoph Hellwig
87268f7a4f nfs: create a kernel keyring
Create a kernel .nfs keyring similar to the nvme .nvme one.  Unlike for
a userspace-created keyrind, tlshd is a possesor of the keys with this
and thus the keys don't need user read permissions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20250515115107.33052-3-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:28 -07:00
Christoph Hellwig
90c9550a8d NFS: support the kernel keyring for TLS
Allow tlshd to use a per-mount key from the kernel keyring similar
to NVMe over TCP.

Note that tlshd expects keys and certificates stored in the kernel
keyring to be in DER format, not the PEM format used for file based keys
and certificates, so they need to be converted before they are added
to the keyring, which is a bit unexpected.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20250515115107.33052-2-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:27 -07:00
Trond Myklebust
72508db0fe NFS: Allow folio migration for the case of mode == MIGRATE_SYNC
When the mode is MIGRATE_SYNC, we are allowed to call nfs_wb_folio()
under the folio lock.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:27 -07:00
Jeff Layton
b0b7cdc994 nfs: new tracepoint in match_stateid operation
Add new tracepoints in the NFSv4 match_stateid minorversion op that show
the info in both stateids.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250618-nfs-tracepoints-v2-4-540c9fb48da2@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:27 -07:00
Jeff Layton
5dd03d14b3 nfs: new tracepoint in nfs_delegation_need_return
Add a tracepoint in the function that decides whether to return a
delegation to the server.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250618-nfs-tracepoints-v2-3-540c9fb48da2@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:27 -07:00
Jeff Layton
0139a30ada nfs: add a tracepoint to nfs_inode_detach_delegation_locked
We have tracepoints for setting a delegation and reclaiming them. Add a
tracepoint for when the delegation is being detached from the inode.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250618-nfs-tracepoints-v2-2-540c9fb48da2@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:27 -07:00
Jeff Layton
8c206b0a12 nfs: add cache_validity to the nfs_inode_event tracepoints
Managing the cache_validity flags is the deep voodoo of NFS cache
coherency. Let's have a little extra visibility into that value via the
nfs_inode_event tracepoints.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250618-nfs-tracepoints-v2-1-540c9fb48da2@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:27 -07:00
Anthony Iliopoulos
74a33326cf NFS: remove unused time_delta field from struct nfs_server
The last code that was using this was removed via commit ca0daa277a
("NFS: Cache aggressively when file is open for writing") which was
merged in v4.8-rc1, so it can be removed completely.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Link: https://lore.kernel.org/r/20250613094439.82338-3-ailiop@suse.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:21 -07:00
Anthony Iliopoulos
0715a72ee9 NFS: remove unused wpages field from struct nfs_server
The wpages field is not serving any purpose since commit c63c7b0513
("NFS: Fix a race when doing NFS write coalescing") which was merged in
v2.6.22-rc1. Remove it completely.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Link: https://lore.kernel.org/r/20250613094439.82338-2-ailiop@suse.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:19 -07:00
Tigran Mkrtchyan
a9e2183720 pnfs: add pnfs_ds_connect trace point
This tracepoint aims to expose pnfs DS connect status

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/20250610151246.9147-1-tigran.mkrtchyan@desy.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:03 -07:00
NeilBrown
c1b0b9d79f nfs: use lock_two_nondirectories()
Rather than open-coding this function call it to make intention clear
and to use "correct" nesting levels (parent and child are for
directories).

This is purely cosmetic with no expected change in behaviour.

Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/174942091741.608730.3327223511347232829@noble.neil.brown.name
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:02 -07:00
Trond Myklebust
4b54274147 NFS: Return the file btime in the statx results when appropriate
If the server supports the NFSv4.x "create_time" attribute, then return
it as part of the statx results.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/eae27d6467e08aaa67e0ac6ae7119263a0f83349.1748515333.git.bcodding@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:02 -07:00
Anne Marie Merritt
1c7ae2dd3f nfs: Add timecreate to nfs inode
Add tracking of the create time (a.k.a. btime) along with corresponding
bitfields, request, and decode xdr routines.

Signed-off-by: Anne Marie Merritt <annemarie.merritt@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/1e3677b0655fa2bbaba0817b41d111d94a06e5ee.1748515333.git.bcodding@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:02 -07:00
Trond Myklebust
ce60ab3964 Expand the type of nfs_fattr->valid
We need to be able to track more than 32 attributes per inode.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/1e3405fca54efd0be7c91c1da77917b94f5dfcc4.1748515333.git.bcodding@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14 15:20:02 -07:00
Christoph Hellwig
a8fb49c6ab mm: remove the for_reclaim field from struct writeback_control
This field is now only set to one in the i915 gem code that only calls
writeback_iter on it, which ignores the flag.  All other checks are thuse
dead code and the field can be removed.

Link: https://lkml.kernel.org/r/20250610054959.2057526-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:41:58 -07:00
Al Viro
19a6314a99 rpc_mkpipe_dentry(): saner calling conventions
Instead of returning a dentry or ERR_PTR(-E...), return 0 and store
dentry into pipe->dentry on success and return -E... on failure.

Callers are happier that way...

NOTE: dummy rpc_pipe is getting ->dentry set; we never access that,
since we
	1) never call rpc_unlink() for it (dentry is taken out by
->kill_sb())
	2) never call rpc_queue_upcall() for it (writing to that
sucker fails; no downcalls are ever submitted, so no replies are
going to arrive)
IOW, having that ->dentry set (and left dangling) is harmless,
if ugly; cleaner solution will take more massage.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Al Viro
bccea4ed06 rpc_unlink(): saner calling conventions
1) pass it pipe instead of pipe->dentry
2) zero pipe->dentry afterwards
3) it always returns 0; why bother?

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:44:55 -04:00
Trond Myklebust
38074de35b NFSv4/flexfiles: Fix handling of NFS level errors in I/O
Allow the flexfiles error handling to recognise NFS level errors (as
opposed to RPC level errors) and handle them separately. The main
motivator is the NFSERR_PERM errors that get returned if the NFS client
connects to the data server through a port number that is lower than
1024. In that case, the client should disconnect and retry a READ on a
different data server, or it should retry a WRITE after reconnecting.

Reviewed-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-06-26 13:46:44 -04:00
Benjamin Coddington
c017762874 NFSv4/pNFS: Fix a race to wake on NFS_LAYOUT_DRAIN
We found a few different systems hung up in writeback waiting on the same
page lock, and one task waiting on the NFS_LAYOUT_DRAIN bit in
pnfs_update_layout(), however the pnfs_layout_hdr's plh_outstanding count
was zero.

It seems most likely that this is another race between the waiter and waker
similar to commit ed0172af5d ("SUNRPC: Fix a race to wake a sync task").
Fix it up by applying the advised barrier.

Fixes: 880265c77a ("pNFS: Avoid a live lock condition in pnfs_update_layout()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-06-23 11:01:16 -04:00
Kuniyuki Iwashima
e8d6f3ab59 nfs: Clean up /proc/net/rpc/nfs when nfs_fs_proc_net_init() fails.
syzbot reported a warning below [1] following a fault injection in
nfs_fs_proc_net_init(). [0]

When nfs_fs_proc_net_init() fails, /proc/net/rpc/nfs is not removed.

Later, rpc_proc_exit() tries to remove /proc/net/rpc, and the warning
is logged as the directory is not empty.

Let's handle the error of nfs_fs_proc_net_init() properly.

[0]:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 1 UID: 0 PID: 6120 Comm: syz.2.27 Not tainted 6.16.0-rc1-syzkaller-00010-g2c4a1f3fe03e #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
 <TASK>
  dump_stack_lvl (lib/dump_stack.c:123)
 should_fail_ex (lib/fault-inject.c:73 lib/fault-inject.c:174)
 should_failslab (mm/failslab.c:46)
 kmem_cache_alloc_noprof (mm/slub.c:4178 mm/slub.c:4204)
 __proc_create (fs/proc/generic.c:427)
 proc_create_reg (fs/proc/generic.c:554)
 proc_create_net_data (fs/proc/proc_net.c:120)
 nfs_fs_proc_net_init (fs/nfs/client.c:1409)
 nfs_net_init (fs/nfs/inode.c:2600)
 ops_init (net/core/net_namespace.c:138)
 setup_net (net/core/net_namespace.c:443)
 copy_net_ns (net/core/net_namespace.c:576)
 create_new_namespaces (kernel/nsproxy.c:110)
 unshare_nsproxy_namespaces (kernel/nsproxy.c:218 (discriminator 4))
 ksys_unshare (kernel/fork.c:3123)
 __x64_sys_unshare (kernel/fork.c:3190)
 do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
 </TASK>

[1]:
remove_proc_entry: removing non-empty directory 'net/rpc', leaking at least 'nfs'
 WARNING: CPU: 1 PID: 6120 at fs/proc/generic.c:727 remove_proc_entry+0x45e/0x530 fs/proc/generic.c:727
Modules linked in:
CPU: 1 UID: 0 PID: 6120 Comm: syz.2.27 Not tainted 6.16.0-rc1-syzkaller-00010-g2c4a1f3fe03e #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
 RIP: 0010:remove_proc_entry+0x45e/0x530 fs/proc/generic.c:727
Code: 3c 02 00 0f 85 85 00 00 00 48 8b 93 d8 00 00 00 4d 89 f0 4c 89 e9 48 c7 c6 40 ba a2 8b 48 c7 c7 60 b9 a2 8b e8 33 81 1d ff 90 <0f> 0b 90 90 e9 5f fe ff ff e8 04 69 5e ff 90 48 b8 00 00 00 00 00
RSP: 0018:ffffc90003637b08 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88805f534140 RCX: ffffffff817a92c8
RDX: ffff88807da99e00 RSI: ffffffff817a92d5 RDI: 0000000000000001
RBP: ffff888033431ac0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff888033431a00
R13: ffff888033431ae4 R14: ffff888033184724 R15: dffffc0000000000
FS:  0000555580328500(0000) GS:ffff888124a62000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f71733743e0 CR3: 000000007f618000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  sunrpc_exit_net+0x46/0x90 net/sunrpc/sunrpc_syms.c:76
  ops_exit_list net/core/net_namespace.c:200 [inline]
  ops_undo_list+0x2eb/0xab0 net/core/net_namespace.c:253
  setup_net+0x2e1/0x510 net/core/net_namespace.c:457
  copy_net_ns+0x2a6/0x5f0 net/core/net_namespace.c:574
  create_new_namespaces+0x3ea/0xa90 kernel/nsproxy.c:110
  unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:218
  ksys_unshare+0x45b/0xa40 kernel/fork.c:3121
  __do_sys_unshare kernel/fork.c:3192 [inline]
  __se_sys_unshare kernel/fork.c:3190 [inline]
  __x64_sys_unshare+0x31/0x40 kernel/fork.c:3190
  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
  do_syscall_64+0xcd/0x490 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa1a6b8e929
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff3a090368 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00007fa1a6db5fa0 RCX: 00007fa1a6b8e929
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000080
RBP: 00007fa1a6c10b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa1a6db5fa0 R14: 00007fa1a6db5fa0 R15: 0000000000000001
 </TASK>

Fixes: d47151b79e ("nfs: expose /proc/net/sunrpc/nfs in net namespaces")
Reported-by: syzbot+a4cc4ac22daa4a71b87c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a4cc4ac22daa4a71b87c
Tested-by: syzbot+a4cc4ac22daa4a71b87c@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-06-23 11:01:16 -04:00