linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-01 15:14:52 +00:00

Author	SHA1	Message	Date
Linus Torvalds	0a08238acf	xfs bug fixes 6.14-rc2 Signed-off-by: Carlos Maiolino <cem@kernel.org> -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQSmtYVZ/MfVMGUq1GNcsMJ8RxYuYwUCZ6C1iQAKCRBcsMJ8RxYu Y1k+AX9qlNEvvzfcJRP7wcqjX64B0IIpuBNd86flWIYcRUug9d/vIExKvRF2A+t7 x1kc5bsBgONUBNY+xDTMAXU58MnONDIWoy9oeP/WXGaXWelN5TKlVDhHU30gXNT1 CoDNOQ/B1w== =pduU -----END PGP SIGNATURE----- Merge tag 'xfs-fixes-6.14-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs bug fixes from Carlos Maiolino: "A few fixes for XFS, but the most notable one is: - xfs: remove xfs_buf_cache.bc_lock which has been hit by different persons including syzbot" * tag 'xfs-fixes-6.14-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove xfs_buf_cache.bc_lock xfs: Add error handling for xfs_reflink_cancel_cow_range xfs: Propagate errors from xfs_reflink_cancel_cow_range in xfs_dax_write_iomap_end xfs: don't call remap_verify_area with sb write protection held xfs: remove an out of data comment in _xfs_buf_alloc xfs: fix the entry condition of exact EOF block allocation optimization	2025-02-03 08:51:24 -08:00
Joel Granados	1751f872cc	treewide: const qualify ctl_tables where applicable Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit `78eb4ea25c` ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>	2025-01-28 13:48:37 +01:00
Christoph Hellwig	a9ab28b3d2	xfs: remove xfs_buf_cache.bc_lock xfs_buf_cache.bc_lock serializes adding buffers to and removing them from the hashtable. But as the rhashtable code already uses fine grained internal locking for inserts and removals the extra protection isn't actually required. It also happens to fix a lock order inversion vs b_lock added by the recent lookup race fix. Fixes: `ee10f6fcdb` ("xfs: fix buffer lookup vs release race") Reported-by: Lai, Yi <yi1.lai@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-28 11:18:22 +01:00
Wentao Liang	26b63bee2f	xfs: Add error handling for xfs_reflink_cancel_cow_range In xfs_inactive(), xfs_reflink_cancel_cow_range() is called without error handling, risking unnoticed failures and inconsistent behavior compared to other parts of the code. Fix this issue by adding an error handling for the xfs_reflink_cancel_cow_range(), improving code robustness. Fixes: `6231848c3a` ("xfs: check for cow blocks before trying to clear them") Cc: stable@vger.kernel.org # v4.17 Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-27 11:42:48 +01:00
Wentao Liang	fb95897b8c	xfs: Propagate errors from xfs_reflink_cancel_cow_range in xfs_dax_write_iomap_end In xfs_dax_write_iomap_end(), directly return the result of xfs_reflink_cancel_cow_range() when !written, ensuring proper error propagation and improving code robustness. Fixes: `ea6c49b784` ("xfs: support CoW in fsdax mode") Cc: stable@vger.kernel.org # v6.0 Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-27 11:40:57 +01:00
Linus Torvalds	9c5968db9e	The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec. - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones. - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest. - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code. - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups. - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code. - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c. - "mm/page_alloc: gfp flags cleanups for alloc_contig_()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator. - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading. - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated (https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/). Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED). - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled. - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL. - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests. - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size. - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic. - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated. - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated. - "move pagetable__dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed. - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic. - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy. - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic. - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions. - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed. - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting. - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface. - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior. - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi "introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors." - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram. - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal. - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance. - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation. - "Cleanup for memfd_create()" from Isaac Manjarres does that thing. - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration. - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices. - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5a+cwAKCRDdBJ7gKXxA jtoyAP9R58oaOKPJuTizEKKXvh/RpMyD6sYcz/uPpnf+cKTZxQEAqfVznfWlw/Lz uC3KRZYhmd5YrxU4o+qjbzp9XWX/xAE= =Ib2s -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable__dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ...	2025-01-26 18:36:23 -08:00
Luiz Capitulino	6bf9b5b40a	mm: alloc_pages_bulk: rename API The previous commit removed the page_list argument from alloc_pages_bulk_noprof() along with the alloc_pages_bulk_list() function. Now that only the *_array() flavour of the API remains, we can do the following renaming (along with the _noprof() ones): alloc_pages_bulk_array -> alloc_pages_bulk alloc_pages_bulk_array_mempolicy -> alloc_pages_bulk_mempolicy alloc_pages_bulk_array_node -> alloc_pages_bulk_node Link: https://lkml.kernel.org/r/275a3bbc0be20fbe9002297d60045e67ab3d4ada.1734991165.git.luizcap@redhat.com Signed-off-by: Luiz Capitulino <luizcap@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-01-25 20:22:31 -08:00
Christoph Hellwig	f5f0ed89f1	xfs: don't call remap_verify_area with sb write protection held The XFS_IOC_EXCHANGE_RANGE ioctl with the XFS_EXCHANGE_RANGE_TO_EOF flag operates on a range bounded by the end of the file. This means the actual amount of blocks exchanged is derived from the inode size, which is only stable with the IOLOCK (i_rwsem) held. Do that, it currently calls remap_verify_area from inside the sb write protection which nests outside the IOLOCK. But this makes fsnotify_file_area_perm which is called from remap_verify_area unhappy when the kernel is built with lockdep and the recently added CONFIG_FANOTIFY_ACCESS_PERMISSIONS option. Fix this by always calling remap_verify_area before taking the write protection, and passing a 0 size to remap_verify_area similar to the FICLONE/FICLONERANGE ioctls when they are asked to clone until the file end. (Note: the size argument gets passed to fsnotify_file_area_perm, but then isn't actually used there). Fixes: `9a64d9b310` ("xfs: introduce new file range exchange ioctl") Cc: <stable@vger.kernel.org> # v6.10 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-24 12:08:50 +01:00
Christoph Hellwig	89841b2380	xfs: remove an out of data comment in _xfs_buf_alloc There hasn't been anything like an io_length for a long time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-24 12:06:18 +01:00
Jinliang Zheng	915175b49f	xfs: fix the entry condition of exact EOF block allocation optimization When we call create(), lseek() and write() sequentially, offset != 0 cannot be used as a judgment condition for whether the file already has extents. Furthermore, when xfs_bmap_adjacent() has not given a better blkno, it is not necessary to use exact EOF block allocation. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-24 12:05:12 +01:00
Linus Torvalds	8883957b3c	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmePs7oACgkQnJ2qBz9k QNmHuAf9GkLnY5u1/81xP5V9ukZ4N2yeMW0dydLS5cjWj/St5ELeMAza3jeqtJtD j36vbnmy2c5pPaGLAK8BJpMXT/R2TkmmKD004zcfqF2S3SgbGzdgO1zMZzq9KJpM woRKZtLuglDajedsDEBBcKotBhlN2+C/sQlFuL1mX4zitk9ajr0qYUB1+JqOeg5f qwPsDLT077ADpxd7lVIMcm+OqbduP5KWkBKYHpn7lJcLe1eqVMMzceJroW42zhVG Dq8Iln26bbU9Wx6FSPFCUcHEzHRHUfXmu07HN9U0X++0QgWjrmBQQLooGFB/bR4a edBrPpVas6xE4/brjgFX3gOKtv8xYg== =ewDV -----END PGP SIGNATURE----- Merge tag 'fsnotify_hsm_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify pre-content notification support from Jan Kara: "This introduces a new fsnotify event (FS_PRE_ACCESS) that gets generated before a file contents is accessed. The event is synchronous so if there is listener for this event, the kernel waits for reply. On success the execution continues as usual, on failure we propagate the error to userspace. This allows userspace to fill in file content on demand from slow storage. The context in which the events are generated has been picked so that we don't hold any locks and thus there's no risk of a deadlock for the userspace handler. The new pre-content event is available only for users with global CAP_SYS_ADMIN capability (similarly to other parts of fanotify functionality) and it is an administrator responsibility to make sure the userspace event handler doesn't do stupid stuff that can DoS the system. Based on your feedback from the last submission, fsnotify code has been improved and now file->f_mode encodes whether pre-content event needs to be generated for the file so the fast path when nobody wants pre-content event for the file just grows the additional file->f_mode check. As a bonus this also removes the checks whether the old FS_ACCESS event needs to be generated from the fast path. Also the place where the event is generated during page fault has been moved so now filemap_fault() generates the event if and only if there is no uptodate folio in the page cache. Also we have dropped FS_PRE_MODIFY event as current real-world users of the pre-content functionality don't really use it so let's start with the minimal useful feature set" * tag 'fsnotify_hsm_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (21 commits) fanotify: Fix crash in fanotify_init(2) fs: don't block write during exec on pre-content watched files fs: enable pre-content events on supported file systems ext4: add pre-content fsnotify hook for DAX faults btrfs: disable defrag on pre-content watched files xfs: add pre-content fsnotify hook for DAX faults fsnotify: generate pre-content permission event on page fault mm: don't allow huge faults for files with pre content watches fanotify: disable readahead if we have pre-content watches fanotify: allow to set errno in FAN_DENY permission response fanotify: report file range info with pre-content events fanotify: introduce FAN_PRE_ACCESS permission event fsnotify: generate pre-content permission event on truncate fsnotify: pass optional file access range in pre-content event fsnotify: introduce pre-content permission events fanotify: reserve event bit of deprecated FAN_DIR_MODIFY fanotify: rename a misnamed constant fanotify: don't skip extra event info if no info_mode is set fsnotify: check if file is actually being watched for pre-content events on open fsnotify: opt-in for permission events at file open time ...	2025-01-23 13:36:06 -08:00
Linus Torvalds	b477ff98d9	New XFS code for 6.14 * Implement reflink support for the realtime device * Implement reverse-mapping support for the realtime device * Several bug fixes and cleanups Signed-off-by: Carlos Maiolino <cem@kernel.org> -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQSmtYVZ/MfVMGUq1GNcsMJ8RxYuYwUCZ490YgAKCRBcsMJ8RxYu Y0ERAX9ClltZD861L93lA30CnQwqlWNTdZDu/YB0LfPw2hM6jVIp6WLJIW7kz08q jvJMD8UBf0jRKI0I5OlD2Vw60kB1GmliUPo4QlkAzy9YNLtAg8OO2mGbEIJA21oS y3rAdzPBFg== =K947 -----END PGP SIGNATURE----- Merge tag 'xfs-merge-6.14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull XFS updates from Carlos Maiolino: "This is mostly focused on the implementation of reflink and reverse-mapping support for XFS's real-time devices. It also includes several bugfixes. - Implement reflink support for the realtime device - Implement reverse-mapping support for the realtime device - Several bug fixes and cleanups" * tag 'xfs-merge-6.14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (121 commits) xfs: fix buffer lookup vs release race xfs: check for dead buffers in xfs_buf_find_insert xfs: add a b_iodone callback to struct xfs_buf xfs: move b_li_list based retry handling to common code xfs: simplify xfsaild_resubmit_item xfs: always complete the buffer inline in xfs_buf_submit xfs: remove the extra buffer reference in xfs_buf_submit xfs: move invalidate_kernel_vmap_range to xfs_buf_ioend xfs: simplify buffer I/O submission xfs: move in-memory buftarg handling out of _xfs_buf_ioapply xfs: move write verification out of _xfs_buf_ioapply xfs: remove xfs_buf_delwri_submit_buffers xfs: simplify xfs_buf_delwri_pushbuf xfs: move xfs_buf_iowait out of (__)xfs_buf_submit xfs: remove the incorrect comment about the b_pag field xfs: remove the incorrect comment above xfs_buf_free_maps xfs: fix a double completion for buffers on in-memory targets xfs/libxfs: replace kmalloc() and memcpy() with kmemdup() xfs: constify feature checks xfs: refactor xfs_fs_statfs ...	2025-01-23 13:06:42 -08:00
Linus Torvalds	47c9f2b3c8	vfs-6.14-rc1.statx.dio -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ4pSTwAKCRCRxhvAZXjc oiSbAQCIWp8Jm2FX9Mv+eX8erLFlyQSDQauAnqPtW/SvMbpgFgEAuZOTSRU3GSqI NiowpYms9OckO638GlNHlSTUTcV4YwU= =obHT -----END PGP SIGNATURE----- Merge tag 'vfs-6.14-rc1.statx.dio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs direct-io updates from Christian Brauner: "File systems that write out of place usually require different alignment for direct I/O writes than what they can do for reads. Add a separate dio read align field to statx, as many out of place write file systems can easily do reads aligned to the device sector size, but require bigger alignment for writes. This is usually papered over by falling back to buffered I/O for smaller writes and doing read-modify-write cycles, but performance for this sucks, so applications benefit from knowing the actual write alignment" * tag 'vfs-6.14-rc1.statx.dio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: report larger dio alignment for COW inodes xfs: report the correct read/write dio alignment for reflinked inodes xfs: cleanup xfs_vn_getattr fs: add STATX_DIO_READ_ALIGN fs: reformat the statx definition	2025-01-20 11:16:50 -08:00
Christoph Hellwig	ee10f6fcdb	xfs: fix buffer lookup vs release race Since commit `298f342245` ("xfs: lockless buffer lookup") the buffer lookup fastpath is done without a hash-wide lock (then pag_buf_lock, now bc_lock) and only under RCU protection. But this means that nothing serializes lookups against the temporary 0 reference count for buffers that are added to the LRU after dropping the last regular reference, and a concurrent lookup would fail to find them. Fix this by doing all b_hold modifications under b_lock. We're already doing this for release so this "only" ~ doubles the b_lock round trips. We'll later look into the lockref infrastructure to optimize the number of lock round trips again. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-16 10:19:59 +01:00
Christoph Hellwig	07eae0fa67	xfs: check for dead buffers in xfs_buf_find_insert Commit `32dd4f9c50` ("xfs: remove a superflous hash lookup when inserting new buffers") converted xfs_buf_find_insert to use rhashtable_lookup_get_insert_fast and thus an operation that returns the existing buffer when an insert would duplicate the hash key. But this code path misses the check for a buffer with a reference count of zero, which could lead to reusing an about to be freed buffer. Fix this by using the same atomic_inc_not_zero pattern as xfs_buf_insert. Fixes: `32dd4f9c50` ("xfs: remove a superflous hash lookup when inserting new buffers") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: stable@vger.kernel.org # v6.0 Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-16 10:19:59 +01:00
Christoph Hellwig	4e35be63c4	xfs: add a b_iodone callback to struct xfs_buf Stop open coding the log item completions and instead add a callback into back into the submitter. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:15 +01:00
Christoph Hellwig	4f1aefd13e	xfs: move b_li_list based retry handling to common code The dquot and inode version are very similar, which is expected given the overall b_li_list logic. The differences are that the inode version also clears the XFS_LI_FLUSHING which is defined in common but only ever set by the inode item, and that the dquot version takes the ail_lock over the list iteration. While this seems sensible given that additions and removals from b_li_list are protected by the ail_lock, log items are only added before buffer submission, and are only removed when completing the buffer, so nothing can change the list when retrying a buffer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:15 +01:00
Christoph Hellwig	46eba93d4f	xfs: simplify xfsaild_resubmit_item Since commit `acc8f8628c` ("xfs: attach dquot buffer to dquot log item buffer") all buf items that use bp->b_li_list are explicitly checked for in the branch to just clears XFS_LI_FAILED. Remove the dead arm that calls xfs_clear_li_failed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:15 +01:00
Christoph Hellwig	819f29cc7b	xfs: always complete the buffer inline in xfs_buf_submit xfs_buf_submit now only completes a buffer on error, or for in-memory buftargs. There is no point in using a workqueue for the latter as the completion will just wake up the caller. Optimize this case by avoiding the workqueue roundtrip. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:15 +01:00
Christoph Hellwig	6dca5abb3d	xfs: remove the extra buffer reference in xfs_buf_submit Nothing touches the buffer after it has been submitted now, so the need for the extra transient reference went away as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:15 +01:00
Christoph Hellwig	5c82a471c2	xfs: move invalidate_kernel_vmap_range to xfs_buf_ioend Invalidating cache lines can be fairly expensive, so don't do it in interrupt context. Note that in practice very few setup will actually do anything here as virtually indexed caches are rather uncommon, but we might as well move the call to the proper place while touching this area. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:15 +01:00
Christoph Hellwig	fac69ec8cd	xfs: simplify buffer I/O submission The code in _xfs_buf_ioapply is unnecessarily complicated because it doesn't take advantage of modern bio features. Simplify it by making use of bio splitting and chaining, that is build a single bio for the pages in the buffer using a simple loop, and then split that bio on the map boundaries for discontiguous multi-FSB buffers and chain the split bios to the main one so that there is only a single I/O completion. This not only simplifies the code to build the buffer, but also removes the need for the b_io_remaining field as buffer ownership is granted to the bio on submit of the final bio with no chance for a completion before that as well as the b_io_error field that is now superfluous because there always is exactly one completion. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:15 +01:00
Christoph Hellwig	8db65d312b	xfs: move in-memory buftarg handling out of _xfs_buf_ioapply No I/O to apply for in-memory buffers, so skip the function call entirely. Clean up the b_io_error initialization logic to allow for this. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:14 +01:00
Christoph Hellwig	0195647aba	xfs: move write verification out of _xfs_buf_ioapply Split the write verification logic out of _xfs_buf_ioapply into a new xfs_buf_verify_write helper called by xfs_buf_submit given that it isn't about applying the I/O and doesn't really fit in with the rest of _xfs_buf_ioapply. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:14 +01:00
Christoph Hellwig	72842dbc2b	xfs: remove xfs_buf_delwri_submit_buffers xfs_buf_delwri_submit_buffers has two callers for synchronous and asynchronous writes that share very little logic. Split out a helper for the shared per-buffer loop and otherwise open code the submission in the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:14 +01:00
Christoph Hellwig	eb43b0b5ca	xfs: simplify xfs_buf_delwri_pushbuf xfs_buf_delwri_pushbuf synchronously writes a buffer that is on a delwri list already. Instead of doing a complicated dance with the delwri and wait list, just leave them alone and open code the actual buffer write. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:14 +01:00
Christoph Hellwig	05b5968f33	xfs: move xfs_buf_iowait out of (__)xfs_buf_submit There is no good reason to pass a bool argument to wait for a buffer when the callers that want that can easily just wait themselves. This means the wait moves out of the extra hold of the buffer, but as the callers of synchronous buffer I/O need to hold a reference anyway that is perfectly fine. Because all async buffer submitters ignore the error return value, and the synchronous ones catch the error condition through b_error and xfs_buf_iowait this also means the new xfs_buf_submit doesn't have to return an error code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:14 +01:00
Christoph Hellwig	411ff3f738	xfs: remove the incorrect comment about the b_pag field The rbtree root is long gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:14 +01:00
Christoph Hellwig	83e9c69dcf	xfs: remove the incorrect comment above xfs_buf_free_maps The comment above xfs_buf_free_maps talks about fields not even used in the function and also doesn't add any other value. Remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:14 +01:00
Christoph Hellwig	cbd6883ed8	xfs: fix a double completion for buffers on in-memory targets __xfs_buf_submit calls xfs_buf_ioend when b_io_remaining hits zero. For in-memory buftargs b_io_remaining is never incremented from it's initial value of 1, so this always happens. Thus the extra call to xfs_buf_ioend in _xfs_buf_ioapply causes a double completion. Fortunately __xfs_buf_submit is only used for synchronous reads on in-memory buftargs due to the peculiarities of how they work, so this is mostly harmless and just causes a little extra work to be done. Fixes: `5076a6040c` ("xfs: support in-memory buffer cache targets") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-14 11:38:14 +01:00
Mirsad Todorovac	9d9b724726	xfs/libxfs: replace kmalloc() and memcpy() with kmemdup() The source static analysis tool gave the following advice: ./fs/xfs/libxfs/xfs_dir2.c:382:15-22: WARNING opportunity for kmemdup → 382 args->value = kmalloc(len, 383 GFP_KERNEL \| __GFP_NOLOCKDEP \| __GFP_RETRY_MAYFAIL); 384 if (!args->value) 385 return -ENOMEM; 386 → 387 memcpy(args->value, name, len); 388 args->valuelen = len; 389 return -EEXIST; Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics. Original code works without fault, so this is not a bug fix but proposed improvement. Link: https://lwn.net/Articles/198928/ Fixes: `94a69db236` ("xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS") Fixes: `384f3ced07` ("[XFS] Return case-insensitive match for dentry cache") Fixes: `2451337dd0` ("xfs: global error sign conversion") Cc: Carlos Maiolino <cem@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Chandan Babu R <chandanbabu@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: linux-xfs@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:58:04 +01:00
Christoph Hellwig	183d988ae9	xfs: constify feature checks They will eventually be needed to be const for zoned growfs, but even now having such simpler helpers as const as possible is a good thing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:57:08 +01:00
Christoph Hellwig	dd324cb79e	xfs: refactor xfs_fs_statfs Split out helpers for data, rt data and inode related information, and assigning f_bavail once instead of in three places. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:56:30 +01:00
Christoph Hellwig	72843ca624	xfs: don't take m_sb_lock in xfs_fs_statfs The only non-constant value read under m_sb_lock in xfs_fs_statfs is sb_dblocks, and it could become stale right after dropping the lock anyway. Remove the thus pointless lock section. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:56:24 +01:00
Christoph Hellwig	f4752daf47	xfs: fix the comment above xfs_discard_endio pagb_lock has been replaced with eb_lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:56:20 +01:00
Long Li	09f7680dea	xfs: remove bp->b_error check in xfs_attr3_root_inactive The b_error check right after xfs_trans_get_buf() is redundant: 1) If the buffer is found in transaction via xfs_trans_buf_item_match(), any corrupted metadata error would have already been exposed during previous reads like xfs_da3_node_read(). 2) If the buffer is obtained via xfs_buf_get_map(): - It's called without XBF_READ flag, so won't return buffer with b_error set, since xfs_buf_get_map() will clear it anyway. - Buffer found in cache normally won't have error since previous reads had checked it, unless someone corrupts the buffer and the AIL pushes it out to disk while the buffer's unlocked. But in this case, AIL will shut down the log. Remove this redundant check to simplify the code, make the code consistent with most other xfs_trans_get_buf() callers in XFS. Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:56:15 +01:00
Long Li	adcaff355b	xfs: remove redundant update for ticket->t_curr_res in xfs_log_ticket_regrant The current reservation of the log ticket has already been updated a few lines above in xfs_log_ticket_regrant(), so there is no need to update it again. This is just a code cleanup with no functional changes. Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:56:09 +01:00
Long Li	99fc33d16b	xfs: clean up xfs_end_ioend() to reuse local variables Use already initialized local variables 'offset' and 'size' instead of accessing ioend members directly in xfs_setfilesize() call. This is just a code cleanup with no functional changes. Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:56:02 +01:00
Long Li	efebe42d95	xfs: fix mount hang during primary superblock recovery failure When mounting an image containing a log with sb modifications that require log replay, the mount process hang all the time and stack as follows: [root@localhost ~]# cat /proc/557/stack [<0>] xfs_buftarg_wait+0x31/0x70 [<0>] xfs_buftarg_drain+0x54/0x350 [<0>] xfs_mountfs+0x66e/0xe80 [<0>] xfs_fs_fill_super+0x7f1/0xec0 [<0>] get_tree_bdev_flags+0x186/0x280 [<0>] get_tree_bdev+0x18/0x30 [<0>] xfs_fs_get_tree+0x1d/0x30 [<0>] vfs_get_tree+0x2d/0x110 [<0>] path_mount+0xb59/0xfc0 [<0>] do_mount+0x92/0xc0 [<0>] __x64_sys_mount+0xc2/0x160 [<0>] x64_sys_call+0x2de4/0x45c0 [<0>] do_syscall_64+0xa7/0x240 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e During log recovery, while updating the in-memory superblock from the primary SB buffer, if an error is encountered, such as superblock corruption occurs or some other reasons, we will proceed to out_release and release the xfs_buf. However, this is insufficient because the xfs_buf's log item has already been initialized and the xfs_buf is held by the buffer log item as follows, the xfs_buf will not be released, causing the mount thread to hang. xlog_recover_do_primary_sb_buffer xlog_recover_do_reg_buffer xlog_recover_validate_buf_type xfs_buf_item_init(bp, mp) The solution is straightforward, we simply need to allow it to be handled by the normal buffer write process. The filesystem will be shutdown before the submission of buffer_list in xlog_do_recovery_pass(), ensuring the correct release of the xfs_buf as follows: xlog_do_recovery_pass error = xlog_recover_process xlog_recover_process_data xlog_recover_process_ophdr xlog_recovery_process_trans ... xlog_recover_buf_commit_pass2 error = xlog_recover_do_primary_sb_buffer //Encounter error and return if (error) goto out_writebuf ... out_writebuf: xfs_buf_delwri_queue(bp, buffer_list) //add bp to list return error ... if (!list_empty(&buffer_list)) if (error) xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); //shutdown first xfs_buf_delwri_submit(&buffer_list); //submit buffers in list __xfs_buf_submit if (bp->b_mount->m_log && xlog_is_shutdown(bp->b_mount->m_log)) xfs_buf_ioend_fail(bp) //release bp correctly Fixes: `6a18765b54` ("xfs: update the file system geometry after recoverying superblock buffers") Cc: stable@vger.kernel.org # v6.12 Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:55:37 +01:00
Christoph Hellwig	471511d6ef	xfs: remove the t_magic field in struct xfs_trans The t_magic field is only ever assigned to, but never read. Remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:55:19 +01:00
Christoph Hellwig	415dee1e06	xfs: remove XFS_ILOG_NONCORE XFS_ILOG_NONCORE is not used in the kernel code or xfsprogs, remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:55:14 +01:00
Christoph Hellwig	23ebf63925	xfs: mark xfs_dir_isempty static And return bool instead of a boolean condition as int. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-13 14:55:06 +01:00
Carlos Maiolino	156d1c389c	xfs: reflink on the realtime device [v6.2 05/14] This patchset enables use of the file data block sharing feature (i.e. reflink) on the realtime device. It follows the same basic sequence as the realtime rmap series -- first a few cleanups; then introduction of the new btree format and inode fork format. Next comes enabling CoW and remapping for the rt device; new scrub, repair, and health reporting code; and at the end we implement some code to lengthen write requests so that rt extents are always CoWed fully. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZ2nQ6gAKCRBKO3ySh0YR phxFAQDsng0DbICm7bQPkjOxCG8SGMv0DRIABYe7kmk4NJuZkQEA/mHiDSA6CLg+ bmqS9dfMRMSG/hl+9+Vp3e3CsZgMxQ4= =dWPC -----END PGP SIGNATURE----- Merge tag 'realtime-reflink_2024-12-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into for-next xfs: reflink on the realtime device [v6.2 05/14] This patchset enables use of the file data block sharing feature (i.e. reflink) on the realtime device. It follows the same basic sequence as the realtime rmap series -- first a few cleanups; then introduction of the new btree format and inode fork format. Next comes enabling CoW and remapping for the rt device; new scrub, repair, and health reporting code; and at the end we implement some code to lengthen write requests so that rt extents are always CoWed fully. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>	2025-01-13 14:54:52 +01:00
Carlos Maiolino	a938bbe473	xfs: realtime reverse-mapping support [v6.2 04/14] This is the latest revision of a patchset that adds to XFS kernel support for reverse mapping for the realtime device. This time around I've fixed some of the bitrot that I've noticed over the past few months, and most notably have converted rtrmapbt to use the metadata inode directory feature instead of burning more space in the superblock. At the beginning of the set are patches to implement storing B+tree leaves in an inode root, since the realtime rmapbt is rooted in an inode, unlike the regular rmapbt which is rooted in an AG block. Prior to this, the only btree that could be rooted in the inode fork was the block mapping btree; if all the extent records fit in the inode, format would be switched from 'btree' to 'extents'. The next few patches enhance the reverse mapping routines to handle the parts that are specific to rtgroups -- adding the new btree type, adding a new log intent item type, and wiring up the metadata directory tree entries. Finally, implement GETFSMAP with the rtrmapbt and scrub functionality for the rtrmapbt and rtbitmap and online fsck functionality. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZ2nQ6gAKCRBKO3ySh0YR puNFAP4j44Z3j+WEMSONKXLgl3Goo7Xahm/teGTzSN42l7lUtQEA5zhKcvDgiNIe KXxtJcRmHCqq8qaxqcnM6vlcHok92wM= =XjiL -----END PGP SIGNATURE----- Merge tag 'realtime-rmap_2024-12-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into for-next xfs: realtime reverse-mapping support [v6.2 04/14] This is the latest revision of a patchset that adds to XFS kernel support for reverse mapping for the realtime device. This time around I've fixed some of the bitrot that I've noticed over the past few months, and most notably have converted rtrmapbt to use the metadata inode directory feature instead of burning more space in the superblock. At the beginning of the set are patches to implement storing B+tree leaves in an inode root, since the realtime rmapbt is rooted in an inode, unlike the regular rmapbt which is rooted in an AG block. Prior to this, the only btree that could be rooted in the inode fork was the block mapping btree; if all the extent records fit in the inode, format would be switched from 'btree' to 'extents'. The next few patches enhance the reverse mapping routines to handle the parts that are specific to rtgroups -- adding the new btree type, adding a new log intent item type, and wiring up the metadata directory tree entries. Finally, implement GETFSMAP with the rtrmapbt and scrub functionality for the rtrmapbt and rtbitmap and online fsck functionality. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>	2025-01-13 14:54:41 +01:00
Carlos Maiolino	8a092f440e	xfs: enable in-core block reservation for rt metadata [v6.2 03/14] In preparation for adding reverse mapping and refcounting to the realtime device, enhance the metadir code to reserve free space for btree shape changes as delayed allocation blocks. This enables us to pre-allocate space for the rmap and refcount btrees in the same manner as we do for the data device counterparts, which is how we avoid ENOSPC failures when space is low but we've already committed to a COW operation. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZ2nQ6gAKCRBKO3ySh0YR pluAAP9u4A74HNTi7Df4C70diZIpXynSnyzAxVirQzda/WlUzAD+OjZEo7VkLgG4 AJIqspcX632tFBFhBAhf3XzqWlM+zws= =hOw3 -----END PGP SIGNATURE----- Merge tag 'reserve-rt-metadata-space_2024-12-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into for-next xfs: enable in-core block reservation for rt metadata [v6.2 03/14] In preparation for adding reverse mapping and refcounting to the realtime device, enhance the metadir code to reserve free space for btree shape changes as delayed allocation blocks. This enables us to pre-allocate space for the rmap and refcount btrees in the same manner as we do for the data device counterparts, which is how we avoid ENOSPC failures when space is low but we've already committed to a COW operation. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>	2025-01-13 14:54:33 +01:00
Carlos Maiolino	9a2ce7254c	xfs: refactor btrees to support records in inode root [v6.2 02/14] Amend the btree code to support storing btree rcords in the inode root, because the current bmbt code does not support this. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZ2nQ6QAKCRBKO3ySh0YR piSBAP0dmPfWwKtdIx1AQUmfQ4FeNQbSk5PTKLuXIjyfUug9QwEAu7L616xtrgt6 6/9CBlIUNp3gJriZzLbCjS72F7hssgQ= =Jk+2 -----END PGP SIGNATURE----- Merge tag 'btree-ifork-records_2024-12-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into for-next xfs: refactor btrees to support records in inode root [v6.2 02/14] Amend the btree code to support storing btree rcords in the inode root, because the current bmbt code does not support this. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>	2025-01-13 14:54:23 +01:00
Carlos Maiolino	69bf6cd7f3	xfs: bug fixes for 6.13 [01/14] Bug fixes for 6.13. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZ2nQ6QAKCRBKO3ySh0YR pjVrAQD9ty5VSpobCR0HRKImR1zCvjreaTzDty+XXnCrYVka1AEA1NbC53RxAPoy bW5GJ0ZEvQhamkx9YuBfiJXoqYw1jgI= =ZmFb -----END PGP SIGNATURE----- Merge tag 'xfs-6.13-fixes_2024-12-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into for-next xfs: bug fixes for 6.13 [01/14] Bug fixes for 6.13. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>	2025-01-13 14:54:14 +01:00
Darrick J. Wong	111d36d627	xfs: lock dquot buffer before detaching dquot from b_li_list We have to lock the buffer before we can delete the dquot log item from the buffer's log item list. Cc: stable@vger.kernel.org # v6.13-rc3 Fixes: `acc8f8628c` ("xfs: attach dquot buffer to dquot log item buffer") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-10 10:12:48 +01:00
Christoph Hellwig	468210ec76	xfs: report larger dio alignment for COW inodes For I/O to reflinked blocks we always need to write an entire new file system block, and the code enforces the file system block alignment for the entire file if it has any reflinked blocks. Mirror the larger value reported in the statx in the dio_offset_align in the xfs-specific XFS_IOC_DIOINFO ioctl for the same reason. Don't bother adding a new field for the read alignment to this legacy ioctl as all new users should use statx instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250109083109.1441561-6-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-01-09 16:23:18 +01:00
Christoph Hellwig	7422bbd030	xfs: report the correct read/write dio alignment for reflinked inodes For I/O to reflinked blocks we always need to write an entire new file system block, and the code enforces the file system block alignment for the entire file if it has any reflinked blocks. Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read vs write alignments for reflinked files. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250109083109.1441561-5-hch@lst.de Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-01-09 16:23:18 +01:00
Christoph Hellwig	7e17483c7b	xfs: cleanup xfs_vn_getattr Split the two bits of optional statx reporting into their own helpers so that they are self-contained instead of deeply indented in the main getattr handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250109083109.1441561-4-hch@lst.de Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-01-09 16:23:17 +01:00
Christoph Hellwig	7ee7c9b39e	xfs: don't return an error from xfs_update_last_rtgroup_size for !XFS_RT Non-rtg file systems have a fake RT group even if they do not have a RT device, and thus an rgcount of 1. Ensure xfs_update_last_rtgroup_size doesn't fail when called for !XFS_RT to handle this case. Fixes: `87fe4c34a3` ("xfs: create incore realtime group structures") Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-01-08 15:04:40 +01:00
Darrick J. Wong	155debbe7e	xfs: enable realtime reflink Enable reflink for realtime devices, as long as the realtime allocation unit is a single fsblock. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:17 -08:00
Darrick J. Wong	fd97fe1112	xfs: fix CoW forks for realtime files Port the copy on write fork repair to realtime files. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:17 -08:00
Darrick J. Wong	12f4d20328	xfs: check for shared rt extents when rebuilding rt file's data fork When we're rebuilding the data fork of a realtime file, we need to cross-reference each mapping with the rt refcount btree to ensure that the reflink flag is set if there are any shared extents found. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:16 -08:00
Darrick J. Wong	92b2019493	xfs: repair inodes that have a refcount btree in the data fork Plumb knowledge of refcount btrees into the inode core repair code. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:16 -08:00
Darrick J. Wong	83ccffc489	xfs: online repair of the realtime refcount btree Port the data device's refcount btree repair code to the realtime refcount btree. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:16 -08:00
Darrick J. Wong	fe2efe9508	xfs: capture realtime CoW staging extents when rebuilding rt rmapbt Walk the realtime refcount btree to find the CoW staging extents when we're rebuilding the realtime rmap btree. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:16 -08:00
Darrick J. Wong	477493082f	xfs: walk the rt reference count tree when rebuilding rmap When we're rebuilding the data device rmap, if we encounter a "refcount" format fork, we have to walk the (realtime) refcount btree inode to build the appropriate mappings. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:16 -08:00
Darrick J. Wong	6470ceef32	xfs: check new rtbitmap records against rt refcount btree When we're rebuilding the realtime bitmap, check the proposed free extents against the rt refcount btree to make sure we don't commit any grievous errors. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:16 -08:00
Darrick J. Wong	cca34a3054	xfs: don't flag quota rt block usage on rtreflink filesystems Quota space usage is allowed to exceed the size of the physical storage when reflink is enabled. Now that we have reflink for the realtime volume, apply this same logic to the rtb repair logic. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:15 -08:00
Darrick J. Wong	ca757af07f	xfs: scrub the metadir path of rt refcount btree files Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata directory tree path to the refcount btree file for each rt group. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:15 -08:00
Darrick J. Wong	a9600db96f	xfs: detect and repair misaligned rtinherit directory cowextsize hints If we encounter a directory that has been configured to pass on a CoW extent size hint to a new realtime file and the hint isn't an integer multiple of the rt extent size, we should flag the hint for administrative review and/or turn it off because that is a misconfiguration. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:15 -08:00
Darrick J. Wong	48bc170f2c	xfs: allow dquot rt block count to exceed rt blocks on reflink fs Update the quota scrubber to allow dquots where the realtime block count exceeds the block count of the rt volume if reflink is enabled. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:15 -08:00
Darrick J. Wong	30f47950dc	xfs: check reference counts of gaps between rt refcount records If there's a gap between records in the rt refcount btree, we ought to cross-reference the gap with the rtrmap records to make sure that there aren't any overlapping records for a region that doesn't have any shared ownership. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:15 -08:00
Darrick J. Wong	2d9a3e9805	xfs: allow overlapping rtrmapbt records for shared data extents Allow overlapping realtime reverse mapping records if they both describe shared data extents and the fs supports reflink on the realtime volume. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:15 -08:00
Darrick J. Wong	91683bb3f2	xfs: cross-reference checks with the rt refcount btree Use the realtime refcount btree to implement cross-reference checks in other data structures. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:14 -08:00
Darrick J. Wong	c27929670d	xfs: scrub the realtime refcount btree Add code to scrub realtime refcount btrees. Similar to the refcount btree checking code for the data device, we walk the rmap btree for each refcount record to confirm that the reference counts are correct. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:14 -08:00
Darrick J. Wong	026c8ed8d4	xfs: report realtime refcount btree corruption errors to the health system Whenever we encounter corrupt realtime refcount btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:14 -08:00
Darrick J. Wong	88a70768df	xfs: check that the rtrefcount maxlevels doesn't increase when growing fs The size of filesystem transaction reservations depends on the maximum height (maxlevels) of the realtime btrees. Since we don't want a grow operation to increase the reservation size enough that we'll fail the minimum log size checks on the next mount, constrain growfs operations if they would cause an increase in the rt refcount btree maxlevels. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:14 -08:00
Darrick J. Wong	8e84e8052b	xfs: enable extent size hints for CoW operations Wire up the copy-on-write extent size hint for realtime files, and connect it to the rt allocator so that we avoid fragmentation on rt filesystems. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:14 -08:00
Darrick J. Wong	4de1a7ba41	xfs: apply rt extent alignment constraints to CoW extsize hint The copy-on-write extent size hint is subject to the same alignment constraints as the regular extent size hint. Since we're in the process of adding reflink (and therefore CoW) to the realtime device, we must apply the same scattered rextsize alignment validation strategies to both hints to deal with the possibility of rextsize changing. Therefore, fix the inode validator to perform rextsize alignment checks on regular realtime files, and to remove misaligned directory hints. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:14 -08:00
Darrick J. Wong	6853d23bad	xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files Currently, we (ab)use xfs_get_extsz_hint so that it always returns a nonzero value for realtime files. This apparently was done to disable delayed allocation for realtime files. However, once we enable realtime reflink, we can also turn on the alwayscow flag to force CoW writes to realtime files. In this case, the logic will incorrectly send the write through the delalloc write path. Fix this by adjusting the logic slightly. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:13 -08:00
Darrick J. Wong	51e2326749	xfs: recover CoW leftovers in the realtime volume Scan the realtime refcount tree at mount time to get rid of leftover CoW staging extents. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:13 -08:00
Darrick J. Wong	c3d3605f96	xfs: allow inodes to have the realtime and reflink flags Now that we can share blocks between realtime files, allow this combination. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:13 -08:00
Darrick J. Wong	5519251da0	xfs: enable sharing of realtime file blocks Update the remapping routines to be able to handle realtime files. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:13 -08:00
Darrick J. Wong	26e97d9b4b	xfs: enable CoW for realtime data Update our write paths to support copy on write on the rt volume. This works in more or less the same way as it does on the data device, with the major exception that we never do delalloc on the rt volume. Because we consider unwritten CoW fork staging extents to be incore quota reservation, we update xfs_quota_reserve_blkres to support this case. Though xfs doesn't allow rt and quota together, the change is trivial and we shouldn't leave a logic bomb here. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:13 -08:00
Darrick J. Wong	3639c63d46	xfs: refactor reflink quota updates Hoist all quota updates for reflink into a helper function, since things are about to become more complicated. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:13 -08:00
Darrick J. Wong	c2694ff678	xfs: compute rtrmap btree max levels when reflink enabled Compute the maximum possible height of the realtime rmap btree when reflink is enabled. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:12 -08:00
Darrick J. Wong	0bada82331	xfs: update rmap to allow cow staging extents in the rt rmap Don't error out on CoW staging extent records when realtime reflink is enabled. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:12 -08:00
Darrick J. Wong	4ee3113aaf	xfs: create routine to allocate and initialize a realtime refcount btree inode Create a library routine to allocate and initialize an empty realtime refcountbt inode. We'll use this for growfs, mkfs, and repair. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:12 -08:00
Darrick J. Wong	e5a171729b	xfs: wire up realtime refcount btree cursors Wire up realtime refcount btree cursors wherever they're needed throughout the code base. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:12 -08:00
Christoph Hellwig	4e87047539	xfs: refactor xfs_reflink_find_shared Move lookup of the perag structure from the callers into the helpers, and return the offset into the extent of the shared region instead of the block number that needs post-processing. This prepares the callsites for the creation of an rt-specific variant in the next patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> [djwong: port to the middle of the rtreflink series for cleanliness] Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>	2024-12-23 13:06:12 -08:00
Darrick J. Wong	f0415af60f	xfs: wire up a new metafile type for the realtime refcount Plumb in the pieces we need to embed the root of the realtime refcount btree in an inode's data fork, complete with metafile type and on-disk interpretation functions. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:12 -08:00
Darrick J. Wong	bf0b994113	xfs: add metadata reservations for realtime refcount btree Reserve some free blocks so that we will always have enough free blocks in the data volume to handle expansion of the realtime refcount btree. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:11 -08:00
Darrick J. Wong	eaed472c40	xfs: add realtime refcount btree inode to metadata directory Add a metadir path to select the realtime refcount btree inode and load it at mount time. The rtrefcountbt inode will have a unique extent format code, which means that we also have to update the inode validation and flush routines to look for it. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:11 -08:00
Darrick J. Wong	e08d0f2004	xfs: add realtime refcount btree block detection to log recovery Identify rt refcount btree blocks in the log correctly so that we can validate them during log recovery. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:11 -08:00
Darrick J. Wong	ee6d434479	xfs: support recovering refcount intent items targetting realtime extents Now that we have reflink on the realtime device, refcount intent items have to support remapping extents on the realtime volume. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:11 -08:00
Darrick J. Wong	fd9300679c	xfs: add a realtime flag to the refcount update log redo items Extend the refcount update (CUI) log items with a new realtime flag that indicates that the updates apply against the realtime refcountbt. We'll wire up the actual refcount code later. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:11 -08:00
Darrick J. Wong	01cef1db24	xfs: prepare refcount functions to deal with rtrefcountbt Prepare the high-level refcount functions to deal with the new realtime refcountbt and its slightly different conventions. Provide the ability to talk to either refcountbt or rtrefcountbt formats from the same high level code. Note that we leave the _recover_cow_leftovers functions for a separate patch so that we can convert it all at once. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:10 -08:00
Darrick J. Wong	1a6f88ea53	xfs: add realtime refcount btree operations Implement the generic btree operations needed to manipulate rtrefcount btree blocks. This is different from the regular refcountbt in that we allocate space from the filesystem at large, and are neither constrained to the free space nor any particular AG. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:10 -08:00
Darrick J. Wong	2003c6a875	xfs: realtime refcount btree transaction reservations Make sure that there's enough log reservation to handle mapping and unmapping realtime extents. We have to reserve enough space to handle a split in the rtrefcountbt to add the record and a second split in the regular refcountbt to record the rtrefcountbt split. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:10 -08:00
Darrick J. Wong	9abe03a0e4	xfs: introduce realtime refcount btree ondisk definitions Add the ondisk structure definitions for realtime refcount btrees. The realtime refcount btree will be rooted from a hidden inode so it needs to have a separate btree block magic and pointer format. Next, add everything needed to read, write and manipulate refcount btree blocks. This prepares the way for connecting the btree operations implementation, though the changes to actually root the rtrefcount btree in an inode come later. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:10 -08:00
Darrick J. Wong	70fcf68665	xfs: namespace the maximum length/refcount symbols Actually namespace these variables properly, so that readers can tell that this is an XFS symbol, and that it's for the refcount functionality. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:10 -08:00
Darrick J. Wong	0d89af530c	xfs: prepare refcount btree cursor tracepoints for realtime Rework the refcount btree cursor tracepoints in preparation to handle the realtime refcount btree cursor. Mostly this involves renaming the field to "refcbno" and extracting the group number from the cursor when possible. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:10 -08:00
Darrick J. Wong	c2358439af	xfs: enable realtime rmap btree Permit mounting filesystems with realtime rmap btrees. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:09 -08:00
Darrick J. Wong	799e7e6566	xfs: react to fsdax failure notifications on the rt device Now that we have reverse mapping for the realtime device, use the information to kill processes that have mappings to bad pmem. This requires refactoring the existing routines to handle rtgroups or AGs; and splitting out the translation function to improve cohesion. Also make a proper header file for the dax holder ops. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:09 -08:00
Darrick J. Wong	f4ed930379	xfs: don't shut down the filesystem for media failures beyond end of log If the filesystem has an external log device on pmem and the pmem reports a media error beyond the end of the log area, don't shut down the filesystem because we don't use that space. Cc: <stable@vger.kernel.org> # v6.0 Fixes: `6f643c57d5` ("xfs: implement ->notify_failure() for XFS") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:09 -08:00
Darrick J. Wong	9515572be6	xfs: hook live realtime rmap operations during a repair operation Hook the regular realtime rmap code when an rtrmapbt repair operation is running so that we can unlock the AGF buffer to scan the filesystem and keep the in-memory btree up to date during the scan. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:09 -08:00
Darrick J. Wong	4a61f12eb1	xfs: create a shadow rmap btree during realtime rmap repair Create an in-memory btree of rmap records instead of an array. This enables us to do live record collection instead of freezing the fs. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:09 -08:00
Darrick J. Wong	6a849bd81b	xfs: online repair of the realtime rmap btree Repair the realtime rmap btree while mounted. Similar to the regular rmap btree repair code, we walk the data fork mappings of every realtime file in the filesystem to collect reverse-mapping records in an xfarray. Then we sort the xfarray, and use the btree bulk loader to create a new rtrmap btree ondisk. Finally, we swap the btree roots, and reap the old blocks in the usual way. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:09 -08:00
Darrick J. Wong	c6904f6788	xfs: support repairing metadata btrees rooted in metadir inodes Adapt the repair code so that we can stage a new btree in the data fork area of a metadir inode and reap the old blocks. We already have nearly all of the infrastructure; the only parts that were missing were the metadata inode reservation handling. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:08 -08:00
Darrick J. Wong	8defee8dff	xfs: online repair of realtime bitmaps for a realtime group For a given rt group, regenerate the bitmap contents from the group's realtime rmap btree. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:08 -08:00
Darrick J. Wong	3dd3aba6b9	xfs: repair rmap btree inodes Teach the inode repair code how to deal with realtime rmap btree inodes that won't load properly. This is most likely moot since the filesystem generally won't mount without the rtrmapbt inodes being usable, but we'll add this for completeness. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:08 -08:00
Darrick J. Wong	1bd0843027	xfs: repair inodes that have realtime extents Plumb into the inode core repair code the ability to search for extents on realtime devices. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:08 -08:00
Darrick J. Wong	f1a6d9b4c3	xfs: online repair of realtime file bmaps Now that we have a reverse-mapping index of the realtime device, we can rebuild the data fork forward-mappings of any realtime file. Enhance the existing bmbt repair code to walk the rtrmap btrees to gather this information. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:08 -08:00
Darrick J. Wong	2e0629e17c	xfs: walk the rt reverse mapping tree when rebuilding rmap When we're rebuilding the data device rmap, if we encounter an "rmap" format fork, we have to walk the (realtime) rmap btree inode to build the appropriate mappings. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:08 -08:00
Darrick J. Wong	366243cc99	xfs: scrub the metadir path of rt rmap btree files Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata directory tree path to the rmap btree file for each rt group. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:07 -08:00
Darrick J. Wong	a5542712f9	xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings Teach the bmbt scrubber how to perform a comprehensive check that the rmapbt does not contain /any/ mappings that are not described by bmbt records when it's dealing with a realtime file. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:07 -08:00
Darrick J. Wong	037a44d827	xfs: cross-reference the realtime rmapbt Teach the data fork and realtime bitmap scrubbers to cross-reference information with the realtime rmap btree. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:07 -08:00
Darrick J. Wong	1ebecab5ad	xfs: cross-reference realtime bitmap to realtime rmapbt scrubber When we're checking the realtime rmap btree entries, cross-reference those entries with the realtime bitmap too. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:07 -08:00
Darrick J. Wong	9a6cc4f6d0	xfs: scrub the realtime rmapbt Check the realtime reverse mapping btree against the rtbitmap, and modify the rtbitmap scrub to check against the rtrmapbt. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:07 -08:00
Darrick J. Wong	428e488465	xfs: allow queued realtime intents to drain before scrubbing When a writer thread executes a chain of log intent items for the realtime volume, the ILOCKs taken during each step are for each rt metadata file, not the entire rt volume itself. Although scrub takes all rt metadata ILOCKs, this isn't sufficient to guard against scrub checking the rt volume while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding the realtime volume. When there's a collision, cross-referencing between data structures (e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the mount structure the same drain that we use to protect scrub against concurrent AG updates, but this time for the realtime volume. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:06 -08:00
Darrick J. Wong	6d4933c221	xfs: report realtime rmap btree corruption errors to the health system Whenever we encounter corrupt realtime rmap btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:06 -08:00
Darrick J. Wong	59a57acbce	xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs The size of filesystem transaction reservations depends on the maximum height (maxlevels) of the realtime btrees. Since we don't want a grow operation to increase the reservation size enough that we'll fail the minimum log size checks on the next mount, constrain growfs operations if they would cause an increase in those maxlevels. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:06 -08:00
Darrick J. Wong	b3683c74bf	xfs: wire up getfsmap to the realtime reverse mapping btree Connect the getfsmap ioctl to the realtime rmapbt. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:06 -08:00
Darrick J. Wong	71b8acb42b	xfs: create routine to allocate and initialize a realtime rmap btree inode Create a library routine to allocate and initialize an empty realtime rmapbt inode. We'll use this for mkfs and repair. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:06 -08:00
Darrick J. Wong	609a592865	xfs: wire up rmap map and unmap to the realtime rmapbt Connect the map and unmap reverse-mapping operations to the realtime rmapbt via the deferred operation callbacks. This enables us to perform rmap operations against the correct btree. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:06 -08:00
Darrick J. Wong	f33659e8a1	xfs: wire up a new metafile type for the realtime rmap Plumb in the pieces we need to embed the root of the realtime rmap btree in an inode's data fork, complete with new metafile type and on-disk interpretation functions. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:05 -08:00
Darrick J. Wong	8491a55cfc	xfs: add metadata reservations for realtime rmap btrees Reserve some free blocks so that we will always have enough free blocks in the data volume to handle expansion of the realtime rmap btree. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:05 -08:00
Darrick J. Wong	6b08901a6e	xfs: add realtime reverse map inode to metadata directory Add a metadir path to select the realtime rmap btree inode and load it at mount time. The rtrmapbt inode will have a unique extent format code, which means that we also have to update the inode validation and flush routines to look for it. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:05 -08:00
Darrick J. Wong	702c90f451	xfs: support file data forks containing metadata btrees Create a new fork format type for metadata btrees. This fork type requires that the inode is in the metadata directory tree, and only applies to the data fork. The actual type of the metadata btree itself is determined by the di_metatype field. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:05 -08:00
Darrick J. Wong	219ee99d36	xfs: pretty print metadata file types in error messages Create a helper function to turn a metadata file type code into a printable string, and use this to complain about lockdep problems with rtgroup inodes. We'll use this more in the next patch. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:05 -08:00
Darrick J. Wong	5e0679d1c6	xfs: support recovering rmap intent items targetting realtime extents Now that we have rmap on the realtime device and rmap intent items that target the realtime device, log recovery has to support remapping extents on the realtime volume. Make this work. Identify rtrmapbt blocks in the log correctly so that we can validate them during log recovery. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:05 -08:00
Darrick J. Wong	9e823fc274	xfs: add a realtime flag to the rmap update log redo items Extend the rmap update (RUI) log items to handle realtime volumes by adding a new log intent item type. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:04 -08:00
Darrick J. Wong	adafb31c80	xfs: prepare rmap functions to deal with rtrmapbt Prepare the high-level rmap functions to deal with the new realtime rmapbt and its slightly different conventions. Provide the ability to talk to either rmapbt or rtrmapbt formats from the same high level code. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:04 -08:00
Darrick J. Wong	d386b40243	xfs: add realtime rmap btree operations Implement the generic btree operations needed to manipulate rtrmap btree blocks. This is different from the regular rmapbt in that we allocate space from the filesystem at large, and are neither constrained to the free space nor any particular AG. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:04 -08:00
Darrick J. Wong	e1c76fce50	xfs: realtime rmap btree transaction reservations Make sure that there's enough log reservation to handle mapping and unmapping realtime extents. We have to reserve enough space to handle a split in the rtrmapbt to add the record and a second split in the regular rmapbt to record the rtrmapbt split. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:04 -08:00
Darrick J. Wong	fc6856c6ff	xfs: introduce realtime rmap btree ondisk definitions Add the ondisk structure definitions for realtime rmap btrees. The realtime rmap btree will be rooted from a hidden inode so it needs to have a separate btree block magic and pointer format. Next, add everything needed to read, write and manipulate rmap btree blocks. This prepares the way for connecting the btree operations implementation, though embedding the rtrmap btree root in the inode comes later in the series. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:04 -08:00
Darrick J. Wong	05290bd5c6	xfs: allow inode-based btrees to reserve space in the data device Create a new space reservation scheme so that btree metadata for the realtime volume can reserve space in the data device to avoid space underruns. Back when we were testing the rmap and refcount btrees for the data device, people observed occasional shutdowns when xfs_btree_split was called for either of those two btrees. This happened when certain operations (mostly writeback ioends) created new rmap or refcount records, which would expand the size of the btree. If there were no free blocks available the allocation would fail and the split would shut down the filesystem. I considered pre-reserving blocks for btree expansion at the time of a write() call, but there wasn't any good way to attach the reservations to an inode and keep them there all the way to ioend processing. Unlike delalloc reservations which have that indlen mechanism, there's no way to do that for mapped extents; and indlen blocks are given back during the delalloc -> unwritten transition. The solution was to reserve sufficient blocks for rmap/refcount btree expansion at mount time. This is what the XFS_AG_RESV_* flags provide; any expansion of those two btrees can come from the pre-reserved space. This patch brings that pre-reservation ability to inode-rooted btrees so that the rt rmap and refcount btrees can also save room for future expansion. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:03 -08:00
Darrick J. Wong	2f63b20b7a	xfs: support storing records in the inode core root Add the necessary flags and code so that we can support storing leaf records in the inode root block of a btree. This hasn't been necessary before, but the realtime rmapbt will need to be able to do this. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:03 -08:00
Darrick J. Wong	953f76bf7a	xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions Simplify the calling conventions by allowing callers to pass a fsbno (xfs_fsblock_t) directly into these functions, since we're just going to set it in a struct anyway. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:03 -08:00
Darrick J. Wong	84140a96cf	xfs: prepare to reuse the dquot pointer space in struct xfs_inode Files participating in the metadata directory tree are not accounted to the quota subsystem. Therefore, the i_[ugp]dquot pointers in struct xfs_inode are never used and should always be NULL. In the next patch we want to add a u64 count of fs blocks reserved for metadata btree expansion, but we don't want every inode in the fs to pay the memory price for this feature. The intent is to union those three pointers with the u64 counter, but for that to work we must guard against all access to the dquot pointers for metadata files. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:03 -08:00
Darrick J. Wong	d415fb34b4	xfs: prepare rmap btree cursor tracepoints for realtime Rework the rmap btree cursor tracepoints in preparation to handle the realtime rmap btree cursor. Mostly this involves renaming the field to "gbno" and extracting the group number from the cursor. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:03 -08:00
Darrick J. Wong	af32541081	xfs: add some rtgroup inode helpers Create some simple helpers to reduce the amount of typing whenever we access rtgroup inodes. Conversion was done with this spatch and some minor reformatting: @@ expression rtg; @@ - rtg->rtg_inodes[XFS_RTGI_BITMAP] + rtg_bitmap(rtg) @@ expression rtg; @@ - rtg->rtg_inodes[XFS_RTGI_SUMMARY] + rtg_summary(rtg) and the CLI command: $ spatch --sp-file /tmp/moo.cocci --dir fs/xfs/ --use-gitgrep --in-place Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:03 -08:00
Darrick J. Wong	505248719f	xfs: hoist the node iroot update code out of xfs_btree_kill_iroot In preparation for allowing records in an inode btree root, hoist the code that copies keyptrs from an existing node child into the root block to a separate function. Remove some unnecessary conditionals and clean up a few function calls in the new function. Note that this change reorders the ->free_block call with respect to the change in bc_nlevels to make it easier to support inode root leaf blocks in the next patch. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:02 -08:00
Darrick J. Wong	7708951ae5	xfs: hoist the node iroot update code out of xfs_btree_new_iroot In preparation for allowing records in an inode btree root, hoist the code that copies keyptrs from an existing node root into a child block to a separate function. Note that the new function explicitly computes the keys of the new child block and stores that in the root block; while the bmap btree could rely on leaving the key alone, realtime rmap needs to set the new high key. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:02 -08:00
Darrick J. Wong	c914081775	xfs: tidy up xfs_bmap_broot_realloc a bit Hoist out the code that migrates broot pointers during a resize operation to avoid code duplication and streamline the caller. Also use the correct bmbt pointer type for the sizeof operation. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:02 -08:00
Darrick J. Wong	eb9bff2231	xfs: make xfs_iroot_realloc a bmap btree function Move the inode fork btree root reallocation function part of the btree ops because it's now mostly bmbt-specific code. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:02 -08:00
Darrick J. Wong	6a92924275	xfs: make xfs_iroot_realloc take the new numrecs instead of deltas Change the calling signature of xfs_iroot_realloc to take the ifork and the new number of records in the btree block, not a diff against the current number. This will make the callsites easier to understand. Note that this function is misnamed because it is very specific to the single type of inode-rooted btree supported. This will be addressed in a subsequent patch. Return the new btree root to reduce the amount of code clutter. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:02 -08:00
Darrick J. Wong	6c1c55ac3c	xfs: refactor the inode fork memory allocation functions Hoist the code that allocates, frees, and reallocates if_broot into a single xfs_iroot_krealloc function. Eventually we're going to push xfs_iroot_realloc into the btree ops structure to handle multiple inode-rooted btrees, but first let's separate out the bits that should stay in xfs_inode_fork.c. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:02 -08:00
Darrick J. Wong	1aacd3fac2	xfs: release the dquot buf outside of qli_lock Lai Yi reported a lockdep complaint about circular locking: Chain exists of: &lp->qli_lock --> &bch->bc_lock --> &l->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&l->lock); lock(&bch->bc_lock); lock(&l->lock); lock(&lp->qli_lock); I /think/ the problem here is that xfs_dquot_attach_buf during quotacheck will release the buffer while it's holding the qli_lock. Because this is a cached buffer, xfs_buf_rele_cached takes b_lock before decrementing b_hold. Other threads have taught lockdep that a locking dependency chain is bp->b_lock -> bch->bc_lock -> l(ru)->lock; and that another chain is l(ru)->lock -> lp->qli_lock. Hence we do not want to take b_lock while holding qli_lock. Reported-by: syzbot+3126ab3db03db42e7a31@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> # v6.13-rc3 Fixes: `ca378189fd` ("xfs: convert quotacheck to attach dquot buffers") Tested-by: syzbot+3126ab3db03db42e7a31@syzkaller.appspotmail.com Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:01 -08:00
Darrick J. Wong	4f13f0a3fc	xfs: tidy up xfs_iroot_realloc Tidy up this function a bit before we start refactoring the memory handling and move the function to the bmbt code. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:01 -08:00
Darrick J. Wong	4b8d867ca6	xfs: don't over-report free space or inodes in statvfs Emmanual Florac reports a strange occurrence when project quota limits are enabled, free space is lower than the remaining quota, and someone runs statvfs: # mkfs.xfs -f /dev/sda # mount /dev/sda /mnt -o prjquota # xfs_quota -x -c 'limit -p bhard=2G 55' /mnt # mkdir /mnt/dir # xfs_io -c 'chproj 55' -c 'chattr +P' -c 'stat -vvvv' /mnt/dir # fallocate -l 19g /mnt/a # df /mnt /mnt/dir Filesystem Size Used Avail Use% Mounted on /dev/sda 20G 20G 345M 99% /mnt /dev/sda 2.0G 0 2.0G 0% /mnt I think the bug here is that xfs_fill_statvfs_from_dquot unconditionally assigns to f_bfree without checking that the filesystem has enough free space to fill the remaining project quota. However, this is a longstanding behavior of xfs so it's unclear what to do here. Cc: <stable@vger.kernel.org> # v2.6.18 Fixes: `932f2c3231` ("[XFS] statvfs component of directory/project quota support, code originally by Glen.") Reported-by: Emmanuel Florac <eflorac@intellique.com> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-23 13:06:01 -08:00
Darrick J. Wong	12f2930f5f	xfs: port xfs_ioc_start_commit to multigrain timestamps Take advantage of the multigrain timestamp APIs to ensure that nobody can sneak in and write things to a file between starting a file update operation and committing the results. This should have been part of the multigrain timestamp merge, but I forgot to fling it at jlayton when he resubmitted the patchset due to developer bandwidth problems. Cc: <stable@vger.kernel.org> # v6.13-rc1 Fixes: `4e40eff0b5` ("fs: add infrastructure for multigrain timestamps") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2024-12-12 17:45:13 -08:00
Darrick J. Wong	7f8b718c58	xfs: return from xfs_symlink_verify early on V4 filesystems V4 symlink blocks didn't have headers, so return early if this is a V4 filesystem. Cc: <stable@vger.kernel.org> # v5.1 Fixes: `39708c20ab` ("xfs: miscellaneous verifier magic value fixups") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:13 -08:00
Darrick J. Wong	c004a793e0	xfs: fix zero byte checking in the superblock scrubber The logic to check that the region past the end of the superblock is all zeroes is wrong -- we don't want to check only the bytes past the end of the maximally sized ondisk superblock structure as currently defined in xfs_format.h; we want to check the bytes beyond the end of the ondisk as defined by the feature bits. Port the superblock size logic from xfs_repair and then put it to use in xfs_scrub. Cc: <stable@vger.kernel.org> # v4.15 Fixes: `21fb4cb198` ("xfs: scrub the secondary superblocks") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:13 -08:00
Darrick J. Wong	06b20ef09b	xfs: check pre-metadir fields correctly The checks that were added to the superblock scrubber for metadata directories aren't quite right -- the old inode pointers are now defined to be zeroes until someone else reuses them. Also consolidate the new metadir field checks to one place; they were inexplicably scattered around. Cc: <stable@vger.kernel.org> # v6.13-rc1 Fixes: `28d756d4d5` ("xfs: update sb field checks when metadir is turned on") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:12 -08:00
Darrick J. Wong	e57e083be9	xfs: don't crash on corrupt /quotas dirent If the /quotas dirent points to an inode but the inode isn't loadable (and hence mkdir returns -EEXIST), don't crash, just bail out. Cc: <stable@vger.kernel.org> # v6.13-rc1 Fixes: `e80fbe1ad8` ("xfs: use metadir for quota inodes") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:12 -08:00
Darrick J. Wong	3853b5e1d7	xfs: don't move nondir/nonreg temporary repair files to the metadir namespace Only directories or regular files are allowed in the metadata directory tree. Don't move the repair tempfile to the metadir namespace if this is not true; this will cause the inode verifiers to trip. xrep_tempfile_adjust_directory_tree opportunistically moves sc->tempip from the regular directory tree to the metadata directory tree if sc->ip is part of the metadata directory tree. However, the scrub setup functions grab sc->ip and create sc->tempip before we actually get around to checking if the file mode is the right type for the scrubber. IOWs, you can invoke the symlink scrubber with the file handle of a subdirectory in the metadir. xrep_setup_symlink will create a temporary symlink file, xrep_tempfile_adjust_directory_tree will foolishly try to set the METADATA flag on the temp symlink, which trips the inode verifier in the inode item precommit, which shuts down the filesystem when expensive checks are turned on. If they're /not/ turned on, then xchk_symlink will return ENOENT when it sees that it's been passed a symlink, but the invalid inode could still get flushed to disk. We don't want that. Cc: <stable@vger.kernel.org> # v6.13-rc1 Fixes: `9dc31acb01` ("xfs: move repair temporary files to the metadata directory tree") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:12 -08:00
Darrick J. Wong	7f8a44f372	xfs: fix sb_spino_align checks for large fsblock sizes For a sparse inodes filesystem, mkfs.xfs computes the values of sb_spino_align and sb_inoalignmt with the following code: int cluster_size = XFS_INODE_BIG_CLUSTER_SIZE; if (cfg->sb_feat.crcs_enabled) cluster_size = cfg->inodesize / XFS_DINODE_MIN_SIZE; sbp->sb_spino_align = cluster_size >> cfg->blocklog; sbp->sb_inoalignmt = XFS_INODES_PER_CHUNK cfg->inodesize >> cfg->blocklog; On a V5 filesystem with 64k fsblocks and 512 byte inodes, this results in cluster_size = 8192 * (512 / 256) = 16384. As a result, sb_spino_align and sb_inoalignmt are both set to zero. Unfortunately, this trips the new sb_spino_align check that was just added to xfs_validate_sb_common, and the mkfs fails: # mkfs.xfs -f -b size=64k, /dev/sda meta-data=/dev/sda isize=512 agcount=4, agsize=81136 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 = exchange=0 metadir=0 data = bsize=65536 blocks=324544, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=65536 ascii-ci=0, ftype=1, parent=0 log =internal log bsize=65536 blocks=5006, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=65536 blocks=0, rtextents=0 = rgcount=0 rgsize=0 extents Discarding blocks...Sparse inode alignment (0) is invalid. Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200 libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1 mkfs.xfs: Releasing dirty buffer to free list! found dirty buffer (bulk) on free list! Sparse inode alignment (0) is invalid. Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200 libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1 mkfs.xfs: writing AG headers failed, err=22 Prior to commit `59e43f5479` this all worked fine, even if "sparse" inodes are somewhat meaningless when everything fits in a single fsblock. Adjust the checks to handle existing filesystems. Cc: <stable@vger.kernel.org> # v6.13-rc1 Fixes: `59e43f5479` ("xfs: sb_spino_align is not verified") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:12 -08:00
Darrick J. Wong	ca378189fd	xfs: convert quotacheck to attach dquot buffers Now that we've converted the dquot logging machinery to attach the dquot buffer to the li_buf pointer so that the AIL dqflush doesn't have to allocate or read buffers in a reclaim path, do the same for the quotacheck code so that the reclaim shrinker dqflush call doesn't have to do that either. Cc: <stable@vger.kernel.org> # v6.12 Fixes: `903edea6c5` ("mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:12 -08:00
Darrick J. Wong	acc8f8628c	xfs: attach dquot buffer to dquot log item buffer Ever since 6.12-rc1, I've observed a pile of warnings from the kernel when running fstests with quotas enabled: WARNING: CPU: 1 PID: 458580 at mm/page_alloc.c:4221 __alloc_pages_noprof+0xc9c/0xf18 CPU: 1 UID: 0 PID: 458580 Comm: xfsaild/sda3 Tainted: G W 6.12.0-rc6-djwa #rc6 6ee3e0e531f6457e2d26aa008a3b65ff184b377c <snip> Call trace: __alloc_pages_noprof+0xc9c/0xf18 alloc_pages_mpol_noprof+0x94/0x240 alloc_pages_noprof+0x68/0xf8 new_slab+0x3e0/0x568 ___slab_alloc+0x5a0/0xb88 __slab_alloc.constprop.0+0x7c/0xf8 __kmalloc_noprof+0x404/0x4d0 xfs_buf_get_map+0x594/0xde0 [xfs 384cb02810558b4c490343c164e9407332118f88] xfs_buf_read_map+0x64/0x2e0 [xfs 384cb02810558b4c490343c164e9407332118f88] xfs_trans_read_buf_map+0x1dc/0x518 [xfs 384cb02810558b4c490343c164e9407332118f88] xfs_qm_dqflush+0xac/0x468 [xfs 384cb02810558b4c490343c164e9407332118f88] xfs_qm_dquot_logitem_push+0xe4/0x148 [xfs 384cb02810558b4c490343c164e9407332118f88] xfsaild+0x3f4/0xde8 [xfs 384cb02810558b4c490343c164e9407332118f88] kthread+0x110/0x128 ret_from_fork+0x10/0x20 ---[ end trace 0000000000000000 ]--- This corresponds to the line: WARN_ON_ONCE(current->flags & PF_MEMALLOC); within the NOFAIL checks. What's happening here is that the XFS AIL is trying to write a disk quota update back into the filesystem, but for that it needs to read the ondisk buffer for the dquot. The buffer is not in memory anymore, probably because it was evicted. Regardless, the buffer cache tries to allocate a new buffer, but those allocations are NOFAIL. The AIL thread has marked itself PF_MEMALLOC (aka noreclaim) since commit `43ff2122e6` ("xfs: on-stack delayed write buffer lists") presumably because reclaim can push on XFS to push on the AIL. An easy way to fix this probably would have been to drop the NOFAIL flag from the xfs_buf allocation and open code a retry loop, but then there's still the problem that for bs>ps filesystems, the buffer itself could require up to 64k worth of pages. Inode items had similar behavior (multi-page cluster buffers that we don't want to allocate in the AIL) which we solved by making transaction precommit attach the inode cluster buffers to the dirty log item. Let's solve the dquot problem in the same way. So: Make a real precommit handler to read the dquot buffer and attach it to the log item; pass it to dqflush in the push method; and have the iodone function detach the buffer once we've flushed everything. Add a state flag to the log item to track when a thread has entered the precommit -> push mechanism to skip the detaching if it turns out that the dquot is very busy, as we don't hold the dquot lock between log item commit and AIL push). Reading and attaching the dquot buffer in the precommit hook is inspired by the work done for inode cluster buffers some time ago. Cc: <stable@vger.kernel.org> # v6.12 Fixes: `903edea6c5` ("mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:11 -08:00
Darrick J. Wong	ec88b41b93	xfs: clean up log item accesses in xfs_qm_dqflush{,_done} Clean up these functions a little bit before we move on to the real modifications, and make the variable naming consistent for dquot log items. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:11 -08:00
Darrick J. Wong	a40fe30868	xfs: separate dquot buffer reads from xfs_dqflush The first step towards holding the dquot buffer in the li_buf instead of reading it in the AIL is to separate the part that reads the buffer from the actual flush code. There should be no functional changes. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:11 -08:00
Darrick J. Wong	07137e925f	xfs: don't lose solo dquot update transactions Quota counter updates are tracked via incore objects which hang off the xfs_trans object. These changes are then turned into dirty log items in xfs_trans_apply_dquot_deltas just prior to commiting the log items to the CIL. However, updating the incore deltas do not cause XFS_TRANS_DIRTY to be set on the transaction. In other words, a pure quota counter update will be silently discarded if there are no other dirty log items attached to the transaction. This is currently not the case anywhere in the filesystem because quota updates always dirty at least one other metadata item, but a subsequent bug fix will add dquot log item precommits, so we actually need a dirty dquot log item prior to xfs_trans_run_precommits. Also let's not leave a logic bomb. Cc: <stable@vger.kernel.org> # v2.6.35 Fixes: `0924378a68` ("xfs: split out iclog writing from xfs_trans_commit()") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:11 -08:00
Darrick J. Wong	3762113b59	xfs: don't lose solo superblock counter update transactions Superblock counter updates are tracked via per-transaction counters in the xfs_trans object. These changes are then turned into dirty log items in xfs_trans_apply_sb_deltas just prior to commiting the log items to the CIL. However, updating the per-transaction counter deltas do not cause XFS_TRANS_DIRTY to be set on the transaction. In other words, a pure sb counter update will be silently discarded if there are no other dirty log items attached to the transaction. This is currently not the case anywhere in the filesystem because sb counter updates always dirty at least one other metadata item, but let's not leave a logic bomb. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:11 -08:00
Darrick J. Wong	a004afdc62	xfs: avoid nested calls to __xfs_trans_commit Currently, __xfs_trans_commit calls xfs_defer_finish_noroll, which calls __xfs_trans_commit again on the same transaction. In other words, there's a nested function call (albeit with slightly different arguments) that has caused minor amounts of confusion in the past. There's no reason to keep this around, since there's only one place where we actually want the xfs_defer_finish_noroll, and that is in the top level xfs_trans_commit call. This also reduces stack usage a little bit. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:11 -08:00
Darrick J. Wong	44d9b07e52	xfs: only run precommits once per transaction object Committing a transaction tx0 with a defer ops chain of (A, B, C) creates a chain of transactions that looks like this: tx0 -> txA -> txB -> txC Prior to commit `cb04211748`, __xfs_trans_commit would run precommits on tx0, then call xfs_defer_finish_noroll to convert A-C to tx[A-C]. Unfortunately, after the finish_noroll loop we forgot to run precommits on txC. That was fixed by adding the second precommit call. Unfortunately, none of us remembered that xfs_defer_finish_noroll calls __xfs_trans_commit a second time to commit tx0 before finishing work A in txA and committing that. In other words, we run precommits twice on tx0: xfs_trans_commit(tx0) __xfs_trans_commit(tx0, false) xfs_trans_run_precommits(tx0) xfs_defer_finish_noroll(tx0) xfs_trans_roll(tx0) txA = xfs_trans_dup(tx0) __xfs_trans_commit(tx0, true) xfs_trans_run_precommits(tx0) This currently isn't an issue because the inode item precommit is idempotent; the iunlink item precommit deletes itself so it can't be called again; and the buffer/dquot item precommits only check the incore objects for corruption. However, it doesn't make sense to run precommits twice. Fix this situation by only running precommits after finish_noroll. Cc: <stable@vger.kernel.org> # v6.4 Fixes: `cb04211748` ("xfs: defered work could create precommits") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:10 -08:00
Darrick J. Wong	53b001a21c	xfs: unlock inodes when erroring out of xfs_trans_alloc_dir Debugging a filesystem patch with generic/475 caused the system to hang after observing the following sequences in dmesg: XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x61/0xe0 [xfs]" at daddr 0x491520 len 32 error 5 XFS (dm-0): metadata I/O error in "xfs_btree_read_buf_block+0xba/0x160 [xfs]" at daddr 0x3445608 len 8 error 5 XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x61/0xe0 [xfs]" at daddr 0x138e1c0 len 32 error 5 XFS (dm-0): log I/O error -5 XFS (dm-0): Metadata I/O Error (0x1) detected at xfs_trans_read_buf_map+0x1ea/0x4b0 [xfs] (fs/xfs/xfs_trans_buf.c:311). Shutting down filesystem. XFS (dm-0): Please unmount the filesystem and rectify the problem(s) XFS (dm-0): Internal error dqp->q_ino.reserved < dqp->q_ino.count at line 869 of file fs/xfs/xfs_trans_dquot.c. Caller xfs_trans_dqresv+0x236/0x440 [xfs] XFS (dm-0): Corruption detected. Unmount and run xfs_repair XFS (dm-0): Unmounting Filesystem be6bcbcc-9921-4deb-8d16-7cc94e335fa7 The system is stuck in unmount trying to lock a couple of inodes so that they can be purged. The dquot corruption notice above is a clue to what happened -- a link() call tried to set up a transaction to link a child into a directory. Quota reservation for the transaction failed after IO errors shut down the filesystem, but then we forgot to unlock the inodes on our way out. Fix that. Cc: <stable@vger.kernel.org> # v6.10 Fixes: `bd5562111d` ("xfs: Hold inode locks in xfs_trans_alloc_dir") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:10 -08:00
Darrick J. Wong	ffc3ea4f3c	xfs: fix scrub tracepoints when inode-rooted btrees are involved Fix a minor mistakes in the scrub tracepoints that can manifest when inode-rooted btrees are enabled. The existing code worked fine for bmap btrees, but we should tighten the code up to be less sloppy. Cc: <stable@vger.kernel.org> # v5.7 Fixes: `92219c292a` ("xfs: convert btree cursor inode-private member names") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:10 -08:00
Darrick J. Wong	6d7b4bc1c3	xfs: update btree keys correctly when _insrec splits an inode root block In commit `2c813ad66a`, I partially fixed a bug wherein xfs_btree_insrec would erroneously try to update the parent's key for a block that had been split if we decided to insert the new record into the new block. The solution was to detect this situation and update the in-core key value that we pass up to the caller so that the caller will (eventually) add the new block to the parent level of the tree with the correct key. However, I missed a subtlety about the way inode-rooted btrees work. If the full block was a maximally sized inode root block, we'll solve that fullness by moving the root block's records to a new block, resizing the root block, and updating the root to point to the new block. We don't pass a pointer to the new block to the caller because that work has already been done. The new record will /always/ land in the new block, so in this case we need to use xfs_btree_update_keys to update the keys. This bug can theoretically manifest itself in the very rare case that we split a bmbt root block and the new record lands in the very first slot of the new block, though I've never managed to trigger it in practice. However, it is very easy to reproduce by running generic/522 with the realtime rmapbt patchset if rtinherit=1. Cc: <stable@vger.kernel.org> # v4.8 Fixes: `2c813ad66a` ("xfs: support btrees with overlapping intervals for keys") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:10 -08:00
Darrick J. Wong	23bee6f390	xfs: fix error bailout in xfs_rtginode_create smatch reported that we screwed up the error cleanup in this function. Fix it. Cc: <stable@vger.kernel.org> # v6.13-rc1 Fixes: `ae897e0bed` ("xfs: support creating per-RTG files in growfs") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:10 -08:00
Darrick J. Wong	af9f02457f	xfs: fix null bno_hint handling in xfs_rtallocate_rtg xfs_bmap_rtalloc initializes the bno_hint variable to NULLRTBLOCK (aka NULLFSBLOCK). If the allocation request is for a file range that's adjacent to an existing mapping, it will then change bno_hint to the blkno hint in the bmalloca structure. In other words, bno_hint is either a rt block number, or it's all 1s. Unfortunately, commit `ec12f97f1b` didn't take the NULLRTBLOCK state into account, which means that it tries to translate that into a realtime extent number. We then end up with an obnoxiously high rtx number and pointlessly feed that to the near allocator. This often fails and falls back to the by-size allocator. Seeing as we had no locality hint anyway, this is a waste of time. Fix the code to detect a lack of bno_hint correctly. This was detected by running xfs/009 with metadir enabled and a 28k rt extent size. Cc: <stable@vger.kernel.org> # v6.12 Fixes: `ec12f97f1b` ("xfs: make the rtalloc start hint a xfs_rtblock_t") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:09 -08:00
Darrick J. Wong	dc5a052739	xfs: mark metadir repair tempfiles with IRECOVERY Once in a long while, xfs/566 and xfs/801 report directory corruption in one of the metadata subdirectories while it's forcibly rebuilding all filesystem metadata. I observed the following sequence of events: 1. Initiate a repair of the parent pointers for the /quota/user file. This is the secret file containing user quota data. 2. The pptr repair thread creates a temporary file and begins staging parent pointers in the ondisk metadata in preparation for an exchange-range to commit the new pptr data. 3. At the same time, initiate a repair of the /quota directory itself. 4. The dir repair thread finds the temporary file from (2), scans it for parent pointers, and stages a dirent in its own temporary dir in preparation to commit the fixed directory. 5. The parent pointer repair completes and frees the temporary file. 6. The dir repair commits the new directory and scans it again. It finds the dirent that points to the old temporary file in (2) and marks the directory corrupt. Oops! Repair code must never scan the temporary files that other repair functions create to stage new metadata. They're not supposed to do that, but the predicate function xrep_is_tempfile is incorrect because it assumes that any XFS_DIFLAG2_METADATA file cannot ever be a temporary file, but xrep_tempfile_adjust_directory_tree creates exactly that. Fix this by setting the IRECOVERY flag on temporary metadata directory inodes and using that to correct the predicate. Repair code is supposed to erase all the data in temporary files before releasing them, so it's ok if a thread scans the temporary file after we drop IRECOVERY. Cc: <stable@vger.kernel.org> # v6.13-rc1 Fixes: `bb6cdd5529` ("xfs: hide metadata inodes from everyone because they are special") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:09 -08:00
Darrick J. Wong	6f4669708a	xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink If we need to reset a symlink target to the "durr it's busted" string, then we clear the zapped flag as well. However, this should be using the provided helper so that we don't set the zapped state on an otherwise ok symlink. Cc: <stable@vger.kernel.org> # v6.10 Fixes: `2651923d8d` ("xfs: online repair of symbolic links") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:09 -08:00
Darrick J. Wong	aa7bfb537e	xfs: separate healthy clearing mask during repair In commit `d9041681dd` we introduced some XFS_SICK_*ZAPPED flags so that the inode record repair code could clean up a damaged inode record enough to iget the inode but still be able to remember that the higher level repair code needs to be called. As part of that, we introduced a xchk_mark_healthy_if_clean helper that is supposed to cause the ZAPPED state to be removed if that higher level metadata actually checks out. This was done by setting additional bits in sick_mask hoping that xchk_update_health will clear all those bits after a healthy scrub. Unfortunately, that's not quite what sick_mask means -- bits in that mask are indeed cleared if the metadata is healthy, but they're set if the metadata is NOT healthy. fsck is only intended to set the ZAPPED bits explicitly. If something else sets the CORRUPT/XCORRUPT state after the xchk_mark_healthy_if_clean call, we end up marking the metadata zapped. This can happen if the following sequence happens: 1. Scrub runs, discovers that the metadata is fine but could be optimized and calls xchk_mark_healthy_if_clean on a ZAPPED flag. That causes the ZAPPED flag to be set in sick_mask because the metadata is not CORRUPT or XCORRUPT. 2. Repair runs to optimize the metadata. 3. Some other metadata used for cross-referencing in (1) becomes corrupt. 4. Post-repair scrub runs, but this time it sets CORRUPT or XCORRUPT due to the events in (3). 5. Now the xchk_health_update sets the ZAPPED flag on the metadata we just repaired. This is not the correct state. Fix this by moving the "if healthy" mask to a separate field, and only ever using it to clear the sick state. Cc: <stable@vger.kernel.org> # v6.8 Fixes: `d9041681dd` ("xfs: set inode sick state flags when we zap either ondisk fork") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:09 -08:00
Darrick J. Wong	7ce31f20a0	xfs: don't drop errno values when we fail to ficlone the entire range Way back when we first implemented FICLONE for XFS, life was simple -- either the the entire remapping completed, or something happened and we had to return an errno explaining what happened. Neither of those ioctls support returning partial results, so it's all or nothing. Then things got complicated when copy_file_range came along, because it actually can return the number of bytes copied, so commit `3f68c1f562` tried to make it so that we could return a partial result if the REMAP_FILE_CAN_SHORTEN flag is set. This is also how FIDEDUPERANGE can indicate that the kernel performed a partial deduplication. Unfortunately, the logic is wrong if an error stops the remapping and CAN_SHORTEN is not set. Because those callers cannot return partial results, it is an error for ->remap_file_range to return a positive quantity that is less than the @len passed in. Implementations really should be returning a negative errno in this case, because that's what btrfs (which introduced FICLONE{,RANGE}) did. Therefore, ->remap_range implementations cannot silently drop an errno that they might have when the number of bytes remapped is less than the number of bytes requested and CAN_SHORTEN is not set. Found by running generic/562 on a 64k fsblock filesystem and wondering why it reported corrupt files. Cc: <stable@vger.kernel.org> # v4.20 Fixes: `3fc9f5e409` ("xfs: remove xfs_reflink_remap_range") Really-Fixes: `3f68c1f562` ("xfs: support returning partial reflink results") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:09 -08:00
Darrick J. Wong	bd27c7bcdc	xfs: return a 64-bit block count from xfs_btree_count_blocks With the nrext64 feature enabled, it's possible for a data fork to have 2^48 extent mappings. Even with a 64k fsblock size, that maps out to a bmbt containing more than 2^32 blocks. Therefore, this predicate must return a u64 count to avoid an integer wraparound that will cause scrub to do the wrong thing. It's unlikely that any such filesystem currently exists, because the incore bmbt would consume more than 64GB of kernel memory on its own, and so far nobody except me has driven a filesystem that far, judging from the lack of complaints. Cc: <stable@vger.kernel.org> # v5.19 Fixes: `df9ad5cc7a` ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:09 -08:00
Darrick J. Wong	e1d8602b6c	xfs: keep quota directory inode loaded In the same vein as the previous patch, there's no point in the metapath scrub setup function doing a lookup on the quota metadir just so it can validate that lookups work correctly. Instead, retain the quota directory inode in memory for the lifetime of the mount so that we can check this meaningfully. Cc: <stable@vger.kernel.org> # v6.13-rc1 Fixes: `128a055291` ("xfs: scrub quota file metapaths") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:08 -08:00
Darrick J. Wong	9b72800103	xfs: metapath scrubber should use the already loaded inodes Don't waste time in xchk_setup_metapath_dqinode doing a second lookup of the quota inodes, just grab them from the quotainfo structure. The whole point of this scrubber is to make sure that the dirents exist, so it's completely silly to do lookups. Cc: <stable@vger.kernel.org> # v6.13-rc1 Fixes: `128a055291` ("xfs: scrub quota file metapaths") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:08 -08:00
Darrick J. Wong	a440a28ddb	xfs: fix off-by-one error in fsmap's end_daddr usage In commit `ca6448aed4`, we created an "end_daddr" variable to fix fsmap reporting when the end of the range requested falls in the middle of an unknown (aka free on the rmapbt) region. Unfortunately, I didn't notice that the the code sets end_daddr to the last sector of the device but then uses that quantity to compute the length of the synthesized mapping. Zizhi Wo later observed that when end_daddr isn't set, we still don't report the last fsblock on a device because in that case (aka when info->last is true), the info->high mapping that we pass to xfs_getfsmap_group_helper has a startblock that points to the last fsblock. This is also wrong because the code uses startblock to compute the length of the synthesized mapping. Fix the second problem by setting end_daddr unconditionally, and fix the first problem by setting start_daddr to one past the end of the range to query. Cc: <stable@vger.kernel.org> # v6.11 Fixes: `ca6448aed4` ("xfs: Fix missing interval for missing_owner in xfs fsmap") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reported-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-12-12 17:45:08 -08:00
Josef Bacik	5121711eb8	fs: enable pre-content events on supported file systems Now that all the code has been added for pre-content events, and the various file systems that need the page fault hooks for fsnotify have been updated, add SB_I_ALLOW_HSM to the supported file systems. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/46960dcb2725fa0317895ed66a8409ba1c306a82.1731684329.git.josef@toxicpanda.com	2024-12-11 17:28:41 +01:00
Josef Bacik	7f4796a465	xfs: add pre-content fsnotify hook for DAX faults xfs has it's own handling for DAX faults, so we need to add the pre-content fsnotify hook for this case. Other faults go through filemap_fault so they're handled properly there. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/9eccdf59a65b72f0a1a5e2f2b9bff8eda2d4f2d9.1731684329.git.josef@toxicpanda.com	2024-12-11 17:28:41 +01:00
Linus Torvalds	9141c5d389	Bug fixes for 6.13-rc2 * Use xchg() in xlog_cil_insert_pcp_aggregate() * Fix ABBA deadlock on a race between mount and log shutdown * Fix quota softlimit incoherency on delalloc * Fix sparse inode limits on runt AG * remove unknown compat feature checks in SB write valdation * Eliminate a lockdep false positive -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQQMHYkcUKcy4GgPe2RGdaER5QtfpgUCZ072iwAKCRBGdaER5Qtf prtCAX4kKOVnDzn2dX4YWpFaPAFvaWbiH0GIIVIiRuQLqzARya/lurNXjfanuotc 4oJ3JacBgJ1MWYiBX2j95AEHJaes/G3Nm+EsXeqefWWxxQvrCcQt5kdtw1fVY8kz 5NCMUtxjIQ== =FM4s -----END PGP SIGNATURE----- Merge tag 'xfs-fixes-6.13-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Carlos Maiolino: - Use xchg() in xlog_cil_insert_pcp_aggregate() - Fix ABBA deadlock on a race between mount and log shutdown - Fix quota softlimit incoherency on delalloc - Fix sparse inode limits on runt AG - remove unknown compat feature checks in SB write valdation - Eliminate a lockdep false positive * tag 'xfs-fixes-6.13-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay xfs: Use xchg() in xlog_cil_insert_pcp_aggregate() xfs: prevent mount and log shutdown race xfs: delalloc and quota softlimit timers are incoherent xfs: fix sparse inode limits on runt AG xfs: remove unknown compat feature check in superblock write validation xfs: eliminate lockdep false positives in xfs_attr_shortform_list	2024-12-03 10:46:49 -08:00
Christoph Hellwig	cc2dba08cc	xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay xfs_bmap_add_extent_hole_delay works entirely on delalloc extents, for which xfs_bmap_same_rtgroup doesn't make sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-11-28 12:54:22 +01:00
Uros Bizjak	214093534f	xfs: Use xchg() in xlog_cil_insert_pcp_aggregate() try_cmpxchg() loop with constant "new" value can be substituted with just xchg() to atomically get and clear the location. The code on x86_64 improves from: 1e7f: 48 89 4c 24 10 mov %rcx,0x10(%rsp) 1e84: 48 03 14 c5 00 00 00 add 0x0(,%rax,8),%rdx 1e8b: 00 1e88: R_X86_64_32S __per_cpu_offset 1e8c: 8b 02 mov (%rdx),%eax 1e8e: 41 89 c5 mov %eax,%r13d 1e91: 31 c9 xor %ecx,%ecx 1e93: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 1e97: 75 f5 jne 1e8e <xlog_cil_commit+0x84e> 1e99: 48 8b 4c 24 10 mov 0x10(%rsp),%rcx 1e9e: 45 01 e9 add %r13d,%r9d to just: 1e7f: 48 03 14 cd 00 00 00 add 0x0(,%rcx,8),%rdx 1e86: 00 1e83: R_X86_64_32S __per_cpu_offset 1e87: 31 c9 xor %ecx,%ecx 1e89: 87 0a xchg %ecx,(%rdx) 1e8b: 41 01 cb add %ecx,%r11d No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <elder@riscstar.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-11-28 12:42:10 +01:00
Linus Torvalds	5c00ff742b	- The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro\|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq xyyenpibWgUoShU2wZ/Ha8FE5WDINwg= =JfWR -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro\|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...	2024-11-23 09:58:07 -08:00
Dave Chinner	a858109960	xfs: prevent mount and log shutdown race I recently had an fstests hang where there were two internal tasks stuck like so: [ 6559.010870] task:kworker/24:45 state:D stack:12152 pid:631308 tgid:631308 ppid:2 flags:0x00004000 [ 6559.016984] Workqueue: xfs-buf/dm-2 xfs_buf_ioend_work [ 6559.020349] Call Trace: [ 6559.022002] <TASK> [ 6559.023426] __schedule+0x650/0xb10 [ 6559.025734] schedule+0x6d/0xf0 [ 6559.027835] schedule_timeout+0x31/0x180 [ 6559.030582] wait_for_common+0x10c/0x1e0 [ 6559.033495] wait_for_completion+0x1d/0x30 [ 6559.036463] __flush_workqueue+0xeb/0x490 [ 6559.039479] ? mempool_alloc_slab+0x15/0x20 [ 6559.042537] xlog_cil_force_seq+0xa1/0x2f0 [ 6559.045498] ? bio_alloc_bioset+0x1d8/0x510 [ 6559.048578] ? submit_bio_noacct+0x2f2/0x380 [ 6559.051665] ? xlog_force_shutdown+0x3b/0x170 [ 6559.054819] xfs_log_force+0x77/0x230 [ 6559.057455] xlog_force_shutdown+0x3b/0x170 [ 6559.060507] xfs_do_force_shutdown+0xd4/0x200 [ 6559.063798] ? xfs_buf_rele+0x1bd/0x580 [ 6559.066541] xfs_buf_ioend_handle_error+0x163/0x2e0 [ 6559.070099] xfs_buf_ioend+0x61/0x200 [ 6559.072728] xfs_buf_ioend_work+0x15/0x20 [ 6559.075706] process_scheduled_works+0x1d4/0x400 [ 6559.078814] worker_thread+0x234/0x2e0 [ 6559.081300] kthread+0x147/0x170 [ 6559.083462] ? __pfx_worker_thread+0x10/0x10 [ 6559.086295] ? __pfx_kthread+0x10/0x10 [ 6559.088771] ret_from_fork+0x3e/0x50 [ 6559.091153] ? __pfx_kthread+0x10/0x10 [ 6559.093624] ret_from_fork_asm+0x1a/0x30 [ 6559.096227] </TASK> [ 6559.109304] Workqueue: xfs-cil/dm-2 xlog_cil_push_work [ 6559.112673] Call Trace: [ 6559.114333] <TASK> [ 6559.115760] __schedule+0x650/0xb10 [ 6559.118084] schedule+0x6d/0xf0 [ 6559.120175] schedule_timeout+0x31/0x180 [ 6559.122776] ? call_rcu+0xee/0x2f0 [ 6559.125034] __down_common+0xbe/0x1f0 [ 6559.127470] __down+0x1d/0x30 [ 6559.129458] down+0x48/0x50 [ 6559.131343] ? xfs_buf_item_unpin+0x8d/0x380 [ 6559.134213] xfs_buf_lock+0x3d/0xe0 [ 6559.136544] xfs_buf_item_unpin+0x8d/0x380 [ 6559.139253] xlog_cil_committed+0x287/0x520 [ 6559.142019] ? sched_clock+0x10/0x30 [ 6559.144384] ? sched_clock_cpu+0x10/0x190 [ 6559.147039] ? psi_group_change+0x48/0x310 [ 6559.149735] ? _raw_spin_unlock+0xe/0x30 [ 6559.152340] ? finish_task_switch+0xbc/0x310 [ 6559.155163] xlog_cil_process_committed+0x6d/0x90 [ 6559.158265] xlog_state_shutdown_callbacks+0x53/0x110 [ 6559.161564] ? xlog_cil_push_work+0xa70/0xaf0 [ 6559.164441] xlog_state_release_iclog+0xba/0x1b0 [ 6559.167483] xlog_cil_push_work+0xa70/0xaf0 [ 6559.170260] process_scheduled_works+0x1d4/0x400 [ 6559.173286] worker_thread+0x234/0x2e0 [ 6559.175779] kthread+0x147/0x170 [ 6559.177933] ? __pfx_worker_thread+0x10/0x10 [ 6559.180748] ? __pfx_kthread+0x10/0x10 [ 6559.183231] ret_from_fork+0x3e/0x50 [ 6559.185601] ? __pfx_kthread+0x10/0x10 [ 6559.188092] ret_from_fork_asm+0x1a/0x30 [ 6559.190692] </TASK> This is an ABBA deadlock where buffer IO completion is triggering a forced shutdown with the buffer lock held. It is waiting for the CIL to flush as part of the log force. The CIL flush is blocked doing shutdown processing of all it's objects, trying to unpin a buffer item. That requires taking the buffer lock.... For the CIL to be doing shutdown processing, the log must be marked with XLOG_IO_ERROR, but that doesn't happen until after the log force is issued. Hence for xfs_do_force_shutdown() to be forcing the log on a shut down log, we must have had a racing xlog_force_shutdown and xfs_force_shutdown like so: p0 p1 CIL push <holds buffer lock> xlog_force_shutdown xfs_log_force test_and_set_bit(XLOG_IO_ERROR) xlog_state_release_iclog() sees XLOG_IO_ERROR xlog_state_shutdown_callbacks .... xfs_buf_item_unpin xfs_buf_lock <blocks on buffer p1 holds> xfs_force_shutdown xfs_set_shutdown(mp) wins xlog_force_shutdown xfs_log_force <blocks on CIL push> xfs_set_shutdown(mp) fails <shuts down rest of log> The deadlock can be mitigated by avoiding the log force on the second pass through xlog_force_shutdown. Do this by adding another atomic state bit (XLOG_OP_PENDING_SHUTDOWN) that is set on entry to xlog_force_shutdown() but doesn't mark the log as shutdown. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-11-22 11:24:51 +01:00
Dave Chinner	c9c293240e	xfs: delalloc and quota softlimit timers are incoherent I've been seeing this failure on during xfs/050 recently: XFS: Assertion failed: dst->d_spc_timer != 0, file: fs/xfs/xfs_qm_syscalls.c, line: 435 .... Call Trace: <TASK> xfs_qm_scall_getquota_fill_qc+0x2a2/0x2b0 xfs_qm_scall_getquota_next+0x69/0xa0 xfs_fs_get_nextdqblk+0x62/0xf0 quota_getnextxquota+0xbf/0x320 do_quotactl+0x1a1/0x410 __se_sys_quotactl+0x126/0x310 __x64_sys_quotactl+0x21/0x30 x64_sys_call+0x2819/0x2ee0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x76/0x7e It turns out that the _qmount call has silently been failing to unmount and mount the filesystem, so when the softlimit is pushed past with a buffered write, it is not getting synced to disk before the next quota report is being run. Hence when the quota report runs, we have 300 blocks of delalloc data on an inode, with a soft limit of 200 blocks. XFS dquots account delalloc reservations as used space, hence the dquot is over the soft limit. However, we don't update the soft limit timers until we do a transactional update of the dquot. That is, the dquot sits over the soft limit without a softlimit timer being started until writeback occurs and the allocation modifies the dquot and we call xfs_qm_adjust_dqtimers() from xfs_trans_apply_dquot_deltas() in xfs_trans_commit() context. This isn't really a problem, except for this debug code in xfs_qm_scall_getquota_fill_qc(): if (xfs_dquot_is_enforced(dqp) && dqp->q_id != 0) { if ((dst->d_space > dst->d_spc_softlimit) && (dst->d_spc_softlimit > 0)) { ASSERT(dst->d_spc_timer != 0); } .... It asserts taht if the used block count is over the soft limit, it must have a soft limit timer running. This is clearly not the case, because we haven't committed the delalloc space to disk yet. Hence the soft limit is only exceeded temporarily in memory (which isn't an issue) and we start the timer the moment we exceed the soft limit in journalled metadata. This debug was introduced in: commit 0d5ad8383061fbc0a9804fbb98218750000fe032 Author: Supriya Wickrematillake <sup@sgi.com> Date: Wed May 15 22:44:44 1996 +0000 initial checkin quotactl syscall functions. The very first quota support commit back in 1996. This is zero-day debug for Irix and, as it turns out, a zero-day bug in the debug code because the delalloc code on Irix didn't update the softlimit timers, either. IOWs, this issue has been in the code for 28 years. We obviously don't care if soft limit timers are a bit rubbery when we have delalloc reservations in memory. Production systems running quota reports have been exposed to this situation for 28 years and nobody has noticed it, so the debug code is essentially worthless at this point in time. We also have the on-disk dquot verifiers checking that the soft limit timer is running whenever the dquot is over the soft limit before we write it to disk and after we read it from disk. These aren't firing, so it is clear the issue is purely a temporary in-memory incoherency that I never would have noticed had the test not silently failed to unmount the filesystem. Hence I'm simply going to trash this runtime debug because it isn't useful in the slightest for catching quota bugs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-11-22 11:24:45 +01:00
Dave Chinner	1332533358	xfs: fix sparse inode limits on runt AG The runt AG at the end of a filesystem is almost always smaller than the mp->m_sb.sb_agblocks. Unfortunately, when setting the max_agbno limit for the inode chunk allocation, we do not take this into account. This means we can allocate a sparse inode chunk that overlaps beyond the end of an AG. When we go to allocate an inode from that sparse chunk, the irec fails validation because the agbno of the start of the irec is beyond valid limits for the runt AG. Prevent this from happening by taking into account the size of the runt AG when allocating inode chunks. Also convert the various checks for valid inode chunk agbnos to use xfs_ag_block_count() so that they will also catch such issues in the future. Fixes: `56d1115c9b` ("xfs: allocate sparse inode chunks on full chunk allocation failure") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-11-22 11:24:40 +01:00
Long Li	652f03db89	xfs: remove unknown compat feature check in superblock write validation Compat features are new features that older kernels can safely ignore, allowing read-write mounts without issues. The current sb write validation implementation returns -EFSCORRUPTED for unknown compat features, preventing filesystem write operations and contradicting the feature's definition. Additionally, if the mounted image is unclean, the log recovery may need to write to the superblock. Returning an error for unknown compat features during sb write validation can cause mount failures. Although XFS currently does not use compat feature flags, this issue affects current kernels' ability to mount images that may use compat feature flags in the future. Since superblock read validation already warns about unknown compat features, it's unnecessary to repeat this warning during write validation. Therefore, the relevant code in write validation is being removed. Fixes: `9e037cb797` ("xfs: check for unknown v5 feature bits in superblock write verifier") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-11-22 10:20:55 +01:00
Long Li	45f69d091b	xfs: eliminate lockdep false positives in xfs_attr_shortform_list xfs_attr_shortform_list() only called from a non-transactional context, it hold ilock before alloc memory and maybe trapped in memory reclaim. Since commit 204fae32d5f7("xfs: clean up remaining GFP_NOFS users") removed GFP_NOFS flag, lockdep warning will be report as [1]. Eliminate lockdep false positives by use __GFP_NOLOCKDEP to alloc memory in xfs_attr_shortform_list(). [1] https://lore.kernel.org/linux-xfs/000000000000e33add0616358204@google.com/ Reported-by: syzbot+4248e91deb3db78358a2@syzkaller.appspotmail.com Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-11-22 09:52:03 +01:00
Linus Torvalds	2edc8f933d	New xfs code for 6.13 * convert perag to use xarrays * create a new generic allocation group structure * Add metadata inode dir trees * Create in-core rt allocation groups * Shard the RT section into allocation groups * Persist quota options with the enw metadata dir tree * Enable quota for RT volumes * Enable metadata directory trees * Some bugfixes Signed-off-by: Carlos Maiolino <cem@kernel.org> -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQQMHYkcUKcy4GgPe2RGdaER5QtfpgUCZzyNwAAKCRBGdaER5Qtf psV3AYCncK/pVhFfKQSFbnCvgPSoAe7N9n0Wt5gmjy0Ill2mbQXVl9ADXkH6a015 gcGM3t4BgIHLJQndL/Uz+3a0L5IriEb9QkAfzmx8t3vjiRBzBe3WfywEx9Yt7kZe xbxEJ2HQpA== =3ngC -----END PGP SIGNATURE----- Merge tag 'xfs-6.13-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Carlos Maiolino: "The bulk of this pull request is a major rework that Darrick and Christoph have been doing on XFS's real-time volume, coupled with a few features to support this rework. It does also includes some bug fixes. - convert perag to use xarrays - create a new generic allocation group structure - add metadata inode dir trees - create in-core rt allocation groups - shard the RT section into allocation groups - persist quota options with the enw metadata dir tree - enable quota for RT volumes - enable metadata directory trees - some bugfixes" * tag 'xfs-6.13-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (146 commits) xfs: port ondisk structure checks from xfs/122 to the kernel xfs: separate space btree structures in xfs_ondisk.h xfs: convert struct typedefs in xfs_ondisk.h xfs: enable metadata directory feature xfs: enable realtime quota again xfs: update sb field checks when metadir is turned on xfs: reserve quota for realtime files correctly xfs: create quota preallocation watermarks for realtime quota xfs: report realtime block quota limits on realtime directories xfs: persist quota flags with metadir xfs: advertise realtime quota support in the xqm stat files xfs: scrub quota file metapaths xfs: fix chown with rt quota xfs: use metadir for quota inodes xfs: refactor xfs_qm_destroy_quotainos xfs: use rtgroup busy extent list for FITRIM xfs: implement busy extent tracking for rtgroups xfs: port the perag discard code to handle generic groups xfs: move the min and max group block numbers to xfs_group xfs: adjust min_block usage in xfs_verify_agbno ...	2024-11-21 09:20:07 -08:00
Linus Torvalds	0f25f0e4ef	the bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdikAAKCRBZ7Krx/gZQ 69nJAQCmbQHK3TGUbQhOw6MJXOK9ezpyEDN3FZb4jsu38vTIdgEA6OxAYDO2m2g9 CN18glYmD3wRyU6Bwl4vGODouSJvDgA= =gVH3 -----END PGP SIGNATURE----- Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' class updates from Al Viro: "The bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) deal with the last remaing boolean uses of fd_file() css_set_fork(): switch to CLASS(fd_raw, ...) memcg_write_event_control(): switch to CLASS(fd) assorted variants of irqfd setup: convert to CLASS(fd) do_pollfd(): convert to CLASS(fd) convert do_select() convert vfs_dedupe_file_range(). convert cifs_ioctl_copychunk() convert media_request_get_by_fd() convert spu_run(2) switch spufs_calls_{get,put}() to CLASS() use convert cachestat(2) convert do_preadv()/do_pwritev() fdget(), more trivial conversions fdget(), trivial conversions privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() o2hb_region_dev_store(): avoid goto around fdget()/fdput() introduce "fd_pos" class, convert fdget_pos() users to it. fdget_raw() users: switch to CLASS(fd_raw) convert vmsplice() to CLASS(fd) ...	2024-11-18 12:24:06 -08:00
Linus Torvalds	241c7ed4d4	vfs-6.13.untorn.writes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcopwAKCRCRxhvAZXjc oitWAQD68PGFI6/ES9x+qGsDFEZBH08icuO+a9dyaZXyNRosDgD/ex2zHj6F7IzS Ghgb9jiqWQ8l2+PDYfisxa/0jiqCbAk= =DmXf -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs untorn write support from Christian Brauner: "An atomic write is a write issed with torn-write protection. This means for a power failure or any hardware failure all or none of the data from the write will be stored, never a mix of old and new data. This work is already supported for block devices. If a block device is opened with O_DIRECT and the block device supports atomic write, then FMODE_CAN_ATOMIC_WRITE is added to the file of the opened block device. This contains the work to expand atomic write support to filesystems, specifically ext4 and XFS. Currently, only support for writing exactly one filesystem block atomically is added. Since it's now possible to have filesystem block size > page size for XFS, it's possible to write 4K+ blocks atomically on x86" * tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: drop an obsolete comment in iomap_dio_bio_iter ext4: Do not fallback to buffered-io for DIO atomic write ext4: Support setting FMODE_CAN_ATOMIC_WRITE ext4: Check for atomic writes support in write iter ext4: Add statx support for atomic writes xfs: Support setting FMODE_CAN_ATOMIC_WRITE xfs: Validate atomic writes xfs: Support atomic write for statx fs: iomap: Atomic write support fs: Export generic_atomic_write_valid() block: Add bdev atomic write limits helpers fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() block/fs: Pass an iocb to generic_atomic_write_valid()	2024-11-18 11:30:09 -08:00
Linus Torvalds	6ac81fd55e	vfs-6.13.mgtime -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcScQAKCRCRxhvAZXjc oj+5AP4k822a77wc/3iPFk379naIvQ4dsrgemh0/Pb6ZvzvkFQEAi3vFCfzCDR2x SkJF/RwXXKZv6U31QXMRt2Qo6wfBuAc= =nVlm -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs multigrain timestamps from Christian Brauner: "This is another try at implementing multigrain timestamps. This time with significant help from the timekeeping maintainers to reduce the performance impact. Thomas provided a base branch that contains the required timekeeping interfaces for the VFS. It serves as the base for the multi-grain timestamp work: - Multigrain timestamps allow the kernel to use fine-grained timestamps when an inode's attributes is being actively observed via ->getattr(). With this support, it's possible for a file to get a fine-grained timestamp, and another modified after it to get a coarse-grained stamp that is earlier than the fine-grained time. If this happens then the files can appear to have been modified in reverse order, which breaks VFS ordering guarantees. To prevent this, a floor value is maintained for multigrain timestamps. Whenever a fine-grained timestamp is handed out, record it, and when later coarse-grained stamps are handed out, ensure they are not earlier than that value. If the coarse-grained timestamp is earlier than the fine-grained floor, return the floor value instead. The timekeeper changes add a static singleton atomic64_t into timekeeper.c that is used to keep track of the latest fine-grained time ever handed out. This is tracked as a monotonic ktime_t value to ensure that it isn't affected by clock jumps. Because it is updated at different times than the rest of the timekeeper object, the floor value is managed independently of the timekeeper via a cmpxchg() operation, and sits on its own cacheline. Two new public timekeeper interfaces are added: (1) ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the coarse-grained clock and the floor time (2) ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries to swap it into the floor. A timespec64 is filled with the result. - The VFS has always used coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide when to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This adds a way to only use fine-grained timestamps when they are being actively queried. Use the (unused) top bit in inode->i_ctime_nsec as a flag that indicates whether the current timestamps have been queried via stat() or the like. When it's set, we allow the kernel to use a fine-grained timestamp iff it's necessary to make the ctime show a different value. This solves the problem of being able to distinguish the timestamp between updates, but introduces a new problem: it's now possible for a file being changed to get a fine-grained timestamp. A file that is altered just a bit later can then get a coarse-grained one that appears older than the earlier fine-grained time. This violates timestamp ordering guarantees. This is where the earlier mentioned timkeeping interfaces help. A global monotonic atomic64_t value is kept that acts as a timestamp floor. When we go to stamp a file, we first get the latter of the current floor value and the current coarse-grained time. If the inode ctime hasn't been queried then we just attempt to stamp it with that value. If it has been queried, then first see whether the current coarse time is later than the existing ctime. If it is, then we accept that value. If it isn't, then we get a fine-grained time and try to swap that into the global floor. Whether that succeeds or fails, we take the resulting floor time, convert it to realtime and try to swap that into the ctime. We take the result of the ctime swap whether it succeeds or fails, since either is just as valid. Filesystems can opt into this by setting the FS_MGTIME fstype flag. Others should be unaffected (other than being subject to the same floor value as multigrain filesystems)" * tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: reduce pointer chasing in is_mgtime() test tmpfs: add support for multigrain timestamps btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps Documentation: add a new file documenting multigrain timestamps fs: add percpu counters for significant multigrain timestamp events fs: tracepoints around multigrain timestamp events fs: handle delegated timestamps in setattr_copy_mgtime timekeeping: Add percpu counter for tracking floor swap events timekeeping: Add interfaces for handling timestamps with a floor value fs: have setattr_copy handle multigrain timestamps appropriately fs: add infrastructure for multigrain timestamps	2024-11-18 09:15:39 -08:00
Carlos Maiolino	5877dc24be	xfs: improve ondisk structure checks [v5.5 10/10] Reorganize xfs_ondisk.h to group the build checks by type, then add a bunch of missing checks that were in xfs/122 but not the build system. With this, we can get rid of xfs/122. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZyqQdAAKCRBKO3ySh0YR ph8MAP4nWT1zCLAEDpFqVQmJO7wjlS02x5xFTHEzcms30ptsIwEA73ycS74tAmP5 CGsPxFVzhG8oMkWgrWXnXuOGfpQBPQo= =Dx2e -----END PGP SIGNATURE----- Merge tag 'better-ondisk-6.13_2024-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: improve ondisk structure checks [v5.5 10/10] Reorganize xfs_ondisk.h to group the build checks by type, then add a bunch of missing checks that were in xfs/122 but not the build system. With this, we can get rid of xfs/122. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-12 11:03:15 +01:00
Carlos Maiolino	052378aef8	xfs: enable metadir [v5.5 09/10] Actually enable this very large feature, which adds metadata directory trees, allocation groups on the realtime volume, persistent quota options, and quota for realtime files. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZyqQdAAKCRBKO3ySh0YR pvjKAP9A1v0QwkBo65ARf1gQb2AZYlebV8o6faSrpc+10vOPxgD/SV3GKhxQrtaT gOswA6f9QJN3E82IJ6fR2PZgSke+iQs= =xjZL -----END PGP SIGNATURE----- Merge tag 'metadir-6.13_2024-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: enable metadir [v5.5 09/10] Actually enable this very large feature, which adds metadata directory trees, allocation groups on the realtime volume, persistent quota options, and quota for realtime files. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-12 11:02:55 +01:00
Carlos Maiolino	8ca118e17a	xfs: enable quota for realtime volumes [v5.5 08/10] At some point, I realized that I've refactored enough of the quota code in XFS that I should evaluate whether or not quota actually works on realtime volumes. It turns out that it nearly works: the only broken pieces are chown and delayed allocation, and reporting of project quotas in the statvfs output for projinherit+rtinherit directories. Fix these things and we can have realtime quotas again after 20 years. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZyqQdAAKCRBKO3ySh0YR pkh4AQCtjI73mwU9rhzs2MO5nLNlpg9bgOxute+G4eqGCP02CwEAvg/LpT9yA6qk 1jM5x8C6xy03yIWTUc+DcMPqoCqJzwc= =2QMv -----END PGP SIGNATURE----- Merge tag 'realtime-quotas-6.13_2024-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: enable quota for realtime volumes [v5.5 08/10] At some point, I realized that I've refactored enough of the quota code in XFS that I should evaluate whether or not quota actually works on realtime volumes. It turns out that it nearly works: the only broken pieces are chown and delayed allocation, and reporting of project quotas in the statvfs output for projinherit+rtinherit directories. Fix these things and we can have realtime quotas again after 20 years. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-12 11:02:25 +01:00
Carlos Maiolino	93c0f79edf	xfs: persist quota options with metadir [v5.5 07/10] Store the quota files in the metadata directory tree instead of the superblock. Since we're introducing a new incompat feature flag, let's also make the mount process bring up quotas in whatever state they were when the filesystem was last unmounted, instead of requiring sysadmins to remember that themselves. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZyqQdAAKCRBKO3ySh0YR pod/AP9+FSHh3UF9ccpBQXMAG0NmhXYJBQ7grnAp4q89ko6fCAD+JyUsqi55zDw6 KnLSWZgNuO+aaCmMKXX4hEttA0//gAY= =0YFz -----END PGP SIGNATURE----- Merge tag 'metadir-quotas-6.13_2024-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: persist quota options with metadir [v5.5 07/10] Store the quota files in the metadata directory tree instead of the superblock. Since we're introducing a new incompat feature flag, let's also make the mount process bring up quotas in whatever state they were when the filesystem was last unmounted, instead of requiring sysadmins to remember that themselves. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-12 11:01:12 +01:00
Carlos Maiolino	b939bcdca3	xfs: shard the realtime section [v5.5 06/10] Right now, the realtime section uses a single pair of metadata inodes to store the free space information. This presents a scalability problem since every thread trying to allocate or free rt extents have to lock these files. Solve this problem by sharding the realtime section into separate realtime allocation groups. While we're at it, define a superblock to be stamped into the start of the rt section. This enables utilities such as blkid to identify block devices containing realtime sections, and avoids the situation where anything written into block 0 of the realtime extent can be misinterpreted as file data. The best advantage for rtgroups will become evident later when we get to adding rmap and reflink to the realtime volume, since the geometry constraints are the same for rt groups and AGs. Hence we can reuse all that code directly. This is a very large patchset, but it catches us up with 20 years of technical debt that have accumulated. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZyqQdAAKCRBKO3ySh0YR pqk4AQD31pupAefiZ39TFLz0oA1+Q2WUOoLxH/3Ovqin1GJNPgD9EG04/14fDmRU WDUSVfU8JKKJYEXXZnLeJLsvEUL2EQ0= =1/oh -----END PGP SIGNATURE----- Merge tag 'realtime-groups-6.13_2024-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: shard the realtime section [v5.5 06/10] Right now, the realtime section uses a single pair of metadata inodes to store the free space information. This presents a scalability problem since every thread trying to allocate or free rt extents have to lock these files. Solve this problem by sharding the realtime section into separate realtime allocation groups. While we're at it, define a superblock to be stamped into the start of the rt section. This enables utilities such as blkid to identify block devices containing realtime sections, and avoids the situation where anything written into block 0 of the realtime extent can be misinterpreted as file data. The best advantage for rtgroups will become evident later when we get to adding rmap and reflink to the realtime volume, since the geometry constraints are the same for rt groups and AGs. Hence we can reuse all that code directly. This is a very large patchset, but it catches us up with 20 years of technical debt that have accumulated. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-12 11:00:42 +01:00
Carlos Maiolino	cb288c9fb2	xfs: preparation for realtime allocation groups [v5.5 05/10] Prepare for realtime groups by adding a few bug fixes and generic code that will be necessary. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZyqQdAAKCRBKO3ySh0YR pgmeAP980iZAY49aL85dhr/QNl0G5YmuLOx6UW8DiAALCJxyxAEAnD5ilA7vQz40 80/cn+Y77fT3LptpAHTM5/FY+42IOgM= =AzwL -----END PGP SIGNATURE----- Merge tag 'rtgroups-prep-6.13_2024-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: preparation for realtime allocation groups [v5.5 05/10] Prepare for realtime groups by adding a few bug fixes and generic code that will be necessary. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-12 11:00:16 +01:00
Carlos Maiolino	6b3582aca3	xfs: create incore rt allocation groups [v5.5 04/10] Add in-memory data structures for sharding the realtime volume into independent allocation groups. For existing filesystems, the entire rt volume is modelled as having a single large group, with (potentially) a number of rt extents exceeding 2^32 blocks, though these are not likely to exist because the codebase has been a bit broken for decades. The next series fills in the ondisk format and other supporting structures. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZyqQdAAKCRBKO3ySh0YR ptrrAP41PURivFpHWXqg0sajsIUUezhuAdfg41fJqOop81qWDAEA2CsLf1z0c9/P CQS/tlQ3xdwZ0MYZMaw2o0EgSHYjwg8= =qVdv -----END PGP SIGNATURE----- Merge tag 'incore-rtgroups-6.13_2024-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: create incore rt allocation groups [v5.5 04/10] Add in-memory data structures for sharding the realtime volume into independent allocation groups. For existing filesystems, the entire rt volume is modelled as having a single large group, with (potentially) a number of rt extents exceeding 2^32 blocks, though these are not likely to exist because the codebase has been a bit broken for decades. The next series fills in the ondisk format and other supporting structures. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-12 10:59:34 +01:00
Carlos Maiolino	d7a5b69bf0	xfs: metadata inode directory trees [v5.5 03/10] This series delivers a new feature -- metadata inode directories. This is a separate directory tree (rooted in the superblock) that contains only inodes that contain filesystem metadata. Different metadata objects can be looked up with regular paths. Start by creating xfs_imeta{dir,file}* functions to mediate access to the metadata directory tree. By the end of this mega series, all existing metadata inodes (rt+quota) will use this directory tree instead of the superblock. Next, define the metadir on-disk format, which consists of marking inodes with a new iflag that says they're metadata. This prevents bulkstat and friends from ever getting their hands on fs metadata files. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZyqQdAAKCRBKO3ySh0YR ppNiAP9JgPFENv3P0UCJiCDqtWZlNWfz8a9ngAehm4AQMA0P9gD/XgVYNKZRY2Q3 P+3Sh1TVZ63dcEENlmEFE3myKjeJKAQ= =IJLc -----END PGP SIGNATURE----- Merge tag 'metadata-directory-tree-6.13_2024-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: metadata inode directory trees [v5.5 03/10] This series delivers a new feature -- metadata inode directories. This is a separate directory tree (rooted in the superblock) that contains only inodes that contain filesystem metadata. Different metadata objects can be looked up with regular paths. Start by creating xfs_imeta{dir,file}* functions to mediate access to the metadata directory tree. By the end of this mega series, all existing metadata inodes (rt+quota) will use this directory tree instead of the superblock. Next, define the metadir on-disk format, which consists of marking inodes with a new iflag that says they're metadata. This prevents bulkstat and friends from ever getting their hands on fs metadata files. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-12 10:59:05 +01:00
Carlos Maiolino	28cf0d1a34	xfs: create a generic allocation group structure [v5.5 02/10] Soon we'll be sharding the realtime volume into separate allocation groups. These rt groups will /mostly/ behave the same as the ones on the data device, but since rt groups don't have quite the same set of struct fields as perags, let's hoist the parts that will be shared by both into a common xfs_group object. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZyqQdAAKCRBKO3ySh0YR pnJDAQCh14f3aSCHslr4XeG1YkT7NZ44AtMjBqEi5GRN26wC0wD+PeaVSRgt+5wy nONT/nFqU5pApe1w2pq78SoJ+vLJxAk= =la16 -----END PGP SIGNATURE----- Merge tag 'generic-groups-6.13_2024-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: create a generic allocation group structure [v5.5 02/10] Soon we'll be sharding the realtime volume into separate allocation groups. These rt groups will /mostly/ behave the same as the ones on the data device, but since rt groups don't have quite the same set of struct fields as perags, let's hoist the parts that will be shared by both into a common xfs_group object. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-12 10:58:27 +01:00
Carlos Maiolino	131ffe5e69	xfs: convert perag to use xarrays [v5.5 01/10] Convert the xfs_mount perag tree to use an xarray instead of a radix tree. There should be no functional changes here. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZyqQcwAKCRBKO3ySh0YR pmKsAQDko7YQ/dJZsjDvBJRUEnjrJqK9ukR9Ovc00ak0WXIDYQD/cdm8xzkixHn2 yfwsuFxIm6k0PPzb9koSCDT/CQS6QAA= =WGih -----END PGP SIGNATURE----- Merge tag 'perag-xarray-6.13_2024-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: convert perag to use xarrays [v5.5 01/10] Convert the xfs_mount perag tree to use an xarray instead of a radix tree. There should be no functional changes here. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-12 10:57:32 +01:00
Kairui Song	da0c02516c	mm/list_lru: simplify the list_lru walk callback function Now isolation no longer takes the list_lru global node lock, only use the per-cgroup lock instead. And this lock is inside the list_lru_one being walked, no longer needed to pass the lock explicitly. Link: https://lkml.kernel.org/r/20241104175257.60853-7-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-11 17:22:26 -08:00
Kairui Song	fb56fdf8b9	mm/list_lru: split the lock to per-cgroup scope Currently, every list_lru has a per-node lock that protects adding, deletion, isolation, and reparenting of all list_lru_one instances belonging to this list_lru on this node. This lock contention is heavy when multiple cgroups modify the same list_lru. This lock can be split into per-cgroup scope to reduce contention. To achieve this, we need a stable list_lru_one for every cgroup. This commit adds a lock to each list_lru_one and introduced a helper function lock_list_lru_of_memcg, making it possible to pin the list_lru of a memcg. Then reworked the reparenting process. Reparenting will switch the list_lru_one instances one by one. By locking each instance and marking it dead using the nr_items counter, reparenting ensures that all items in the corresponding cgroup (on-list or not, because items have a stable cgroup, see below) will see the list_lru_one switch synchronously. Objcg reparent is also moved after list_lru reparent so items will have a stable mem cgroup until all list_lru_one instances are drained. The only caller that doesn't work the *_obj interfaces are direct calls to list_lru_{add,del}. But it's only used by zswap and that's also based on objcg, so it's fine. This also changes the bahaviour of the isolation function when LRU_RETRY or LRU_REMOVED_RETRY is returned, because now releasing the lock could unblock reparenting and free the list_lru_one, isolation function will have to return withoug re-lock the lru. prepare() { mkdir /tmp/test-fs modprobe brd rd_nr=1 rd_size=33554432 mkfs.xfs -f /dev/ram0 mount -t xfs /dev/ram0 /tmp/test-fs for i in $(seq 1 512); do mkdir "/tmp/test-fs/$i" for j in $(seq 1 10240); do echo TEST-CONTENT > "/tmp/test-fs/$i/$j" done & done; wait } do_test() { read_worker() { sleep 1 tar -cv "$1" &>/dev/null } read_in_all() { cd "/tmp/test-fs" && ls for i in $(seq 1 512); do (exec sh -c 'echo "$PPID"') > "/sys/fs/cgroup/benchmark/$i/cgroup.procs" read_worker "$i" & done; wait } for i in $(seq 1 512); do mkdir -p "/sys/fs/cgroup/benchmark/$i" done echo +memory > /sys/fs/cgroup/benchmark/cgroup.subtree_control echo 512M > /sys/fs/cgroup/benchmark/memory.max echo 3 > /proc/sys/vm/drop_caches time read_in_all } Above script simulates compression of small files in multiple cgroups with memory pressure. Run prepare() then do_test for 6 times: Before: real 0m7.762s user 0m11.340s sys 3m11.224s real 0m8.123s user 0m11.548s sys 3m2.549s real 0m7.736s user 0m11.515s sys 3m11.171s real 0m8.539s user 0m11.508s sys 3m7.618s real 0m7.928s user 0m11.349s sys 3m13.063s real 0m8.105s user 0m11.128s sys 3m14.313s After this commit (about ~15% faster): real 0m6.953s user 0m11.327s sys 2m42.912s real 0m7.453s user 0m11.343s sys 2m51.942s real 0m6.916s user 0m11.269s sys 2m43.957s real 0m6.894s user 0m11.528s sys 2m45.346s real 0m6.911s user 0m11.095s sys 2m43.168s real 0m6.773s user 0m11.518s sys 2m40.774s Link: https://lkml.kernel.org/r/20241104175257.60853-6-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-11 17:22:26 -08:00
Darrick J. Wong	13877bc79d	xfs: port ondisk structure checks from xfs/122 to the kernel Check this with every kernel and userspace build, so we can drop the nonsense in xfs/122. Roughly drafted with: sed -e 's/^offsetof/\tXFS_CHECK_OFFSET/g' \ -e 's/^sizeof/\tXFS_CHECK_STRUCT_SIZE/g' \ -e 's/ = $[0-9]$/,\t\t\t\1);/g' \ -e 's/xfs_sb_t/struct xfs_dsb/g' \ -e 's/),/,/g' \ -e 's/xfs_$[a-z0-9_]$_t,/struct xfs_\1,/g' \ < tests/xfs/122.out \| sort and then manual fixups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:47 -08:00
Darrick J. Wong	131a883fff	xfs: separate space btree structures in xfs_ondisk.h Create a separate section for space management btrees so that they're not mixed in with file structures. Ignore the dsb stuff sprinkled around for now, because we'll deal with that in a subsequent patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:47 -08:00
Darrick J. Wong	89b38282d1	xfs: convert struct typedefs in xfs_ondisk.h Replace xfs_foo_t with struct xfs_foo where appropriate. The next patch will import more checks from xfs/122, and it's easier to automate deduplication if we don't have to reason about typedefs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:47 -08:00
Darrick J. Wong	ea079efd36	xfs: enable metadata directory feature Enable the metadata directory feature. With this feature, all metadata inodes are placed in the metadata directory, and the only inumbers in the superblock are the roots of the two directory trees. The RT device is now sharded into a number of rtgroups, where 0 rtgroups mean that no RT extents are supported, and the traditional XFS stub RT bitmap and summary inodes don't exist. A single rtgroup gives roughly identical behavior to the traditional RT setup, but now with checksummed and self identifying free space metadata. For quota, the quota options are read from the superblock unless explicitly overridden via mount options. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:46 -08:00
Darrick J. Wong	edc038f7f3	xfs: enable realtime quota again Enable quotas for the realtime device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:46 -08:00
Darrick J. Wong	28d756d4d5	xfs: update sb field checks when metadir is turned on When metadir is enabled, we want to check the two new rtgroups fields, and we don't want to check the old inumbers that are now in the metadir. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:46 -08:00
Darrick J. Wong	b7020ba86a	xfs: reserve quota for realtime files correctly Fix xfs_quota_reserve_blkres to reserve rt block quota whenever we're dealing with a realtime file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:46 -08:00
Darrick J. Wong	5dd70852b0	xfs: create quota preallocation watermarks for realtime quota Refactor the quota preallocation watermarking code so that it'll work for realtime quota too. Convert the do_div calls into div_u64 for compactness. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:46 -08:00
Darrick J. Wong	9a17ebfea9	xfs: report realtime block quota limits on realtime directories On the data device, calling statvfs on a projinherit directory results in the block and avail counts being curtailed to the project quota block limits, if any are set. Do the same for realtime files or directories, only use the project quota rt block limits. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:46 -08:00
Darrick J. Wong	d5d9dd5b30	xfs: persist quota flags with metadir It's annoying that one has to keep reminding XFS about what quota options it should mount with, since the quota flags recording the previous state are sitting right there in the primary superblock. Even more strangely, there exists a noquota option to disable quotas completely, so it's odder still that providing no options is the same as noquota. Starting with metadir, let's change the behavior so that if the user does not specify any quota-related mount options at all, the ondisk quota flags will be used to bring up quota. In other words, the filesystem will mount in the same state and with the same functionality as it had during the last mount. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:45 -08:00
Darrick J. Wong	184c619f55	xfs: advertise realtime quota support in the xqm stat files Add a fifth column to this (really old) stat file to advertise that the kernel supports quota for realtime volumes. This will be used by fstests to detect kernel support. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:45 -08:00
Darrick J. Wong	128a055291	xfs: scrub quota file metapaths Enable online fsck for quota file metadata directory paths. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:45 -08:00
Darrick J. Wong	b28564cae1	xfs: fix chown with rt quota Make chown's quota adjustments work with realtime files. This is mostly a matter of calling xfs_inode_count_blocks on a given file to figure out the number of blocks allocated to the data device and to the realtime device, and using those quantities to update the quota accounting when the id changes. Delayed allocation reservations are moved from the old dquot's incore reservation to the new dquot's incore reservation. Note that there was a missing ILOCK bug in xfs_qm_dqusage_adjust that we must fix before calling xfs_iread_extents. Prior to 2.6.37 the locking was correct, but then someone removed the ILOCK as part of a cleanup. Nobody noticed because nowhere in the git history have we ever supported rt+quota so nobody can use this. I'm leaving git breadcrumbs in case anyone is desperate enough to try to backport the rtquota code to old kernels. Not-Cc: <stable@vger.kernel.org> # v2.6.37 Fixes: `52fda11424` ("xfs: simplify xfs_qm_dqusage_adjust") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:45 -08:00
Darrick J. Wong	e80fbe1ad8	xfs: use metadir for quota inodes Store the quota inodes in the /quota metadata directory if metadir is enabled. This enables us to stop using the sb_[ugp]uotino fields in the superblock. From this point on, all metadata files will be children of the metadata directory tree root. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:45 -08:00
Darrick J. Wong	fc23a426ce	xfs: refactor xfs_qm_destroy_quotainos Reuse this function instead of open-coding the logic. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:45 -08:00
Darrick J. Wong	a3315d1130	xfs: use rtgroup busy extent list for FITRIM For filesystems that have rtgroups and hence use the busy extent list for freed rt space, use that busy extent list so that FITRIM can issue discard commands asynchronously without worrying about other callers accidentally allocating and using space that is being discarded. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:44 -08:00
Darrick J. Wong	7e85fc2394	xfs: implement busy extent tracking for rtgroups For rtgroups filesystems, track newly freed (rt) space through the log until the rt EFIs have been committed to disk. This way we ensure that space cannot be reused until all traces of the old owner are gone. As a fringe benefit, we now support -o discard on the realtime device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:44 -08:00
Darrick J. Wong	0c271d906e	xfs: port the perag discard code to handle generic groups Port xfs_discard_extents and its tracepoints to handle generic groups instead of just perags. This is needed to enable busy extent tracking for rtgroups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:44 -08:00
Darrick J. Wong	e0b5b97dde	xfs: move the min and max group block numbers to xfs_group Move the min and max agblock numbers to the generic xfs_group structure so that we can start building validators for extents within an rtgroup. While we're at it, use check_add_overflow for the extent length computation because that has much better overflow checking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:44 -08:00
Darrick J. Wong	ceaa0bd773	xfs: adjust min_block usage in xfs_verify_agbno There's some weird logic in xfs_verify_agbno -- min_block ought to be the first agblock number in the AG that can be used by non-static metadata. However, we initialize it to the last agblock of the static metadata, which works due to the <= check, even though this isn't technically correct. Change the check to < and set min_block to the next agblock past the static metadata. This hasn't been an issue up to now, but we're going to move these things into the generic group struct, and this will cause problems with rtgroups, where min_block can be zero for an rtgroup that doesn't have a rt superblock. Note that there's no user-visible impact with the old logic, so this isn't a bug fix. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:44 -08:00
Darrick J. Wong	7195f240c6	xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t Now that we've finished adding allocation groups to the realtime volume, let's make the file block mapping address (xfs_rtblock_t) a segmented value just like we do on the data device. This means that group number and block number conversions can be done with shifting and masking instead of integer division. While in theory we could continue caching the rgno shift value in m_rgblklog, the fact that we now always use the shift value means that we have an opportunity to increase the redundancy of the rt geometry by storing it in the ondisk superblock and adding more sb verifier code. Extend the sueprblock to store the rgblklog value. Now that we have segmented addresses, set the correct values in m_groups[XG_TYPE_RTG] so that the xfs_group helpers work correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:44 -08:00
Darrick J. Wong	3f0205ebe7	xfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries We're about to segment xfs_rtblock_t addresses, so we must create type-specific helpers to do rt extent rounding of file mapping block lengths because the rtb helpers soon will not do the right thing there. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:43 -08:00
Darrick J. Wong	fd7588fa64	xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries We're about to segment xfs_rtblock_t addresses, so we must create type-specific helpers to do rt extent rounding of file block offsets because the rtb helpers soon will not do the right thing there. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:43 -08:00
Darrick J. Wong	ea99122b18	xfs: mask off the rtbitmap and summary inodes when metadir in use Set the rtbitmap and summary file inumbers to NULLFSINO in the superblock and make sure they're zeroed whenever we write the superblock to disk, to mimic mkfs behavior. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:43 -08:00
Darrick J. Wong	a74923333d	xfs: scrub metadir paths for rtgroup metadata Add the code we need to scan the metadata directory paths of rt group metadata files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:43 -08:00
Darrick J. Wong	1433f8f9ce	xfs: repair realtime group superblock Repair the realtime superblock if it has become out of date with the primary superblock. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:43 -08:00
Darrick J. Wong	3f1bdf50ab	xfs: scrub the realtime group superblock Enable scrubbing of realtime group superblocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:43 -08:00
Darrick J. Wong	7333c948c2	xfs: don't coalesce file mappings that cross rtgroup boundaries in scrub The bmbt scrubber will combine file mappings if they are mergeable to reduce the number of cross-referencing checks. However, we shouldn't combine mappings that cross rt group boundaries because that will cause verifiers to trip incorrectly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:42 -08:00
Christoph Hellwig	d162491c54	xfs: make the RT allocator rtgroup aware Make the allocator rtgroup aware by either picking a specific group if there is a hint, or loop over all groups otherwise. A simple rotor is provided to pick the placement for initial allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:42 -08:00
Darrick J. Wong	b91afef724	xfs: don't merge ioends across RTGs Unlike AGs, RTGs don't always have metadata in their first blocks, and thus we don't get automatic protection from merging I/O completions across RTG boundaries. Add code to set the IOMAP_F_BOUNDARY flag for ioends that start at the first block of a RTG so that they never get merged into the previous ioend. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:42 -08:00
Darrick J. Wong	44e69c9af1	xfs: use realtime EFI to free extents when rtgroups are enabled When rmap is enabled, XFS expects a certain order of operations, which is: 1) remove the file mapping, 2) remove the reverse mapping, and then 3) free the blocks. When reflink is enabled, XFS replaces (3) with a deferred refcount decrement operation that can schedule freeing the blocks if that was the last refcount. For realtime files, xfs_bmap_del_extent_real tries to do 1 and 3 in the same transaction, which will break both rmap and reflink unless we switch it to use realtime EFIs. Both rmap and reflink depend on the rtgroups feature, so let's turn on EFIs for all rtgroups filesystems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:42 -08:00
Darrick J. Wong	fc91d9430e	xfs: support error injection when freeing rt extents A handful of fstests expect to be able to test what happens when extent free intents fail to actually free the extent. Now that we're supporting EFIs for realtime extents, add to xfs_rtfree_extent the same injection point that exists in the regular extent freeing code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:42 -08:00
Darrick J. Wong	4c8900bbf1	xfs: support logging EFIs for realtime extents Teach the EFI mechanism how to free realtime extents. We're going to need this to enforce proper ordering of operations when we enable realtime rmap. Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer ops for rt extents. This keeps the ondisk artifacts and processing code completely separate between the rt and non-rt cases. Hopefully this will make it easier to debug filesystem problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:42 -08:00
Darrick J. Wong	b57283e1a0	xfs: force swapext to a realtime file to use the file content exchange ioctl xfs_swap_extent_rmap does not use log items to track the overall progress of an attempt to swap the extent mappings between two files. If the system crashes in the middle of swapping a partially written realtime extent, the mapping will be left in an inconsistent state wherein a file can point to multiple extents on the rt volume. The new file range exchange functionality handles this correctly, so all callers must upgrade to that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:41 -08:00
Darrick J. Wong	e464d8e8bb	xfs: store rtgroup information with a bmap intent Make the bmap intent items take an active reference to the rtgroup containing the space that is being mapped or unmapped. We will need this functionality once we start enabling rmap and reflink on the rt volume. Technically speaking we need it even for !rtgroups filesystems to prevent the (dummy) rtgroup 0 from going away, even though this will never happen. As a bonus, we can rework the xfs_bmap_deferred_class tracepoint to use the xfs_group object to figure out the type and group number, widen the group block number field to fit 64-bit quantities, and get rid of the now redundant opdev and rtblock fields. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:41 -08:00
Darrick J. Wong	ee32135148	xfs: grow the realtime section when realtime groups are enabled Enable growing the rt section when realtime groups are enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:41 -08:00
Darrick J. Wong	a2c2836739	xfs: encode the rtsummary in big endian format Currently, the ondisk realtime summary file counters are accessed in units of 32-bit words. There's no endian translation of the contents of this file, which means that the Bad Things Happen(tm) if you go from (say) x86 to powerpc. Since we have a new feature flag, let's take the opportunity to enforce an endianness on the file. Encode the summary information in big endian format, like most of the rest of the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:41 -08:00
Darrick J. Wong	eba42c2c53	xfs: encode the rtbitmap in big endian format Currently, the ondisk realtime bitmap file is accessed in units of 32-bit words. There's no endian translation of the contents of this file, which means that the Bad Things Happen(tm) if you go from (say) x86 to powerpc. Since we have a new feature flag, let's take the opportunity to enforce an endianness on the file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:41 -08:00
Darrick J. Wong	118895aa95	xfs: add block headers to realtime bitmap and summary blocks Upgrade rtbitmap and rtsummary blocks to have self describing metadata like most every other thing in XFS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:40 -08:00
Darrick J. Wong	3fa7a6d0c7	xfs: export the geometry of realtime groups to userspace Create an ioctl so that the kernel can report the status of realtime groups to userspace. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:40 -08:00
Darrick J. Wong	ab7bd650e1	xfs: record rt group metadata errors in the health system Record the state of per-rtgroup metadata sickness in the rtgroup structure for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:40 -08:00
Darrick J. Wong	21e62bddf0	xfs: convert sick_map loops to use ARRAY_SIZE Convert these arrays to use ARRAY_SIZE insteead of requiring an empty sentinel array element at the end. This saves memory and would have avoided a bug that worked its way into the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:40 -08:00
Darrick J. Wong	35537f25d2	xfs: add frextents to the lazysbcounters when rtgroups enabled Make the free rt extent count a part of the lazy sb counters when the realtime groups feature is enabled. This is possible because the patch to recompute frextents from the rtbitmap during log recovery predates the code adding rtgroup support, hence we know that the value will always be correct during runtime. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:40 -08:00
Christoph Hellwig	8458c4944e	xfs: add a helper to prevent bmap merges across rtgroup boundaries Except for the rt superblock, realtime groups do not store any metadata at the start (or end) of the group. There is nothing to prevent the bmap code from merging allocations from multiple groups into a single bmap record. Add a helper to check for this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: massage the commit message after pulling this into rtgroups] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:40 -08:00
Darrick J. Wong	9bb5127347	xfs: check that rtblock extents do not break rtsupers or rtgroups Check that rt block pointers do not point to the realtime superblock and that allocated rt space extents do not cross rtgroup boundaries. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:39 -08:00
Darrick J. Wong	8edde94d64	xfs: export realtime group geometry via XFS_FSOP_GEOM Export the realtime geometry information so that userspace can query it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:39 -08:00
Darrick J. Wong	76d3be00df	xfs: update realtime super every time we update the primary fs super Every time we update parts of the primary filesystem superblock that are echoed in the rt superblock, we must update the rt super. Avoid changing the log to support logging to the rt device by using ordered buffers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:39 -08:00
Darrick J. Wong	18618e7100	xfs: check the realtime superblock at mount time Check the realtime superblock at mount time, to ensure that the label and uuids actually match the primary superblock on the data device. If the rt superblock is good, attach it to the xfs_mount so that the log can use ordered buffers to keep this primary in sync with the primary super on the data device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:39 -08:00
Darrick J. Wong	96768e9151	xfs: define the format of rt groups Define the ondisk format of realtime group metadata, and a superblock for realtime volumes. rt supers are conditionally enabled by a predicate function so that they can be disabled if we ever implement zoned storage support for the realtime volume. For rt group enabled file systems there is a separate bitmap and summary file for each group and thus the number of bitmap and summary blocks needs to be calculated differently. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:39 -08:00
Christoph Hellwig	f220f6da5f	xfs: make RT extent numbers relative to the rtgroup To prepare for adding per-rtgroup bitmap files, make the xfs_rtxnum_t type encode the RT extent number relative to the rtgroup. The biggest part of this to clearly distinguish between the relative extent number that gets masked when converting from a global block number and length values that just have a factor applied to them when converting from file system blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:38 -08:00
Darrick J. Wong	dca94251f6	xfs: fix rt device offset calculations for FITRIM FITRIM on xfs has this bizarro uapi where we flatten all the physically addressable storage across two block devices into a linear address space. In this address space, the realtime device comes immediately after the data device. Therefore, the xfs_trim_rtdev_extents has to convert its input parameters from the linear address space to actual rtdev block addresses on the realtime volume. Right now the address space conversion is done in units of rtblocks. However, a future patchset will convert xfs_rtblock_t to be a segmented address space (group:blkno) like the data device. Change the conversion code to be done in units of daddrs since those will never be segmented. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:38 -08:00
Christoph Hellwig	f8c5a8415f	xfs: refactor xfs_rtsummary_blockcount Make xfs_rtsummary_blockcount take all the required information from the mount structure and return the number of summary levels from it as well. This cleans up many of the callers and prepares for making the rtsummary files per-rtgroup where they need to look at different value. This means we recalculate some values in some callers, but as all these calculations are outside the fast path and cheap, which seems like a price worth paying. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:38 -08:00
Christoph Hellwig	5a7566c8d6	xfs: refactor xfs_rtbitmap_blockcount Rename the existing xfs_rtbitmap_blockcount to xfs_rtbitmap_blockcount_len and add a new xfs_rtbitmap_blockcount wrapper around it that takes the number of extents from the mount structure. This will simplify the move to per-rtgroup bitmaps as those will need to pass in the number of extents per rtgroup instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:38 -08:00
Christoph Hellwig	bde86b42d2	xfs: factor out a xfs_growfs_check_rtgeom helper Split the check that the rtsummary fits into the log into a separate helper, and use xfs_growfs_rt_alloc_fake_mount to calculate the new RT geometry. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: avoid division for the 0-rtx growfs check] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:38 -08:00
Christoph Hellwig	fc233f1fb0	xfs: use xfs_growfs_rt_alloc_fake_mount in xfs_growfs_rt_alloc_blocks Use xfs_growfs_rt_alloc_fake_mount instead of manually recalculating the RT bitmap geometry. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:37 -08:00
Darrick J. Wong	1029f08dc5	xfs: factor out a xfs_growfs_rt_alloc_fake_mount helper Split the code to set up a fake mount point to calculate new RT geometry out of xfs_growfs_rt_bmblock so that it can be reused. Note that this changes the rmblocks calculation method to be based on the passed in rblocks and extsize and not the explicitly passed one, but both methods will always lead to the same result. The new version just does a little bit more math while being more general. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:37 -08:00
Christoph Hellwig	cb9cd6e56e	xfs: calculate RT bitmap and summary blocks based on sb_rextents Use the on-disk rextents to calculate the bitmap and summary blocks instead of the calculated one so that we can refactor the helpers for calculating them. As the RT bitmap and summary scrubbers already check that sb_rextents match the block count this does not change coverage of the scrubber. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:37 -08:00
Darrick J. Wong	c1442d22a0	xfs: remove XFS_ILOCK_RT* Now that we've centralized the realtime metadata locking routines, get rid of the ILOCK subclasses since we now use explicit lockdep classes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:37 -08:00
Christoph Hellwig	ae897e0bed	xfs: support creating per-RTG files in growfs To support adding new RT groups in growfs, we need to be able to create the per-RT group files. Add a new xfs_rtginode_create helper to create a given per-RTG file. Most of the code for that is shared, but the details of the actual file are abstracted out using a new create method in struct xfs_rtginode_ops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:37 -08:00
Christoph Hellwig	e3088ae2dc	xfs: move RT bitmap and summary information to the rtgroup Move the pointers to the RT bitmap and summary inodes as well as the summary cache to the rtgroups structure to prepare for having a separate bitmap and summary inodes for each rtgroup. Code using the inodes now needs to operate on a rtgroup. Where easily possible such code is converted to iterate over all rtgroups, else rtgroup 0 (the only one that can currently exist) is hardcoded. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:37 -08:00
Christoph Hellwig	c8edf1cbef	xfs: split xfs_trim_rtdev_extents Split xfs_trim_rtdev_extents into two parts to prepare for reusing the main validation also for RT group aware file systems. Use the fully features xfs_daddr_to_rtb helper to convert from a daddr to a xfs_rtblock_t to prepare for segmented addressing in RT groups. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:36 -08:00
Christoph Hellwig	d6d5c90ada	xfs: cleanup xfs_getfsmap_rtdev_rtbitmap Use mp->m_sb.sb_rblocks to calculate the end instead of sb_rextents that needs a conversion, use consistent names to xfs_rtblock_t types, and only calculated them by the time they are needed. Remove the pointless "high" local variable that only has a single user. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:36 -08:00
Christoph Hellwig	9154b5008c	xfs: factor out a xfs_growfs_rt_alloc_blocks helper Split out a helper to allocate or grow the rtbitmap and rtsummary files in preparation of per-RT group bitmap and summary files. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:36 -08:00
Christoph Hellwig	cd8d049082	xfs: add a xfs_qm_unmount_rt helper RT group enabled file systems fix the bug where we pointlessly attach quotas to the RT bitmap and summary files. Split the code to detach the quotas into a helper, make it conditional and document the differing behavior for RT group and pre-RT group file systems. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:36 -08:00
Christoph Hellwig	9c3cfb9c96	xfs: add a xfs_bmap_free_rtblocks helper Split the RT extent freeing logic from xfs_bmap_del_extent_real because it will become more complicated when adding RT group. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:36 -08:00
Darrick J. Wong	cd5b26f0c0	xfs: add rtgroup-based realtime scrubbing context management Create a state tracking structure and helpers to initialize the tracking structure so that we can check metadata records against the realtime space management metadata. Right now this is limited to grabbing the incore rtgroup object, but we'll eventually add to the tracking structure the ILOCK state and btree cursors. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:36 -08:00
Darrick J. Wong	0d2c636e48	xfs: repair metadata directory file path connectivity Fix disconnected or incorrect metadata directory paths. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:35 -08:00
Darrick J. Wong	65b1231b8c	xfs: support caching rtgroup metadata inodes Create the necessary per-rtgroup infrastructure that we need to load metadata inodes into memory. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:35 -08:00
Darrick J. Wong	c29237a65c	xfs: add a lockdep class key for rtgroup inodes Add a dynamic lockdep class key for rtgroup inodes. This will enable lockdep to deduce inconsistencies in the rtgroup metadata ILOCK locking order. Each class can have 8 subclasses, and for now we will only have 2 inodes per group. This enables rtgroup order and inode order checks when nesting ILOCKs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:35 -08:00
Darrick J. Wong	0e4875b3fb	xfs: define locking primitives for realtime groups Define helper functions to lock all metadata inodes related to a realtime group. There's not much to look at now, but this will become important when we add per-rtgroup metadata files and online fsck code for them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:35 -08:00
Darrick J. Wong	87fe4c34a3	xfs: create incore realtime group structures Create an incore object that will contain information about a realtime allocation group. This will eventually enable us to shard the realtime section in a similar manner to how we shard the data section, but for now just a single object for the entire RT subvolume is created. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:35 -08:00
Christoph Hellwig	dcfc65befb	xfs: clean up xfs_getfsmap_helper arguments The calling conventions for xfs_getfsmap_helper are confusing -- callers pass in an rmap record, but they must also supply startblock and blockcount in daddr units. This was bolted onto the original fsmap implementation so that we could report something for realtime volumes, which do not support rmap and hence can draw only from the rt free space bitmap. Free space on the rt volume can be more than 2^32 fsblocks long, which means that we can't use the rmap startblock or blockcount fields. This is confusing for callers, because they must supplying redundant data, but not all of it is used. Streamline this by creating a separate fsmap irec structure that contains exactly the data we need, once. Note that we actually do need rm_startblock for rmap key comparisons when we're actually querying an rmap btree, so leave that field but document why it's there. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:35 -08:00
Darrick J. Wong	87b7c205da	xfs: confirm dotdot target before replacing it during a repair xfs_dir_replace trips an assertion if you tell it to change a dirent to point to an inumber that it already points at. Look up the dotdot entry directly to confirm that we need to make a change. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:34 -08:00
Darrick J. Wong	b3c03efa59	xfs: check metadata directory file path connectivity Create a new scrubber type that checks that well known metadata directory paths are connected to the metadata inode that the incore structures think is in use. For example, check that "/quota/user" in the metadata directory tree actually points to mp->m_quotainfo->qi_uquotaip->i_ino. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:34 -08:00
Darrick J. Wong	9dc31acb01	xfs: move repair temporary files to the metadata directory tree Due to resource acquisition rules, we have to create the ondisk temporary files used to stage a filesystem repair before we can acquire a reference to the inode that we actually want to repair. Therefore, we do not know at tempfile creation time whether the tempfile will belong to the regular directory tree or the metadata directory tree. This distinction becomes important when the swapext code tries to figure out the quota accounting of the two files whose mappings are being swapped. The swapext code assumes that accounting updates are required for a file if dqattach attaches dquots. Metadir files are never accounted in quota, which means that swapext must not update the quota accounting when swapping in a repaired directory/xattr/rtbitmap structure. Prior to the swapext call, therefore, both files must be marked as METADIR for dqattach so that dqattach will ignore them. Add support for a repair tempfile to be switched to the metadir tree and switched back before being released so that ifree will just free the file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:34 -08:00
Darrick J. Wong	dcde94bdee	xfs: check the metadata directory inumber in superblocks When metadata directories are enabled, make sure that the secondary superblocks point to the metadata directory. This isn't strictly required because the secondaries are only used to recover damaged filesystems, and the metadir root inumber is fixed. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:34 -08:00
Darrick J. Wong	3d2c341111	xfs: scrub metadata directories Teach online scrub about the metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:34 -08:00
Darrick J. Wong	5dab2daa8a	xfs: fix di_metatype field of inodes that won't load Make sure that the di_metatype field is at least set plausibly so that later scrubbers could set the real type. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:34 -08:00
Darrick J. Wong	aec2eb7da8	xfs: adjust parent pointer scrubber for sb-rooted metadata files Starting with the metadata directory feature, we're allowed to call the directory and parent pointer scrubbers for every metadata file, including the ones that are children of the superblock. For these children, checking the link count against the number of parent pointers is a bit funny -- there's no such thing as a parent pointer for a child of the superblock since there's no corresponding dirent. For purposes of validating nlink, we pretend that there is a parent pointer. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:33 -08:00
Darrick J. Wong	91fb4232be	xfs: metadata files can have xattrs if metadir is enabled If parent pointers are enabled, then metadata files will store parent pointers in xattrs, just like files in the user visible directory tree. Therefore, scrub and repair need to handle attr forks for metadata files on metadir filesystems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:33 -08:00
Darrick J. Wong	13af229ee0	xfs: do not count metadata directory files when doing online quotacheck Previously, we stated that files in the metadata directory tree are not counted in the dquot information. Fix the online quotacheck code to reflect this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:33 -08:00
Darrick J. Wong	679b098b59	xfs: refactor directory tree root predicates Metadata directory trees make reasoning about the parent of a file more difficult. Traditionally, user files are children of sb_rootino, and metadata files are "children" of the superblock. Now, we add a third possibility -- some metadata files can be children of sb_metadirino, but the classic ones (rt free space data and quotas) are left alone. Let's add some helper functions (instead of open-coding the logic everywhere) to make scrub logic easier to understand. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:33 -08:00
Darrick J. Wong	be42fc1393	xfs: record health problems with the metadata directory Make a report to the health monitoring subsystem any time we encounter something in the metadata directory tree that looks like corruption. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:33 -08:00
Darrick J. Wong	61b6bdb30a	xfs: adjust xfs_bmap_add_attrfork for metadir Online repair might use the xfs_bmap_add_attrfork to repair a file in the metadata directory tree if (say) the metadata file lacks the correct parent pointers. In that case, it is not correct to check that the file is dqattached -- metadata files must be not have /any/ dquot attached at all. Adjust the assertions appropriately. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:32 -08:00
Darrick J. Wong	cc0cf84aa7	xfs: mark quota inodes as metadata files When we're creating quota files at mount time, make sure to mark them as metadir inodes if appropriate. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:32 -08:00
Darrick J. Wong	382e275f0e	xfs: don't count metadata directory files to quota Files in the metadata directory tree are internal to the filesystem. Don't count the inodes or the blocks they use in the root dquot because users do not need to know about their resource usage. This will also quiet down complaints about dquot usage not matching du output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:32 -08:00
Darrick J. Wong	df866c538f	xfs: allow bulkstat to return metadata directories Allow the V5 bulkstat ioctl to return information about metadata directory files so that xfs_scrub can find and scrub them, since they are otherwise ordinary directories. (Metadata files of course require per-file scrub code and hence do not need exposure.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:32 -08:00
Darrick J. Wong	688828d8f8	xfs: advertise metadata directory feature Advertise the existence of the metadata directory feature; this will be used by scrub to decide if it needs to scan the metadir too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:32 -08:00
Darrick J. Wong	bb6cdd5529	xfs: hide metadata inodes from everyone because they are special Metadata inodes are private files and therefore cannot be exposed to userspace. This means no bulkstat, no open-by-handle, no linking them into the directory tree, and no feeding them to LSMs. As such, we mark them S_PRIVATE, which stops all that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:32 -08:00
Darrick J. Wong	8651b410ae	xfs: disable the agi rotor for metadata inodes Ideally, we'd put all the metadata inodes in one place if we could, so that the metadata all stay reasonably close together instead of spreading out over the disk. Furthermore, if the log is internal we'd probably prefer to keep the metadata near the log. Therefore, disable AGI rotoring for metadata inode allocations. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:31 -08:00
Darrick J. Wong	5d9b54a4ef	xfs: read and write metadata inode directory tree Plumb in the bits we need to load metadata inodes from a named entry in a metadir directory, create (or hardlink) inodes into a metadir directory, create metadir directories, and flag inodes as being metadata files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:31 -08:00
Darrick J. Wong	7297fd0beb	xfs: enforce metadata inode flag Add checks for the metadata inode flag so that we don't ever leak metadata inodes out to userspace, and we don't ever try to read a regular inode as metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:31 -08:00
Darrick J. Wong	c555dd9b8c	xfs: load metadata directory root at mount time Load the metadata directory root inode into memory at mount time and release it at unmount time. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:31 -08:00
Darrick J. Wong	dcf6069143	xfs: iget for metadata inodes Create a xfs_trans_metafile_iget function for metadata inodes to ensure that when we try to iget a metadata file, the inode is allocated and its file mode matches the metadata file type the caller expects. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:31 -08:00
Darrick J. Wong	4f3d4dd1b0	xfs: define the on-disk format for the metadir feature Define the on-disk layout and feature flags for the metadata inode directory feature. Add a xfs_sb_version_hasmetadir for benefit of xfs_repair, which needs to know where the new end of the superblock lies. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:31 -08:00
Christoph Hellwig	e5e5cae05b	xfs: store a generic group structure in the intents Replace the pag pointers in the extent free, bmap, rmap and refcount intent structures with a pointer to the generic group to prepare for adding intents for realtime groups. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:30 -08:00
Darrick J. Wong	ecc8065dfa	xfs: standardize EXPERIMENTAL warning generation Refactor the open-coded warnings about EXPERIMENTAL feature use into a standard helper before we go adding more experimental features. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:30 -08:00
Darrick J. Wong	4d272929a5	xfs: rename metadata inode predicates The predicate xfs_internal_inum tells us if an inumber refers to one of the inodes rooted in the superblock. Soon we're going to have internal inodes in a metadata directory tree, so this helper should be renamed to capture its limited scope. Ondisk inodes will soon have a flag to indicate that they're metadata inodes. Head off some confusion by renaming the xfs_is_metadata_inode predicate to xfs_is_internal_inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:30 -08:00
Darrick J. Wong	fdf5703b61	xfs: constify the xfs_inode predicates Change the xfs_inode predicates to take a const struct xfs_inode pointer because they do not change the inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:30 -08:00
Darrick J. Wong	8d939f4bd7	xfs: constify the xfs_sb predicates Change the xfs_sb predicates to take a const struct xfs_sb pointer because they do not change the superblock. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-11-05 13:38:30 -08:00
Christoph Hellwig	ba102a682d	xfs: remove xfs_group_intent_hold and xfs_group_intent_rele Each of them just has a single caller, so fold them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-05 13:38:29 -08:00

... 4 5 6 7 8 ...

9520 Commits