mirror_ubuntu-kernels/fs/xfs/libxfs
Darrick J. Wong 13ae04d8d4 xfs: force all buffers to be written during btree bulk load
While stress-testing online repair of btrees, I noticed periodic
assertion failures from the buffer cache about buffers with incorrect
DELWRI_Q state.  Looking further, I observed this race between the AIL
trying to write out a btree block and repair zapping a btree block after
the fact:

AIL:    Repair0:

pin buffer X
delwri_queue:
set DELWRI_Q
add to delwri list

        stale buf X:
        clear DELWRI_Q
        does not clear b_list
        free space X
        commit

delwri_submit   # oops

Worse yet, I discovered that running the same repair over and over in a
tight loop can result in a second race that cause data integrity
problems with the repair:

AIL:    Repair0:        Repair1:

pin buffer X
delwri_queue:
set DELWRI_Q
add to delwri list

        stale buf X:
        clear DELWRI_Q
        does not clear b_list
        free space X
        commit

                        find free space X
                        get buffer
                        rewrite buffer
                        delwri_queue:
                        set DELWRI_Q
                        already on a list, do not add
                        commit

                        BAD: committed tree root before all blocks written

delwri_submit   # too late now

I traced this to my own misunderstanding of how the delwri lists work,
particularly with regards to the AIL's buffer list.  If a buffer is
logged and committed, the buffer can end up on that AIL buffer list.  If
btree repairs are run twice in rapid succession, it's possible that the
first repair will invalidate the buffer and free it before the next time
the AIL wakes up.  Marking the buffer stale clears DELWRI_Q from the
buffer state without removing the buffer from its delwri list.  The
buffer doesn't know which list it's on, so it cannot know which lock to
take to protect the list for a removal.

If the second repair allocates the same block, it will then recycle the
buffer to start writing the new btree block.  Meanwhile, if the AIL
wakes up and walks the buffer list, it will ignore the buffer because it
can't lock it, and go back to sleep.

When the second repair calls delwri_queue to put the buffer on the
list of buffers to write before committing the new btree, it will set
DELWRI_Q again, but since the buffer hasn't been removed from the AIL's
buffer list, it won't add it to the bulkload buffer's list.

This is incorrect, because the bulkload caller relies on delwri_submit
to ensure that all the buffers have been sent to disk /before/
committing the new btree root pointer.  This ordering requirement is
required for data consistency.

Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally
drop it, so the next thread to walk through the btree will trip over a
debug assertion on that flag.

To fix this, create a new function that waits for the buffer to be
removed from any other delwri lists before adding the buffer to the
caller's delwri list.  By waiting for the buffer to clear both the
delwri list and any potential delwri wait list, we can be sure that
repair will initiate writes of all buffers and report all write errors
back to userspace instead of committing the new structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15 10:03:27 -08:00
..
xfs_ag_resv.c xfs: inobt can use perags in many more places than it does 2023-02-13 09:14:52 +11:00
xfs_ag_resv.h xfs: move perag structure and setup to libxfs/xfs_ag.[ch] 2021-06-02 10:48:24 +10:00
xfs_ag.c xfs: remove __xfs_free_extent_later 2023-12-06 18:45:18 -08:00
xfs_ag.h xfs: allow queued AG intents to drain before scrubbing 2023-04-11 18:59:58 -07:00
xfs_alloc_btree.c xfs: implement masked btree key comparisons for _has_records scans 2023-04-11 19:00:11 -07:00
xfs_alloc_btree.h xfs: use separate btree cursor cache for each btree type 2021-10-19 11:45:16 -07:00
xfs_alloc.c xfs: pass the defer ops directly to xfs_defer_add 2023-12-14 11:13:52 +05:30
xfs_alloc.h xfs: automatic freeing of freshly allocated unwritten space 2023-12-06 18:45:18 -08:00
xfs_attr_leaf.c xfs: extract xfs_da_buf_copy() helper function 2023-12-07 14:57:14 +05:30
xfs_attr_leaf.h xfs: don't hold xattr leaf buffers across transaction rolls 2022-06-29 08:47:56 -07:00
xfs_attr_remote.c xfs: rework xfs_buf_incore() API 2022-07-07 22:05:18 +10:00
xfs_attr_remote.h xfs: rename struct xfs_attr_item to xfs_attr_intent 2022-05-22 16:00:26 +10:00
xfs_attr_sf.h
xfs_attr.c xfs: pass the defer ops directly to xfs_defer_add 2023-12-14 11:13:52 +05:30
xfs_attr.h xfs: replace XFS_IFORK_Q with a proper predicate function 2022-07-12 11:17:27 -07:00
xfs_bit.c
xfs_bit.h
xfs_bmap_btree.c xfs: remove __xfs_free_extent_later 2023-12-06 18:45:18 -08:00
xfs_bmap_btree.h xfs: use separate btree cursor cache for each btree type 2021-10-19 11:45:16 -07:00
xfs_bmap.c xfs: pass the defer ops directly to xfs_defer_add 2023-12-14 11:13:52 +05:30
xfs_bmap.h xfs: accumulate iextent records when checking bmap 2023-04-11 19:00:24 -07:00
xfs_btree_staging.c xfs: force all buffers to be written during btree bulk load 2023-12-15 10:03:27 -08:00
xfs_btree_staging.h xfs: remove unused fields from struct xbtree_ifakeroot 2023-12-06 18:45:18 -08:00
xfs_btree.c xfs: implement masked btree key comparisons for _has_records scans 2023-04-11 19:00:11 -07:00
xfs_btree.h overflow: Add struct_size_t() helper 2023-05-26 13:52:19 -07:00
xfs_cksum.h
xfs_da_btree.c xfs: extract xfs_da_buf_copy() helper function 2023-12-07 14:57:14 +05:30
xfs_da_btree.h xfs: extract xfs_da_buf_copy() helper function 2023-12-07 14:57:14 +05:30
xfs_da_format.h xfs: convert flex-array declarations in xfs attr shortform objects 2023-07-17 08:48:56 -07:00
xfs_defer.c xfs: pass the defer ops directly to xfs_defer_add 2023-12-14 11:13:52 +05:30
xfs_defer.h xfs: pass the defer ops directly to xfs_defer_add 2023-12-14 11:13:52 +05:30
xfs_dir2_block.c xfs: replace inode fork size macros with functions 2022-07-12 11:17:27 -07:00
xfs_dir2_data.c xfs: convert bp->b_bn references to xfs_buf_daddr() 2021-08-19 10:07:15 -07:00
xfs_dir2_leaf.c xfs: fix exception caused by unexpected illegal bestcount in leaf dir 2022-10-20 09:42:56 -07:00
xfs_dir2_node.c xfs: convert bp->b_bn references to xfs_buf_daddr() 2021-08-19 10:07:15 -07:00
xfs_dir2_priv.h xfs: constify the name argument to various directory functions 2022-03-14 10:23:17 -07:00
xfs_dir2_sf.c xfs: Remove the unneeded result variable 2022-09-19 06:52:14 +10:00
xfs_dir2.c xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation 2023-04-11 19:05:04 -07:00
xfs_dir2.h xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation 2023-04-11 19:05:04 -07:00
xfs_dquot_buf.c xfs: remove the xfs_dqblk_t typedef 2021-10-14 09:19:33 -07:00
xfs_errortag.h xfs: add debug knob to slow down write for fun 2022-11-28 17:54:49 -08:00
xfs_format.h xfs: use accessor functions for summary info words 2023-10-18 16:53:00 -07:00
xfs_fs.h xfs: allow userspace to rebuild metadata structures 2023-08-10 07:48:11 -07:00
xfs_health.h
xfs_ialloc_btree.c xfs: remove __xfs_free_extent_later 2023-12-06 18:45:18 -08:00
xfs_ialloc_btree.h xfs: standardize ondisk to incore conversion for inode btrees 2023-04-11 19:00:01 -07:00
xfs_ialloc.c xfs: remove __xfs_free_extent_later 2023-12-06 18:45:18 -08:00
xfs_ialloc.h xfs: convert xfs_ialloc_has_inodes_at_extent to return keyfill scan results 2023-04-11 19:00:15 -07:00
xfs_iext_tree.c xfs: prevent metadata files from being inactivated 2021-03-25 16:47:50 -07:00
xfs_inode_buf.c xfs: inode recovery does not validate the recovered inode 2023-11-13 09:11:41 +05:30
xfs_inode_buf.h xfs: kill xfs_sb_version_has_v3inode() 2021-08-19 10:07:14 -07:00
xfs_inode_fork.c xfs: standardize btree record checking code [v24.5] 2023-04-14 07:09:18 +10:00
xfs_inode_fork.h xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done 2023-04-12 15:49:10 +10:00
xfs_log_format.h xfs: fix AGF vs inode cluster buffer deadlock 2023-06-05 04:08:27 +10:00
xfs_log_recover.h xfs: pass the defer ops instead of type to xfs_defer_start_recovery 2023-12-14 11:13:38 +05:30
xfs_log_rlimit.c xfs: reduce transaction reservations with reflink 2022-04-28 10:25:42 -07:00
xfs_ondisk.h xfs: move xfs_ondisk.h to libxfs/ 2023-12-07 15:15:29 +05:30
xfs_quota_defs.h xfs: remove warning counters from struct xfs_dquot_res 2022-05-11 17:12:09 +10:00
xfs_refcount_btree.c xfs: remove __xfs_free_extent_later 2023-12-06 18:45:18 -08:00
xfs_refcount_btree.h xfs: use separate btree cursor cache for each btree type 2021-10-19 11:45:16 -07:00
xfs_refcount.c xfs: pass the defer ops directly to xfs_defer_add 2023-12-14 11:13:52 +05:30
xfs_refcount.h xfs: replace xfs_btree_has_record with a general keyspace scanner 2023-04-11 19:00:10 -07:00
xfs_rmap_btree.c xfs: implement masked btree key comparisons for _has_records scans 2023-04-11 19:00:11 -07:00
xfs_rmap_btree.h xfs: use separate btree cursor cache for each btree type 2021-10-19 11:45:16 -07:00
xfs_rmap.c xfs: pass the defer ops directly to xfs_defer_add 2023-12-14 11:13:52 +05:30
xfs_rmap.h xfs: teach scrub to check for sole ownership of metadata objects 2023-04-11 19:00:15 -07:00
xfs_rtbitmap.c xfs: fix 32-bit truncation in xfs_compute_rextslog 2023-12-06 18:45:17 -08:00
xfs_rtbitmap.h xfs: don't allow overly small or large realtime volumes 2023-12-06 18:45:17 -08:00
xfs_sb.c xfs: don't allow overly small or large realtime volumes 2023-12-06 18:45:17 -08:00
xfs_sb.h xfs: bump max fsgeom struct version 2023-10-17 08:40:54 -07:00
xfs_shared.h xfs: tag transactions that contain intent done items 2022-05-04 11:46:21 +10:00
xfs_symlink_remote.c xfs: convert XFS_IFORK_PTR to a static inline helper 2022-07-09 15:17:21 -07:00
xfs_trans_inode.c xfs: convert to new timestamp accessors 2023-10-18 14:08:29 +02:00
xfs_trans_resv.c xfs: create helpers for rtbitmap block/wordcount computations 2023-10-18 10:58:58 -07:00
xfs_trans_resv.h xfs: rename xfs_*alloc*_log_count to _block_count 2022-04-28 10:25:59 -07:00
xfs_trans_space.h xfs: compute the maximum height of the rmap btree when reflink enabled 2021-10-19 11:45:16 -07:00
xfs_types.c xfs: rename xfs_verify_rtext to xfs_verify_rtbext 2023-10-17 16:24:22 -07:00
xfs_types.h xfs: convert rt summary macros to helpers 2023-10-17 17:45:38 -07:00