Commit Graph

99 Commits

Author SHA1 Message Date
Kent Overstreet
b2e2bed119 bcachefs: Add missing key type checks to check_snapshot_exists()
For now we only have one key type in these btrees, but forward
compatibility means we do have to check.

Reported-by: syzbot+b4cb4a6988aced0cec4b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-19 14:37:04 -04:00
Kent Overstreet
f2ed089273 bcachefs: better __bch2_snapshot_is_ancestor() assert
Previously, we weren't checking the result of the skiplist walk, just
the is_ancestor bitmap.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15 22:11:56 -04:00
Kent Overstreet
a4907d7f33 bcachefs: Run snapshot deletion out of system_long_wq
We don't want this running out of the same workqueue, and blocking,
writes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
e49cf9b54b bcachefs: Make check_key_has_snapshot safer
Snapshot deletion v2 added sentinal values for deleted snapshots, so
"key for deleted snapshot" - i.e. snapshot deletion missed something -
is safe to repair automatically.

But if we find a key for a missing snapshot we have no idea what
happened, and we shouldn't delete it unless we're very sure that
everything else is consistent.

So hook it up to the new bch2_require_recovery_pass(), we'll now only
delete if snapshots and subvolumes have recenlty been checked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
0942b852d4 bcachefs: BCH_RECOVERY_PASS_NO_RATELIMIT
Add a superblock flag to temporarily disable ratelimiting for a recovery
pass.

This will be used to make check_key_has_snapshot safer: we don't want to
delete a key for a missing snapshot unless we know that the snapshots
and subvolumes btrees are consistent, i.e. check_snapshots and
check_subvols have run recently.

Changing those btrees - creating/deleting a subvolume or snapshot - will
set the "disable ratelimit" flag, i.e. ensuring that those passes run if
check_key_has_snapshot discovers an error.

We're only disabling ratelimiting in the snapshot/subvol delete paths,
we're not so concerned about the create paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
09b9c72bd4 bcachefs: bch_err_throw()
Add a tracepoint for any time we return an error and unwind.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
18dad454cd bcachefs: Replace rcu_read_lock() with guards
The new guard(), scoped_guard() allow for more natural code.

Some of the uses with creative flow control have been left.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
5802caf74f bcachefs: darray_find(), darray_find_p()
New helpers to avoid open coded loops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
801cb2bd6c bcachefs: bch2_get_snapshot_overwrites()
New helper for getting a list of snapshot IDs that have overwritten a
given key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
77aeaa2f0f bcachefs: bch2_inum_snapshot_to_path()
Add a better helper for printing out paths of inodes when we don't know
the subvolume, for fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:09 -04:00
Kent Overstreet
7ed4c14e20 bcachefs: Reduce usage of recovery.curr_pass
We want recovery.curr_pass to be private to the recovery passes code,
for better showing recovery pass status; also, it may rewind and is
generally not the correct member to use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:03 -04:00
Kent Overstreet
68708efcac bcachefs: struct bch_fs_recovery
bch_fs has gotten obnoxiously big, let's start organizing thins a bit
better.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:03 -04:00
Kent Overstreet
8c69e2b52e bcachefs: Knob for manual snapshot deletion
Add 'opts.snapshot_deletion_enabled', enabled by default.

This may be turned off so that the new sysfs knob,
'internal/trigger_delete_dead_snapshots', may be used instead - this
will allow snapshot deletion to be profiled more easily.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:49 -04:00
Kent Overstreet
88f62ed60c bcachefs: delete_dead_snapshot_keys_v2()
Since extents, dirents and xattrs require an inode with the
corresponding snapshot ID to exists, we can avoid a lot of scanning by
only scanning those trees for keys to process if the correspending inode
exists.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:45 -04:00
Kent Overstreet
e9756dd29f bcachefs: bcachefs_metadata_version_snapshot_deletion_v2
We're going to be speeding up snapshot deletion, by only having it
process the extents/dirents/xattrs btrees if an inode of a given
snapshot ID was present.

This raises the possibility of 'bkey_in_missing_snapshot' errors popping
up, if we ever accidentally don't do the corresponding inode update, or
if the new algorithm has bugs.

So instead of deleting snapshot IDs, add a new deleted flag, so that
'key in missing snapshot' errors can more definitively tell what
happened and automatically repair.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:45 -04:00
Kent Overstreet
08d14d90a4 bcachefs: BCH_SNAPSHOT_DELETED -> BCH_SNAPSHOT_WILL_DELETE
We're going to be speeding up snapshot deletion, by only having it
process the extents/dirents/xattrs btrees if an inode of a given
snapshot ID was present.

This raises the possibility of 'bkey_in_missing_snapshot' errors popping
up, if we ever accidentally don't do the corresponding inode update, or
if the new algorithm has bugs.

So we'll want to be able to differentiate more definitively between
'snapshot went missing' (and perhaps needs to be reconstructed), and
'key in snapshot that was deleted'.

So instead of deleting snapshot IDs, we'll be adding a new deleted flag
and leaving them permanently.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:44 -04:00
Kent Overstreet
3f8e977265 bcachefs: Skip unrelated snapshot trees in snapshot deletion
Don't scan keys in inodes for which the snapshot tree doesn't match any
we're deleting from.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:43 -04:00
Kent Overstreet
15dbd0d814 bcachefs: snapshot delete progress indicator
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:40 -04:00
Kent Overstreet
3be132f93c bcachefs: bch2_btree_lost_data() now handles snapshots tree
We have a consolidated places for "this btree lost data, run this
repair", so use it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:33 -04:00
Kent Overstreet
c9b1d94a21 bcachefs: bch_fs.writes -> enumerated_refs
Drop the single-purpose write ref code in bcachefs.h, and convert to
enumarated refs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:27 -04:00
Kent Overstreet
b974357c63 bcachefs: bch2_snapshot_table_make_room()
Add a better helper for check_snapshot_exists().

create_snapids() can't be changed to use this, unfortunately, because
the transaction that creates new snapshot will also be inserting other
keys (e.g. root inode) that reference that snapshot ID, and they expect
the snapshot table to already be updated.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:13:50 -04:00
Kent Overstreet
aa6a591f0f bcachefs: Fix null ptr deref in bch2_snapshot_tree_oldest_subvol()
Reported-by: syzbot+baee8591f336cab0958b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-20 19:41:38 -04:00
Kent Overstreet
83d539b1b0 bcachefs: Fix check_snapshot_exists() restart handling
Codepaths that create entries in the snapshots btree currently call
bch2_mark_snapshot(), which updates the in-memory snapshot table, before
transaction commit.

This is because bch2_mark_snapshot() is an atomic trigger, run with
btree write locks held, and isn't allowed to fail - but it might need to
reallocate the table, hence we call it early when we're still allowed to
fail.

This is generally harmless - if we fail, we'll have left an entry in the
snapshots table around, but nothing will reference it and it'll get
overwritten if reused by another transaction.

But check_snapshot_exists(), which reconstructs snapshots when the
snapshots btree has been corrupted or lost, was erronously rechecking if
the snapshot exists inside the transaction commit loop - so on
transaction restart (in this case mem_realloced), the second iteration
would return without repairing.

This code needs some cleanup: splitting out a "maybe realloc snapshots
table" helper would have avoided this, that will be in the next patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:43 -04:00
Kent Overstreet
9180ad2e16 bcachefs: Kill btree_iter.trans
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Kent Overstreet
1ece53237e bcachefs: Consistent indentation of multiline fsck errors
Add the new helper printbuf_indent_add_nextline(), and use it in
__bch2_fsck_err() to centralize setting the indentation of multiline
fsck errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 22:31:47 -04:00
Kent Overstreet
45f0e6c838 bcachefs: bch2_indirect_extent_missing_error() prints path, not just inode number
We want all error messages converted to print paths, not just inode
numbers - users want this information, and it speeds up debugging too.

Auditing and converting all error messages is going to be a big project,
so for the moment we're just doing this incrementally.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Alan Huang
7d8321a286 bcachefs: Fix subtraction underflow
When ancestor is less than IS_ANCESTOR_BITMAP, we would get an incorrect
result.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
5906dcb993 bcachefs: Silence read-only errors when deleting snapshots
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
4bd06f07bc bcachefs: Fixes for snapshot_tree.master_subvol
Ensure that snapshot_tree.master_subvol is cleared when we delete the
master subvolume in a tree of snapshots, and allow for snapshot trees
that don't have a master subvolume in fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
17d678bcdd bcachefs: Log message in journal for snapshot deletion
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29 13:30:39 -05:00
Kent Overstreet
d0855e2106 bcachefs: Kill snapshot_t->equiv
Now entirely dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29 13:30:39 -05:00
Kent Overstreet
35c5609abf bcachefs: Snapshot deletion no longer uses snapshot_t->equiv
Switch to generating a private list of interior nodes to delete, instead
of using the equivalence class in the global data structure.

This eliminates possible races with snapshot creation, and is much
cleaner - it'll let us delete a lot of janky code for calculating and
maintaining the equivalence classes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:23 -05:00
Kent Overstreet
85c060f62d bcachefs: Kill equiv_seen arg to delete_dead_snapshots_process_key()
When deleting dead snapshots, we move keys from redundant interior
snapshot nodes to child nodes - unless there's already a key, in which
case the ancestor key is deleted.

Previously, we tracked via equiv_seen whether the child snapshot had a
key, but this was tricky w.r.t. transaction restarts, and not
transactionally safe w.r.t. updates in the child snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:23 -05:00
Kent Overstreet
9f95fc3c12 bcachefs: bch2_snapshot_exists()
bch2_snapshot_equiv() is going away; convert users that just wanted to
know if the snapshot exists to something better

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:23 -05:00
Kent Overstreet
be203120dc bcachefs: bch2_check_key_has_snapshot() prints btree id
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:23 -05:00
Kent Overstreet
ce70157112 bcachefs: kill flags param to bch2_subvolume_get()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:22 -05:00
Kent Overstreet
a6f4794fcd bcachefs: struct bkey_validate_context
Add a new parameter to bkey validate functions, and use it to improve
invalid bkey error messages: we can now print the btree and depth it
came from, or if it came from the journal, or is a btree root.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:20 -05:00
Kent Overstreet
a34b026482 bcachefs: Kill BCH_TRANS_COMMIT_lazy_rw
We unconditionally go read-write, if we're going to do so, before
journal replay: lazy_rw is obsolete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:16 -05:00
Kent Overstreet
9e2f5f7988 bcachefs: better error message in check_snapshot_tree()
If we find a snapshot node and it didn't match the snapshot tree, we
should print it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:15 -05:00
Kent Overstreet
b836f22014 bcachefs: __bch2_key_has_snapshot_overwrites uses for_each_btree_key_reverse_norestart()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:14 -05:00
Kent Overstreet
c986dd7ecb bcachefs: Improve check_snapshot_exists()
Check if we have snapshot_trees or subvolumes that refer to the snapshot
node being reconstructed, and use them.

With this, the kill_btree_root test that blows away the snapshots btree
now passes, and we're able to successfully reconstruct.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-12 05:02:48 -04:00
Kent Overstreet
84878e8245 bcachefs: Kill bch2_propagate_key_to_snapshot_leaves()
Dead code now.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-09 16:42:53 -04:00
Kent Overstreet
7eb4a319db bcachefs: Fix infinite loop in propagate_key_to_snapshot_leaves()
As we iterate we need to mark that we no longer need iterators -
otherwise we'll infinite loop via the "too many iters" check when
there's many snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-23 18:46:58 -04:00
Ahmed Ehab
39c3aad43f bcachefs: Hold read lock in bch2_snapshot_tree_oldest_subvol()
Syzbot reports a problem that a warning is triggered due to suspicious
use of rcu_dereference_check(). That is triggered by a call of
bch2_snapshot_tree_oldest_subvol().

The cause of the warning is that inside
bch2_snapshot_tree_oldest_subvol(), snapshot_t() is called which calls
rcu_dereference() that requires a read lock to be held. Also, the call
of bch2_snapshot_tree_next() eventually calls snapshot_t().

To fix this, call rcu_read_lock() before calling snapshot_t(). Then,
release the lock after the termination of the while loop.

Reported-by: <syzbot+f7c41a878676b72c16a6@syzkaller.appspotmail.com>
Signed-off-by: Ahmed Ehab <bottaawesome633@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 14:54:18 -04:00
Kent Overstreet
d97de0d017 bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()
bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.

This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.

Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13 23:00:50 -04:00
Kent Overstreet
a850bde649 bcachefs: fsck_err() may now take a btree_trans
fsck_err() now optionally takes a btree_trans; if the current thread has
one, it is required that it be passed.

The next patch will use this to unlock when waiting for user input.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:14 -04:00
Pei Li
64cd7de998 bcachefs: Fix kmalloc bug in __snapshot_t_mut
When allocating too huge a snapshot table, we should fail gracefully
in __snapshot_t_mut() instead of fail in kmalloc().

Reported-by: syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=770e99b65e26fa023ab1
Tested-by: syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com
Signed-off-by: Pei Li <peili.dev@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-25 20:51:14 -04:00
Kent Overstreet
0a2a507d40 bcachefs: set_worker_desc() for delete_dead_snapshots
this is long running - help users see what's going on

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19 18:27:24 -04:00
Kent Overstreet
1ba44217f8 bcachefs: delete_dead_snapshots() doesn't need to go RW
We've been moving away from going RW lazily; if we want to go RW we do
that in set_may_go_rw(), and if we didn't go RW we don't need to delete
dead snapshots.

Reported-by: syzbot+4366624c0b5aac4906cf@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19 18:27:24 -04:00
Kent Overstreet
08f50005e0 bcachefs: Run check_key_has_snapshot in snapshot_delete_keys()
delete_dead_snapshots now runs before the main fsck.c passes which check
for keys for invalid snapshots; thus, it needs those checks as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00