linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-27 15:36:48 +00:00

Author	SHA1	Message	Date
Kent Overstreet	e1f0e1a45a	bcachefs: Fix restart handling in btree_node_scrub_work() btree node scrub was sometimes failing to rewrite nodes with errors; bch2_btree_node_rewrite() can return a transaction restart and we weren't checking - the lockrestart_do() needs to wrap the entire operation. And there's a better helper it should've been using, bch2_btree_node_rewrite_key(), which makes all this more convenient. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-17 11:42:06 -04:00
Kent Overstreet	6c4897caef	bcachefs: Fix bch2_read_bio_to_text() We can only pass negative error codes to bch2_err_str(); if it's a positive integer it's not an error and we trip an assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 20:35:42 -04:00
Kent Overstreet	495ba899d5	bcachefs: fsck: Fix check_path_loop() + snapshots A path exists in a particular snapshot: we should do the pathwalk in the snapshot ID of the inode we started from, _not_ change snapshot ID as we walk inodes and dirents. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:05:02 -04:00
Kent Overstreet	583ba52a40	bcachefs: fsck: check_subdir_count logs path We can easily go from inode number -> path now, which makes for more useful log messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:05:02 -04:00
Kent Overstreet	8d6ac82361	bcachefs: fsck: additional diagnostics for reattach_inode() Log the inode's new path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:05:02 -04:00
Kent Overstreet	3e5ceaa5bf	bcachefs: fsck: check_directory_structure runs in reverse order When we find a directory connectivity problem, we should do the repair in the oldest snapshot that has the issue - so that we don't end up duplicating work or making a real mess of things. Oldest snapshot IDs have the highest integer value, so - just walk inodes in reverse order. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:05:02 -04:00
Kent Overstreet	9fb09ace59	bcachefs: fsck: Fix reattach_inode() for subvol roots bch_subvolume.fs_path_parent needs to be updated as well, it should match inode.bi_parent_subvol. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:04:59 -04:00
Kent Overstreet	c1ca07a4dd	bcachefs: fsck: Fix remove_backpointer() for subvol roots The dirent will be in a different snapshot if the inode is a subvolume root. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:04:54 -04:00
Kent Overstreet	7029cc4d13	bcachefs: fsck: Print path when we find a subvol loop Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:04:48 -04:00
Kent Overstreet	9ba6930ef8	bcachefs: Fix __bch2_inum_to_path() when crossing subvol boundaries The bch2_subvolume_get_snapshot() call needs to happen before the dirent lookup - the dirent is in the parent subvolume. Also, check for loops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:04:48 -04:00
Kent Overstreet	1cddad0fcb	bcachefs: Call bch2_fs_init_rw() early if we'll be going rw kthread creation checks for pending signals, which is _very_ annoying if we have to do a long recovery and don't go rw until we've done significant work. Check if we'll be going rw and pre-allocate kthreads/workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:04:48 -04:00
Kent Overstreet	f2a701fd94	bcachefs: fsck: Improve check_key_has_inode() Print out more info when we find a key (extent, dirent, xattr) for a missing inode - was there a good inode in an older snapshot, full(ish) list of keys for that missing inode, so we can make better decisions on how to repair. If it looks like it should've been deleted, autofix it. If we ever hit the non-autofix cases, we'll want to write more repair code (possibly reconstituting the inode). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:04:44 -04:00
Bharadwaj Raju	03208bd06a	bcachefs: don't return fsck_fix for unfixable node errors in __btree_err After `cd3cdb1ef7` ("Single err message for btree node reads"), all errors caused __btree_err to return -BCH_ERR_fsck_fix no matter what the actual error type was if the recovery pass was scanning for btree nodes. This lead to the code continuing despite things like bad node formats when they earlier would have caused a jump to fsck_err, because btree_err only jumps when the return from __btree_err does not match fsck_fix. Ultimately this lead to undefined behavior by attempting to unpack a key based on an invalid format. Make only errors of type -BCH_ERR_btree_node_read_err_fixable cause __btree_err to return -BCH_ERR_fsck_fix when scanning for btree nodes. Reported-by: syzbot+cfd994b9cdf00446fd54@syzkaller.appspotmail.com Fixes: `cd3cdb1ef7` ("bcachefs: Single err message for btree node reads") Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:03:52 -04:00
Alan Huang	56be92c63f	bcachefs: Fix pool->alloc NULL pointer dereference btree_interior_update_pool has not been initialized before the filesystem becomes read-write, thus mempool_alloc in bch2_btree_update_start will trigger pool->alloc NULL pointer dereference in mempool_alloc_noprof Reported-by: syzbot+2f3859bd28f20fa682e6@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:03:52 -04:00
Alan Huang	d89a34b14d	bcachefs: Move bset size check before csum check In syzbot's crash, the bset's u64s is larger than the btree node. Reported-by: syzbot+bfaeaa8e26281970158d@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:03:52 -04:00
Kent Overstreet	7c9cef5f8b	bcachefs: mark more errors autofix Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:03:52 -04:00
Kent Overstreet	10dfe4926d	bcachefs: Kill unused tracepoints Dead code cleanup. Link: https://lore.kernel.org/linux-bcachefs/20250612224059.39fddd07@batman.local.home/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:03:52 -04:00
Kent Overstreet	17c3395e25	bcachefs: opts.journal_rewind Add a mount option for rewinding the journal, bringing the entire filesystem to where it was at a previous point in time. This is for extreme disaster recovery scenarios - it's not intended as an undelete operation. The option takes a journal sequence number; the desired sequence number can be determined with 'bcachefs list_journal' Caveats: - The 'journal_transaction_names' option must have been enabled (it's on by default). The option controls emitting of extra debug info in the journal, so we can see what individual transactions were doing; It also enables journalling of keys being overwritten, which is what we rely on here. - A full fsck run will be automatically triggered since alloc info will be inconsistent. Only leaf node updates to non-alloc btrees are rewound, since rewinding interior btree updates isn't possible or desirable. - We can't do anything about data that was deleted and overwritten. Lots of metadata updates after the point in time we're rewinding to shouldn't cause a problem, since we segragate data and metadata allocations (this is in order to make repair by btree node scan practical on larger filesystems; there's a small 64-bit per device bitmap in the superblock of device ranges with btree nodes, and we try to keep this small). However, having discards enabled will cause problems, since buckets are discarded as soon as they become empty (this is why we don't implement fstrim: we don't need it). Hopefully, this feature will be a one-off thing that's never used again: this was implemented for recovering from the "vfs i_nlink 0 -> subvol deletion" bug, and that bug was unusually disastrous and additional safeguards have since been implemented. But if it does turn out that we need this more in the future, I'll have to implement an option so that empty buckets aren't discarded immediately - lagging by perhaps 1% of device capacity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 19:03:52 -04:00
Kent Overstreet	191334400d	bcachefs: fsck: fix extent past end of inode repair Fix the case where we're deleting in a different snapshot and need to emit a whiteout - that requires a regular BTREE_ITER_filter_snapshots iterator. Also, only delete the part of the extent that extents past i_size. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:56 -04:00
Kent Overstreet	b17d7bdb12	bcachefs: fsck: fix add_inode() the inode btree uses the offset field for the inum, not the inode field. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:56 -04:00
Kent Overstreet	c27e5782d9	bcachefs: Fix snapshot_key_missing_inode_snapshot repair When the inode was a whiteout, we were inserting a new whiteout at the wrong (old) snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:56 -04:00
Kent Overstreet	c1ccd43b35	bcachefs: Fix "now allowing incompatible features" message Check against version_incompat_allowed, not version_incompat. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:56 -04:00
Kent Overstreet	2ba562cc04	bcachefs: pass last_seq into fs_journal_start() Prep work for journal rewind, where the seq we're replaying from may be different than the last journal entry's last_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:56 -04:00
Kent Overstreet	f2ed089273	bcachefs: better __bch2_snapshot_is_ancestor() assert Previously, we weren't checking the result of the skiplist walk, just the is_ancestor bitmap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:56 -04:00
Kent Overstreet	425da82c63	bcachefs: btree_iter: fix updates, journal overlay We need to start searching from search_key - _not_ path->pos, which will point to the key we found in the btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:56 -04:00
Kent Overstreet	0e62fca2a6	bcachefs: Fix bch2_journal_keys_peek_prev_min() this code is rarely invoked, so - we had a few bugs left from basing it off of bch2_journal_keys_peek_max()... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:55 -04:00
Alan Huang	0dc8eaebed	bcachefs: Delay calculation of trans->journal_u64s When there is commit error that need split btree leaf, fsck might change the value of trans->journal_entries.u64s, when retry commit, the value of trans->journal_u64s would be incorrect, which will lead to trans->journal_res.u64s underflow, and then out of bounds write will occur: [ 464.496970][T11969] Call trace: [ 464.496973][T11969] show_stack+0x3c/0x88 (C) [ 464.496995][T11969] dump_stack_lvl+0xf8/0x178 [ 464.497014][T11969] dump_stack+0x20/0x30 [ 464.497031][T11969] __bch2_trans_log_str+0x344/0x350 [ 464.497048][T11969] bch2_trans_log_str+0x3c/0x60 [ 464.497065][T11969] __bch2_fsck_err+0x11bc/0x1390 [ 464.497083][T11969] bch2_check_discard_freespace_key+0xad4/0x10d0 [ 464.497100][T11969] bch2_bucket_alloc_freelist+0x99c/0x1130 [ 464.497117][T11969] bch2_bucket_alloc_trans+0x79c/0xcb8 [ 464.497133][T11969] bch2_bucket_alloc_set_trans+0x378/0xc20 [ 464.497151][T11969] __open_bucket_add_buckets+0x7fc/0x1c00 [ 464.497168][T11969] open_bucket_add_buckets+0x184/0x3a8 [ 464.497185][T11969] bch2_alloc_sectors_start_trans+0xa04/0x1da0 [ 464.497203][T11969] bch2_btree_reserve_get+0x6e0/0xef0 [ 464.497220][T11969] bch2_btree_update_start+0x1618/0x2600 [ 464.497239][T11969] bch2_btree_split_leaf+0xcc/0x730 [ 464.497258][T11969] bch2_trans_commit_error+0x22c/0xc30 [ 464.497276][T11969] __bch2_trans_commit+0x207c/0x4e30 [ 464.497292][T11969] bch2_journal_replay+0x9e0/0x1420 [ 464.497305][T11969] __bch2_run_recovery_passes+0x458/0xf98 [ 464.497318][T11969] bch2_run_recovery_passes+0x280/0x478 [ 464.497331][T11969] bch2_fs_recovery+0x24f0/0x3a28 [ 464.497344][T11969] bch2_fs_start+0xb80/0x1248 [ 464.497358][T11969] bch2_fs_get_tree+0xe94/0x1708 [ 464.497377][T11969] vfs_get_tree+0x84/0x2d0 Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:55 -04:00
Alan Huang	e31144f8cb	bcachefs: Add missing EBUG_ON Just like the EBUG_ON in bch2_journal_add_entry(). Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:55 -04:00
Alan Huang	9b54efe66c	bcachefs: Fix alloc_req use after free Now the alloc_req is allocated from the bump allocator, if there is reallocation, the memory of alloc_req would be frees, fix by delaying the reallocation to transaction restart, it has to restart anyway. Reported-by: syzbot+2887a13a5c387e616a68@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:55 -04:00
Alan Huang	9b9a327009	bcachefs: Don't allocate new memory when mempool is exhausted Allocating new memory when mempool is exhausted is too complicated, just return ENOMEM is fine. memcpy is not needed, since there might be pointers point to the old memory, that's the bug. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:55 -04:00
Kent Overstreet	3bd6f8aeae	bcachefs: btree iter tracepoints We've been seeing some livelock-ish behavior in the index update part of the main write path, and while we've got low level btree path tracepoints, we've been lacking high level btree iterator tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:55 -04:00
Kent Overstreet	0f4dd2ce35	bcachefs: trace_extent_trim_atomic Add a tracepoint for when we insert only part of an extent, due to too many overwrites. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-15 22:11:55 -04:00
Kent Overstreet	aef22f6fe7	bcachefs: Don't trace should_be_locked unless changing Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:25:41 -04:00
Kent Overstreet	cd1124244b	bcachefs: Ensure that snapshot creation propagates has_case_insensitive We normally can't create a new directory with the case-insensitive option already set - except when we're creating a snapshot. And if casefolding is enabled filesystem wide, we should still set it even though not strictly required, for consistency. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:24:21 -04:00
Kent Overstreet	b68baf9a87	bcachefs: Print devices we're mounting on multi device filesystems Previously, we only ever logged the filesystem UUID. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:24:21 -04:00
Kent Overstreet	205da7c026	bcachefs: Don't trust sb->nr_devices in members_to_text() We have to be able to print superblock sections even if they fail to validate (for debugging), so we have to calculate the number of entries from the field size. Reported-by: syzbot+5138f00559ffb3cb3610@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:24:20 -04:00
Kent Overstreet	625c494db9	bcachefs: Fix version checks in validate_bset() It seems btree node scan picked up a partially overwritten btree node, and corrected the "bset version older than sb version_min" error - resulting in an invalid superblock with a bad version_min field. Don't run this check at all when we're in btree node scan, and when we do run it, do something saner if the bset version is totally crazy. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Arnd Bergmann	e82b3a63a9	bcachefs: ioctl: avoid stack overflow warning Multiple ioctl handlers individually use a lot of stack space, and clang chooses to inline them into the bch2_fs_ioctl() function, blowing through the warning limit: fs/bcachefs/chardev.c:655:6: error: stack frame size (1032) exceeds limit (1024) in 'bch2_fs_ioctl' [-Werror,-Wframe-larger-than] 655 \| long bch2_fs_ioctl(struct bch_fs c, unsigned cmd, void __user arg) By marking the largest two of them as noinline_for_stack, no indidual code path ends up using this much, which avoids the warning and reduces the possible total stack usage in the ioctl handler. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	c3dd25319c	bcachefs: Don't pass trans to fsck_err() in gc_accounting_done fsck_err() can return a transaction restart if passed a transaction object - this has always been true when it has to drop locks to prompt for user input, but we're seeing this more now that we're logging the error being corrected in the journal. gc_accounting_done() doesn't call fsck_err() from an actual commit loop, and it doesn't need to be holding btree locks when it calls fsck_err(), so the easy fix here for the unhandled transaction restart is to just not pass it the transaction object. We'll miss out on the fancy new logging, but that's ok. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	9e48f574e5	bcachefs: Fix leak in bch2_fs_recovery() error path Fix a small leak of the superblock 'clean' section. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	54aacfe397	bcachefs: Fix rcu_pending for PREEMPT_RT PREEMPT_RT redefines how standard spinlocks work, so local_irq_save() + spin_lock() is no longer equivalent to spin_lock_irqsave(). Fortunately, we don't strictly need to do it that way. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	082c744114	bcachefs: Fix downgrade_table_extra() Fix a UAF: we were calling darray_make_room() and retaining a pointer to the old buffer. And fix an UBSAN warning: struct bch_sb_field_downgrade_entry uses __counted_by, so set dst->nr_errors before assigning to the array entry. Reported-by: syzbot+14c52d86ddbd89bea13e@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	757601ef85	bcachefs: Don't put rhashtable on stack Object debugging generally needs special provisions for putting said objects on the stack, which rhashtable does not have. Reported-by: syzbot+bcc38a9556d0324c2ec2@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	f946ce0be4	bcachefs: Make sure opts.read_only gets propagated back to VFS If we think we're read-only but the VFS doesn't, fun will ensue. And now that we know we have to be able to do this safely, just make nochanges imply ro. Reported-by: syzbot+a7d6ceaba099cc21dee4@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Alan Huang	0acb385ec1	bcachefs: Fix possible console lock involved deadlock Link: https://lore.kernel.org/all/6822ab02.050a0220.f2294.00cb.GAE@google.com/T/ Reported-by: syzbot+2c3ef91c9523c3d1a25c@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	3315113af1	bcachefs: mark more errors autofix Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	263561649e	bcachefs: Don't persistently run scan_for_btree_nodes bch2_btree_lost_data() gets called on btree node read error, but the error might be transient. btree_node_scan is expensive, and there's no need to run it persistently (marking it in the superblock as required to run) - check_topology will run it if required, via bch2_get_scanned_nodes(). Running it non-persistently is fine, to avoid check_topology having to rewind recovery to run it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	dd22844f48	bcachefs: Read error message now prints if self healing Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	b47a82ff47	bcachefs: Only run 'increase_depth' for keys from btree node csan bch2_btree_increase_depth() was originally for disaster recovery, to get some data back from the journal when a btree root was bad. We don't need it for that purpose anymore; on bad btree root we'll launch btree node scan and reconstruct all the interior nodes. If there's a key in the journal for a depth that doesn't exists, and it's not from check_topology/btree node scan, we should just ignore it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	7b0e6b198e	bcachefs: Mark need_discard_freespace_key_bad autofix Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	af5b88618a	bcachefs: Update /dev/disk/by-uuid on device add Invalidate pagecache after we write the new superblock and send a uevent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	b76cce1270	bcachefs: Add more flags to btree nodes for rewrite reason It seems excessive forced btree node rewrites can cause interior btree updates to become wedged during recovery, before we're using the write buffer for backpointer updates. Add more flags so we can determine where these are coming from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	c7e351be7a	bcachefs: Add range being updated to btree_update_to_text() We had a deadlock during recovery where interior btree updates became wedged and all open_buckets were consumed; start adding more introspection. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:29 -04:00
Kent Overstreet	b43f724927	bcachefs: Log fsck errors in the journal Log the specific error being corrected in the journal when we're repairing, this helps greatly with 'bcachefs list_journal' analysis. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:29 -04:00
Kent Overstreet	47fe65b105	bcachefs: Add missing restart handling to check_topology() The next patch will add logging of the specific error being corrected in repair paths to the journal; this means __bch2_fsck_err() can return transaction restarts in places that previously weren't expecting them. check_topology() is old code that doesn't use btree iterators for btree node locking - it'll have to be rewritten in the future to work online. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:29 -04:00
Linus Torvalds	ff0905bbf9	bcachefs updates for 6.16, part 2 - More stack usage improvements (~600 bytes). - Define CLASS()es for some commonly used types, and convert most rcu_read_lock() uses to the new lock guards - New introspection: - Superblock error counters are now available in sysfs: previously, they were only visible with 'show-super', which doesn't provide a live view - New tracepoint, error_throw(), which is called any time we return an error and start to unwind - Repair - check_fix_ptrs() can now repair btree node roots - We can now repair when we've somehow ended up with the journal using a superblock bucket - Revert some leftovers from the aborted directory i_size feature, and add repair code: some userspace programs (e.g. sshfs) were getting confused. It seems in 6.15 there's a bug where i_nlink on the vfs inode has been getting incorrectly set to 0, with some unfortunate results; list_journal analysis showed bch2_inode_rm() being called (by bch2_evict_inode()) when it clearly should not have been. - bch2_inode_rm() now runs "should we be deleting this inode?" checks that were previously only run when deleting unlinked inodes in recovery. - check_subvol() was treating a dangling subvol (pointing to a missing root inode) like a dangling dirent, and deleting it. This was the really unfortunate one: check_subvol() will now recreate the root inode if necessary. This took longer to debug than it should have, and we lost several filesystems unnecessarily, becuase users have been ignoring the release notes and blindly running 'fsck -y'. Debugging required reconstructing what happened through analyzing the journal, when ideally someone would have noticed 'hey, fsck is asking me if I want to repair this: it usually doesn't, maybe I should run this in dry run mode and check what's going on?'. As a reminder, fsck errors are being marked as autofix once we've verified, in real world usage, that they're working correctly; blindly running 'fsck -y' on an experimental filesystem is playing with fire. Up to this incident we've had an excellent track record of not losing data, so let's try to learn from this one. This is a community effort, I wouldn't be able to get this done without the help of all the people QAing and providing excellent bug reports and feedback based on real world usage. But please don't ignore advice and expect me to pick up the pieces. If an error isn't marked as autofix, and it /is/ happening in the wild, that's also something I need to know about so we can check it out and add it to the autofix list if repair looks good. I haven't been getting those reports, and I should be; since we don't have any sort of telemetry yet I am absolutely dependent on user reports. Now I'll be spending the weekend working on new repair code to see if I can get a filesystem back for a user who didn't have backups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmhAuL0ACgkQE6szbY3K bnZlCg/+Pu2TgWBbkwrmHgKH9v4K3pwQRREXSj0TlbWQp9bK00zEBrmdEfTZKgUC q5nAAa6zCs0w/A9TFA7t1W/3+JY28ENhoArKFWemLhFZ2qEEXTZlVHvqyHOyuPBf Loe+hQO8qgWJm6KO9VMCT1pEupslQLRlhI8GhbPPcxPvYXVjmTne7KCanhjeSEx5 TLaOiMn7jr+qPeLZ7xSMaaUTbH2SASjwl2E9/4kG6VqaTTF2MnPNwrdJI0exjyvs QRaUvYbwBBTe/ru5ddmJuWj+61awKS87ANg+rkO2FWpOrai2HfgHd6o+zge/IR2Z /Cfarv1SSd1+0caVaGUAzhnoVhOpY1FU4emJwVvcwnBXeXdGIb/kpaw+Lxm7fr+U J6EnqgUoBsBWBCWgvUxlNHVeJ6wBdVNtDlTHabaH8RSCJZjgjg2JaSQM/v9VPLNa 6jTy3rhkPo50BJBb/F/AZmrobWXR2MkgID3iPEMcpjEyLaRZvW9FPqMFIxKQrUfB XGDU4dAu3C+Q9i1KDkFIvIG3e7z9nSmv6np4O57CgrmrmmCpRUz7Yy0yhqNs36/H WhLh/Pjb9gupdFK0TwFiEEG3wfnmXlde2c8IfrXXzKSKPIZ0T/RnLZapS7i94c2E DumhLYjNjSCiciQZh4eLK0bKx0NETUG79eLUTz5Gi3Pc02E0pU8= =ZGDn -----END PGP SIGNATURE----- Merge tag 'bcachefs-2025-06-04' of git://evilpiepirate.org/bcachefs Pull more bcachefs updates from Kent Overstreet: "More bcachefs updates: - More stack usage improvements (~600 bytes) - Define CLASS()es for some commonly used types, and convert most rcu_read_lock() uses to the new lock guards - New introspection: - Superblock error counters are now available in sysfs: previously, they were only visible with 'show-super', which doesn't provide a live view - New tracepoint, error_throw(), which is called any time we return an error and start to unwind - Repair - check_fix_ptrs() can now repair btree node roots - We can now repair when we've somehow ended up with the journal using a superblock bucket - Revert some leftovers from the aborted directory i_size feature, and add repair code: some userspace programs (e.g. sshfs) were getting confused It seems in 6.15 there's a bug where i_nlink on the vfs inode has been getting incorrectly set to 0, with some unfortunate results; list_journal analysis showed bch2_inode_rm() being called (by bch2_evict_inode()) when it clearly should not have been. - bch2_inode_rm() now runs "should we be deleting this inode?" checks that were previously only run when deleting unlinked inodes in recovery - check_subvol() was treating a dangling subvol (pointing to a missing root inode) like a dangling dirent, and deleting it. This was the really unfortunate one: check_subvol() will now recreate the root inode if necessary This took longer to debug than it should have, and we lost several filesystems unnecessarily, because users have been ignoring the release notes and blindly running 'fsck -y'. Debugging required reconstructing what happened through analyzing the journal, when ideally someone would have noticed 'hey, fsck is asking me if I want to repair this: it usually doesn't, maybe I should run this in dry run mode and check what's going on?' As a reminder, fsck errors are being marked as autofix once we've verified, in real world usage, that they're working correctly; blindly running 'fsck -y' on an experimental filesystem is playing with fire Up to this incident we've had an excellent track record of not losing data, so let's try to learn from this one This is a community effort, I wouldn't be able to get this done without the help of all the people QAing and providing excellent bug reports and feedback based on real world usage. But please don't ignore advice and expect me to pick up the pieces If an error isn't marked as autofix, and it /is/ happening in the wild, that's also something I need to know about so we can check it out and add it to the autofix list if repair looks good. I haven't been getting those reports, and I should be; since we don't have any sort of telemetry yet I am absolutely dependent on user reports Now I'll be spending the weekend working on new repair code to see if I can get a filesystem back for a user who didn't have backups" * tag 'bcachefs-2025-06-04' of git://evilpiepirate.org/bcachefs: (69 commits) bcachefs: add cond_resched() to handle_overwrites() bcachefs: Make journal read log message a bit quieter bcachefs: Fix subvol to missing root repair bcachefs: Run may_delete_deleted_inode() checks in bch2_inode_rm() bcachefs: delete dead code from may_delete_deleted_inode() bcachefs: Add flags to subvolume_to_text() bcachefs: Fix oops in btree_node_seq_matches() bcachefs: Fix dirent_casefold_mismatch repair bcachefs: Fix bch2_fsck_rename_dirent() for casefold bcachefs: Redo bch2_dirent_init_name() bcachefs: Fix -Wc23-extensions in bch2_check_dirents() bcachefs: Run check_dirents second time if required bcachefs: Run snapshot deletion out of system_long_wq bcachefs: Make check_key_has_snapshot safer bcachefs: BCH_RECOVERY_PASS_NO_RATELIMIT bcachefs: bch2_require_recovery_pass() bcachefs: bch_err_throw() bcachefs: Repair code for directory i_size bcachefs: Kill un-reverted directory i_size code bcachefs: Delete redundant fsck_err() ...	2025-06-04 19:14:24 -07:00
Kent Overstreet	3d11125ff6	bcachefs: add cond_resched() to handle_overwrites() Fix soft lockup warnings in btree nodes can. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:41 -04:00
Kent Overstreet	a4b0f75050	bcachefs: Make journal read log message a bit quieter Users seem to be assuming that the 'dropped unflushed entries' message at the end of journal read indicates some sort of problem, when it does not - we expect there to be entries in the journal that weren't commited, it's purely informational so that we can correlate journal sequence numbers elsewhere when debugging. Shorten the log message a bit to hopefully make this clearer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:41 -04:00
Kent Overstreet	29cc6fb7c0	bcachefs: Fix subvol to missing root repair We had a bug where the root inode of a subvolume was erronously deleted: bch2_evict_inode() called bch2_inode_rm(), meaning the VFS inode's i_nlink was somehow set to 0 when it shouldn't have - the inode in the btree indicated it clearly was not unlinked. This has been addressed with additional safety checks in bch2_inode_rm() - pulling in the safety checks we already were doing when deleting unlinked inodes in recovery - but the really disastrous bug was in check_subvols(), which on finding a dangling subvol (subvol with a missing root inode) would delete the subvolume. I assume this bug dates from early check_directory_structure() code, which originally handled subvolumes and normal paths - the idea being that still live contents of the subvolume would get reattached somewhere. But that's incorrect, and disastrously so; deleting a subvolume triggers deleting the snapshot ID it points to, deleting the entire contents. The correct way to repair is to recreate the root inode if it's missing; then any contents will get reattached under that subvolume's lost+found. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:41 -04:00
Kent Overstreet	09fb85ae56	bcachefs: Run may_delete_deleted_inode() checks in bch2_inode_rm() We had a bug where bch2_evict_inode() incorrectly called bch2_inode_rm() - the journal clearly showed the inode was not unlinked. We've got checks that we use in recovery when cleaning up deleted inodes, lift them to bch2_inode_rm() as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:41 -04:00
Kent Overstreet	bb6689bbee	bcachefs: delete dead code from may_delete_deleted_inode() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:41 -04:00
Kent Overstreet	bfaac2c546	bcachefs: Add flags to subvolume_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:41 -04:00
Kent Overstreet	9f2dc5f394	bcachefs: Fix oops in btree_node_seq_matches() btree_update_nodes_written() needs to wait on in-flight writes to old nodes before marking them as freed. But it has no reason to pin those old nodes in memory, so some trickyness ensues. The update we're completing deleted references to those nodes from the btree, so we know if they've been evicted they can't be pulled back in. We just have to check if the nodes we have pointers to are still those old nodes, and haven't been reused. To do that we check the node's "sequence number" (actually a random 64 bit cookie), but that lives in the node's data buffer. 'struct btree' can't be freed until filesystem shutdown (as they're quite small), but the data buffers can be freed or swapped around. Commit `1f88c35674`, which was fixing a kmsan warning, assumed that we could safely do this locklessly with just a READ_ONCE() - if we've got a non-null ptr it would be safe to read from. But that's not true if the data buffer is a vmalloc allocation, so we need to restore the locking that commit deleted (or alternatively RCU free those data buffers, but there's no other reason for that). Fixes: `1f88c35674` ("bcachefs: Fix a KMSAN splat in btree_update_nodes_written()") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:41 -04:00
Kent Overstreet	2bf380c005	bcachefs: Fix dirent_casefold_mismatch repair Instead of simply recreating a mis-casefolded dirent, use the str_hash repair code, which will rename it if necessary - the dirent might have been created again with the correct casefolding. Factor out out bch2_str_hash_repair key() from __bch2_str_hash_check_key() for the new path to use, and export bch2_dirent_create_key() as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:41 -04:00
Kent Overstreet	b938d3c970	bcachefs: Fix bch2_fsck_rename_dirent() for casefold bch2_fsck_renamed_dirent was creating bch_dirent keys open-coded - but we need to use the appropriate helper, if the directory is casefolded. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:41 -04:00
Kent Overstreet	35c1f131bc	bcachefs: Redo bch2_dirent_init_name() Redo (and simplify somewhat) how casefolded and non casefolded dirents are initialized, and export this to be used by fsck_rename_dirent(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:41 -04:00
Nathan Chancellor	01d925f7e1	bcachefs: Fix -Wc23-extensions in bch2_check_dirents() Clang warns (or errors with CONFIG_WERROR=y): fs/bcachefs/fsck.c:2325:2: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions] 2325 \| int ret = bch2_trans_run(c, \| ^ On clang-17 and older, this is an unconditional error: fs/bcachefs/fsck.c:2325:2: error: expected expression 2325 \| int ret = bch2_trans_run(c, \| ^ Move the declaration of ret to the top of the function to resolve both ways this issue manifests. Fixes: `c72def5237` ("bcachefs: Run check_dirents second time if required") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-04 16:45:38 -04:00
Kent Overstreet	c72def5237	bcachefs: Run check_dirents second time if required If we move a key backwards, we'll need a second pass to run the rest of the fsck checks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:36 -04:00
Kent Overstreet	a4907d7f33	bcachefs: Run snapshot deletion out of system_long_wq We don't want this running out of the same workqueue, and blocking, writes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:36 -04:00
Kent Overstreet	e49cf9b54b	bcachefs: Make check_key_has_snapshot safer Snapshot deletion v2 added sentinal values for deleted snapshots, so "key for deleted snapshot" - i.e. snapshot deletion missed something - is safe to repair automatically. But if we find a key for a missing snapshot we have no idea what happened, and we shouldn't delete it unless we're very sure that everything else is consistent. So hook it up to the new bch2_require_recovery_pass(), we'll now only delete if snapshots and subvolumes have recenlty been checked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:36 -04:00
Kent Overstreet	0942b852d4	bcachefs: BCH_RECOVERY_PASS_NO_RATELIMIT Add a superblock flag to temporarily disable ratelimiting for a recovery pass. This will be used to make check_key_has_snapshot safer: we don't want to delete a key for a missing snapshot unless we know that the snapshots and subvolumes btrees are consistent, i.e. check_snapshots and check_subvols have run recently. Changing those btrees - creating/deleting a subvolume or snapshot - will set the "disable ratelimit" flag, i.e. ensuring that those passes run if check_key_has_snapshot discovers an error. We're only disabling ratelimiting in the snapshot/subvol delete paths, we're not so concerned about the create paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:36 -04:00
Kent Overstreet	a2ffab0e65	bcachefs: bch2_require_recovery_pass() Add a helper for requiring that a recovery pass has already run: either run it directly, if we're still in recovery, or if we're not in recovery check if it has run recently and schedule it if it hasn't. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:35 -04:00
Kent Overstreet	09b9c72bd4	bcachefs: bch_err_throw() Add a tracepoint for any time we return an error and unwind. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:35 -04:00
Kent Overstreet	36a2fdf7c5	bcachefs: Repair code for directory i_size We had a bug due due to an incomplete revert of the patch implementing directory i_size (summing up the size of the dirents), leading to completely screwy i_size values that underflow. Most userspace programs don't seem to care (e.g. du ignores it), but it turns out this broke sshfs, so needs to be repaired. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:35 -04:00
Kent Overstreet	95fafc0f34	bcachefs: Kill un-reverted directory i_size code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:35 -04:00
Kent Overstreet	d47db3e636	bcachefs: Delete redundant fsck_err() 'inode_has_wrong_backpointer'; we have more specific errors for every case afterwards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:35 -04:00
Kent Overstreet	165815c296	bcachefs: Convert BUG() to error Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:35 -04:00
Kent Overstreet	132263220d	bcachefs: Add better logging to fsck_rename_dirent() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-01 00:03:12 -04:00
Kent Overstreet	18dad454cd	bcachefs: Replace rcu_read_lock() with guards The new guard(), scoped_guard() allow for more natural code. Some of the uses with creative flow control have been left. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-01 00:03:12 -04:00
Kent Overstreet	9cb49fbf73	bcachefs: CLASS(btree_trans) Allow btree_trans to be used with CLASS(). Automatic cleanup, instead of manually calling bch2_trans_put(). We don't use DEFINE_CLASS because using a static inline for the constructor breaks bch2_trans_get()'s use of __func__, so we have to open code it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-01 00:03:12 -04:00
Linus Torvalds	7d4e49a77d	- The 3 patch series "hung_task: extend blocking task stacktrace dump to semaphore" from Lance Yang enhances the hung task detector. The detector presently dumps the blocking tasks's stack when it is blocked on a mutex. Lance's series extends this to semaphores. - The 2 patch series "nilfs2: improve sanity checks in dirty state propagation" from Wentao Liang addresses a couple of minor flaws in nilfs2. - The 2 patch series "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn fixes a couple of issues in the gdb scripts. - The 9 patch series "Support kdump with LUKS encryption by reusing LUKS volume keys" from Coiby Xu addresses a usability problem with kdump. When the dump device is LUKS-encrypted, the kdump kernel may not have the keys to the encrypted filesystem. A full writeup of this is in the series [0/N] cover letter. - The 2 patch series "sysfs: add counters for lockups and stalls" from Max Kellermann adds /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and /sys/kernel/rcu_stall_count. - The 3 patch series "fork: Page operation cleanups in the fork code" from Pasha Tatashin implements a number of code cleanups in fork.c. - The 3 patch series "scripts/gdb/symbols: determine KASLR offset on s390 during early boot" from Ilya Leoshkevich fixes some s390 issues in the gdb scripts. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDuCvQAKCRDdBJ7gKXxA jrkxAQCnFAp/uK9ckkbN4nfpJ0+OMY36C+A+dawSDtuRsIkXBAEAq3e6MNAUdg5W Ca0cXdgSIq1Op7ZKEA+66Km6Rfvfow8= =g45L -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "hung_task: extend blocking task stacktrace dump to semaphore" from Lance Yang enhances the hung task detector. The detector presently dumps the blocking tasks's stack when it is blocked on a mutex. Lance's series extends this to semaphores - "nilfs2: improve sanity checks in dirty state propagation" from Wentao Liang addresses a couple of minor flaws in nilfs2 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn fixes a couple of issues in the gdb scripts - "Support kdump with LUKS encryption by reusing LUKS volume keys" from Coiby Xu addresses a usability problem with kdump. When the dump device is LUKS-encrypted, the kdump kernel may not have the keys to the encrypted filesystem. A full writeup of this is in the series [0/N] cover letter - "sysfs: add counters for lockups and stalls" from Max Kellermann adds /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and /sys/kernel/rcu_stall_count - "fork: Page operation cleanups in the fork code" from Pasha Tatashin implements a number of code cleanups in fork.c - "scripts/gdb/symbols: determine KASLR offset on s390 during early boot" from Ilya Leoshkevich fixes some s390 issues in the gdb scripts * tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits) llist: make llist_add_batch() a static inline delayacct: remove redundant code and adjust indentation squashfs: add optional full compressed block caching crash_dump, nvme: select CONFIGFS_FS as built-in scripts/gdb/symbols: determine KASLR offset on s390 during early boot scripts/gdb/symbols: factor out pagination_off() scripts/gdb/symbols: factor out get_vmlinux() kernel/panic.c: format kernel-doc comments mailmap: update and consolidate Casey Connolly's name and email nilfs2: remove wbc->for_reclaim handling fork: define a local GFP_VMAP_STACK fork: check charging success before zeroing stack fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code fork: clean-up ifdef logic around stack allocation kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count x86/crash: make the page that stores the dm crypt keys inaccessible x86/crash: pass dm crypt keys to kdump kernel Revert "x86/mm: Remove unused __set_memory_prot()" crash_dump: retrieve dm crypt keys in kdump kernel ...	2025-05-31 19:12:53 -07:00
Kent Overstreet	42359f1615	bcachefs: CLASS(darray) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	237a8e16bd	bcachefs: CLASS(printbuf) Add a DEFINE_CLASS() for printbufs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	a0f7437906	bcachefs: sysfs trigger_journal_commit Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	1f42a0335a	bcachefs: sysfs trigger_emergency_read_only Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	5802caf74f	bcachefs: darray_find(), darray_find_p() New helpers to avoid open coded loops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	9a1accd3a5	bcachefs: Journal keys are retained until shutdown, or journal replay finishes If we don't finish journal replay we need to keep journal keys around until the filesystem shuts down - otherwise e.g. -o norecovery, various tools (dump, list) break, and eventually we'll be doing journal replay in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	6447544c3d	bcachefs: Improve error printing in btree_node_check_topology() We had a bug report where the errors from btree_node_check_topology() don't seem to be getting printed; log_fsck_err() does some fancy ratelimiting-type stuff that we don't want here. Instead, just use bch2_count_fsck_err(); this is simpler, and modelled after how we're currently handling bucket ref update errors in buckets.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	f402d9710b	bcachefs: bch2_readdir() now calls str_hash_check_key() More self healing code: readdir will now notice if there are dirents hashed incorrectly, and it'll repair them if errors=fix_safe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	a592268260	bcachefs: bch2_str_hash_check_key() may now be called without snapshots_seen We don't track snapshot overwrites outside of fsck, so for this to be called at runtime outside of fsck we need to create it on demand, when we have repair to do. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	cb6f5d0dec	bcachefs: __bch2_insert_snapshot_whiteouts() refactoring Now uses bch2_get_snapshot_overwrites(), and much shorter. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	801cb2bd6c	bcachefs: bch2_get_snapshot_overwrites() New helper for getting a list of snapshot IDs that have overwritten a given key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	d21262d4e3	bcachefs: bch2_dev_journal_bucket_delete() Recover from "journal and btree in same bucket". Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	0224d17d76	bcachefs: Runtime self healing for keys for deleted snapshots If snapshot deletion incorrectly missing some keys and leaves keys for deleted snapshots, that causes a bit of a problem for data move - we can't move an extent for a nonexistent snapshot, because the extent might have to be fragmented, and maintaining correct visibility in child snapshots doesn't work if it doesn't have a snapshot. Previously we'd just skip these keys, but it turns out that causes copygc to spin. So we need runtime self healing, i.e. calling check_key_has_snapshot() from the data move path. Snapshot deletion v2 included sentinal values for deleted snapshot nodes, so this is quite safe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	f02d153274	bcachefs: Don't unlock trans before data_update_init() data_update_init() does need to do btree operations, delay doing the unlock-before-io. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:17 -04:00
Kent Overstreet	642c1aabb0	bcachefs: Use bch2_err_matches() for BCH_ERR_fsck_(fix\|ignore) We'll be adding subtypes of these errors, and new error code tracing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-31 22:03:16 -04:00
Kent Overstreet	dc43f6a70b	bcachefs: Mark bch_errcode helpers __attribute__((const)) These don't access global memory or defer pointer arguments - this enables CSE optimizations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-30 11:20:18 -04:00
Kent Overstreet	66621f016d	bcachefs: Add missing printbuf_reset() in bch2_check_dirent_inode_dirent() We were accidentally including the contents from the previous fsck_err(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-30 11:20:18 -04:00
Kent Overstreet	f1dc067bc1	bcachefs: sysfs/errors Make the superblock error counters available in sysfs; the only other way they can be seen is 'show-super', but we don't write the superblock every time the error count gets incremented. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-30 11:20:18 -04:00
Kent Overstreet	66b7c51ceb	bcachefs: bch2_check_fix_ptrs() can now repair btree roots This is straightforward enough: check_fix_ptrs() currently only runs before we go RW, so updating the btree root pointer in c->btree_roots suffices - it'll be written out in the first journal write we do. For that, do_bch2_trans_commit_to_journal_replay() now handles JSET_ENTRY_btree_root entries. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-30 01:21:13 -04:00

1 2 3 4 5 ...

5280 Commits