linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-04 18:49:41 +00:00

Author	SHA1	Message	Date
Kent Overstreet	1ed0a5d280	bcachefs: Convert fsck errors to errcode.h Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	d4bf5eecd7	bcachefs: Use bch2_err_str() in error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	326568f18c	bcachefs: Convert bch2_gc_done() for_each_btree_key2() This converts bch2_gc_stripes_done() and bch2_gc_reflink_done() to the new for_each_btree_key_commit() macro. The new for_each_btree_key2() and for_each_btree_key_commit() macros handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	a1783320d4	bcachefs: for_each_btree_key2() This introduces two new macros for iterating through the btree, with transaction restart handling - for_each_btree_key2() - for_each_btree_key_commit() Every iteration is now in an implicit transaction, and - as with lockrestart_do() and commit_do() - returning -EINTR will cause the transaction to be restarted, at the same key. This patch converts a bunch of code that was open coding this to these new macros, saving a substantial amount of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	e68914ca84	bcachefs: Rename __bch2_trans_do() -> commit_do() Better/more descriptive naming, and prep for adding nested_lockrestart_do() and nested_commit_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	80b3bf33d3	bcachefs: Silence some fsck errors when reconstructing alloc info There's no need to print fsck errors for errors that are expected, and the user has already opted to repair. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	1534ebb706	bcachefs: Put some repair messages behind opts->verbose These messages log the updates we're doing in bch2_check_fix_ptrs(), which is useful when debugging but not usually needed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	7a47d0993b	bcachefs: Always descend to leaf nodes it btree_gc If a btree node is unreadable, it's the topology repair that fixes that and it's kicked off by btree_gc, so btree_gc needs to touch every node and very that they can be read. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	2817d45381	bcachefs: Fix assertion in topology repair If we were at the end of the node, when breaking out of the loop we'd pop the assertion on line 446 when cur wasn't NULL. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	401ec4db63	bcachefs: Printbuf rework This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:33 -04:00
Kent Overstreet	1f93726e63	bcachefs: Tracepoint improvements Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	c0960603e2	bcachefs: Shutdown path improvements We're seeing occasional firings of the assertion in the key cache shutdown code that nr_dirty == 0, which means we must sometimes be doing transaction commits after we've gone read only. Cleanups & changes: - BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN - new helper bch2_btree_interior_updates_flush(), which returns true if it had to wait - bch2_btree_flush_writes() now also returns true if there were btree writes in flight - __bch2_fs_read_only now checks if btree writes were in flight in the shutdown loop: btree write completion does a transaction update, to update the pointer in the parent node - assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	7003589dab	bcachefs: Ensure buckets have io_time[READ] set It's an error if a bucket is in state BCH_DATA_cached but not on the LRU btree - i.e io_time[READ] == 0 - so, make sure it's set before adding it. Also, make some of the LRU code a bit clearer and more direct. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:31 -04:00
Kent Overstreet	822835ffea	bcachefs: Fold bucket_state in to BCH_DATA_TYPES() Previously, we were missing accounting for buckets in need_gc_gens and need_discard states. This matters because buckets in those states need other btree operations done before they can be used, so they can't be conuted when checking current number of free buckets against the allocation watermark. Also, we weren't directly counting free buckets at all. Now, data type 0 == BCH_DATA_free, and free buckets are counted; this means we can get rid of the separate (poorly defined) count of unavailable buckets. This is a new on disk format version, with upgrade and fsck required for the accounting changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:30 -04:00
Kent Overstreet	48620e5177	bcachefs: Topology repair fixes - We were failing to start topology repair, because we hadn't set the superblock flag indicating it needed to run - set_node_min() forget to update the btree node's key - bch2_gc_alloc_reset() didn't reset data type, leading to inserting an invalid key that was empty but had nonzero data type Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:30 -04:00
Kent Overstreet	66d9082385	bcachefs: Kill struct bucket_mark This switches struct bucket to using a lock, instead of cmpxchg. And now that the protected members no longer need to fit into a u64, we can expand the sector counts to 32 bits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:29 -04:00
Kent Overstreet	5735608c14	bcachefs: Kill main in-memory bucket array All code using the in-memory bucket array, excluding GC, has now been converted to use the alloc btree directly - so we can finally delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:29 -04:00
Kent Overstreet	f25d8215f4	bcachefs: Kill allocator threads & freelists Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:29 -04:00
Kent Overstreet	3d48a7f85f	bcachefs: KEY_TYPE_alloc_v4 This introduces a new alloc key which doesn't use varints. Soon we'll be adding backpointers and storing them in alloc keys, which means our pack/unpack workflow for alloc keys won't really work - we'll need to be mutating alloc keys in place. Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that converts older types of alloc keys to v4 if needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:29 -04:00
Kent Overstreet	f0a3a2ccab	bcachefs: Journal seq now incremented at entry open, not close This patch changes journal_entry_open() to initialize the new journal entry, not __journal_entry_close(). This also means that journal_cur_seq() refers to the sequence number of the last journal entry when we don't have an open journal entry, not the next one. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	fa8e94faee	bcachefs: Heap allocate printbufs This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:25 -04:00
Kent Overstreet	78c8fe20be	bcachefs: Normal update/commit path now works before going RW This improves __bch2_trans_commit - early in the recovery process, when we're running btree_gc and before we want to go RW, it now uses bch2_journal_key_insert() to add the update to the list of updates for journal replay to do, instead of btree_gc having to use separate interfaces depending on whether we're running at bringup or, later, runtime. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:25 -04:00
Kent Overstreet	c929f2306e	bcachefs: Stale ptr cleanup is now done by gc_gens Before we had dedicated gc code for bucket->oldest_gen this was btree_gc's responsibility, but now that we have that we can rip it out, simplifying the already overcomplicated btree_gc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:24 -04:00
Kent Overstreet	aa8982c3f2	bcachefs: Fix reflink repair code The reflink repair code was incorrectly inserting a nonzero deleted key via journal replay - this is due to bch2_journal_key_insert() being somewhat hacky, and so this fix is also hacky for now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:23 -04:00
Kent Overstreet	c45c866761	bcachefs: bch2_gc_gens() no longer uses bucket array Like the previous patches, this converts bch2_gc_gens() to use the alloc btree directly, and private arrays of generation numbers for its own recalculation of oldest_gen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:23 -04:00
Kent Overstreet	ec061b215d	bcachefs: btree_gc no longer uses main in-memory bucket array This changes the btree_gc code to only use the second bucket array, the one dedicated to GC. On completion, it compares what's in its in memory bucket array to the allocation information in the btree and writes it directly, instead of updating the main in-memory bucket array and writing that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:23 -04:00
Kent Overstreet	8f11548edb	bcachefs: Improve path for when btree_gc needs another pass btree_gc sometimes needs another pass when it corrects bucket generation numbers or data types - when it finds multiple pointers of different data types to the same bucket, it may want to keep the second one it found. When this happens, we now clear out bucket sector counts _without_ resetting the bucket generation/data types that we already found, instead of resetting them to what we have in the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:22 -04:00
Kent Overstreet	4e08446db0	bcachefs: Fix bch2_check_fix_ptrs() The repair for for btree_ptrs was saying one thing and doing another - fortunately, that code can just be deleted. Also, when we update a btree node pointer, we also have to update node in memery, if it exists in the btree node cache - this fixes bch2_check_fix_ptrs() to do that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:22 -04:00
Kent Overstreet	5222a4607c	bcachefs: BTREE_ITER_WITH_JOURNAL This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is automatically enabled when initializing a btree iterator before journal replay has completed - it overlays the contents of the journal with the btree. This lets us delete bch2_btree_and_journal_walk() and just use the normal btree iterator interface instead - which also lets us delete a significant amount of duplicated code. Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch - we're redoing the binary search over keys in the journal every time we call bch2_btree_iter_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:21 -04:00
Kent Overstreet	2a84de3360	bcachefs: Log what we're doing when repairing Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:21 -04:00
Kent Overstreet	13f914ecb9	bcachefs: Kill bch2_ec_mem_alloc() bch2_ec_mem_alloc() was only used by GC, and there's no real need to preallocate the stripes radix tree since we can cope fine with memory allocation failure when we use the radix tree. This deletes a fair bit of code, and it's also needed for the upcoming patch because bch2_btree_iter_peek_prev() won't be working before journal replay completes (and using it was incorrect previously, as well). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	47ac34ec98	bcachefs: Separate out gc_bucket() Since the main in memory bucket array is going away, we don't want to be calling bucket() or __bucket() when what we want is the GC in-memory bucket. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	e75b2d4c1c	bcachefs: bch2_journal_key_insert() no longer transfers ownership bch2_journal_key_insert() used to assume that the key passed to it was allocated with kmalloc(), and on success took ownership. This patch deletes that behaviour, making it more similar to bch2_trans_update()/bch2_trans_commit(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	77170d0dd7	bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to decide what buckets are eligible to allocate, we can clean up the filesystem initialization and device add paths. Previously, we had to use ancient code to mark superblock/journal buckets in the in memory bucket marks as we allocated them, and then zero that out and re-do that marking using the newer transational bucket mark paths. Now, we can simply delete the in-memory bucket marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	991ba02112	bcachefs: Add more time_stats This adds more latency/event measurements and breaks some apart into more events. Journal writes are broken apart into flush writes and noflush writes, btree compactions are broken out from btree splits, btree mergers are added, as well as btree_interior_updates - foreground and total. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	990d42d187	bcachefs: Split out struct gc_stripe from struct stripe We have two radix trees of stripes - one that mirrors some information from the stripes btree in normal operation, and another that GC uses to recalculate block usage counts. The normal one is now only used for finding partially empty stripes in order to reuse them - the normal stripes radix tree and the GC stripes radix tree are used significantly differently, so this patch splits them into separate types. In an upcoming patch we'll be replacing c->stripes with a btree that indexes stripes by the order we want to reuse them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	b547d005d5	bcachefs: Erasure coding fixes When we added the stripe and stripe_redundancy fields to alloc keys, we neglected to add them to the functions that convert back and forth with the in-memory types. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	181fe42a75	bcachefs: Handle replica marking fsck errors locally This simplifies the code quite a bit and eliminates an inconsistency - a given bkey doesn't necessarily translate to a single replicas entry for disk space accounting. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	58e1ea4bcb	bcachefs: Push c->mark_lock usage down to where it is needed This changes the bch2_mark_key() and related paths to take mark lock where it is needed, instead of taking it in the upper transaction commit path - by pushing down locking we'll be able to handle fsck errors locally instead of requiring a separate check in the btree_gc code for replicas being marked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	7468c4effc	bcachefs: Fix BCH_FS_ERROR flag handling We were setting BCH_FS_ERROR on startup if the superblock was marked as containing errors, which is not what we wanted - BCH_FS_ERROR indicates whether errors have been found, so that after a successful fsck we're able to clear the error bit in the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	e5464a371d	bcachefs: Add a bit of missing repair code This adds repair code to drop very stale pointers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	2debb1b875	bcachefs: BTREE_TRIGGER_INSERT now only means insert This allows triggers to distinguish between a key entering the btree - i.e. being called from the trans commit path - vs. being called on a key that already exists, i.e. by GC. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	904823de49	bcachefs: Convert bch2_mark_key() to take a btree_trans * This helps to unify the interface between bch2_mark_key() and bch2_trans_mark_key() - and it also gives access to the journal reservation and journal seq in the mark_key path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	961b2d6282	bcachefs: Assorted ec fixes - The backpointer that ec_stripe_update_ptrs() uses now needs to include the snapshot ID, which means we have to change where we add the backpointer to after getting the snapshot ID for the new extents - ec_stripe_update_ptrs() needs to be calling bch2_trans_begin() - improve error message in bch2_mark_stripe() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	f3cf0999ac	bcachefs: bch2_btree_node_rewrite() now returns transaction restarts We have been getting away from handling transaction restarts locally - convert bch2_btree_node_rewrite() to the newer style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	b0d1b70af8	bcachefs: Must check for errors from bch2_trans_cond_resched() But we don't need to call it from outside the btree iterator code anymore, since it's called by bch2_trans_begin() and bch2_btree_path_traverse(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:14 -04:00
Kent Overstreet	d355c6f4f7	bcachefs: for_each_btree_node() now returns errors directly This changes for_each_btree_node() to work like for_each_btree_key(), and to that end bch2_btree_iter_peek_node() and next_node() also return error ptrs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:14 -04:00
Kent Overstreet	dfc276df91	bcachefs: Improve reflink repair code When a reflink pointer points to an indirect extent that doesn't exist, we need to replace it with a KEY_TYPE_error key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:14 -04:00
Kent Overstreet	b9a7d8ac5f	bcachefs: Fix implementation of KEY_TYPE_error When force-removing a device, we were silently dropping extents that we no longer had pointers for - we should have been switching them to KEY_TYPE_error, so that reads for data that was lost return errors. This patch adds the logic for switching a key to KEY_TYPE_error to bch2_bkey_drop_ptr(), and improves the logic somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:13 -04:00
Kent Overstreet	e59a4d7875	bcachefs: Fix a spurious fsck error We were getting spurious "multiple types of data in same bucket" errors in fsck, because the check was running for (cached) stale pointers - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:13 -04:00
Kent Overstreet	67e0dd8f0d	bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:11 -04:00
Kent Overstreet	f4ccfe07e2	bcachefs: Fix unhandled transaction restart in bch2_gc_btree_gens() This fixes https://github.com/koverstreet/bcachefs/issues/305 Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:10 -04:00
Kent Overstreet	e5af273fce	bcachefs: trans->restarted Start tracking when btree transactions have been restarted - and assert that we're always calling bch2_trans_begin() immediately after transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:09 -04:00
Kent Overstreet	e3a67bdb6e	bcachefs: Regularize argument passing of btree_trans btree_trans should always be passed when we have one - iter->trans is disfavoured. This mainly updates old code in btree_update_interior.c, some of which predates btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:08 -04:00
Kent Overstreet	618b1c0e20	bcachefs: Split out SPOS_MAX Internal btree code really wants a POS_MAX with all fields ~0; external code more likely wants the snapshot field to be 0, because when we're passing it to bch2_trans_get_iter() it's used for the snapshot we're operating in, which should be 0 for most btrees that don't use snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:07 -04:00
Kent Overstreet	d976a84e3b	bcachefs: Don't loop into topology repair Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:07 -04:00
Kent Overstreet	0806151913	bcachefs: Don't ratelimit certain fsck errors It's unhelpful if we see "Halting mark and sweep to start topology repair" but we don't see the error that triggered it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:07 -04:00
Kent Overstreet	297d89343d	bcachefs: Extensive triggers cleanups - We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:07 -04:00
Kent Overstreet	4351d3ecb4	bcachefs: More topology repair code This improves the handling of overlapping btree nodes; now, we handle the case where one btree node completely overwrites another. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	bc3f8b25f3	bcachefs: Check for errors from bch2_trans_update() Upcoming refactoring is going to change bch2_trans_update() to start returning transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	890b74f03d	bcachefs: Fsck for reflink refcounts Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	e1036ce581	bcachefs: Repair code for multiple types of data in same bucket bch2_check_fix_ptrs() is awkward, we need to find a way to improve it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	3a402c8dab	bcachefs: Fix some refcounting bugs We really need debug mode assertions that ca->ref and ca->io_ref are used correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	ceda1b9a17	bcachefs: Evict btree nodes we're deleting There was a bug that led to duplicate btree node pointers being inserted at the wrong level. The new topology repair code can fix that, except that the btree cache code gets confused when we read in a btree node from the pointer that was at the wrong level. This patch evicts nodes that we're deleting to, which nicely solves the problem. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	aae15aafcd	bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	1c8441bea5	bcachefs: Fix repair leading to replicas not marked bch2_check_fix_ptrs() was being called after checking if the replicas set was marked - but repair could change which replicas set needed to be marked. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	dac1525d9c	bcachefs: gc shouldn't care about owned_by_allocator The owned_by_allocator field is a purely in memory thing, even if/when we bring back GC at runtime there's no need for it to be recalculating this field. This is prep work for pulling it out of struct bucket, and eventually getting rid of the bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	1b9374adec	bcachefs: Fix bch2_gc_done() error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	d44a6e350e	bcachefs: Drop old style btree node coalescing We have foreground btree node merging now, and any future btree node merging improvements are going to be based off of that code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	e949fbbba0	bcachefs: Ensure bucket gen gc completes We don't want it to block, if it can't allocate it should just continue instead of possibly deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	ac516d0e7d	bcachefs: Add the status of bucket gen gc to sysfs Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	d7f35163e6	bcachefs: Fix BTREE_ITER_NOT_EXTENTS bch2_btree_iter_peek() wasn't properly checking for BTREE_ITER_IS_EXTENTS when updating iter->pos. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	0e96452eef	bcachefs: Fix bch2_gc_btree_gens() Since we're using a NOT_EXTENTS iterator, we shouldn't be setting the iter pos to the start of the extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	e264b2f62a	bcachefs: Improve bch2_btree_update_start() bch2_btree_update_start() is now responsible for taking gc_lock and upgrading the iterator to lock parent nodes - greatly simplifying error handling and all of the callers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	e751c01a8e	bcachefs: Start using bpos.snapshot field This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor\|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	4cf91b0270	bcachefs: Split out bpos_cmp() and bkey_cmp() With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	3bf57160c2	bcachefs: Fix packed bkey format calculation for new btree roots Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	0390ea8ad8	bcachefs: Drop bkey noops Bkey noops were introduced to deal with trimming inline data extents in place in the btree: if the u64s field of a bkey was 0, that u64 was a noop and we'd start looking for the next bkey immediately after it. But extent handling has been lifted above the btree - we no longer modify existing extents in place in the btree, and the compatibilty code for old style extent btree nodes is gone, so we can completely drop this code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	e0ba3b6429	bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance The way btree iterators work internally has been changing, particularly with the iter->real_pos changes, and bch2_btree_iter_next() is no longer hyper optimized - it's just advance followed by peek, so it's more efficient to just call advance where we're not using the return value of bch2_btree_iter_next(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:56 -04:00
Kent Overstreet	7e6dbac982	bcachefs: Kill bkey ops->debugcheck method This code used to be used for running some assertions on alloc info at runtime, but it long predates fsck and hasn't been good for much in ages - we can delete it now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:56 -04:00
Kent Overstreet	50dc0f692a	bcachefs: Require all btree iterators to be freed We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:56 -04:00
Kent Overstreet	b3b66e3044	bcachefs: Have fsck check for stripe pointers matching stripe Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	f020bfcdb0	bcachefs: Use bch2_bpos_to_text() more consistently Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	006d69aa26	bcachefs: Don't drop ptrs to btree nodes If a ptr gen doesn't match the bucket gen, the bucket likely doesn't contain the data we want - but it's still possible the data we want might have been overwritten, and for btree node pointers we can verify whether or not the node is the one we wanted with the node's sequence number, so it's better to keep the pointer and try reading from it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	d065472c3a	bcachefs: Fix a use-after-free in bch2_gc_mark_key() bch2_check_fix_ptrs() can update/reallocate k Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	41e3778636	bcachefs: Bring back metadata only gc This is useful for the filesystem dump debugging tool - when we're hitting bugs we want to skip as much of the recovery process as possible, and the dump tool only needs to know where metadata lives. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	19dd3172b0	bcachefs: Use x-macros for compat feature bits This is to generate strings for them, so that we can print them out. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	33a391a255	bcachefs: Fix some (spurious) warnings about uninitialized vars These are only complained about when building in userspace, for some reason. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	a4805d6672	bcachefs: Scan for old btree nodes if necessary on mount We dropped support for !BTREE_NODE_NEW_EXTENT_OVERWRITE but it turned out there were people who still had filesystems with btree nodes in that format in the wild. This adds a new compat feature that indicates we've scanned for and rewritten nodes in the old format, and does that scan at mount time if the option isn't set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	dab9ef0d27	bcachefs: Add error message for some allocation failures Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	0507962f63	bcachefs: Drop invalid stripe ptrs in fsck More repair code, now that we can repair extents during initial gc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	180fb49dea	bcachefs: Journal updates to dev usage This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	2abe542087	bcachefs: Persist 64 bit io clocks Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	5fc70d3a54	bcachefs: Repair bad data pointers Now that we can repair metadata during GC, we can handle bad pointers that would trigger errors being marked, when they need to just be dropped. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	a0b73c1c53	bcachefs: Add (partial) support for fixing btree topology When we walk the btrees during recovery, part of that is checking that btree topology is correct: for every interior btree node, its child nodes should exactly span the range the parent node covers. Previously, we had checks for this, but not repair code. Now that we have the ability to do btree updates during initial GC, this patch adds that repair code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	5b593ee172	bcachefs: Add support for doing btree updates prior to journal replay Some errors may need to be fixed in order for GC to successfully run - walk and mark all metadata. But we can't start the allocators and do normal btree updates until after GC has completed, and allocation information is known to be consistent, so we need a different method of doing btree updates. Fortunately, we already have code for walking the btree while overlaying keys from the journal to be replayed. This patch adds an update path that adds keys to the list of keys to be replayed by journal replay, and also fixes up iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	a66f798974	bcachefs: Refactor checking of btree topology Still a lot of work to be done here: we can't yet repair btree topology issues, but this patch refactors things so that we have better access to what we need in the topology checks. Next up will be figuring out a way to do btree updates during gc, before journal replay is done. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	079663d8ed	bcachefs: Kill metadata only gc This was useful before we had transactional updates to interior btree nodes - but now, it's just extra unneeded complexity. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	6e53151b7b	bcachefs: Kill stripe->dirty This makes bch2_stripes_write() work more like bch2_alloc_write(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	a39c74be80	bcachefs: Fix gc updating stripes info The primary stripes radix tree can be sparse, which was causing an assertion to pop because the one use for gc isn't. Fix this by changing the algorithm to copy between the two radix trees. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	ac95800629	bcachefs: Factor out bch2_ec_stripes_heap_start() This fixes a bug where mark and sweep gc incorrectly was clearing out the stripes heap and causing assertions to fire later - simpler to just create the stripes heap after gc has finished. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	4291a3317f	bcachefs: bch2_alloc_write() should be writing for all devices Alloc info isn't stored on a particular device, it makes no sense to only be writing it out for rw members - this was causing fsck to not fix alloc info errors, oops. Also, make sure we write out alloc info in other repair paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	07a1006ae8	bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	3187aa8d57	bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	f299d57350	bcachefs: Refactor filesystem usage accounting Various filesystem usage counters are kept in percpu counters, with one set per in flight journal buffer. Right now all the code that deals with it assumes that there's only two buffers/sets of counters, but the number of journal bufs is getting increased to 4 in the next patch - so refactor that code to not assume a constant. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	b7a9bbfc1b	bcachefs: Move journal reclaim to a kthread This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	29364f3453	bcachefs: Drop sysfs interface to debug parameters It's not used much anymore, the module paramter interface is better. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:45 -04:00
Kent Overstreet	8d6b6222bf	bcachefs: Improvements to writing alloc info Now that we've got transactional alloc info updates (and have for awhile), we don't need to write it out on shutdown, and we don't need to write it out on startup except when GC found errors - this is a big improvement to mount/unmount performance. This patch also fixes a few bugs where we weren't writing out alloc info (on new filesystems, and new devices) and should have been. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	c47c50f856	bcachefs: Fix gc of stale ptr gens Awhile back, gcing of stale pointers was split out from full mark-and-sweep gc - but, the bit to actually drop those stale pointers wasn't implemnted. Whoops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	74ed7e560b	bcachefs: Don't let copygc buckets be stolen by other threads And assorted other copygc fixes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	988e98cfce	bcachefs: Refactor replicas code Awhile back the mechanism for garbage collecting unused replicas entries was significantly improved, but some cleanup was missed - this patch does that now. This is also prep work for a patch to account for erasure coded parity blocks separately - we need to consolidate the logic for checking/marking the various replicas entries from one bkey into a single function. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	89fd25be70	bcachefs: Use x-macros for data types Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	ba6dd1dd49	bcachefs: Improve stripe triggers/heap code Soon we'll be able to modify existing stripes - replacing empty blocks with new blocks and new p/q blocks. This patch updates the trigger code to handle pointers changing in an existing stripe; also, it significantly improves how the stripes heap works, which means we can get rid of the stripe creation/deletion lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	b9c3d13978	bcachefs: Fix a deadlock in the RO path Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	451570a5bc	bcachefs: Implement a new gc that only recalcs oldest gen Full mark and sweep gc doesn't (yet?) work with the new btree key cache code, but it also blocks updates to interior btree nodes for the duration and isn't really necessary in practice; we aren't currently attempting to repair errors in allocation info at runtime. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	1ada160618	bcachefs: Turn c->state_lock into an rwsem Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	00b8ccf707	bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	aafcf9bc12	bcachefs: Better error messages on bucket sector count overflows Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	1e1a31c4b0	bcachefs: Add some printks for error paths Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:39 -04:00
Kent Overstreet	0f9dda478f	bcachefs: Fix a deadlock on starting an interior btree update Not legal to block on a journal prereservation with btree locks held. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:38 -04:00
Kent Overstreet	d06c1a0cbc	bcachefs: Check btree topology at startup When initial btree gc was changed to overlay journal keys as it walks the btree, it also stopped checking btree topology. Previously, checking btree topology was a fairly complicated affair - but it's much easier now that btree_ptr_v2 has min_key in the pointer. This rewrites the old range_checks code and uses it in both runtime and initial gc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:37 -04:00
Kent Overstreet	39fb2983c5	bcachefs: Kill bkey_type_successor Previously, BTREE_ID_INODES was special - inodes were indexed by the inode field, which meant the offset field of struct bpos wasn't used, which led to special cases in e.g. the btree iterator code. Now, inodes in the inodes btree are indexed by the offset field. Also: prevously min_key was special for extents btrees, min_key for extents would equal max_key for the previous node. Now, min_key = bkey_successor() of the previous node, same as non extent btrees. This means we can completely get rid of btree_type_sucessor/predecessor. Also make some improvements to the metadata IO validate/compat code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:37 -04:00
Kent Overstreet	e62d65f2fb	bcachefs: trans_commit() path can now insert to interior nodes This will be needed for the upcoming patches to journal updates to interior btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:37 -04:00
Kent Overstreet	e3e464ac6d	bcachefs: Move extent overwrite handling out of core btree code Ever since the btree code was first written, handling of overwriting existing extents - including partially overwriting and splittin existing extents - was handled as part of the core btree insert path. The modern transaction and iterator infrastructure didn't exist then, so that was the only way for it to be done. This patch moves that outside of the core btree code to a pass that runs at transaction commit time. This is a significant simplification to the btree code and overall reduction in code size, but more importantly it gets us much closer to the core btree code being completely independent of extents and is important prep work for snapshots. This introduces a new feature bit; the old and new extent update models are incompatible when the filesystem needs journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:36 -04:00
Kent Overstreet	a9bc0a5173	bcachefs: Check for bad key version number Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:35 -04:00
Kent Overstreet	2d594dfb53	bcachefs: Split out btree_trigger_flags The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	ad44bdc351	bcachefs: bkey noops For upcoming inline data extents, we're going to need to be able to shorten the value of existing bkeys in the btree - and to make that work we're going to be able to need to pad out the space the value previously took up with something. This patch changes the various code that iterates over bkeys to handle k->u64s == 0 as meaning "skip the next 8 bytes". Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:32 -04:00
Kent Overstreet	ea3532cbf7	bcachefs: Fix a subtle race in the btree split path We have to free the old (in memory) btree node _before_ unlocking the new nodes - else, some other thread with a read lock on the old node could see stale data after another thread has already updated the new node. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Justin Husted	f7c0fcdd39	bcachefs: Fix uninitialized data in bch2_gc_btree() Running the filesystem under valgrind exposed a path where the max_stale variable in bch2_gc_btree() might not be initialized before use in a rare case when there are no btree nodes in a transaction. Signed-off-by: Justin Husted <sigstop@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	0741d37821	bcachefs: Don't allocate memory under mark_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:28 -04:00
Kent Overstreet	89b0511826	bcachefs: Flush fsck errors when looping in btree gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	06ab329c15	bcachefs: Improve pointer marking checks and error messages Importantly, we don't want to use bch2_fs_inconsistent_on() for errors that fsck can repair, becuase that will just put us in RO mode and prevent fsck from actually fixing stuff. Probably want to get rid of it in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	6671a7089f	bcachefs: Refactor bch2_alloc_write() Major simplification - gets rid of the need for marking buckets as dirty, instead we write buckets if the in memory mark is different from what's in the btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	2cbe5cfe27	bcachefs: Rework calling convention for marking overwrites Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:25 -04:00
Kent Overstreet	6e738539cd	bcachefs: Improve key marking interface Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:22 -04:00
Kent Overstreet	20bceecb31	bcachefs: More work to avoid transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:22 -04:00
Kent Overstreet	c43a6ef9a0	bcachefs: btree_bkey_cached_common This is prep work for the btree key cache: btree iterators will point to either struct btree, or a new struct bkey_cached. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:21 -04:00
Kent Overstreet	5e82a9a1f4	bcachefs: Write out fs usage consistently Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:21 -04:00
Kent Overstreet	94f651e2c7	bcachefs: Return errors from for_each_btree_key() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	c6dd04f8f5	bcachefs: Mark overwrites from journal replay in initial gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	d07343561e	bcachefs: Deduplicate keys in the journal before replay Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	4881fdb756	bcachefs: initial gc no longer needs to touch every node Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	a1d58243f9	bcachefs: add ability to run gc on metadata only Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	cccf4e6df3	bcachefs: Convert gc errors to fsck errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	36e916e13b	bcachefs: Caller now responsible for calling mark_key for gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:19 -04:00
Kent Overstreet	3a0e06db71	bcachefs: Assorted preemption fixes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:19 -04:00
Kent Overstreet	ccaa61c9f6	bcachefs: fix initial gc Buckets weren't being marked as dirty Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:19 -04:00
Kent Overstreet	05b3d5ac1f	bcachefs: simplify gc locking a bit Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:19 -04:00
Kent Overstreet	424eb88130	bcachefs: Only get btree iters from btree transactions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:18 -04:00
Kent Overstreet	8b2b9d11b9	bcachefs: Fix error handling in gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:18 -04:00
Kent Overstreet	6122ab639c	bcachefs: More debug params for testing of recovery paths Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:18 -04:00
Kent Overstreet	28062d320b	bcachefs: Fix gc handling of bucket gens Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	ecf37a4a80	bcachefs: fs_usage_u64s() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	768ac63924	bcachefs: Add a mechanism for blocking the journal Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	8777210b92	bcachefs: refactor key marking code a bit Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	39fbc5a49f	bcachefs: gc lock no longer needed for disk reservations Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	76f4c7b0c3	bcachefs: Fix oldest_gen handling Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	1df42b5715	bcachefs: don't do initial gc if have alloc info feature Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	42b72e0ba2	bcachefs: journal_replay_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	61c8d7c8eb	bcachefs: Persist stripe blocks_used Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	7ef2a73a58	bcachefs: Fix check for if extent update is allocating Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:14 -04:00
Kent Overstreet	23f80d2b3b	bcachefs: Factor out acc_u64s() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:14 -04:00
Kent Overstreet	06b7345cc2	bcachefs: Include summarized counts in fs_usage Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:13 -04:00
Kent Overstreet	9166b41db1	bcachefs: s/usage_lock/mark_lock better describes what it's for, and we're going to call a new lock usage_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:13 -04:00
Kent Overstreet	8eb7f3ee46	bcachefs: move dirty into bucket_mark Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	90541a741d	bcachefs: Add new alloc fields prep work for persistent alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	f0cfb963ec	bcachefs: Track nr_inodes with the key marking machinery Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	26609b619f	bcachefs: Make bkey types globally unique this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	d034c09b26	bcachefs: return errors correctly from gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	eeb83e25bb	bcachefs: Hold usage_lock over mark_key and fs_usage_apply Fixes an inconsistency at the end of gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	dfe9bfb32e	bcachefs: Stripes now properly subject to gc gc now verifies the contents of the stripes radix tree, important for persistent alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	ad7ae8d63f	bcachefs: Btree locking fix, refactoring Hit an assertion, probably spurious, indicating an iterator was unlocked when it shouldn't have been (spurious because it wasn't locked at all when the caller called btree_insert_at()). Add a flag, BTREE_ITER_NOUNLOCK, and tighten up the assertions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	9ca53b55f7	bcachefs: gc now operates on second set of bucket marks This means we can now use gc to verify the allocation information - important for testing persistant alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	cd575ddf57	bcachefs: Erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:11 -04:00
Kent Overstreet	91f8b5677b	bcachefs: More btree gc refactorings more prep work for erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:11 -04:00
Kent Overstreet	1d25849c2c	bcachefs: Centralize marking of replicas in btree update path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:11 -04:00
Kent Overstreet	47799326bc	bcachefs: more key marking refactoring prep work for erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:10 -04:00
Kent Overstreet	103e212785	bcachefs: replicas: prep work for stripes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:10 -04:00
Kent Overstreet	2252aa271c	bcachefs: btree gc refactoring prep work for erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:10 -04:00
Kent Overstreet	ef337c54c6	bcachefs: Allocation code refactoring bch2_alloc_sectors_start() was a nightmare to work with - it's got some tricky stuff to do, since it wants to use the buckets the writepoint already has, unless they're not in the target it wants to write to, unless it can't allocate from any other devices in which case it will use those buckets if it has to - et cetera. This restructures the code to start with a new empty list of open buckets we're going to use for the new allocation, pulling buckets from the write point's list as we decide that we really are going to use them - making the code somewhat more functional and drastically easier to understand. Also fixes a bug where we could end up waiting on c->freelist_wait (because allocating from one device failed) but return success from bch2_bucket_alloc(), because allocating from a different device succeeded. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:10 -04:00
Kent Overstreet	7b3f84ea7d	bcachefs: Split out alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:10 -04:00
Kent Overstreet	216c9facfd	bcachefs: Pass around bset_tree less Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:09 -04:00
Kent Overstreet	271a3d3a4b	bcachefs: lift ordering restriction on 0 size extents This lifts the restriction that 0 size extents must not overlap with other extents, which means we can now sort extents and non extents the same way, and will let us simplify a bunch of other stuff as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:09 -04:00
Kent Overstreet	5b650fd11a	bcachefs: Account for internal fragmentation better Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	a7c7a3092e	bcachefs: bch2_mark_key() now takes bch_data_type Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	b2be7c8b73	bcachefs: kill bucket mark sector count saturation Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	97446a242a	bcachefs: Fix device add Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:07 -04:00
Kent Overstreet	1c6fdbd8f2	bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:07 -04:00

... 2 3 4 5 6 ...

338 Commits