linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-08-18 17:17:55 +00:00

Author	SHA1	Message	Date
Kent Overstreet	2232fa397c	bcachefs: Only allocate buckets_nouse when requested It's only needed by the migrate tool - this patch adds an option to enable allocating it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:24 -04:00
Kent Overstreet	2e63e18066	bcachefs: Stash a copy of key being overwritten in btree_insert_entry We currently need to call bch2_btree_path_peek_slot() multiple times in the transaction commit path - and some of those need to be updated to also check the keys from journal replay, too. Let's consolidate this and stash the key being overwritten in btree_insert_entry. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:23 -04:00
Kent Overstreet	80bf2f3454	bcachefs: Fix freeing in bch2_dev_buckets_resize() We were double-freeing old_buckets and not freeing old_buckets_gens: also, the code was supposed to free buckets, not old_buckets; old_buckets is only needed because we have to use rcu_assign_pointer() instead of swap(), and won't be set if we hit the error path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:23 -04:00
Kent Overstreet	0678cbe2cb	bcachefs: Ignore cached data when calculating fragmentation Previously, bucket fragmentation was considered to be bucket size - total amount of live data, both dirty and cached. This meant that if a bucket was full but only a small amount of data in it was dirty - the rest cached, we'd get stuck: copygc wouldn't move the dirty data out of the bucket and the allocator wouldn't be able to invalidate and drop the cached data. This changes fragmentation to exclude cached data, so that copygc will evacuate these buckets and copygc/the allocator will always be able to make forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:22 -04:00
Kent Overstreet	3763cb9566	bcachefs: Don't use in-memory bucket array for alloc updates More prep work for getting rid of the in-memory bucket array: now that we have BTREE_ITER_WITH_JOURNAL, the allocator code can do ntree lookups before journal replay is finished, and there's no longer any need for it to get allocation information from the in-memory bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:22 -04:00
Kent Overstreet	21aec962df	bcachefs: New data structure for buckets waiting on journal commit Implement a hash table, using cuckoo hashing, for empty buckets that are waiting on a journal commit before they can be reused. This replaces the journal_seq field of bucket_mark, and is part of eventually getting rid of the in memory bucket array. We may need to make bch2_bucket_needs_journal_commit() lockless, pending profiling and testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:22 -04:00
Kent Overstreet	f443fa66c9	bcachefs: Also print out in-memory gen on stale dirty pointer We're trying to track down a bug that shows itself as newly-created extents having stale dirty pointers - possibly due to the in memory gen and the btree gen being inconsistent. This patch changes the error message to also print out the in memory bucket gen when this happens. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:22 -04:00
Kent Overstreet	f0f41a6d74	bcachefs: Add error messages for memory allocation failures This adds some missing diagnostics from rare but annoying to debug runtime allocation failure paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	e3ad29379e	bcachefs: Optimize bucket reuse If the btree updates pointing to a bucket were never flushed by the journal before the bucket became empty again, we can reuse the bucket without a journal flush. This tweaks the tracking of journal sequence numbers in alloc keys to implement this optimization: now, we only update the journal sequence number in alloc keys on transitions to and from empty. When a bucket becomes empty, we check if we can tell the journal not to flush entries starting from when the bucket was used. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	13f914ecb9	bcachefs: Kill bch2_ec_mem_alloc() bch2_ec_mem_alloc() was only used by GC, and there's no real need to preallocate the stripes radix tree since we can cope fine with memory allocation failure when we use the radix tree. This deletes a fair bit of code, and it's also needed for the upcoming patch because bch2_btree_iter_peek_prev() won't be working before journal replay completes (and using it was incorrect previously, as well). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	36f035e908	bcachefs: Fix allocator + journal interaction The allocator needs to wait until the last update touching a bucket has been commited before writing to it again. However, the code was checking against the last dirty journal sequence number, not the last flushed journal sequence number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	a786087744	bcachefs: New in-memory array for bucket gens The main in-memory bucket array is going away, but we'll still need to keep bucket generations in memory, at least for now - ptr_stale() needs to be an efficient operation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	47ac34ec98	bcachefs: Separate out gc_bucket() Since the main in memory bucket array is going away, we don't want to be calling bucket() or __bucket() when what we want is the GC in-memory bucket. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	e75b2d4c1c	bcachefs: bch2_journal_key_insert() no longer transfers ownership bch2_journal_key_insert() used to assume that the key passed to it was allocated with kmalloc(), and on success took ownership. This patch deletes that behaviour, making it more similar to bch2_trans_update()/bch2_trans_commit(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	77170d0dd7	bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to decide what buckets are eligible to allocate, we can clean up the filesystem initialization and device add paths. Previously, we had to use ancient code to mark superblock/journal buckets in the in memory bucket marks as we allocated them, and then zero that out and re-do that marking using the newer transational bucket mark paths. Now, we can simply delete the in-memory bucket marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	8244f3209b	bcachefs: Option improvements This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	20572300dc	bcachefs: Improve alloc_mem_to_key() This moves some common code into alloc_mem_to_key(), which translates from the in-memory format for a bucket to the btree key format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	fb0e480872	bcachefs: bch2_alloc_write() This adds a new helper that much like the one we have for inode updates, that allocates the packed alloc key, packs it and calls bch2_trans_update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:18 -04:00
Kent Overstreet	990d42d187	bcachefs: Split out struct gc_stripe from struct stripe We have two radix trees of stripes - one that mirrors some information from the stripes btree in normal operation, and another that GC uses to recalculate block usage counts. The normal one is now only used for finding partially empty stripes in order to reuse them - the normal stripes radix tree and the GC stripes radix tree are used significantly differently, so this patch splits them into separate types. In an upcoming patch we'll be replacing c->stripes with a btree that indexes stripes by the order we want to reuse them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	94a3e1a6c1	bcachefs: bch2_trans_update() is now __must_check With snapshots, bch2_trans_update() has to check if we need a whitout, which can cause a transaction restart, so this is important now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	b547d005d5	bcachefs: Erasure coding fixes When we added the stripe and stripe_redundancy fields to alloc keys, we neglected to add them to the functions that convert back and forth with the in-memory types. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	181fe42a75	bcachefs: Handle replica marking fsck errors locally This simplifies the code quite a bit and eliminates an inconsistency - a given bkey doesn't necessarily translate to a single replicas entry for disk space accounting. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	58e1ea4bcb	bcachefs: Push c->mark_lock usage down to where it is needed This changes the bch2_mark_key() and related paths to take mark lock where it is needed, instead of taking it in the upper transaction commit path - by pushing down locking we'll be able to handle fsck errors locally instead of requiring a separate check in the btree_gc code for replicas being marked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	502cfb3591	bcachefs: Kill bch2_replicas_delta_list_marked() This changes bch2_trans_fs_usage_apply() to handle failure (replicas entry missing) by reverting the changes it made - meaning we can make the main transaction commit path a bit slimmer, and perhaps also simplify some locking in upcoming patches. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	f0c3f88b35	bcachefs: Run insert triggers before overwrite triggers Currently, btree triggers are run in natural key order, which presents a problem for fallocate in INSERT_RANGE mode: since we're moving existing extents to higher offsets, the trigger for deleting the old extent runs before the trigger that adds the new extent, potentially leading to indirect extents being deleted that shouldn't be when the delete causes the refcount to hit 0. This changes the order we run triggers so that for a givin btree, we run all insert triggers before overwrite triggers, nicely sidestepping this issue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	c714614bd0	bcachefs: Disk space accounting fix on brand-new fs The filesystem initialization path first marks superblock and journal buckets non transactionally, since the btree isn't functional yet. That path was updating the per-journal-buf percpu counters via bch2_dev_usage_update(), and updating the wrong set of counters so those updates didn't get written out until journal entry 4. The relevant code is going to get significantly rewritten in the future as we transition away from the in memory bucket array, so this just hacks around it for now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	076c783cd3	bcachefs: Fix upgrade path for reflink_p fix Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:16 -04:00
Kent Overstreet	3e52c22255	bcachefs: Add journal_seq to inode & alloc keys Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:16 -04:00
Kent Overstreet	2debb1b875	bcachefs: BTREE_TRIGGER_INSERT now only means insert This allows triggers to distinguish between a key entering the btree - i.e. being called from the trans commit path - vs. being called on a key that already exists, i.e. by GC. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	904823de49	bcachefs: Convert bch2_mark_key() to take a btree_trans * This helps to unify the interface between bch2_mark_key() and bch2_trans_mark_key() - and it also gives access to the journal reservation and journal seq in the mark_key path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	961b2d6282	bcachefs: Assorted ec fixes - The backpointer that ec_stripe_update_ptrs() uses now needs to include the snapshot ID, which means we have to change where we add the backpointer to after getting the snapshot ID for the new extents - ec_stripe_update_ptrs() needs to be calling bch2_trans_begin() - improve error message in bch2_mark_stripe() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	37f72492f4	bcachefs: Fix bch2_mark_update() When the old or new key doesn't exist, we should still pass in a deleted key with the correct pos. This fixes a bug in the ec code, when bch2_mark_stripe() was looking up the wrong in-memory stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	f3b1e19379	bcachefs: Improve error messages in trans_mark_reflink_p() We should always print out the key we were marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	396a887d8f	bcachefs: Fix fsck path for refink pointers The way __bch2_mark_reflink_p returns errors was clashing with returning the number of sectors processed - we weren't returning FSCK_ERR_EXIT correctly. Fix this by only using the return code for errors, which actually ends up simplifying the overall logic. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	6d76aefea1	bcachefs: Fix for leaking of reflinked extents When a reflink pointer points to only part of an indirect extent, and then that indirect extent is fragmented (e.g. by copygc), if the reflink pointer only points to one of the fragments we leak a reference. Fix this by storing front/back pad values in reflink pointers - when inserting reflink pointesr, we initialize them to cover the full range of the indirect extents we reference. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:14 -04:00
Kent Overstreet	dfc276df91	bcachefs: Improve reflink repair code When a reflink pointer points to an indirect extent that doesn't exist, we need to replace it with a KEY_TYPE_error key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:14 -04:00
Kent Overstreet	14b393ee76	bcachefs: Subvolumes, snapshots This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:12 -04:00
Kent Overstreet	67e0dd8f0d	bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:11 -04:00
Kent Overstreet	6fba6b83b4	bcachefs: Prefer using btree_insert_entry to btree_iter This moves some data dependencies forward, to improve pipelining. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:11 -04:00
Brett Holman	fd0bd123d5	bcachefs: Fix 32 bit build failures This fix replaces multiple 64 bit divisions with do_div() equivalents. Signed-off-by: Brett Holman <bholman.devel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:10 -04:00
Kent Overstreet	62df3d443c	bcachefs: Disk space accounting fix DIV_ROUND_UP() wasn't doing what we wanted when passing it negative numbers - fix it by just not passing it negative numbers anymore. Also, no need to do the scaling by compression ratio for incompressible data. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:10 -04:00
Kent Overstreet	297d89343d	bcachefs: Extensive triggers cleanups - We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:07 -04:00
Kent Overstreet	8c3f6da9fc	bcachefs: Improve iter->should_be_locked Adding iter->should_be_locked introduced a regression where it ended up not being set on the iterator passed to bch2_btree_update_start(), which is definitely not what we want. This patch requires it to be set when calling bch2_trans_update(), and adds various fixups to make that happen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	8ee529e9c1	bcachefs: Make sure bch2_trans_mark_update uses correct iter flags Now that bch2_btree_iter_peek_with_updates() has been removed in favor of BTREE_ITER_WITH_UPDATES, we need to make sure it's not used where we don't want it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	290448ed2e	bcachefs: Don't underflow c->sectors_available This rarely used error path should've been checking for underflow - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	953ee28a3e	bcachefs: Kill bch2_btree_iter_peek_cached() It's now been rolled into bch2_btree_iter_peek_slot() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	c1949baa51	bcachefs: Simplify reflink trigger Now that we only mark entire extents, we can ditch the "reflink_p_frag_references" code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	8e6bbc4181	bcachefs: Move extent_handle_overwrites() to bch2_trans_update() This lifts handling of overlapping extents out of __bch2_trans_commit() and moves it to where we first do the update - which means that BTREE_ITER_WITH_UPDATES can now work correctly in extents mode. Also, this patch reworks how extent triggers work: previously, on partial extent overwrite we would pass this information to the trigger, telling it what part of the extent was being overwritten. But, this approach has had too many subtle corner cases - now, we only mark whole extents, meaning on partial extent overwrite we unmark the old extent and mark the new extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	224ec3e677	bcachefs: Don't mark superblocks past end of usable space bcachefs-tools recently started putting a backup superblock at the end of the device. This causes a problem if the bucket size doesn't divide the device size - but we can fix it by just skipping marking that part. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	bc3f8b25f3	bcachefs: Check for errors from bch2_trans_update() Upcoming refactoring is going to change bch2_trans_update() to start returning transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	890b74f03d	bcachefs: Fsck for reflink refcounts Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	9eba7c8d15	bcachefs: Reflink refcount fix __bch2_trans_mark_reflink_p wasn't always correctly returning the number of sectors processed - the new logic is a bit more straightforward overall too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	7e94eeffe0	bcachefs: Inline fastpath of bch2_disk_reservation_add() The fastpath now doesn't even disable preemption - instead we use a (non locked) cmpxchg. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:04 -04:00
Dan Robertson	ed34341189	bcachefs: statfs resports incorrect avail blocks The current implementation of bch_statfs does not scale the number of available blocks provided in f_bavail by the reserve factor. This causes an allocation of a file of this size to fail. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	bbfcb4519d	bcachefs: Fix bch2_extent_can_insert() call It was being skipped when hole punching, leading to problems when splitting compressed extents. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:03 -04:00
Brett Holman	2cd0563461	bcachefs: made changes to support clang, fixed a couple bugs fs/bcachefs/bset.c edited prefetch macro to add clang support fs/bcachefs/btree_iter.c bugfix: initialize iter->real_pos in bch2_btree_iter_init for later use fs/bcachefs/io.c bugfix: eliminated undefined behavior (negative bitshift) fs/bcachefs/buckets.c bugfix: invert sign to handle 64bit abs() Signed-off-by: Brett Holman <bpholman5@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Dan Robertson	d125615a4e	bcachefs: properly initialize used values - Ensure the second key value in bch_hash_info is initialized to zero if the info type is of type BCH_STR_HASH_SIPHASH. - Initialize the possibly returned value in bch2_inode_create. Assuming bch2_btree_iter_peek returns bkey_s_c_null, the uninitialized value of ret could be returned to the user as an error pointer. - Fix compiler warning in initialization of bkey_s_c_stripe fs/bcachefs/buckets.c:1646:35: warning: suggest braces around initialization of subobject [-Wmissing-braces] struct bkey_s_c_stripe new_s = { NULL }; ^~~~ Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	933532b8b2	bcachefs: Fix reflink trigger The trigger for reflink pointers wasn't always incrementing/decrementing the refcounts correctly - this patch fixes that logic. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	3a402c8dab	bcachefs: Fix some refcounting bugs We really need debug mode assertions that ca->ref and ca->io_ref are used correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	d99af4f194	bcachefs: Call bch2_inconsistent_error() on missing stripe/indirect extent Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	eb365fbc33	bcachefs: Don't BUG() in update_replicas Apparently, we have a bug where in mark and sweep while accounting for a key, a replicas entry isn't found. Change the code to print out the key we couldn't mark and halt instead of a BUG_ON(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	04903131db	bcachefs: Handle errors in bch2_trans_mark_update() It's not actually the case that iterators are always checked here - __bch2_trans_commit() checks for that after running triggers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	dac1525d9c	bcachefs: gc shouldn't care about owned_by_allocator The owned_by_allocator field is a purely in memory thing, even if/when we bring back GC at runtime there's no need for it to be recalculating this field. This is prep work for pulling it out of struct bucket, and eventually getting rid of the bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	d62ab355d7	bcachefs: Fix bch2_trans_mark_dev_sb() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	319c130507	bcachefs: Fix heap overrun in bch2_fs_usage_read() XXX squash oops Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	ecc1420944	bcachefs: Fix an uninitialized variable Fortunately it was just used in an error message Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	35d5aff263	bcachefs: Kill bch2_fs_usage_scratch_get() This is an important cleanup, eliminating an unnecessary copy in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	9c2e624290	bcachefs: Fix livelock calling bch2_mark_bkey_replicas() The bug was that we were trying to find a replicas entry that wasn't sorted - but, we can also simplify the code by not using bch2_mark_bkey_replicas and instead ensuring the list of replicas entries exists directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	b753d4b338	bcachefs: Fix this_cpu_ptr() usage Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	65bcd6579d	buckets.c fixups XXX squash Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	e9895f0ab9	bcachefs: Assert that iterators aren't being double freed Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:56 -04:00
Kent Overstreet	b3b66e3044	bcachefs: Have fsck check for stripe pointers matching stripe Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	41f8b09edc	bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	9620c3ec2f	bcachefs: Add a mempool for the replicas delta list Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	8042b5b715	bcachefs: Extents may now cross btree node boundaries When snapshots arrive, we won't necessarily be able to arbitrarily split existis - when we need to split an existing extent, we'll have to check if the extent was overwritten in child snapshots and if so emit a whiteout for the split in the child snapshot. Because extents couldn't span btree nodes previously, journal replay would sometimes have to split existing extents. That's no good anymore, but fortunately since extent handling has already been lifted above most of the btree code there's no real need for that rule anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	180fb49dea	bcachefs: Journal updates to dev usage This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	7f4e1d5d0f	bcachefs: KEY_TYPE_alloc_v2 This introduces a new version of KEY_TYPE_alloc, which uses the new varint encoding introduced for inodes. This means we'll eventually be able to support much larger bucket sizes (for SMR devices), and the read/write time fields are expanded to 64 bits - which will be used in the next patch to get rid of the periodic rescaling of those fields. Also, for buckets that are members of erasure coded stripes, this adds persistent fields for the index of the stripe they're members of and the stripe redundancy. This is part of work to get rid of having to scan and read into memory the alloc and stripes btrees at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	bfcf840ddf	bcachefs: Mark superblocks transactionally More work towards getting rid of the in memory struct bucket: this path adds code for marking superblock and journal buckets via the btree, and uses it in the device add and journal resize paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	9afc6652d1	bcachefs: Kill bch2_invalidate_bucket() This patch is working towards eventually getting rid of the in memory struct bucket, and relying only on the btree representation. Since bch2_invalidate_bucket() was only used for incrementing gens, not invalidating cached data, no other counters were being changed as a side effect - meaning it's safe for the allocator code to increment the bucket gen directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	72eab8da47	bcachefs: Refactor dev usage This is to make it more amenable for serialization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	2ef220cba2	bcachefs: Fix double counting of stripe block counts by GC Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	cd9f3dfe58	bcachefs: Fix integer overflow in bch2_disk_reservation_get() The sectors argument shouldn't have been a u32 - it can be up to U32_MAX (i.e. fallocate creating persistent reservations), and if replication is enabled we'll overflow when we calculate the real number of sectors to reserve. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	2a3731e34d	bcachefs: Erasure coding fixes & refactoring - Originally bch_extent_stripe_ptr didn't contain the block index, instead we'd have to search through the stripe pointers to figure out which pointer matched. When the block field was added to bch_extent_stripe_ptr, not all of the code was updated to use it. This patch fixes that, and we also now verify that field where it makes sense. - The ec_stripe_buf_init/exit() functions have been improved, and are now used by the bch2_ec_read_extent() (recovery read) path. - get_stripe_key() is now used by bch2_ec_read_extent(). - We now have a getter and setter for checksums within a stripe, like we had previously for block sector counts, and ec_generate_checksums and ec_validate_checksums are now quite a bit smaller and cleaner. ec.c still needs a lot of work, but this patch is slowly moving things in the right direction. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	3187aa8d57	bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	719fe7fb55	bcachefs: Update transactional triggers interface to pass old & new keys This is needed to fix a bug where we're overflowing iterators within a btree transaction, because we're updating the stripes btree (to update block counts) and the stripes btree trigger is unnecessarily updating the alloc btree - it doesn't need to update the alloc btree when the pointers within a stripe aren't changing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	f299d57350	bcachefs: Refactor filesystem usage accounting Various filesystem usage counters are kept in percpu counters, with one set per in flight journal buffer. Right now all the code that deals with it assumes that there's only two buffers/sets of counters, but the number of journal bufs is getting increased to 4 in the next patch - so refactor that code to not assume a constant. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	3eb26d0157	bcachefs: bch2_trans_get_iter() no longer returns errors Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	101d471367	bcachefs: Fix a 64 bit divide this fixes builds on 32 bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	801a3de642	bcachefs: Indirect inline data extents When inline data extents were added, reflink was forgotten about - we need indirect inline data extents for reflink + inline data to work correctly. This patch adds them, and a new feature bit that's flipped when they're used. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:45 -04:00
Kent Overstreet	5b088c1dd0	bcachefs: Fix bch2_mark_stripe() There's no reason not to always recalculate these fields Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:45 -04:00
Kent Overstreet	b88e971e45	bcachefs: Don't drop replicas when copygcing ec data Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:45 -04:00
Kent Overstreet	af4d05c46b	bcachefs: Account for stripe parity sectors separately Instead of trying to charge EC parity to the data within the stripe (which is subject to rounding errors), let's charge it to the stripe itself. It should also make -ENOSPC issues easier to deal with if we charge for parity blocks up front, and means we can also make more fine grained accounting available to the user. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:45 -04:00
Kent Overstreet	39283c712e	bcachefs: Fix for bad stripe pointers The allocator usually doesn't increment bucket gens right away on buckets that it's about to hand out (for reasons that need to be documented), instead deferring that to whatever extent update first references that bucket. But stripe pointers reference buckets without changing bucket sector counts, meaning we could end up with a pointer in a stripe with a gen newer than the bucket it points to. Fix this by adding a transactional trigger for KEY_TYPE_stripe that just writes out the keys in the alloc btree for the buckets it points to. Also - consolidate the code that checks pointer validity. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	f3721e12d0	bcachefs: Perf improvements for bch_alloc_read() On large filesystems reading in the alloc info takes a significant amount of time. But we don't need to be calling into the fully general bch2_mark_key() path, just open code what we need in bch2_alloc_read_fn(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	9ee38f62da	bcachefs: Fix off-by-one error in ptr gen check Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	3d080aa52f	bcachefs: Delete unused arguments Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	e6d1161530	bcachefs: Make copygc thread global Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	89fd25be70	bcachefs: Use x-macros for data types Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	ba6dd1dd49	bcachefs: Improve stripe triggers/heap code Soon we'll be able to modify existing stripes - replacing empty blocks with new blocks and new p/q blocks. This patch updates the trigger code to handle pointers changing in an existing stripe; also, it significantly improves how the stripes heap works, which means we can get rid of the stripe creation/deletion lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	e63534a201	bcachefs: Rework triggers interface The trigger for stripe keys is shortly going to need both the old and the new key passed to the trigger - this patch does that rework. For now, this just changes the in memory triggers, and this doesn't change how extent triggers work. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	697e45b230	bcachefs: Kill BTREE_TRIGGER_NOOVERWRITES This is prep work for reworking the triggers machinery - we have triggers that need to know both the old and the new key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	64f2a8803e	bcachefs: Fix bch2_extent_can_insert() not being called It's supposed to check whether we're splitting a compressed extent and if so get a bigger disk reservation - hence this fixes a "disk usage increased by x without a reservaiton" bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	649a9b68ac	bcachefs: Track sectors of erasure coded data Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	b9c3d13978	bcachefs: Fix a deadlock in the RO path Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	1d18678962	bcachefs: delete a slightly faulty assertion state lock isn't held at startup Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	5d20ba48f0	bcachefs: Use cached iterators for alloc btree Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	2ca88e5ad9	bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	1ada160618	bcachefs: Turn c->state_lock into an rwsem Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	255adc515a	bcachefs: Always increment bucket gen on bucket reuse Not doing so confuses copygc Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	9ef846a7a1	bcachefs: Improve assorted error messages This also consolidates the various checks in bch2_mark_pointer() and bch2_trans_mark_pointer(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	baeed3c3c0	bcachefs: Don't require alloc btree to be updated before buckets are used This is to break a circular dependency in the shutdown path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	00b8ccf707	bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	aafcf9bc12	bcachefs: Better error messages on bucket sector count overflows Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	19f24758ef	bcachefs: Don't use peek_filter() unnecessarily Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:36 -04:00
Kent Overstreet	e3e464ac6d	bcachefs: Move extent overwrite handling out of core btree code Ever since the btree code was first written, handling of overwriting existing extents - including partially overwriting and splittin existing extents - was handled as part of the core btree insert path. The modern transaction and iterator infrastructure didn't exist then, so that was the only way for it to be done. This patch moves that outside of the core btree code to a pass that runs at transaction commit time. This is a significant simplification to the btree code and overall reduction in code size, but more importantly it gets us much closer to the core btree code being completely independent of extents and is important prep work for snapshots. This introduces a new feature bit; the old and new extent update models are incompatible when the filesystem needs journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:36 -04:00
Kent Overstreet	2e70ce5634	bcachefs: More btree iter invariants Ensure that iter->pos always lies between the start and end of iter->k (the last key returned). Also, bch2_btree_iter_set_pos() now invalidates the key that peek() or next() returned. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:36 -04:00
Kent Overstreet	38f0664a5f	bcachefs: Fix error message on bucket sector count overflow Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:35 -04:00
Kent Overstreet	548b3d209f	bcachefs: btree_ptr_v2 Add a new btree ptr type which contains the sequence number (random 64 bit cookie, actually) for that btree node - this lets us verify that when we read in a btree node it really is the btree node we wanted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:35 -04:00
Kent Overstreet	24326cd12a	bcachefs: Sort & deduplicate updates in bch2_trans_update() Previously, when doing multiple update in the same transaction commit that overwrote each other, we relied on doing the updates in the same order as the bch2_trans_update() calls in order to get the correct result. But that wasn't correct for triggers; bch2_trans_mark_update() when marking overwrites would do the wrong thing because it hadn't seen the update that was being overwritten. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	2d594dfb53	bcachefs: Split out btree_trigger_flags The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	54e86b5813	bcachefs: Make btree_insert_entry more private to update path This should be private to btree_update_leaf.c, and we might end up removing it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	ef496cd268	bcachefs: Don't BUG_ON() sector count overflow Return an error instead (still work in progress...) Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:31 -04:00
Kent Overstreet	b7ba66c845	bcachefs: Inline more of bch2_trans_commit hot path The main optimization here is that if we let bch2_replicas_delta_list_apply() fail, we can completely skip calling bch2_bkey_replicas_marked_locked(). And assorted other small optimizations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:30 -04:00
Kent Overstreet	ff929515cc	bcachefs: Trust btree alloc info at runtime This lets us avoid a cache miss in the write path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:30 -04:00
Kent Overstreet	77d63522f0	bcachefs: Make replicas_delta_list smaller Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:30 -04:00
Kent Overstreet	43de7376f3	bcachefs: Fix erasure coding disk space accounting Disk space accounting for erasure coding + compression was completely broken - we need to calculate the parity sectors delta the same way we calculate disk_sectors, by calculating the old and new usage and subtracting to get the difference. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	37954a275f	bcachefs: Limit pointers to being in only one stripe This make the disk accounting code saner, and it's not clear why we'd ever want the same data to be in multiple stripes simultaneously. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	332c6e5370	bcachefs: Fix bch2_mark_extent() If an extent only contained cached or erasure coded pointers, there won't be any devices in the normal dirty replicas list or an entry to update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	64bc001153	bcachefs: Rework btree iterator lifetimes The btree_trans struct needs to memoize/cache btree iterators, so that on transaction restart we don't have to completely redo btree lookups, and so that we can do them all at once in the correct order when the transaction had to restart to avoid a deadlock. This switches the btree iterator lookups to work based on iterator position, instead of trying to match them up based on the stack trace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:28 -04:00
Kent Overstreet	a7199432c3	bcachefs: Kill deferred btree updates Will be replaced by cached btree iterators Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:28 -04:00
Kent Overstreet	6309589468	bcachefs: Improved bch2_fcollapse() Move extents instead of copying them - this way, we can iterate over only live extents, not the entire keyspace. Also, this means we can mostly skip running triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	36e9d69854	bcachefs: Do updates in order they were queued up in Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	4430ea7046	bcachefs: Kill BTREE_INSERT_NOMARK_INSERT Was dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	78854fca28	bcachefs: Fix BTREE_INSERT_NOMARK_OVERWRITES bch2_mark_update() was correct, but bch2_trans_mark_update() wasn't respecting BTREE_INSERT_NOMARK_OVERWRITES - key marking/triggers really need to be cleaned up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	06ab329c15	bcachefs: Improve pointer marking checks and error messages Importantly, we don't want to use bch2_fs_inconsistent_on() for errors that fsck can repair, becuase that will just put us in RO mode and prevent fsck from actually fixing stuff. Probably want to get rid of it in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	9940a791ea	bcachefs: Fix error message on bucket overflow Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	df5d4dae0b	bcachefs: Fixes for replicas tracking The continue statement in bch2_trans_mark_extent() was wrong - by bailing out early, we'd be constructing the wrong replicas list to update. Also, the assertion in update_replicas() was wrong - due to rounding with compressed extents, it is possible for sectors to be 0 sometimes. Also, change extent_to_replicas() in replicas.c to match the replicas list we construct in buckets.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	6671a7089f	bcachefs: Refactor bch2_alloc_write() Major simplification - gets rid of the need for marking buckets as dirty, instead we write buckets if the in memory mark is different from what's in the btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	67163cded3	bcachefs: Trust in memory bucket mark This fixes a bug in the journal replay -> extent_replay_key -> split_compressed path, when we do an update that changes alloc info but the alloc info in the btree isn't up to date yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:26 -04:00
Kent Overstreet	41fcd62150	bcachefs: Fix faulty assertion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:25 -04:00
Kent Overstreet	76426098e4	bcachefs: Reflink Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:25 -04:00
Kent Overstreet	3c7f3b7aeb	bcachefs: Refactor bch2_extent_trim_atomic() for reflink Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:25 -04:00
Kent Overstreet	2cbe5cfe27	bcachefs: Rework calling convention for marking overwrites Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:25 -04:00
Kent Overstreet	e3d3a9d91a	bcachefs: trans_get_key() now works correctly for extents More prep work for reflink: for extents, we're not looking for an exact mach on pos, rather that the pos is within the range of the key the iterator points to. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:24 -04:00
Kent Overstreet	0c04f5eb0d	bcachefs: Don't overflow trans with iters from triggers Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:24 -04:00
Kent Overstreet	8d591d5da4	bcachefs: Convert some assertions to fsck errors Actual repair code will come later, but this is a start Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:24 -04:00
Kent Overstreet	91052b9de8	bcachefs: Refactor trans_(get\|update)_key these are still pretty ugly... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:23 -04:00
Kent Overstreet	88767d65d8	bcachefs: Update path now handles triggers that generate more triggers Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:23 -04:00
Kent Overstreet	6e738539cd	bcachefs: Improve key marking interface Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:22 -04:00
Kent Overstreet	4ee202e2b7	bcachefs: better BTREE_INSERT_NO_CLEAR_REPLICAS Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:22 -04:00
Kent Overstreet	3838be7841	bcachefs: Don't use a fixed size buffer for fs_usage_deltas Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:22 -04:00
Kent Overstreet	6fb076e60d	bcachefs: Fix spurious inconsistency in recovery Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:22 -04:00
Kent Overstreet	7cfac5f506	bcachefs: Fix for the stripes mark path and gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:21 -04:00
Kent Overstreet	460651ee86	bcachefs: Various improvements to bch2_alloc_write() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:21 -04:00
Kent Overstreet	932aa83745	bcachefs: bch2_trans_mark_update() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:21 -04:00
Kent Overstreet	5e82a9a1f4	bcachefs: Write out fs usage consistently Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:21 -04:00
Kent Overstreet	fca1223ccf	bcachefs: Avoid write lock on mark_lock mark_lock is a frequently taken lock, and there's also potential for deadlocks since currently bch2_clear_page_bits which is called from memory reclaim has to take it to drop disk reservations. The disk reservation get path takes it when it recalculates the number of sectors known to be available, but it's not really needed for consistency. We just want to make sure we only have one thread updating the sectors_available count, which we can do with a dedicated mutex. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:21 -04:00
Kent Overstreet	94f651e2c7	bcachefs: Return errors from for_each_btree_key() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	201a4d4cbe	bcachefs: fix triggers for stripes btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	c6dd04f8f5	bcachefs: Mark overwrites from journal replay in initial gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	a1d58243f9	bcachefs: add ability to run gc on metadata only Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	36e916e13b	bcachefs: Caller now responsible for calling mark_key for gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:19 -04:00
Kent Overstreet	3a0e06db71	bcachefs: Assorted preemption fixes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:19 -04:00
Kent Overstreet	4d8100daa9	bcachefs: Allocate fs_usage in do_btree_insert_at() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:18 -04:00
Kent Overstreet	0dc17247f1	bcachefs: kill struct btree_insert Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:18 -04:00
Kent Overstreet	59928c1220	bcachefs: Don't BUG_ON() on bucket sector count overflow Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:17 -04:00
Kent Overstreet	ecf37a4a80	bcachefs: fs_usage_u64s() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	768ac63924	bcachefs: Add a mechanism for blocking the journal Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	8fe826f90a	bcachefs: Convert bucket invalidation to key marking path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	73c27c6095	bcachefs: fixes for cached data accounting Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	8777210b92	bcachefs: refactor key marking code a bit Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	2ecc6171a3	bcachefs: Fix double counting when gc is running Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	39fbc5a49f	bcachefs: gc lock no longer needed for disk reservations Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	76f4c7b0c3	bcachefs: Fix oldest_gen handling Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	3577df5f7f	bcachefs: serialize persistent_reserved Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	3e0745e283	bcachefs: initialize fs usage summary in recovery Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	4c97e04aa8	bcachefs: percpu utility code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	bdba6c29ff	bcachefs: fix inode counting Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	61c8d7c8eb	bcachefs: Persist stripe blocks_used Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	430735cd1a	bcachefs: Persist alloc info on clean shutdown - Does not persist alloc info for stripes yet - Also does not yet include filesystem block/sector counts yet, from struct fs_usage - Not made use of just yet Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:14 -04:00
Kent Overstreet	7ef2a73a58	bcachefs: Fix check for if extent update is allocating Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:14 -04:00
Kent Overstreet	d0cc3defba	bcachefs: More allocator startup improvements Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:14 -04:00
Kent Overstreet	23f80d2b3b	bcachefs: Factor out acc_u64s() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:14 -04:00
Kent Overstreet	06b7345cc2	bcachefs: Include summarized counts in fs_usage Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:13 -04:00
Kent Overstreet	5663a41521	bcachefs: refactor bch_fs_usage Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:13 -04:00
Kent Overstreet	641ab73643	bcachefs: improve/clarify ptr_disk_sectors() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:13 -04:00
Kent Overstreet	9166b41db1	bcachefs: s/usage_lock/mark_lock better describes what it's for, and we're going to call a new lock usage_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:13 -04:00
Kent Overstreet	8eb7f3ee46	bcachefs: move dirty into bucket_mark Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	f0cfb963ec	bcachefs: Track nr_inodes with the key marking machinery Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	26609b619f	bcachefs: Make bkey types globally unique this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	eeb83e25bb	bcachefs: Hold usage_lock over mark_key and fs_usage_apply Fixes an inconsistency at the end of gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	dfe9bfb32e	bcachefs: Stripes now properly subject to gc gc now verifies the contents of the stripes radix tree, important for persistent alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	9ca53b55f7	bcachefs: gc now operates on second set of bucket marks This means we can now use gc to verify the allocation information - important for testing persistant alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	61274e9d45	bcachefs: Allocator startup improvements Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	cd575ddf57	bcachefs: Erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:11 -04:00
Kent Overstreet	b35b192583	bcachefs: Move key marking out of extents.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:11 -04:00
Kent Overstreet	4628529f15	bcachefs: Disk usage in compressed sectors, not uncompressed Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:11 -04:00
Kent Overstreet	8b335baef2	bcachefs: Assorted fixes for running on very small devices It's now possible to create and use a filesystem on a 512k device with 4k buckets (though at that size we still waste almost half to internal reserves) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:11 -04:00
Kent Overstreet	b092dadd55	bcachefs: Scale down number of writepoints when low on space this means we don't have to reserve space for them when calculating filesystem capacity Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:11 -04:00
Kent Overstreet	47799326bc	bcachefs: more key marking refactoring prep work for erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:10 -04:00
Kent Overstreet	1742237ba1	bcachefs: extent_for_each_ptr_decode() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:10 -04:00
Kent Overstreet	7b3f84ea7d	bcachefs: Split out alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:10 -04:00
Kent Overstreet	6eac2c2e24	bcachefs: Change how replicated data is accounted Due to compression, the different replicas of a replicated extent don't necessarily have to take up the same amount of space - so replicated data sector counts shouldn't be stored divided by the number of replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	5b650fd11a	bcachefs: Account for internal fragmentation better Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	09f3297ac9	bcachefs: kill s_alloc, use bch_data_type Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	a7c7a3092e	bcachefs: bch2_mark_key() now takes bch_data_type Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	3142e7ef4b	bcachefs: fix nbuckets usage on device resize Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	b29e197aaf	bcachefs: Invalidate buckets when writing to alloc btree Prep work for persistent alloc information. Refactoring also lets us make free_inc much smaller, which means a lot fewer buckets stranded on freelists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	b2be7c8b73	bcachefs: kill bucket mark sector count saturation Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	c692399529	bcachefs: don't call bch2_bucket_seq_cleanup from journal_buf_switch journal_buf_switch is called from the foreground when getting a journal reservation and thus is somewhat latency sensitive; bch2_bucket_seq_cleanup has to run infrequently but is a bit expensive when it does run. Call it from the journal write path instead, and punt the journal write to worqueue context. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:08 -04:00
Kent Overstreet	1c6fdbd8f2	bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:07 -04:00

... 3 4 5 6 7 ...

411 Commits