Commit Graph

144 Commits

Author SHA1 Message Date
Kent Overstreet
9432e90df1 bcachefs: Check for invalid bucket from bucket_gen(), gc_bucket()
Turn more asserts into proper recoverable error paths.

Reported-by: syzbot+246b47da27f8e7e7d6fb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
9c4acd19bb bcachefs: Replace bucket_valid() asserts in bucket lookup with proper checks
The bucket_gens array and gc_buckets array known their own size; we
should be using those members, and returning an error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
3858aa4268 bcachefs: ptr_stale() -> dev_ptr_stale()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
1f2f92ec3f bcachefs: PTR_BUCKET_POS() now takes bch_dev
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
07d7c4da7b bcachefs: bch2_bucket_ref_update() now takes bch_dev
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
9b3059a1b3 bcachefs: bch2_check_alloc_key() -> bch2_dev_tryget_noerror()
More elimination of bch2_dev_bkey_exists() usage.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet
ffcbec6076 bcachefs: Kill opts.buckets_nouse
Now explicitly allocate and free the buckets_nouse bitmap - this is
going to be used for online fsck.

To go RW when we haven't check allocations, we'll do a much slimmed down
version that just initializes the buckets_nouse bitmaps.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet
c02eb9e891 bcachefs: kill bch2_dev_usage_update_m()
by using bucket_m_to_alloc() more, we can get some nice code cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet
f40d13f94d bcachefs: Run bch2_check_fix_ptrs() via triggers
Currently, the reflink_p gc trigger does repair as well - turning a
reflink_p key into an error key if the reflink_v it points to doesn't
exist.

This won't work with online check/repair, because the repair path once
online will be subject to transaction restarts, but BTREE_TRIGGER_gc is
not idempotant - we can't run it multiple times if we get a transaction
restart.

So we need to split these paths; to do so this patch calls
check_fix_ptrs() by a new general path - a new trigger type,
BTREE_TRIGGER_check_repair.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet
70e3e039cf bcachefs: bch2_bucket_ref_update()
If we hit an inconsistency when updating allocation information, we
don't want to fail the update if it's for a deletion - only if it's for
a new key.

Rename check_bucket_ref() -> bucket_ref_update() so we can centralize
the logic to do this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet
2f724563fc bcachefs: member helper cleanups
Some renaming for better consistency

bch2_member_exists	-> bch2_member_alive
bch2_dev_exists		-> bch2_member_exists
bch2_dev_exsits2	-> bch2_dev_exists
bch_dev_locked		-> bch2_dev_locked
bch_dev_bkey_exists	-> bch2_dev_bkey_exists

new helper - bch2_dev_safe

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
d155272b6e bcachefs: bucket_valid()
cut out a branch from doing it the obvious way

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
5dd8c60e1e bcachefs: iter/update/trigger/str_hash flag cleanup
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.

These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
c281db0fa5 bcachefs: mark_superblock cleanup
Consolidate mark_superblock() and trans_mark_superblock(), like we did
with the other trigger paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
9abb6dd7ce bcachefs: Standardize helpers for printing enum strs with bounds checks
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:17 -04:00
Kent Overstreet
e2a316b3cc bcachefs: BCH_WATERMARK_interior_updates
This adds a new watermark, higher priority than BCH_WATERMARK_reclaim,
for interior btree updates. We've seen a deadlock where journal replay
triggers a ton of btree node merges, and these use up all available open
buckets and then interior updates get stuck.

One cause of this is that we're currently lacking btree node merging on
write buffer btrees - that needs to be fixed as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01 21:14:02 -04:00
Kent Overstreet
5b14ce35af bcachefs: bch2_trans_account_disk_usage_change()
The disk space accounting rewrite is splitting out accounting for each
replicas set - those are moving to btree keys, instead of percpu
counters.

This breaks bch2_trans_fs_usage_apply() up, splitting out the part we
will still need.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 06:01:45 -05:00
Kent Overstreet
e58f963cec bcachefs: helpers for printing data types
We need bounds checking since new versions may introduce new data types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 06:01:45 -05:00
Kent Overstreet
4f9ec59f8f bcachefs: unify extent trigger
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet
f4f78779bb bcachefs: move stripe triggers to ec.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet
6820ac2cdc bcachefs: move bch2_mark_alloc() to alloc_background.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet
6cacd0c414 bcachefs: unify reservation trigger
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet
282e7c37eb bcachefs: kill mem_trigger_run_overwrite_then_insert()
now that type signatures are unified, redundant

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet
ad00bce07d bcachefs: mark now takes bkey_s
Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet
717296c34c bcachefs: trans_mark now takes bkey_s
Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet
0d963a635d bcachefs: Move reflink_p triggers into reflink.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
ed0cd515cd bcachefs: bch2_dev_usage_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
25f64e997e bcachefs: Don't use update_cached_sectors() in bch2_mark_alloc()
bch2_update_cached_sectors_list() is closer to how the new disk space
accounting works, called from trans_mark().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
523f33efbf bcachefs: All triggers are BTREE_TRIGGER_WANTS_OLD_AND_NEW
Upcoming rebalance_work btree will require extent triggers to be
BTREE_TRIGGER_WANTS_OLD_AND_NEW - so to reduce potential confusion,
let's just make all triggers BTREE_TRIGGER_WANTS_OLD_AND_NEW.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kent Overstreet
bbe682c767 bcachefs: Ensure devices are always correctly initialized
We can't mark device superblocks or allocate journal on a device that
isn't online.

That means we may need to do this on every mount, because we may have
formatted a new filesystem and then done the first mount
(bch2_fs_initialize()) in degraded mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kent Overstreet
73bbeaa2de bcachefs: bucket_lock() is now a sleepable lock
fsck_err() may sleep - it takes a mutex and may allocate memory, so
bucket_lock() needs to be a sleepable lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:15 -04:00
Kent Overstreet
8c2d82a6fe bcachefs: Change bucket_lock() to use bit_spin_lock()
bucket_lock() previously open coded a spinlock, because we need to cram
a spinlock into a single byte.

But it turns out not all archs support xchg() on a single byte; since we
need struct bucket to be small, this means we have to play fun games
with casts and ifdefs for endianness.

This fixes building on 32 bit arm, and likely other architectures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:14 -04:00
Josh Poimboeuf
3764647b25 bcachefs: Remove undefined behavior in bch2_dev_buckets_reserved()
In general it's a good idea to avoid using bare unreachable() because it
introduces undefined behavior in compiled code.  In this case it even
confuses GCC into emitting an empty unused
bch2_dev_buckets_reserved.part.0() function.

Use BUG() instead, which is nice and defined.  While in theory it should
never trigger, if something were to go awry and the BCH_WATERMARK_NR
case were to actually hit, the failure mode is much more robust.

Fixes the following warnings:

  vmlinux.o: warning: objtool: bch2_bucket_alloc_trans() falls through to next function bch2_reset_alloc_cursors()
  vmlinux.o: warning: objtool: bch2_dev_buckets_reserved.part.0() is missing an ELF size annotation

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
fb8e5b4cae bcachefs: sb-members.c
Split out a new file for bch_sb_field_members - we'll likely want to
move more code here in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
4dc5bb9adf bcachefs: move inode triggers to inode.c
bit of reorg

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:08 -04:00
Kent Overstreet
494036d862 bcachefs: BCH_WATERMARK_reclaim
Add another watermark for journal reclaim - this is needed for the next
patches, that unify BCH_WATERMARK with JOURNAL_WATERMARK.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00
Kent Overstreet
e53a961c6b bcachefs: Rename enum alloc_reserve -> bch_watermark
This is prep work for consolidating with JOURNAL_WATERMARK.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet
962210b281 bcachefs: Fix a buffer overrun in bch2_fs_usage_read()
We were copying the size of a struct bch_fs_usage_online to a struct
bch_fs_usage, which is 8 bytes smaller.

This adds some new helpers so we can do this correctly, and get rid of
some magic +1s too.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:01 -04:00
Kent Overstreet
e84face6f0 bcachefs: RESERVE_stripe
Rework stripe creation path - new algorithm for deciding when to create
new stripes or reuse existing stripes.

We add a new allocation watermark, RESERVE_stripe, above RESERVE_none.
Then we always try to create a new stripe by doing RESERVE_stripe
allocations; if this fails, we reuse an existing stripe and allocate
buckets for it with the reserve watermark for the given write
(RESERVE_none or RESERVE_movinggc).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
b1cfe5ed2b bcachefs: Improve dev_alloc_debug_to_text()
Now we also print the number of buckets reserved for each watermark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
2611a041ae bcachefs: bch2_mark_key() now takes btree_id & level
btree & level are passed to trans_mark - for backpointers -
bch2_mark_key() should take them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
a8c752bb1d bcachefs: New on disk format: Backpointers
This patch adds backpointers: we now have a reverse index from device
and offset on that device (specifically, offset within a bucket) back to
btree nodes and (non cached) data extents.

The first 40 backpointers within a bucket are stored in the alloc key;
after that backpointers spill over to the next backpointers btree. This
is to help avoid performance regressions from additional btree updates
on large streaming workloads.

This patch adds all the code for creating, checking and repairing
backpointers. The next patch in the series is going to use backpointers
for copygc - finally getting rid of the need to scan all extents to do
copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:50 -04:00
Kent Overstreet
920e69bc3d bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.

This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.

This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.

Changes:
 - A new btree_insert_type enum, for btree_insert_entries. Specifies
   btree, btree key cache, or btree write buffer.

 - bch2_trans_update_buffered(): updates via the btree write buffer
   don't need a btree path, so we need a new update path.

 - Transaction commit path changes:
   The update to the btree write buffer both mutates global, and can
   fail if there isn't currently room. Therefore we do all write buffer
   updates in the transaction all at once, and also if it fails we have
   to revert filesystem usage counter changes.

   If there isn't room we flush the write buffer in the transaction
   commit error path and retry.

 - A new persistent option, for specifying the number of entries in the
   write buffer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:50 -04:00
Kent Overstreet
b2d1d56b1d bcachefs: Fixes for building in userspace
- Marking a non-static function as inline doesn't actually work and is
   now causing problems - drop that

 - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages
   with bcachefs (filesystem name)

 - Userspace doesn't have real percpu variables (maybe we can get this
   fixed someday), put an #ifdef around bch2_disk_reservation_add()
   fastpath

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:46 -04:00
Kent Overstreet
77671e8fff bcachefs: Move bkey bkey_unpack_key() to bkey.h
Long ago, bkey_unpack_key() was added to bset.h instead of bkey.h
because bkey.h didn't include btree_types.h, which it needs for the
compiled unpack function.

This patch finally moves it to the proper location.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
ed80c5699a bcachefs: Optimize bch2_dev_usage_read()
- add bch2_dev_usage_read_fast(), which doesn't return by value -
   bch_dev_usage is big enough that we don't want the silent memcpy
 - tweak the allocation path to only call bch2_dev_usage_read() once per
   bucket allocated

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Daniel Hill
58aaa0836b bcachefs: fix __dev_available().
__dev_available() now calculates available buckets correctly. Previously
it would almost always return 0 when we have cached data.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:34 -04:00
Kent Overstreet
30f0349d62 bcachefs: Split out dev_buckets_free()
Previously, dev_buckets_available() only counted buckets that are
eligible to be allocated right now - i.e. buckets that don't have cached
data, or need discard, or need gc gens, etc.

But most users of this function want to know how many buckets are
eligible to be allocated from without moving data around - copygc,
allocator striping, which means we should be including cached data
buckets etc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
e1b8f5f5ca bcachefs: Plumb btree_id & level to trans_mark
For backpointers, we'll need the full key location - that means btree_id
and btree level. This patch plumbs it through.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet
822835ffea bcachefs: Fold bucket_state in to BCH_DATA_TYPES()
Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00