Commit Graph

284 Commits

Author SHA1 Message Date
Kent Overstreet
c9cabfb215 bcachefs: improve bch2_new_stripe_to_text()
also print out the new stripe key

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
a4b7a0c037 bcachefs: ec_stripe_head.nr_created
additional debug stat

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
fa85c47397 bcachefs: bch_stripe.disk_label
When reshaping existing stripes, we should keep them on the same target
that they were allocated on; to do this, we need to add a field to the
btree stripe type.

This is a tad awkward, because we only have 8 bits left, and targets are
16 bits - but we only need to store a label, not a full target.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
1b11c4d365 bcachefs: stripe_to_mem()
factor out a common helper

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
54a12984a9 bcachefs: EIO errcode cleanup
We want to be using private errcodes whenever possible, for better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
2e95497e81 bcachefs: fix prototype to bch2_alloc_sectors_start_trans()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:35:20 -04:00
Kent Overstreet
d97de0d017 bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()
bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.

This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.

Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13 23:00:50 -04:00
Kent Overstreet
2caca9fb16 bcachefs: ec should not allocate from ro devs
This fixes a device removal deadlock when using erasure coding.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-07 08:31:10 -04:00
Kent Overstreet
c1e4446247 bcachefs: Improved allocator debugging for ec
chasing down a device removal deadlock with erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-07 08:31:10 -04:00
Linus Torvalds
527eff227d - In the series "treewide: Refactor heap related implementation",
Kuan-Wei Chiu has significantly reworked the min_heap library code and
   has taught bcachefs to use the new more generic implementation.
 
 - Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
   reworks the cpumask and nodemask headers to make things generally more
   rational.
 
 - Kuan-Wei Chiu has sent along some maintenance work against our sorting
   library code in the series "lib/sort: Optimizations and cleanups".
 
 - More library maintainance work from Christophe Jaillet in the series
   "Remove usage of the deprecated ida_simple_xx() API".
 
 - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
   series "nilfs2: eliminate the call to inode_attach_wb()".
 
 - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB
   command error".
 
 - Plus the usual shower of singleton patches all over the place.  Please
   see the relevant changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2GvwAKCRDdBJ7gKXxA
 jlf/AP48xP5ilIHbtpAKm2z+MvGuTxJQ5VSC0UXFacuCbc93lAEA+Yo+vOVRmh6j
 fQF2nVKyKLYfSz7yqmCyAaHWohIYLgg=
 =Stxz
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - In the series "treewide: Refactor heap related implementation",
   Kuan-Wei Chiu has significantly reworked the min_heap library code
   and has taught bcachefs to use the new more generic implementation.

 - Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
   reworks the cpumask and nodemask headers to make things generally
   more rational.

 - Kuan-Wei Chiu has sent along some maintenance work against our
   sorting library code in the series "lib/sort: Optimizations and
   cleanups".

 - More library maintainance work from Christophe Jaillet in the series
   "Remove usage of the deprecated ida_simple_xx() API".

 - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
   series "nilfs2: eliminate the call to inode_attach_wb()".

 - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix
   GDB command error".

 - Plus the usual shower of singleton patches all over the place. Please
   see the relevant changelogs for details.

* tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits)
  ia64: scrub ia64 from poison.h
  watchdog/perf: properly initialize the turbo mode timestamp and rearm counter
  tsacct: replace strncpy() with strscpy()
  lib/bch.c: use swap() to improve code
  test_bpf: convert comma to semicolon
  init/modpost: conditionally check section mismatch to __meminit*
  init: remove unused __MEMINIT* macros
  nilfs2: Constify struct kobj_type
  nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro
  math: rational: add missing MODULE_DESCRIPTION() macro
  lib/zlib: add missing MODULE_DESCRIPTION() macro
  fs: ufs: add MODULE_DESCRIPTION()
  lib/rbtree.c: fix the example typo
  ocfs2: add bounds checking to ocfs2_check_dir_entry()
  fs: add kernel-doc comments to ocfs2_prepare_orphan_dir()
  coredump: simplify zap_process()
  selftests/fpu: add missing MODULE_DESCRIPTION() macro
  compiler.h: simplify data_race() macro
  build-id: require program headers to be right after ELF header
  resource: add missing MODULE_DESCRIPTION()
  ...
2024-07-21 17:56:22 -07:00
Kent Overstreet
e0d5bc6a66 bcachefs: Fix missing BTREE_TRIGGER_bucket_invalidate flag
This fixes an accounting mismatch for cached data.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:16 -04:00
Kent Overstreet
fb23d57a6d bcachefs: Convert gc to new accounting
Rewrite fsck/gc for the new accounting scheme.

This adds a second set of in-memory accounting counters for gc to use;
like with other parts of gc we run all trigger in TRIGGER_GC mode, then
compare what we calculated to existing in-memory accounting at the end.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:13 -04:00
Kent Overstreet
1d16c605cc bcachefs: Disk space accounting rewrite
Main part of the disk accounting rewrite.

This is a wholesale rewrite of the existing disk space accounting, which
relies on percepu counters that are sharded by journal buffer, and
rolled up and added to each journal write.

With the new scheme, every set of counters is a distinct key in the
accounting btree; this fixes scaling limitations of the old scheme,
where counters took up space in each journal entry and required multiple
percpu counters.

Now, in memory accounting requires a single set of percpu counters - not
multiple for each in flight journal buffer - and in the future we'll
probably also have counters that don't use in memory percpu counters,
they're not strictly required.

An accounting update is now a normal btree update, using the btree write
buffer path. At transaction commit time, we apply accounting updates to
the in memory counters, which are percpu counters indexed in an
eytzinger tree by the accounting key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:13 -04:00
Kuan-Wei Chiu
1fcce6b8a7 bcachefs: remove heap-related macros and switch to generic min_heap
Drop the heap-related macros from bcachefs and replacing them with the
generic min_heap implementation from include/linux.  By doing so, code
readability is improved by using functions instead of macros.  Moreover,
the min_heap implementation in include/linux adopts a bottom-up variation
compared to the textbook version currently used in bcachefs.  This
bottom-up variation allows for approximately 50% reduction in the number
of comparison operations during heap siftdown, without changing the number
of swaps, thus making it more efficient.

[visitorckw@gmail.com: fix missing assignment of minimum element]
  Link: https://lkml.kernel.org/r/20240602174828.1955320-1-visitorckw@gmail.com
Link: https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu
Link: https://lkml.kernel.org/r/20240524152958.919343-17-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24 22:25:00 -07:00
Kent Overstreet
9432e90df1 bcachefs: Check for invalid bucket from bucket_gen(), gc_bucket()
Turn more asserts into proper recoverable error paths.

Reported-by: syzbot+246b47da27f8e7e7d6fb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
088d0de812 bcachefs: btree_gc can now handle unknown btrees
Compatibility fix - we no longer have a separate table for which order
gc walks btrees in, and special case the stripes btree directly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00
Kent Overstreet
65eaf4e24a bcachefs: s/bkey_invalid_flags/bch_validate_flags
We're about to start using bch_validate_flags for superblock section
validation - it's no longer bkey specific.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09 16:23:36 -04:00
Kent Overstreet
2c91ab7262 bcachefs: bch2_dev_get_ioref() checks for device not present
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09 16:23:36 -04:00
Kent Overstreet
3858aa4268 bcachefs: ptr_stale() -> dev_ptr_stale()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
c387d84413 bcachefs: ec_validate_checksums() -> bch2_dev_tryget()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
8783856ab1 bcachefs: ob_dev()
Wrapper around bch2_dev_have_ref() for open_buckets; we do guarantee
that the device an open_bucket points to exists.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
633cf06944 bcachefs: Kill bch2_dev_bkey_exists() in backpointer code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
1f2f92ec3f bcachefs: PTR_BUCKET_POS() now takes bch_dev
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
07d7c4da7b bcachefs: bch2_bucket_ref_update() now takes bch_dev
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
9a768ab75b bcachefs: bch2_bkey_drop_ptrs() declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet
abe2f470bc bcachefs: simplify bch2_trans_start_alloc_update()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet
0acf2169a5 bcachefs: __mark_stripe_bucket() now takes bch_alloc_v4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet
c02eb9e891 bcachefs: kill bch2_dev_usage_update_m()
by using bucket_m_to_alloc() more, we can get some nice code cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet
fa9bb741fe bcachefs: alloc_data_type_set()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet
f40d13f94d bcachefs: Run bch2_check_fix_ptrs() via triggers
Currently, the reflink_p gc trigger does repair as well - turning a
reflink_p key into an error key if the reflink_v it points to doesn't
exist.

This won't work with online check/repair, because the repair path once
online will be subject to transaction restarts, but BTREE_TRIGGER_gc is
not idempotant - we can't run it multiple times if we get a transaction
restart.

So we need to split these paths; to do so this patch calls
check_fix_ptrs() by a new general path - a new trigger type,
BTREE_TRIGGER_check_repair.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet
70e3e039cf bcachefs: bch2_bucket_ref_update()
If we hit an inconsistency when updating allocation information, we
don't want to fail the update if it's for a deletion - only if it's for
a new key.

Rename check_bucket_ref() -> bucket_ref_update() so we can centralize
the logic to do this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet
9cc455d1bc bcachefs: Consolidate mark_stripe_bucket() and trans_mark_stripe_bucket()
This eliminates some duplicated logic, and the gc path now handles
stripe updates and deletions - we need this since soon we're bringing
back runtime gc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet
d930764650 bcachefs: mark_stripe_bucket cleanup
Start to work on unifying mark_stripe_bucket() and
trans_mark_stripe_bucket(); first, clean up all the unnecessary and
gratuitious differences.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet
c4e8db2b5d bcachefs: bucket_data_type_mismatch()
We're working on potentially unifying bch2_check_bucket_ref() and
bch2_check_fix_ptrs() - or at least eliminating gratuitious differences.

Most immediately, there's a bunch of cleanups to be done regarding
BCH_DATA_stripe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet
2f724563fc bcachefs: member helper cleanups
Some renaming for better consistency

bch2_member_exists	-> bch2_member_alive
bch2_dev_exists		-> bch2_member_exists
bch2_dev_exsits2	-> bch2_dev_exists
bch_dev_locked		-> bch2_dev_locked
bch_dev_bkey_exists	-> bch2_dev_bkey_exists

new helper - bch2_dev_safe

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
5dd8c60e1e bcachefs: iter/update/trigger/str_hash flag cleanup
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.

These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
9abb6dd7ce bcachefs: Standardize helpers for printing enum strs with bounds checks
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:17 -04:00
Kent Overstreet
2aeed876d7 bcachefs: fix unsafety in bch2_stripe_to_text()
.to_text() functions need to work on key values that didn't pass .valid

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:16 -04:00
Kent Overstreet
3ed94062e3 bcachefs: Improve bch2_fatal_error()
error messages should always include __func__

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18 00:24:24 -04:00
Kent Overstreet
cb6fc943b6 bcachefs: kill kvpmalloc()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:12 -04:00
Kent Overstreet
e58f963cec bcachefs: helpers for printing data types
We need bounds checking since new versions may introduce new data types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 06:01:45 -05:00
Kent Overstreet
38c23fb809 bcachefs: BTREE_TRIGGER_ATOMIC
Add a new flag to be explicit about when we're running atomic triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 06:01:45 -05:00
Kent Overstreet
e4eb3e5ae4 bcachefs: unify stripe trigger
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet
f4f78779bb bcachefs: move stripe triggers to ec.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet
0beebd9245 bcachefs: bkey_for_each_ptr() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
41b84fb489 bcachefs: for_each_member_device_rcu() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
80eab7a7c2 bcachefs: for_each_btree_key() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
cf904c8d96 bcachefs: bch_err_(fn|msg) check if should print
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
09caeabe1a bcachefs: btree write buffer now slurps keys from journal
Previosuly, the transaction commit path would have to add keys to the
btree write buffer as a separate operation, requiring additional global
synchronization.

This patch introduces a new journal entry type, which indicates that the
keys need to be copied into the btree write buffer prior to being
written out. We switch the journal entry type back to
JSET_ENTRY_btree_keys prior to write, so this is not an on disk format
change.

Flushing the btree write buffer may require pulling keys out of journal
entries yet to be written, and quiescing outstanding journal
reservations; we previously added journal->buf_lock for synchronization
with the journal write path.

We also can't put strict bounds on the number of keys in the journal
destined for the write buffer, which means we might overflow the size of
the preallocated buffer and have to reallocate - this introduces a
potentially fatal memory allocation failure. This is something we'll
have to watch for, if it becomes an issue in practice we can do
additional mitigation.

The transaction commit path no longer has to explicitly check if the
write buffer is full and wait on flushing; this is another performance
optimization. Instead, when the btree write buffer is close to full we
change the journal watermark, so that only reservations for journal
reclaim are allowed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
5028b9078c bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
27b2df982f bcachefs: Kill for_each_btree_key()
for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
183bcc89b8 bcachefs: Clean up btree write buffer write ref handling
__bch2_btree_write_buffer_flush() now assumes a write ref is already
held (as called by the transaction commit path); and the wrappers
bch2_write_buffer_flush() and flush_sync() take an explicit write ref.

This means internally the write buffer code can always use
BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags
around and hoping the NOCHECK_RW flag was always carried around
correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:39 -05:00
Kent Overstreet
3c471b6588 bcachefs: convert bch_fs_flags to x-macro
Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
cb52d23e77 bcachefs: Rename BTREE_INSERT flags
BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
4d6128dca6 bcachefs: Guard against insufficient devices to create stripes
We can't create stripes if we don't have enough devices - this
manifested as an integer underflow bug later.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13 21:42:22 -05:00
Kent Overstreet
c7046ed0cf bcachefs: Improve stripe checksum error message
We now include the name of the device in the error message - and also
increment the number of checksum errors on that device.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05 13:14:22 -05:00
Kent Overstreet
aa98266588 bcachefs: bch2_ec_read_extent() now takes btree_trans
We're not supposed to have more than one btree_trans at a time in a
given thread - that causes recursive locking deadlocks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05 13:13:57 -05:00
Kent Overstreet
da4aa3b001 bcachefs: bch2_stripe_to_text() now prints ptr gens
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05 13:13:57 -05:00
Kent Overstreet
b65db750e2 bcachefs: Enumerate fsck errors
This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet
94119eeb02 bcachefs: Add IO error counts to bch_member
We now track IO errors per device since filesystem creation.

IO error counts can be viewed in sysfs, or with the 'bcachefs
show-super' command.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet
6bd68ec266 bcachefs: Heap allocate btree_trans
We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
1809b8cba7 bcachefs: Break up io.c
More reorganization, this splits up io.c into
 - io_read.c
 - io_misc.c - fallocate, fpunch, truncate
 - io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet
e46c181af9 bcachefs: Convert more code to bch_err_msg()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet
bf5a261c7a bcachefs: Assorted fixes for clang
clang had a few more warnings about enum conversion, and also didn't
like the opts.c initializer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:09 -04:00
Kent Overstreet
7c50140fce bcachefs: Convert more -EROFS to private error codes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
8726dc936f bcachefs: Change check for invalid key types
As part of the forward compatibility patch series, we need to allow for
new key types without complaining loudly when running an old version.

This patch changes the flags parameter of bkey_invalid to an enum, and
adds a new flag to indicate we're being called from the transaction
commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
73bd774d28 bcachefs: Assorted sparse fixes
- endianness fixes
 - mark some things static
 - fix a few __percpu annotations
 - fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
e53a961c6b bcachefs: Rename enum alloc_reserve -> bch_watermark
This is prep work for consolidating with JOURNAL_WATERMARK.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet
1bb3c2a974 bcachefs: New error message helpers
Add two new helpers for printing error messages with __func__ and
bch2_err_str():
 - bch_err_fn
 - bch_err_msg

Also kill the old error strings in the recovery path, which were causing
us to incorrectly report memory allocation failures - they're not needed
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet
99a3d39893 bcachefs: ec: Fix a lost wakeup
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet
d95dd378c2 bcachefs: allocate_dropping_locks()
Add two new helpers for allocating memory with btree locks held: The
idea is to first try the allocation with GFP_NOWAIT|__GFP_NOWARN, then
if that fails - unlock, retry with GFP_KERNEL, and then call
trans_relock().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet
19c304bebd bcachefs: GFP_NOIO -> GFP_NOFS
GFP_NOIO dates from the bcache days, when we operated under the block
layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to
GFP_NOFS.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet
bcb79a51cb bcachefs: bch2_bkey_get_iter() helpers
Introduce new helpers for a common pattern:

  bch2_trans_iter_init();
  bch2_btree_iter_peek_slot();

 - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of
   the correct type
 - bch2_bkey_get_val_typed() copies the val out of the btree to a
   (typically stack allocated) variable; it handles the case where the
   value in the btree is smaller than the current version of the type,
   zeroing out the remainder.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:00 -04:00
Kent Overstreet
174f930b8e bcachefs: bkey_ops.min_val_size
This adds a new field to bkey_ops for the minimum size of the value,
which standardizes that check and also enforces the new rule (previously
done somewhat ad-hoc) that we can extend value types by adding new
fields on to the end.

To make that work we do _not_ initialize min_val_size with sizeof,
instead we initialize it to the size of the first version of those
values.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:00 -04:00
Kent Overstreet
62a03559d6 bcachefs: Rip out code for storing backpointers in alloc keys
We don't store backpointers in alloc keys anymore, since we gained the
btree write buffer.

This patch drops support for backpointers in alloc keys, and revs the on
disk format version so that we know a fsck is required.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:59 -04:00
Kent Overstreet
6bdefe9c39 bcachefs: Use BTREE_ITER_INTENT in ec_stripe_update_extent()
This adds a flags param to bch2_backpointer_get_key() so that we can
pass BTREE_ITER_INTENT, since ec_stripe_update_extent() is updating the
extent immediately.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:58 -04:00
Brian Foster
8bff9875a6 bcachefs: use dedicated workqueue for tasks holding write refs
A workqueue resource deadlock has been observed when running fsck
on a filesystem with a full/stuck journal. fsck is not currently
able to repair the fs due to fairly rapid emergency shutdown, but
rather than exit gracefully the fsck process hangs during the
shutdown sequence. Fortunately this is easily recoverable from
userspace, but the root cause involves code shared between the
kernel and userspace and so should be addressed.

The deadlock scenario involves the main task in the bch2_fs_stop()
-> bch2_fs_read_only() path waiting on write references to drain
with the fs state lock held. A bch2_read_only_work() workqueue task
is scheduled on the system_long_wq, blocked on the state lock.
Finally, various other write ref holding workqueue tasks are
scheduled to run on the same workqueue and must complete in order to
release references that the initial task is waiting on.

To avoid this problem, we can split the dependent workqueue tasks
across different workqueues. It's a bit of a waste to create a
dedicated wq for the read-only worker, but there are several tasks
throughout the fs that follow the pattern of acquiring a write
reference and then scheduling to the system wq. Use a local wq
for such tasks to break the subtle dependency between these and the
read-only worker.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:58 -04:00
Kent Overstreet
b40901b0f7 bcachefs: New erasure coding shutdown path
This implements a new shutdown path for erasure coding, which is needed
for the upcoming BCH_WRITE_WAIT_FOR_EC write path.

The process is:
 - Cancel new stripes being built up
 - Close out/cancel open buckets on write points or the partial list
   that are for stripes
 - Shutdown rebalance/copygc
 - Then wait for in flight new stripes to finish

With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill
up before they complete; the new ec shutdown path is needed for shutting
down copygc/rebalance without deadlocking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
65d48e3525 bcachefs: Private error codes: ENOMEM
This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
aebe7a679c bcachefs: Fix stripe create error path
If we errored out on a new stripe before fully allocating it, we
shouldn't be zeroing out unwritten data.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
e6539b0aeb bcachefs: Improve bch2_new_stripes_to_text()
Print out the alloc reserve, and format it a bit more nicely.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
e28ef07e0e bcachefs: Simplify stripe_idx_to_delete
This is not technically correct - it's subject to a race if we ever end
up with a stripe with all empty blocks (that needs to be deleted) being
held open. But the "correct" version was much too inefficient, and soon
we'll be adding a stripes LRU.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
fba053d2aa bcachefs: Second layer of refcounting for new stripes
This will be used for move writes, which will be waiting until the
stripe is created to do the index update. They need to prevent the
stripe from being reclaimed until their index update is done, so we need
another refcount that just keeps the stripe open.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# Conflicts:
#	fs/bcachefs/ec.c
#	fs/bcachefs/io.c
2023-10-22 17:09:56 -04:00
Kent Overstreet
10d9f7d285 bcachefs: ec: fall back to creating new stripes for copygc
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:56 -04:00
Kent Overstreet
702ffea204 bcachefs: Extent helper improvements
- __bch2_bkey_drop_ptr() -> bch2_bkey_drop_ptr_noerror(), now available
   outside extents.

 - Split bch2_bkey_has_device() and bch2_bkey_has_device_c(), const and
   non const versions

 - bch2_extent_has_ptr() now returns the pointer it found

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:56 -04:00
Kent Overstreet
7635e1a6d6 bcachefs: Rework open bucket partial list allocation
Now, any open_bucket can go on the partial list: allocating from the
partial list has been moved to its own dedicated function,
open_bucket_add_bucets() -> bucket_alloc_set_partial().

In particular, this means that erasure coded buckets can safely go on
the partial list; the new location works with the "allocate an ec bucket
first, then the rest" logic.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:56 -04:00
Kent Overstreet
26bab33b69 bcachefs: Fix "btree node in stripe" error
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:56 -04:00
Kent Overstreet
2a912a9a39 bcachefs: Kill bch2_ec_bucket_written()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:56 -04:00
Kent Overstreet
81c771b266 bcachefs: Improve bch2_new_stripes_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:56 -04:00
Kent Overstreet
5be6a274ff bcachefs: Fix stripe reuse path
It's possible that we reuse a stripe that doesn't have quite the same
configuration as the stripe_head we're allocating from. In that case, we
have to make sure that the new stripe uses the settings from the stripe
we resue, not the stripe head, and make sure the buffer is allocated
correctly.

This fixes the ec_mixed_tiers test.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
e84face6f0 bcachefs: RESERVE_stripe
Rework stripe creation path - new algorithm for deciding when to create
new stripes or reuse existing stripes.

We add a new allocation watermark, RESERVE_stripe, above RESERVE_none.
Then we always try to create a new stripe by doing RESERVE_stripe
allocations; if this fails, we reuse an existing stripe and allocate
buckets for it with the reserve watermark for the given write
(RESERVE_none or RESERVE_movinggc).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
9d32097f3b bcachefs: More stripe create cleanup/fixes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
a1fb08f5df bcachefs: Plumb alloc_reserve through stripe create path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
8f2bbcdd9b bcachefs: ec: Improve error message for btree node in stripe
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
a64adedb86 bcachefs: ec: Ensure new stripe is closed in error path
This fixes a use-after-free bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
94bc95c468 bcachefs: ec: zero_out_rest_of_ec_bucket()
Occasionally, we won't write to an entire bucket. This fixes the EC code
to handle this case, zeroing out the rest of the bucket as needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
0d763863af bcachefs: Improve bch2_stripe_to_text()
We now print pointers as bucket:offset, the same as how we print extent
pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:54 -04:00
Kent Overstreet
70ded998c5 bcachefs: get_stripe_key_trans()
Another nested btree_trans fix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:54 -04:00
Kent Overstreet
e3877382fb bcachefs: Fix erasure coding shutdown path
It's possible when shutting down to for a stripe head to have a new
stripe that doesn't yet have any blocks allocated - we just need to free
it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:54 -04:00
Kent Overstreet
64784ade4f bcachefs: Fix buffer overrun in ec_stripe_update_extent()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:54 -04:00