The new guard(), scoped_guard() allow for more natural code.
Some of the uses with creative flow control have been left.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Fix a missing wakeup in
'bcachefs set-file-option' -> xattr option update -> inode_write
this was missing because the wakeup needs to happen after transaction
commit. Also, add a 'kick' counter, to make sure we don't miss a wakeup
that occured right after we finished checking the rebalance_work btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Let's not move poisoned extents unnecessarily, since we can't guard
against introducing more bitrot.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Fix bch2_bkey_clear_needs_rebalance(): indirect extents are never
supposed to have bch_extent_rebalance stripped off, because that's how
we get the IO path options when we don't have the original inode it
belonged to.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, copygc and rebalance weren't started until the very end of
mounting, after all recvoery passes have finished.
But copygc really should be started earlier, since it may be needed for
allocations to make forward progress. Additionally, we've been seeing
occasional bug reports where starting the kthread fails due to a pending
signal - i.e. we're getting timed out by systemd (during a version
upgrade), but we're not seeing the signal until mount is about to
complete.
Additionally, we now have copygc/rebalance explicitly wait for
check_snapshots to complete (if being run); they require that for
snapshot_is_ancestor() in the data move path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Not all compilers fully initialize these - they're not guaranteed to
because of the union shenanigans.
Fixes: https://github.com/koverstreet/bcachefs/issues/844
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
These are commonly needed when debugging, and saves from having to ask
users to dig.
Also, rebalance_status now includes pending rebalance work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, bch2_bkey_sectors_need_rebalance() called
bch2_target_accepts_data(), checking whether the target is writable.
However, this means that adding or removing devices from a target would
change the value of bch2_bkey_sectors_need_rebalance() for an existing
extent; this needs to be invariant so that the extent trigger can
correctly maintain rebalance_work accounting.
Instead, check target_accepts_data() in io_opts_to_rebalance_opts(),
before creating the bch_extent_rebalance entry.
This fixes (one?) cause of rebalance_work accounting being off.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We unconditionally go read-write, if we're going to do so, before
journal replay: lazy_rw is obsolete.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Since bch2_move_get_io_opts() now synchronizes io_opts with options from
bch_extent_rebalance, delete the ad-hoc logic in rebalance.c that
previously did this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_move_get_io_opts() now synchronizes options loaded from the
filesystem and inode (if present, i.e. not walking the reflink btree
directly) with options from the bch_extent_rebalance_entry, updating the
extent if necessary.
Since bch_extent_rebalance tracks where its option came from we can
preserve "inode options override filesystem options", even for indirect
extents where we don't have access to the inode the options came from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, BCHFS_IOC_REINHERIT_ATTRS didn't trigger rebalance scans
when changing rebalance options - it had been missed, only the xattr
interface triggered them.
Ideally they'd be done by the transactional trigger, but unpacking the
inode to get the options is too heavy to be done in the low level
trigger - the inode trigger is run on every extent update, since the
bch_inode.bi_journal_seq has to be updated for fsync.
bch2_write_inode() is a good compromise, it already unpacks and repacks
and is not run in any super-fast paths.
Additionally, creating the new rebalance entry to trigger the scan is
now done in the same transaction as the inode update that changed the
options.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Using commit_do() to call alloc_sectors_start_trans() breaks when we're
randomly injecting transaction restarts - the restart in the commit
causes us to leak the lock that alloc_sectorS_start_trans() takes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
this was an oversight: rebalance is moving data to a specific device, so
we don't want it falling back to the full filesystem
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We're about to add new asserts for btree_trans locking consistency, and
part of that requires that aren't using the btree_trans while it's
unlocked.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.
These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The variable tmp is being assigned a value but it isn't being
read afterwards. The assignment is redundant and so tmp can be
removed.
Cleans up clang scan build warning:
warning: Although the value stored to 'ret' is used in the enclosing
expression, the value is never actually read from 'ret'
[deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Also print out the data_opts, so that we can see what specifically is
being done to an extent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a bug with rebalance IOs getting stuck with reads completed,
but writes never being issued.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a tracepoint for rebalance, printing out
- the target option
- the compression option
- the key being rebalanced
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds a new btree, rebalance_work, to eliminate scanning required
for finding extents that need work done on them in the background - i.e.
for the background_target and background_compression options.
rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an
extent in the extents or reflink btree at the same pos.
A new extent field is added, bch_extent_rebalance, which indicates that
this extent has work that needs to be done in the background - and which
options to use. This allows per-inode options to be propagated to
indirect extents - at least in some circumstances. In this patch,
changing IO options on a file will not propagate the new options to
indirect extents pointed to by that file.
Updating (setting/clearing) the rebalance_work btree is done by the
extent trigger, which looks at the bch_extent_rebalance field.
Scanning is still requrired after changing IO path options - either just
for a given inode, or for the whole filesystem. We indicate that
scanning is required by adding a KEY_TYPE_cookie key to the
rebalance_work btree: the cookie counter is so that we can detect that
scanning is still required when an option has been flipped mid-way
through an existing scan.
Future possible work:
- Propagate options to indirect extents when being changed
- Add other IO path options - nr_replicas, ec, to rebalance_work so
they can be applied in the background when they change
- Add a counter, for bcachefs fs usage output, showing the pending
amount of rebalance work: we'll probably want to do this after the
disk space accounting rewrite (moving it to a new btree)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
More reorganization, this splits up io.c into
- io_read.c
- io_misc.c - fallocate, fpunch, truncate
- io_write.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This allows including a compression level when specifying a compression
type, e.g.
compression=zstd:15
Values from 1 through 15 indicate compression levels, 0 or unspecified
indicates the default.
For LZ4, values 3-15 specify that the HC algorithm should be used.
Note that for compatibility, extents themselves only include the
compression type, not the compression level. This means that specifying
the same compression algorithm but different compression levels for the
compression and background_compression options will have no effect.
XXX: perhaps we could add a warning for this
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>