Commit Graph

185 Commits

Author SHA1 Message Date
Kent Overstreet
96dea3d599 bcachefs: Fix W=12 build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
5902cc283c bcachefs: New io_misc.c helpers
This pulls the non vfs specific parts of truncate and finsert/fcollapse
out of fs-io.c, and moves them to io_misc.c.

This is prep work for logging these operations, to make them atomic in
the event of a crash.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet
1809b8cba7 bcachefs: Break up io.c
More reorganization, this splits up io.c into
 - io_read.c
 - io_misc.c - fallocate, fpunch, truncate
 - io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet
e46c181af9 bcachefs: Convert more code to bch_err_msg()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet
7573041ab9 bcachefs: Fix bch2_mount error path
In the bch2_mount() error path, we were calling
deactivate_locked_super(), which calls ->kill_sb(), which in our case
was calling bch2_fs_free() without __bch2_fs_stop().

This changes bch2_mount() to just call bch2_fs_stop() directly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:11 -04:00
Kent Overstreet
8e877caaad bcachefs: Split out snapshot.c
subvolume.c has gotten a bit large, this splits out a separate file just
for managing snapshot trees - BTREE_ID_snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:11 -04:00
Joshua Ashton
01a7e74fe1 bcachefs: Introduce bch2_dirent_get_name
A nice cleanup that avoids a bunch of open-coding name/string usage
around dirent usage.

Will be used by casefolding impl in future commits.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Joshua Ashton
791236b85c bcachefs: Add btree_trans* to inode_set_fn
This will be used when we need to re-hash a directory tree when setting
flags.

It is not possible to have concurrent btree_trans on a thread.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
dbbfca9f41 bcachefs: Split up fs-io.[ch]
fs-io.c is too big - time for some reorganization
 - fs-dio.c: direct io
 - fs-pagecache.c: pagecache data structures (bch_folio), utility code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
e691b391f0 bcachefs: Add logging to bch2_inode_peek() & related
Add error messages when we fail to lookup an inode, and also add a few
missing bch2_err_class() calls.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:09 -04:00
Mikulas Patocka
5eaa76d813 bcachefs: mark bch_inode_info and bkey_cached as reclaimable
Mark these caches as reclaimable, so that available memory is correctly
reported when there is a lot of cached inodes.

Note that more work is needed - you should add __GFP_RECLAIMABLE to some
of the kmalloc calls, so that they are allocated from the "kmalloc-rcl-*"
caches.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:07 -04:00
Kent Overstreet
a83e108fc1 bcachefs: fiemap: Fix a lockdep splat
As with the previous patch, we generally can't hold btree locks while
copying to userspace, as that may incur a page fault and require
mmap_lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet
b0e8c75e40 bcachefs: Fix subvol deletion deadlock
d_prune_aliases() may call bch2_evict_inode(), which needs
c->vfs_inodes_list_lock.

Fix this by always calling igrab() before putting the inodes onto our
disposal list, and then calling d_prune_aliases() with
c->vfs_inodes_lock dropped.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet
e47a390aa5 bcachefs: Convert -ENOENT to private error codes
As with previous conversions, replace -ENOENT uses with more informative
private error codes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet
550a6a496d bcachefs: Enable large folios
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:59 -04:00
Kent Overstreet
dde72e1827 bcachefs: Add missing bch2_err_class() call
We're not supposed to return our private error codes to userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:59 -04:00
Kent Overstreet
711bf946d5 bcachefs: Add an assert in inode_write for -ENOENT
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
9edbcc72f6 bcachefs: Fix bch2_evict_subvolume_inodes()
This fixes a bug in bch2_evict_subvolume_inodes(): d_mark_dontcache()
doesn't handle the case where i_count is already 0, we need to grab and
put the inode in order for it to be dropped.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
2d33036ca9 bcachefs: Fix for 'missing subvolume' error
Subvolumes, including their root inodes, get deleted asynchronously
after an unlink. But we still need to ensure that we tell the VFS the
inode has been deleted, otherwise VFS writeback could fire after
asynchronous deletion has finished, and try to write to an
inode/subvolume that no longer exists.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
792031116b bcachefs: Unwritten extents support
- bch2_extent_merge checks unwritten bit
 - read path returns 0s for unwritten extents without actually reading
 - reflink path skips over unwritten extents
 - bch2_bkey_ptrs_invalid() checks for extents with both written and
   unwritten extents, and non-normal extents (stripes, btree ptrs) with
   unwritten ptrs
 - fiemap checks for unwritten extents and returns
   FIEMAP_EXTENT_UNWRITTEN

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:51 -04:00
Kent Overstreet
78c0b75c34 bcachefs: More errcode cleanup
We shouldn't be overloading standard error codes now that we have
provisions for bcachefs-specific errorcodes: this patch converts super.c
and super-io.c to per error site errcodes, with a bit of cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:48 -04:00
Kent Overstreet
a7ecd30c83 bcachefs: Factor out two_state_shared_lock
We have a unique lock used for controlling adding to the pagecache: the
lock has two states, where both states are shared - the lock may be held
multiple times for either state - but not both states at the same time.

This is exactly what we need for nocow mode locking, so this patch pulls
it out of fs.c into its own file.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
3e3e02e6bc bcachefs: Assorted checkpatch fixes
checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

 ASSIGN_IN_IF
 UNSPECIFIED_INT	- bcachefs coding style prefers single token type names
 NEW_TYPEDEFS		- typedefs are occasionally good
 FUNCTION_ARGUMENTS	- we prefer to look at functions in .c files
			  (hopefully with docbook documentation), not .h
			  file prototypes
 MULTISTATEMENT_MACRO_USE_DO_WHILE
			- we have _many_ x-macros and other macros where
			  we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
5c1ef830f6 bcachefs: Errcodes can now subtype standard error codes
The next patch is going to be adding private error codes for all the
places we return -ENOSPC.

Additionally, this patch updates return paths at all module boundaries
to call bch2_err_class(), to return the standard error code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
549d173c1b bcachefs: EINTR -> BCH_ERR_transaction_restart
Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:37 -04:00
Kent Overstreet
d4bf5eecd7 bcachefs: Use bch2_err_str() in error messages
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:36 -04:00
Kent Overstreet
e68914ca84 bcachefs: Rename __bch2_trans_do() -> commit_do()
Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:35 -04:00
Kent Overstreet
facc81479c bcachefs: Delete bch_writepage
Per Dave Chinner and the xfs folks, .writepage is no longer needed, and
it's better not to define it if .writepages is the intended path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet
91d961badf bcachefs: darrays
Inspired by CCAN darray - simple, stupid resizable (dynamic) arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:28 -04:00
Kent Overstreet
85d8cf161f bcachefs: bch2_btree_iter_peek_upto()
In BTREE_ITER_FILTER_SNAPHOTS mode, we skip over keys in unrelated
snapshots. When we hit the end of an inode, if the next inode(s) are in
a different subvolume, we could potentially have to skip past many keys
before finding a key we can return to the caller, so they can terminate
the iteration.

This adds a peek_upto() variant to solve this problem, to be used when
we know the range we're searching within.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:27 -04:00
Kent Overstreet
5521b1dfa2 bcachefs: Convert bch2_sb_to_text to master option list
Options no longer have to be manually added to bch2_sb_to_text() - it
now uses the master list of options in opts.h. Also, improve some of the
formatting by converting it to tabstops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:27 -04:00
Kent Overstreet
4eea53de8a bcachefs: Fix transaction path overflow in fiemap
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:27 -04:00
Kent Overstreet
fa8e94faee bcachefs: Heap allocate printbufs
This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:25 -04:00
Kent Overstreet
7c8f6f980d bcachefs: btree_id_cached()
Add a new helper that returns true if the given btree ID uses the btree
key cache. This enables some new cleanups, since the helper can check
the options for whether caching is enabled on a given btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:23 -04:00
Kent Overstreet
669f87a5da bcachefs: Switch to __func__for recording where btree_trans was initialized
Symbol decoding, via %ps, isn't supported in userspace - this will also
be faster when we're using trans->fn in the fast path, as with the new
BCH_JSET_ENTRY_log journal messages.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:21 -04:00
Kent Overstreet
8244f3209b bcachefs: Option improvements
This adds flags for options that must be a power of two (block size and
btree node size), and options that are stored in the superblock as a
power of two (encoded extent max).

Also: options are now stored in memory in the same units they're
displayed in (bytes): we now convert when getting and setting from the
superblock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:19 -04:00
Kent Overstreet
9ca4853b98 bcachefs: Fix quota support for snapshots
Quota support was disabled when snapshots were released, because of some
tricky interactions with snpashots. We're sidestepping that for now -
we're simply disabling quota accounting on snapshot subvolumes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
85e95ca7cc bcachefs: Update export_operations for snapshots
When support for snapshots was merged, export operations weren't
updated yet. This patch adds new filehandle types for bcachefs that
include the subvolume ID and updates export operations for subvolumes -
and also .get_parent, support for which was added just prior to
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
e3f2db39b3 bcachefs: Tweak vfs cache shrinker behaviour
In bcachefs, inodes and dentries are also cached - more compactly - by
the btree node cache, they don't require seeks to recreate.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
32b26e8c7f bcachefs: bch2_assert_pos_locked()
This adds a new assertion to be used by bch2_inode_update_after_write(),
which updates the VFS inode based on the update to the btree inode we
just did - we require that the btree inode still be locked when we do
that update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
68a2054d88 bcachefs: Switch fsync to use bi_journal_seq
Now that we're recording in each inode the journal sequence number of
the most recent update, fsync becomes a lot simpler and we can delete
all the plumbing for ei_journal_seq.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
0e030f5e20 bcachefs: Kill journal buf bloom filter
This was used for recording which inodes have been modified by in flight
journal writes, but was broken and has been superceded.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
3e52c22255 bcachefs: Add journal_seq to inode & alloc keys
Add fields to inode & alloc keys that record the journal sequence number
when they were most recently modified.

For alloc keys, this is needed to know what journal sequence number we
have to flush before the bucket can be reused. Currently this is tracked
in memory, but we'll be getting rid of the in memory bucket array.

For inodes, this is needed for fsync when the inode has been evicted
from the vfs cache. Currently we use a bloom filter per outstanding
journal buf - but that mechanism has been broken since we added the
ability to not issue a flush/fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
41f9b7d39f bcachefs: Move bch2_evict_subvolume_inodes() to fs.c
This fixes building in userspace - code that's coupled to the kernel VFS
interface should live in fs.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:15 -04:00
Kent Overstreet
2027875bd8 bcachefs: Add BCH_SUBVOLUME_UNLINKED
Snapshot deletion needs to become a multi step process, where we unlink,
then tear down the page cache, then delete the subvolume - the deleting
flag is equivalent to an inode with i_nlink = 0.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:15 -04:00
Kent Overstreet
9a796fdb06 bcachefs: bch2_trans_exit() no longer returns errors
Now that peek_node()/next_node() are converted to return errors
directly, we don't need bch2_trans_exit() to return errors - it's
cleaner this way and wasn't used much anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:14 -04:00
Kent Overstreet
42d237320e bcachefs: Snapshot creation, deletion
This is the final patch in the patch series implementing snapshots.
This patch implements two new ioctls that work like creation and
deletion of directories, but fancier.

 - BCH_IOCTL_SUBVOLUME_CREATE, for creating new subvolumes and snaphots
 - BCH_IOCTL_SUBVOLUME_DESTROY, for deleting subvolumes and snapshots

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:13 -04:00
Kent Overstreet
6fed42bb77 bcachefs: Plumb through subvolume id
To implement snapshots, we need every filesystem btree operation (every
btree operation without a subvolume) to start by looking up the
subvolume and getting the current snapshot ID, with
bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing
btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode.

This patch adds those bch2_subvolume_get_snapshot() calls, and also
switches to passing around a subvol_inum instead of just an inode
number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:12 -04:00
Kent Overstreet
284ae18c1d bcachefs: Add subvolume to ei_inode_info
Filesystem operations generally operate within a subvolume: at the start
of every btree transaction we'll be looking up (and locking) the
subvolume to get the current snapshot ID, which we then use for our
other btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode.

But inodes don't record what subvolume they're in - they can't, because
if they did we'd have to update every single inode within a subvolume
when taking a snapshot in order to keep that field up to date. So it
needs to be tracked in memory, based on how we got to that inode.

Hence this patch adds a subvolume field to ei_inode_info, and switches
to iget5() so we can index by it in the inode hash table.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:12 -04:00
Kent Overstreet
67e0dd8f0d bcachefs: btree_path
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:11 -04:00
Kent Overstreet
1a488e7306 bcachefs: Kill BTREE_INSERT_NOUNLOCK
With the recent transaction restart changes, it's no longer needed - all
transaction commits have BTREE_INSERT_NOUNLOCK semantics.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:10 -04:00
Kent Overstreet
700c25b32a bcachefs: Use bch2_trans_begin() more consistently
Upcoming patch will require that a transaction restart is always
immediately followed by bch2_trans_begin().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
47924527e6 Revert "bcachefs: statfs bfree and bavail should be the same"
This reverts commit 664f9847bec525d396d62d2db094ca9020289ae0.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:08 -04:00
Dan Robertson
e8e9607f3c bcachefs: statfs bfree and bavail should be the same
The value of f_bfree and f_bavail should be the same. The value of
f_bfree is not currently scaled by the availability factor.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:07 -04:00
Kent Overstreet
68a507a2e8 bcachefs: fix truncate with ATTR_MODE
After the v5.12 rebase, we started oopsing when truncate was passed
ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This
refactors things so that truncate/extend finish by using
bch2_setattr_nonsize(), which solves the problem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:07 -04:00
Dan Robertson
044c8c9e05 bcachefs: mount: fix null deref with null devname
- Fix null deref on mount when given a null device name.
 - Move the dev_name checks to return EINVAL when it is invalid.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:06 -04:00
Kent Overstreet
f7beb4ca04 bcachefs: Preallocate transaction mem
This helps avoid transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
ddc7dd62f0 bcachefs: Don't use uuid in tracepoints
%pU for printing out pointers to uuids doesn't work in perf trace

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Dan Robertson
ed34341189 bcachefs: statfs resports incorrect avail blocks
The current implementation of bch_statfs does not scale the number of
available blocks provided in f_bavail by the reserve factor. This causes
an allocation of a file of this size to fail.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:03 -04:00
Stijn Tintel
ffcf9ec78c bcachefs: avoid out-of-bounds in split_devs
Calling mount with an empty source string causes an out-of-bounds error
in split_devs. Check the length of the source string to avoid this.

Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:03 -04:00
Kent Overstreet
595c1e9bab bcachefs: Fix time handling
There were some overflows in the time conversion functions - fix this by
converting tv_sec and tv_nsec separately. Also, set sb->time_min and
sb->time_max.

Fixes xfstest generic/258.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:02 -04:00
Kent Overstreet
050197b1c1 bcachefs: Ensure that fpunch updates inode timestamps
Fixes xfstests generic/059

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:02 -04:00
Kent Overstreet
e0ba3b6429 bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance
The way btree iterators work internally has been changing, particularly
with the iter->real_pos changes, and bch2_btree_iter_next() is no longer
hyper optimized - it's just advance followed by peek, so it's more
efficient to just call advance where we're not using the return value of
bch2_btree_iter_next().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
50dc0f692a bcachefs: Require all btree iterators to be freed
We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
5ff75ccbbc bcachefs: Fix read retry path for indirect extents
In the read path, for retry of indirect extents to work we need to
differentiate between the location in the btree the read was for, vs.
the location where we found the data. This patch adds that plumbing to
bch_read_bio.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
07bca3bd1e bcachefs: Kill ei_str_hash
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
5f0e4ae1c7 bcachefs: Use __bch2_trans_do() in a few more places
Minor cleanup, it was being open coded.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
41f8b09edc bcachefs: Rename BTREE_ID enums for consistency with other enums
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
07a1006ae8 bcachefs: Reduce/kill BKEY_PADDED use
With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:50 -04:00
Kent Overstreet
33c74e4119 bcachefs: Flag inodes that had btree update errors
On write error, the vfs inode's i_size may be inconsistent with the
btree inode's i_size - flag this so we don't have spurious assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:49 -04:00
Kent Overstreet
6584e84a97 bcachefs: Don't use bkey cache for inode update in fsck
fsck doesn't know about the btree key cache, and non-cached iterators
aren't cache coherent (yet?)

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
0b5c9f5940 bcachefs: Set preallocated transaction mem to avoid restarts
this will reduce transaction restarts, from observation of tracepoints.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:47 -04:00
Kent Overstreet
eb8e6e9ccb bcachefs: Deadlock prevention for ei_pagecache_lock
In the dio write path, when get_user_pages() invokes the fault handler
we have a recursive locking situation - we have to handle the lock
ordering ourselves or we have a deadlock: this patch addresses that by
checking for locking ordering violations and doing the unlock/relock
dance if necessary.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet
b735d73a00 bcachefs: Build fixes for 32bit x86
PAGE_SIZE and size_t are not unsigned longs on 32 bit, annoying...

also switch to atomic64_cmpxchg instead of cmpxchg() for
journal_seq_copy, as atomic64_cmpxchg has a fallback that uses spinlocks
for when it's not supported.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet
df082b3a50 bcachefs: Report inode counts via statfs
Took awhile to figure out exactly what statfs wanted...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet
527087c741 bcachefs: Fix stack corruption
A bkey_on_stack_realloc() call was in the wrong place, and broken for
indirect extents

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet
e7b854b1f7 bcachefs: fiemap fixes
- fiemap didn't know about inline extents, fixed
 - advancing to the next extent after we'd chased a pointer to the
   reflink btree was wrong, fixed

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet
13dcd4abcd bcachefs: Fix rare use after free in read path
If the bkey_on_stack_reassemble() call in __bch2_read_indirect_extent()
reallocates the buffer, k in bch2_read - which we pointed at the
bkey_on_stack buffer - will now point to a stale buffer. Whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet
a10e677a15 bcachefs: Fix for passing target= opts as mount opts
Some options can't be parsed until the filesystem initialized;
previously, passing these options to mount or remount would cause mount
to fail.

This changes the mount path so that we parse the options passed in
twice, and just ignore any options that can't be parsed the first time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet
61ce38b862 bcachefs: Fix journal_seq_copy()
We also need to update the journal's bloom filter of inode numbers that
each journal write has upudates for - in case the inode gets evicted
before it gets fsynced.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
d5e4dcc29c bcachefs: Fix unmount path
There was a long standing race in the mount/unmount code - the VFS
intends for mount/unmount synchronizatino to be handled by the list of
superblocks, but we were still holding devices open after tearing down
our superblock in the unmount path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
625104ea21 bcachefs: Don't fail mount if device has been removed
Also - make sure to show the devices we actually have open in /proc

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
ac7eef0318 bcachefs: Don't report inodes to statfs
We don't have a limit on the number of inodes in a filesystem, so this
is apparently the right way to report that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
1ada160618 bcachefs: Turn c->state_lock into an rwsem
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Yuxuan Shui
22d8a33d30 bcachefs: fix stack corruption
When a bkey_on_stack is passed to bch_read_indirect_extent, there is no
guarantee that it will be big enough to hold the bkey. And
bch_read_indirect_extent is not aware of bkey_on_stack to call realloc
on it. This cause a stack corruption.

This commit makes bch_read_indirect_extent aware of bkey_on_stack so it
can call realloc when appropriate.

Tested-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:39 -04:00
Kent Overstreet
47c46c9531 bcachefs: Add another mssing bch2_trans_iter_put() call
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:38 -04:00
Kent Overstreet
58e2388f9e bcachefs: Kill BTREE_INSERT_ATOMIC
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Justin Husted
184b1dc1a6 bcachefs: Update directory timestamps during link
Timestamp updates on the directory during a link operation were cached.
This is inconsistent with other metadata operations such as rename, as
well as being less efficient.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet
35189e09ab bcachefs: bkey_on_stack
This implements code for storing small bkeys on the stack and allocating
out of a mempool if they're too big.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet
4a1d8d3efc bcachefs: Fix setting of attributes mask in getattr
Discovered by xfstests generic/553

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet
821a99b7ba bcachefs: Switch to .iterate_shared for readdir
We definitely don't need an exclusive inode lock for readdir.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
e0541a9346 bcachefs: Kill some dependencies on ei_inode
Moving bch2_extent_update() to io.c will be greatly simplified if we
no longer have to keep ei_inode.bi_size/bi_sectors up to date.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
9638574229 bcachefs: Factor out fs-common.c
This refactoring makes the code easier to understand by separating the
bcachefs btree transactional code from the linux VFS code - but more
importantly, it's also to share code with the fuse port.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00
Kent Overstreet
58677a1d40 bcachefs: bch2_inode_peek()/bch2_inode_write()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00
Kent Overstreet
6988e85be5 bcachefs: Trust inode in btree over bch_inode_info
This is the start of some refactoring work to make less code depend on
the linux VFS - here the inode cache - to make e.g. the fuse port
easier.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00
Kent Overstreet
a7199432c3 bcachefs: Kill deferred btree updates
Will be replaced by cached btree iterators

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00
Kent Overstreet
b43a0f60a6 bcachefs: Cleanup i_nlink handling
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00
Kent Overstreet
05cf02b5a1 bcachefs: Fix fiemap (again)
when iterating over reflink pointers, we use the key we just emitted to
set the iterator position - which means we have to be setting the key's
inode field as well

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet
7d5224fcdc bcachefs: Optimize fiemap
Reflink caused fiemap performance to regress badly - this gets us back
to where we were.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet
3fb5ebcdd4 bcachefs: Inline some fast paths
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:25 -04:00