linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-08-17 18:55:22 +00:00

Author	SHA1	Message	Date
Kent Overstreet	30792947c6	bcachefs: io_read: remove from async obj list in rbio_done() Previously, only split rbios allocated in io_read.c would be removed from the async obj list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-13 17:45:13 -04:00
Kent Overstreet	6c4897caef	bcachefs: Fix bch2_read_bio_to_text() We can only pass negative error codes to bch2_err_str(); if it's a positive integer it's not an error and we trip an assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-16 20:35:42 -04:00
Kent Overstreet	dd22844f48	bcachefs: Read error message now prints if self healing Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-11 23:21:30 -04:00
Kent Overstreet	09b9c72bd4	bcachefs: bch_err_throw() Add a tracepoint for any time we return an error and unwind. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-02 12:16:35 -04:00
Kent Overstreet	18dad454cd	bcachefs: Replace rcu_read_lock() with guards The new guard(), scoped_guard() allow for more natural code. Some of the uses with creative flow control have been left. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-06-01 00:03:12 -04:00
Kent Overstreet	156d9e8341	bcachefs: Emit a single log message on data read error Instead of emitting a message immediately when we get an error in the read path, and then another at the end if we successfully retry - emit one single log message before returning from bch2_rbio_retry(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:14:32 -04:00
Kent Overstreet	41e51769b8	bcachefs: Make various async objs visible in debugfs Add async objs list for - promote_op - bch_read_bio - btree_read_bio - btree_write_bio This gets us introspection on in-flight async ops, and because under the hood it uses fast_lists (percpu slot buffer on top of a radix tree), it'll be fast enough to enable in production. This will be very helpful for debugging "something got stuck" issues, which have been cropping up from time to time (in the CI, especially with folio writeback). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:14:30 -04:00
Kent Overstreet	989b4c375a	bcachefs: bch2_read_bio_to_text Pretty printer for struct bch_read_bio. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:14:29 -04:00
Kent Overstreet	cca2c0d224	bcachefs: bch_dev.io_ref -> enumerated_ref Convert device IO refs to enumerated_refs, for easier debugging of refcount issues. Simple conversion: enumerate all users and convert to the new helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:14:28 -04:00
Kent Overstreet	c9b1d94a21	bcachefs: bch_fs.writes -> enumerated_refs Drop the single-purpose write ref code in bcachefs.h, and convert to enumarated refs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:14:27 -04:00
Kent Overstreet	d4d71b58e5	bcachefs: RO mounts now use less memory Defer memory allocations only needed in RW mode until we actually go RW. This is part of improved support for RO images. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:14:04 -04:00
Kent Overstreet	bcaea61adc	bcachefs: add missing include Hygeine, and fix build in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:13:52 -04:00
Kent Overstreet	cb8336ca42	bcachefs: Data move can read from poisoned extents Now, if an extent is poisoned we can move it even if there was a checksum error. We'll have to give it a new checksum, but the poison bit means that userspace will still see the appropriate error when they try to read it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:13:21 -04:00
Kent Overstreet	760be1ad5e	bcachefs: Poison extents that can't be read due to checksum errors Copygc needs to be able to move extents that have bitrotted. We don't want to delete them - in the future we'll have an API for "read me the data even if there's checksum errors", and in general we don't want to delete anything unless the user asks us to. That will require writing it with a new checksum, which means we can't forget that there was a checksum error so we return the correct error to userspace. Rebalance also wants to skip bad extents; we can now use the poison flag for that. This is currently disabled by default, as we want read fua support so that we can distinguish between transient and permanent errors from the device. It may be enabled with the module parameter: poison_extents_on_checksum_error Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:13:19 -04:00
Kent Overstreet	6659ba3b18	bcachefs: Be precise about bch_io_failures If the extent we're reading from changes, due to be being overwritten or moved (possibly partially) - we need to reset bch_io_failures so that we don't accidentally mark a new extent as poisoned prematurely. This means we have to separately track (in the retry path) the extent we previously read from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:13:17 -04:00
Kent Overstreet	a06459657e	bcachefs: Silence extent_poisoned error messages extent poisoning is partly so that we don't keep spewing the dmesg log when we've got unreadable data - we don't want to print these. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-04-15 11:34:37 -04:00
Kent Overstreet	7dfd42a07a	bcachefs: Don't print data read retry success on non-errors We may end up in the data read retry path when reading cached data and racing with invalidation, or on checksum error when we were reading into a userspace buffer that might have been modified while the read was in flight. These aren't real errors, so we shouldn't print the 'retry success' message. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-04-13 10:17:00 -04:00
Eric Biggers	4bf4b5046d	bcachefs: use library APIs for ChaCha20 and Poly1305 Just use the ChaCha20 and Poly1305 libraries instead of the clunky crypto API. This is much simpler. It is also slightly faster, since the libraries provide more direct access to the same architecture-optimized ChaCha20 and Poly1305 code. I've tested that existing encrypted bcachefs filesystems can be continue to be accessed with this patch applied. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-04-06 19:33:53 -04:00
Kent Overstreet	9180ad2e16	bcachefs: Kill btree_iter.trans This was planned to be done ages ago, now finally completed; there are places where we have quite a few btree_trans objects on the stack, so this reduces stack usage somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-04-02 10:24:34 -04:00
Kent Overstreet	dcffc3b1ae	bcachefs: Split up bch_dev.io_ref We now have separate per device io_refs for read and write access. This fixes a device removal bug where the discard workers were still running while we're removing alloc info for that device. It's also a bit of hardening; we no longer allow writes to devices that are read-only. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-04-02 10:24:34 -04:00
Kent Overstreet	7fdc3fa3cb	bcachefs: Log original key being moved in data updates There's something going on with the data move path; log the original key being moved for debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-30 18:25:12 -04:00
Kent Overstreet	3ba0240a87	bcachefs: Fix silent short reads in data read retry path __bch2_read, before calling __bch2_read_extent(), sets bvec_iter.bi_size to "the size we can read from the current extent" with a swap, and restores it to "the size for the total read" after the read_extent call with another swap. But we neglected to do the restore before the "if (ret) goto err;" - which is a problem if we're retrying those errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-25 11:49:16 -04:00
Kent Overstreet	5e67243ea6	bcachefs: Add missing random.h includes Fix build in userspace, and good hygeine. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-24 09:50:34 -04:00
Kent Overstreet	35de2abc22	bcachefs: __bch2_read() now takes a btree_trans Next patch will be checking if the extent we're reading from matches the IO failure we saw before marking the failure. For this to work, __bch2_read() needs to take the same transaction context that bch2_rbio_retry() uses to do that check. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-24 09:50:34 -04:00
Kent Overstreet	3fb8bacb14	bcachefs: BCH_READ_data_update -> bch_read_bio.data_update Read flags are codepath dependent and change as they're passed around, while the fields in rbio._state are mostly fixed properties of that particular object. Losing track of BCH_READ_data_update would be bad, and previously it was not obvious if it was always correctly set in the rbio, so this is a safety cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-24 09:50:34 -04:00
Kent Overstreet	be31e412ac	bcachefs: Checksum errors get additional retries It's possible for checksum errors to be transient - e.g. flakey controller or cable, thus we need additional retries (besides retrying from different replicas) before we can definitely return an error. This is particularly important for the next patch, which will allow the data move path to move extents with checksum errors - we don't want to accidentally introduce bitrot due to a transient error! - bch2_bkey_pick_read_device() is substantially reworked, and bch2_dev_io_failures is expanded to record more information about the type of failure (i.e. number of checksum errors). It now returns an error code that describes more precisely the reason for the failure - checksum error, io error, or offline device, instead of the previous generic "insufficient devices". This is important for the next patches that add poisoning, as we only want to poison extents when we've got real checksum errors (or perhaps IO errors?) - not because a device was offline. - Add a new option and superblock field for the number of checksum retries. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-16 13:47:55 -04:00
Kent Overstreet	ccba9029b0	bcachefs: Print message on successful read retry Users have been asking for this, and now that errors are returned to the top level read retry path - we can. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-16 13:47:55 -04:00
Kent Overstreet	de73677ff8	bcachefs: Return errors to top level bch2_rbio_retry() Next patch will be adding an additional retry loop for checksum errors, so that we can rule out transient errors before marking an extent as poisoned. Prerequisite to this is returning errors to bch2_rbio_retry(); this will also let us add a "successful retry" message. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-16 13:47:55 -04:00
Kent Overstreet	881b598ef1	bcachefs: BCH_ERR_data_read_buffer_too_small Now that the read path uses proper error codes, we can get rid of the weird rbio->hole signalling to the move path that the read didn't happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-16 13:47:55 -04:00
Kent Overstreet	f4b84bac20	bcachefs: Read error message now indicates if it was for an internal move Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-16 13:47:55 -04:00
Kent Overstreet	e75993b0bf	bcachefs: Fix BCH_ERR_data_read_csum_err_maybe_userspace in retry path When we do a read to a buffer that's mapped into userspace, it's possible to get a spurious checksum error if userspace was modified the buffer at the same time. When we retry those, they have to be bounced before we know definitively whether we're reading corrupt data. But the retry path propagates read flags differently, so needs special handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-16 13:47:55 -04:00
Kent Overstreet	943f0cfb15	bcachefs: Convert read path to standard error codes Kill the READ_ERR/READ_RETRY/READ_RETRY_AVOID enums, and add standard error codes that describe precisely which error occured. This is going to be used for the data move path, to move but poison extents with checksum errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-16 13:47:55 -04:00
Kent Overstreet	5a06cb8000	bcachefs: Debug params for data corruption injection dm-flakey is busted, and this is simpler anyways - this lets us test the checksum error retry ptahs Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-16 13:47:55 -04:00
Kent Overstreet	3526bca36b	bcachefs: bch2_account_io_completion() We need to start accounting successes for every IO, not just failures, so introduce a unified hook for io completion accounting and convert io_read.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:16 -04:00
Kent Overstreet	3480aecd5f	bcachefs: Fix read path io_ref handling We were using our device pointer after we'd released our ref to it. Unlikely to be a race that's practical to hit, since actually removing a member device is a whole process besides just taking it offline, but - needs to be fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:16 -04:00
Kent Overstreet	06284963e3	bcachefs: bch2_inum_offset_err_msg_trans() no longer handles transaction restarts we're starting to use error messages with paths in fsck_errors(), where we do not want nested transaction restart handling, so let's prepare for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:12 -04:00
Kent Overstreet	157ea58341	bcachefs: Read/move path counter work Reorganize counters a bit, grouping related counters together. New counters: - io_read_inline - io_read_hole Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:12 -04:00
Kent Overstreet	f269ae55d2	bcachefs: Scrub Add a new data op to walk all data and metadata in a filesystem, checking if it can be read successfully, and on error repairing from another copy if possible. - New helper: bch2_dev_idx_is_online(), so that we can bail out and report to userspace when we're unable to scrub because the device is offline - data_update_opts, which controls the data move path, now understands scrub: data is only read, not written. The read path is responsible for rewriting on read error, as with other reads. - scrub_pred skips data extents that don't have checksums - bch_ioctl_data has a new scrub member, which has a data_types field for data types to check - i.e. all data types, or only metadata. - Add new entries to bch_move_stats so that we can report numbers for corrected and uncorrected errors - Add a new enum to bch_ioctl_data_event for explicitly reporting completion and return code (i.e. device offline) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:12 -04:00
Kent Overstreet	ca24130ee4	bcachefs: bch2_bkey_pick_read_device() can now specify a device To be used for scrub, where we want the read to come from a specific device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:12 -04:00
Kent Overstreet	dff6de9518	bcachefs: Internal reads can now correct errors Rework the read path so that BCH_READ_NODECODE reads now also self-heal after a read error and a successful retry - prerequisite for scrub. - __bch2_read_endio() now handles a read that's both BCH_READ_NODECODE and a bounce. Normally, we don't want a BCH_READ_NODECODE read to ever allocate a split bch_read_bio: we want to maintain the relationship between the bch_read_bio and the data_update it's embedded in. But correcting read errors requires allocating a split/bounce rbio that's embedded in a promote_op. We do still have a 1-1 relationship, i.e. we only allocate a single split/bounce if it's a BCH_READ_NODECODE, so things hopefully don't get too crazy. - __bch2_read_extent() now is allowed to allocate the promote_op for rewriting after a failed read, even if it's BCH_READ_NODECODE. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:12 -04:00
Kent Overstreet	7b1d655106	bcachefs: Don't self-heal if a data update is already rewriting Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:12 -04:00
Kent Overstreet	4dfb76e0ad	bcachefs: Don't start promotes from bch2_rbio_free() we don't want to block completion of the read - starting a promote calls into the write path, which will block. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:12 -04:00
Kent Overstreet	29ad31c780	bcachefs: Self healing writes are BCH_WRITE_alloc_nowait If a drive is failing and we're moving data off of it, we can't necessairly depend on capacity/disk reservation calculations to avoid deadlocking/blocking on the allocator. And, we don't want to queue up infinite self healing moves anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:11 -04:00
Kent Overstreet	8ff92a9e4e	bcachefs: Promotes should use BCH_WRITE_only_specified_devs Promotes, like most other internal moves, should only go to the specified target and not fall back to allocating from the full filesystem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:11 -04:00
Kent Overstreet	d0148e7169	bcachefs: Be stricter in bch2_read_retry_nodecode() Now that data_update embeds bch_read_bio, BCH_READ_NODECODE means that the read is embedded in a a data_update - and we can check in the retry path if the extent has changed and bail out. This likely fixes some subtle bugs with read errors and data moves. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:11 -04:00
Kent Overstreet	6f7111f820	bcachefs: cleanup redundant code around data_update_op initialization Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:11 -04:00
Kent Overstreet	8f97793d67	bcachefs: promote_op uses embedded bch_read_bio Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:11 -04:00
Kent Overstreet	dfa204b169	bcachefs: rbio_init() cleanup Move more initialization to rbio_init(), to assist in further cleanups. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:11 -04:00
Kent Overstreet	0f856b7228	bcachefs: rbio_init_fragment() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:11 -04:00
Kent Overstreet	14e2523fc5	bcachefs: Rename BCH_WRITE flags fer consistency with other x-macros enums The uppercase/lowercase style is nice for making the namespace explicit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:11 -04:00

1 2 3

101 Commits