linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-08-18 03:39:02 +00:00

Author	SHA1	Message	Date
Brian Foster	bf98ee10d4	bcachefs: folio pos to bch_folio_sector index helper Create a small helper to translate from file offset to the associated bch_folio_sector index in the underlying bch_folio. The helper assumes the file offset is covered by the passed folio. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:00 -04:00
Brian Foster	6b9857b208	bcachefs: use u64 for folio end pos to avoid overflows Some of the folio_end_() helpers are prone to overflow of signed 64-bit types because the mapping is only limited by the max value of loff_t and the associated helpers return the start offset of the next folio. Therefore, a folio_end_pos() of the max allowable folio in a mapping returns a value that overflows loff_t. This makes it hard to rely on such values when doing folio processing across a range of a file, as bcachefs attempts to do with the recent folio changes. For example, generic/564 causes problems in the buffered write path when testing writes at max boundary conditions. The current understanding is that the pagecache historically limited the mapping to one less page to avoid this problem and this was dropped with some of the folio conversions, but may be reinstated to properly address the problem. In the meantime, update the internal folio_end_() helpers in bcachefs to return a u64, and all of the associated code to use or cast to u64 to avoid overflow problems. This allows generic/564 to pass and can be reverted back to using loff_t if at any point the pagecache subsystem can guarantee these boundary conditions will not overflow. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:00 -04:00
Brian Foster	335f7d4f22	bcachefs: clean up post-eof folios on -ENOSPC The buffered write path batches folio creations in the file mapping based on the requested size of the write. Under low free space conditions, it is possible to add a bunch of folios to the mapping and then return a short write or -ENOSPC due to lack of space. If this occurs on an extending write, the file size is updated based on the amount of data successfully written to the file. If folios were added beyond the final i_size, they may hang around until reclaimed, truncated or encountered unexpectedly by another operation. For example, generic/083 reproduces a sequence of events where a short write leaves around one or more post-EOF folios on an inode, a subsequent zero range request extends beyond i_size and overlaps with an aforementioned folio, and __bch2_truncate_folio() happens across it and complains. Update __bch2_buffered_write() to keep track of the start offset of the last folio added to the mapping for a prospective write. After i_size is updated, check whether this offset starts beyond EOF. If so, truncate pagecache beyond the latest EOF to clean up any folios that don't reside at least partially within EOF upon completion of the write. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:00 -04:00
Brian Foster	4ad6aa46e1	bcachefs: fix truncate overflow if folio is beyond EOF generic/083 occasionally reproduces a panic caused by an overflow when accessing the bch_folio_sector array of the folio being processed by __bch2_truncate_folio(). The immediate cause of the overflow is that the folio offset is beyond i_size, and therefore the sector index calculation underflows on subtraction of the folio offset. One cause of this is mainly observed on nocow mounts. When nocow is enabled, fallocate performs physical block allocation (as opposed to block reservation in cow mode), which range_has_data() then interprets as valid data that requires partial zeroing on truncate. Therefore, if a post-eof zero range request lands across post-eof preallocated blocks, __bch2_truncate_folio() may actually create a post-eof folio in order to perform zeroing. To avoid this problem, update range_has_data() to filter out unwritten blocks from folio creation and partial zeroing. Even though we should never create folios beyond EOF like this, the mere existence of such folios is not necessarily a fatal error. Fix up the truncate code to warn about this condition and not overflow the sector array and possibly crash the system. The addition of this warning without the corresponding unwritten extent fix has shown that various other fstests are able to reproduce this problem fairly frequently, but often in ways that doesn't necessarily result in a kernel panic or a change in user observable behavior, and therefore the problem goes undetected. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	34fdcf0632	bcachefs: Check for folios that don't have bch_folio attached With large folios, it's now incidentally possible to end up with a clean, uptodate folio in the page cache that doesn't have a bch_folio attached, if a folio has to be split. This patch fixes __bch2_truncate_folio() to check for this; other code paths appear to handle it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	9567413c82	bcachefs: bch2_readahead() large folio conversion Readahead now uses the new filemap_get_contig_folios_d() helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	40022c0115	bcachefs: filemap_get_contig_folios_d() Add a new helper for getting a range of contiguous folios and returning them in a darray. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	a1774a0564	bcachefs: bch_folio_sector_state improvements - X-macro-ize the bch_folio_sector_state enum: this means we can easily generate strings, which is helpful for debugging. - Add helpers for state transitions: folio_sector_dirty(), folio_sector_undirty(), folio_sector_reserve() - Add folio_sector_set(), a single helper for changing folio sector state just so that we have a single place to instrument when we're debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	959f7368d6	bcachefs: bch2_truncate_page() large folio conversion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	c42b57c451	bcachefs: bch2_buffered_write large folio conversion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	49fe78ff33	bcachefs: bch_folio can now handle multi-order folios Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	33e2eb9677	bcachefs: More assorted large folio conversion Various misc small conversions in fs-io.c for large folios. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	a86a92cb0d	bcachefs: bch2_seek_pagecache_data() folio conversion This converts bch2_seek_pagecache_data() to handle large folios. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	e8d28c3e47	bcachefs: bch2_seek_pagecache_hole() folio conversion This converts bch2_seek_pagecache_hole() to handle large folios. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	ff9c301f28	bcachefs: bio_for_each_segment_all() -> bio_for_each_folio_all() This converts the writepage end_io path to folios. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	30bff5944e	bcachefs: Initial folio conversion This converts fs-io.c to pass folios, not pages. We're not handling large folios yet, there's no functional changes in this patch - just a lot of churn doing the initial type conversions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	3342ac134d	bcachefs: Rename bch_page_state -> bch_folio Start of the large folio conversion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	c437e15379	bcachefs: Add a bch_page_state assert Seeing an odd bug with page/folio state not being properly initialized, this is to help track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	65d48e3525	bcachefs: Private error codes: ENOMEM This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	a8b3a677e7	bcachefs: Nocow support This adds support for nocow mode, where we do writes in-place when possible. Patch components: - New boolean filesystem and inode option, nocow: note that when nocow is enabled, data checksumming and compression are implicitly disabled - To prevent in-place writes from racing with data moves (data_update.c) or bucket reuse (i.e. a bucket being reused and re-allocated while a nocow write is in flight, we have a new locking mechanism. Buckets can be locked for either data update or data move, using a fixed size hash table of two_state_shared locks. We don't have any chaining, meaning updates and moves to different buckets that hash to the same lock will wait unnecessarily - we'll want to watch for this becoming an issue. - The allocator path also needs to check for in-place writes in flight to a given bucket before giving it out: thus we add another counter to bucket_alloc_state so we can track this. - Fsync now may need to issue cache flushes to block devices instead of flushing the journal. We add a device bitmask to bch_inode_info, ei_devs_need_flush, which tracks devices that need to have flushes issued - note that this will lead to unnecessary flushes when other codepaths have already issued flushes, we may want to replace this with a sequence number. - New nocow write path: look up extents, and if they're writable write to them - otherwise fall back to the normal COW write path. XXX: switch to sequence numbers instead of bitmask for devs needing journal flush XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to run in process context - see if we can improve this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	792031116b	bcachefs: Unwritten extents support - bch2_extent_merge checks unwritten bit - read path returns 0s for unwritten extents without actually reading - reflink path skips over unwritten extents - bch2_bkey_ptrs_invalid() checks for extents with both written and unwritten extents, and non-normal extents (stripes, btree ptrs) with unwritten ptrs - fiemap checks for unwritten extents and returns FIEMAP_EXTENT_UNWRITTEN Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	70de7a47e2	bcachefs: bch2_extent_fallocate() This factors out part of __bchfs_fallocate() in fs-io.c into an new, lower level io.c helper, which creates a single extent reservation. This is prep work for nocow support - the new helper will shortly gain the ability to create unwritten extents. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	d94189ad56	bcachefs: Debug mode for c->writes references This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	c72f687a1f	bcachefs: Use for_each_btree_key_upto() more consistently It's important that in BTREE_ITER_FILTER_SNAPSHOTS mode we always use peek_upto() and provide an end for the interval we're searching for - otherwise, when we hit the end of the inode the next inode be in a different subvolume and not have any keys in the current snapshot, and we'd iterate over arbitrarily many keys before returning one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	01ad673727	bcachefs: bch2_inode_opts_get() This improves io_opts() and makes it a non-inline function - it's big enough that it probably shouldn't be. Also, bch_io_opts no longer needs fields for whether options are defined, so we can slim it down a bit. We'd like to stop passing around the full bch_io_opts, but that'll be tricky because of bch2_rebalance_add_key(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	e88a75ebe8	bcachefs: New bpos_cmp(), bkey_cmp() replacements This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	4d868d18e5	bcachefs: More dio inlining Eliminate another function call in the O_DIRECT write path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	7fec8266af	bcachefs: Error message improvement - Centralize format strings in bcachefs.h - Add bch2_fmt_inum_offset() and related helpers - Switch error messages for inodes to also print out the offset, in bytes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	8eb71e9e1a	bcachefs: Improve a few warnings Warnings ought to always have a format string/log message - makes them considerably more useful. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	6b1b186a5a	bcachefs: Minor dio write path improvements This switches where we take quota reservations to be per bch_wirte_op instead of per dio_write, so we can drop the quota reservation in the same place as we call i_sectors_acct(), and only take/release ei_quota_lock once. In the future we'd like ei_quota_lock to not be a mutex, so that we can avoid punting to process context before deliving write completions in nocow mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	a7ecd30c83	bcachefs: Factor out two_state_shared_lock We have a unique lock used for controlling adding to the pagecache: the lock has two states, where both states are shared - the lock may be held multiple times for either state - but not both states at the same time. This is exactly what we need for nocow mode locking, so this patch pulls it out of fs.c into its own file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	a1ee777bfc	bcachefs: Kill BCH_WRITE_FLUSH BCH_WRITE_FLUSH is a write flag that causes a journal flush. It's only used in the direct IO path, and this will allow for some consolidation with the regular fsync path, which will help with the upcoming nocow mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	182c7bbfbf	bcachefs: DIO write path optimization - With BCH_WRITE_SYNC, we no longer need the completion in struct dio_write - Pull out bch2_dio_write_copy_iov() into a separate non-inline function, it's code that doesn't run in the common case - Copy mapping and inode pointers into dio_write, avoiding pointer chasing at the start of bch2_dio_write_loop() - kthread_use_mm() is not needed in the common case; move it into bch2_dio_write_loop_async() - factor out various helpers from bch2_dio_write_loop() and rework control flow for better icache utilization Other small optimizations: - bch2_keylist_free() is only used in one place, at the end of the bch2_write() path - drop the reinit - in bch2_disk_reservation_put(), check if res->sectors is nonzero before touching c->online_reserved, since that will likely be a cache miss Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> bcachefs: More DIO write path optimization Better code prefetching (?) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	1df3e19996	bcachefs: BCH_WRITE_SYNC This adds a new flag for the write path, BCH_WRITE_SYNC, and switches the O_DIRECT write path to use it when we're not running asynchronously. It runs the btree update after the write in the original thread's context instead of a kworker, cutting context switches in half. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	80fe580c8d	bcachefs: Fix a spurious warning Fixes fstests generic/648 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	353448f3ea	bcachefs: Fix buffered write path for generic/275 Per fstests generic/275, on -ENOSPC we're supposed write until the filesystem is full - i.e. do a partial write instead of failing the full write. This is a partial fix for the buffered write path: we'll still fail on a page boundary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	3e3e02e6bc	bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	bd954215ca	bcachefs: Quota fixes - We now correctly allow soft limits to be exceeded, instead of always returning -EDQUOT - Disk quota grate times/warnings can now be set, not just the systemwide defaults Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	07bfcc0b4c	bcachefs: Fix for not dropping privs in fallocate When modifying a file, we may be required to drop the suid/sgid bits - we were missing a file_modified() call to do this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	3a4d3656e5	bcachefs: Fix bch2_write_begin() An error case was jumping to the wrong label, creating an infinite loop - oops. This fixes fstests generic/648. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	e8540e5681	bcachefs: Reflink now respects quotas This adds a new helper, quota_reserve_range(), which takes a quota reservation for unallocated blocks in a given file range, and uses it in bch2_remap_file_range(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	2d848dacb2	bcachefs: Kill io_in_flight semaphore This used to be needed more for buffered IO, but now the block layer has writeback throttling - we can delete this now. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	098ef98d5b	bcachefs: Add private error codes for ENOSPC Continuing the saga of introducing private dedicated error codes for each error path, this patch converts ENOSPC to error codes that are subtypes of ENOSPC. We've recently had a test failure where we got -ENOSPC where we shouldn't have, and didn't have enough information to tell where it came from, so this patch will solve that problem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	5c1ef830f6	bcachefs: Errcodes can now subtype standard error codes The next patch is going to be adding private error codes for all the places we return -ENOSPC. Additionally, this patch updates return paths at all module boundaries to call bch2_err_class(), to return the standard error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	549d173c1b	bcachefs: EINTR -> BCH_ERR_transaction_restart Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	a3d7afa5c1	bcachefs: Always use percpu_ref_tryget_live() on c->writes If we're trying to get a ref and the refcount has been killed, it means we're doing an emergency shutdown - we always want tryget_live(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	facc81479c	bcachefs: Delete bch_writepage Per Dave Chinner and the xfs folks, .writepage is no longer needed, and it's better not to define it if .writepages is the intended path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	b33bf1bc0d	bcachefs: Go emergency RO when i_blocks underflows This improves some of our warnings and assertions - they imply possible filesystem inconsistencies, so they should be calling bch2_fs_inconsistent(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:31 -04:00
Kent Overstreet	7c4ca54ae6	bcachefs: Don't skip triggers in fcollapse() With backpointers this doesn't work anymore - backpointers always need to be updated to point to the new extent position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:31 -04:00
Kent Overstreet	f8494d2535	bcachefs: Convert some WARN_ONs to WARN_ON_ONCE These warnings are symptomatic of something else going wrong, we don't want them spamming up the logs as that'll make it harder to find the real issue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:28 -04:00
Kent Overstreet	9552e19f6f	bcachefs: Fix dio write path with loopback dio mode When the iov_iter is a bvec iter, it's possible the IO was submitted from a kthread that didn't have an mm to switch to. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:27 -04:00
Kent Overstreet	4d126dc8b3	bcachefs: Use bio_iov_vecs_to_alloc() This fixes a bug in the DIO read path where, when using a loopback device in DIO mode, we'd allocate a biovec that would get overwritten and leaked in bio_iov_iter_get_pages() -> bio_iov_bvec_set(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:27 -04:00
Kent Overstreet	eb331fe5a4	bcachefs: Check for stale dirty pointer before reads Since we retry reads when we discover we read from a pointer that went stale, if a dirty pointer is erroniously stale it would cause us to loop retrying that read forever - unless we check before issuing the read, while the btree is still locked, when we know that a dirty pointer should never be stale. This patch adds that check, along with printing some helpful debug info. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:24 -04:00
Kent Overstreet	57cfdd8b54	bcachefs: BTREE_ITER_FILTER_SNAPSHOTS is selected automatically It doesn't have to be specified - this patch deletes the two instances where it was. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:21 -04:00
Kent Overstreet	51c4e406aa	bcachefs: Fix an assertion in bch2_truncate() We recently added an assertion that when we truncate a file to 0, i_blocks should also go to 0 - but that's not necessarily true if we're doing an emergency shutdown, lots of invariants no longer hold true in that case. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	f54788cc8c	bcachefs: Convert a BUG_ON() to a warning A user reported hitting this assertion, and we can't reproduce it yet, but it shouldn't be fatal - so convert it to a warning. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	dcfc593f7b	bcachefs: Fix page state after fallocate This tweaks the fallocate code to also update the page cache to reflect the new on disk reservations, giving us better i_sectors consistency. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	e6ec361f95	bcachefs: Fix page state when reading into !PageUptodate pages This patch adds code to read page state before writing to pages that aren't uptodate, which corrects i_sectors being tempororarily too large and means we may not need to get a disk reservation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> # Conflicts: # fs/bcachefs/fs-io.c	2023-10-22 17:09:17 -04:00
Kent Overstreet	7279c1a24c	bcachefs: Kill PAGE_SECTOR_SHIFT Replace it with the new, standard PAGE_SECTORS_SHIFT Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	084d42bbd6	bcachefs: Apply workaround for too many btree iters to read path Reading from cached data, which calls bch2_bucket_io_time_reset(), is leading to transaction iterator overflows - this standardizes the workaround. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	b44a66a641	bcachefs: SECTOR_DIRTY_RESERVED This fixes another i_sectors accounting bug - we need to differentiate between dirty writes that overwrite a reservation and dirty writes to unallocated space - dirty writes to unallocated space increase i_sectors, dirty writes over a reservation do not. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	b19d307dc1	bcachefs: Fix i_sectors_leak in bch2_truncate_page When bch2_truncate_page() discards dirty sectors in the page cache, we need to account for that - we don't need to account for allocated sectors because that'll be done by the bch2_fpunch() call when it updates the btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	8810386f6b	bcachefs: Fix an i_sectors accounting bug We weren't checking for errors before calling i_sectors_acct() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	f74a5051b0	bcachefs: Don't check for -ENOSPC in page writeback If at all possible we'd prefer to not fail page writeback unless the filesystem has been shutdown; allowing errors in page writeback means things we'd like to assert about i_size consistency between the VFS and the btree go out the window. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:16 -04:00
Kent Overstreet	74163da7c8	bcachefs: Fallocate fixes - fpunch wasn't always correctly updating i_size - when we drop buffered writes that were extending a file, we become responsible for writing i_size. - fzero was sometimes zeroing out more data that it should have - block_start and block_end were being rounded in the wrong directions Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:16 -04:00
Kent Overstreet	68a2054d88	bcachefs: Switch fsync to use bi_journal_seq Now that we're recording in each inode the journal sequence number of the most recent update, fsync becomes a lot simpler and we can delete all the plumbing for ei_journal_seq. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:16 -04:00
Kent Overstreet	e5fa91d7ac	bcachefs: Fix restart handling in for_each_btree_key() Code that uses for_each_btree_key often wants transaction restarts to be handled locally and not returned. Originally, we wouldn't return transaction restarts if there was a single iterator in the transaction - the reasoning being if there weren't other iterators being invalidated, and the current iterator was being advanced/retraversed, there weren't any locks or iterators we were required to preserve. But with the btree_path conversion that approach doesn't work anymore - even when we're using for_each_btree_key() with a single iterator there will still be two paths in the transaction, since we now always preserve the path at the pos the iterator was initialized at - the reason being that on restart we often restart from the same place. And it turns out there's now a lot of for_each_btree_key() uses that _do not_ want transaction restarts handled locally, and should be returning them. This patch splits out for_each_btree_key_norestart() and for_each_btree_key_continue_norestart(), and converts existing users as appropriate. for_each_btree_key(), for_each_btree_key_continue(), and for_each_btree_node() now handle transaction restarts themselves by calling bch2_trans_begin() when necessary - and the old hack to not return transaction restarts when there's a single path in the transaction has been deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:14 -04:00
Kent Overstreet	9a796fdb06	bcachefs: bch2_trans_exit() no longer returns errors Now that peek_node()/next_node() are converted to return errors directly, we don't need bch2_trans_exit() to return errors - it's cleaner this way and wasn't used much anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:14 -04:00
Kent Overstreet	8c6d298ab2	bcachefs: Convert io paths for snapshots This plumbs around the subvolume ID as was done previously for other filesystem code, but now for the IO paths - the control flow in the IO paths is trickier so the changes in this patch are more involved. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:12 -04:00
Kent Overstreet	6fed42bb77	bcachefs: Plumb through subvolume id To implement snapshots, we need every filesystem btree operation (every btree operation without a subvolume) to start by looking up the subvolume and getting the current snapshot ID, with bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode. This patch adds those bch2_subvolume_get_snapshot() calls, and also switches to passing around a subvol_inum instead of just an inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:12 -04:00
Kent Overstreet	67e0dd8f0d	bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:11 -04:00
Kent Overstreet	9f6bd30703	bcachefs: Reduce iter->trans usage Disfavoured, and should go away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:10 -04:00
Kent Overstreet	3737e0ddfb	bcachefs: Fix an unhandled transaction restart __bch2_read() -> __bch2_read_extent() -> bch2_bucket_io_time_reset() may cause a transaction restart, which we don't return an error for because it doesn't prevent us from making forward progress on the read we're submitting. Instead, change __bch2_read() and bchfs_read() to check for transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:10 -04:00
Kent Overstreet	700c25b32a	bcachefs: Use bch2_trans_begin() more consistently Upcoming patch will require that a transaction restart is always immediately followed by bch2_trans_begin(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:09 -04:00
Kent Overstreet	8b3e9bd65f	bcachefs: Always check for transaction restarts On transaction restart iterators won't be locked anymore - make sure we're always checking for errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:09 -04:00
Kent Overstreet	b97bbd4ec3	bcachefs: Use bch2_inode_find_by_inum() in truncate This is needed for snapshots because we need to start handling lock restarts even when just calling bch2_inode_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:09 -04:00
Kent Overstreet	5468f1195d	bcachefs: Fix a memory leak in the dio write path There were some error paths where we were leaking page refs - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:08 -04:00
Dan Robertson	78d66ab1ca	bcachefs: fix truncate without a size change Do not attempt to shortcut a truncate when the given new size is the same as the current size. There may be blocks allocated to the file that extend beyond the i_size. The ctime and mtime should not be updated in this case. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:07 -04:00
Kent Overstreet	68a507a2e8	bcachefs: fix truncate with ATTR_MODE After the v5.12 rebase, we started oopsing when truncate was passed ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This refactors things so that truncate/extend finish by using bch2_setattr_nonsize(), which solves the problem. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:07 -04:00
Kent Overstreet	8c3f6da9fc	bcachefs: Improve iter->should_be_locked Adding iter->should_be_locked introduced a regression where it ended up not being set on the iterator passed to bch2_btree_update_start(), which is definitely not what we want. This patch requires it to be set when calling bch2_trans_update(), and adds various fixups to make that happen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	2ed5cd508d	bcachefs: Fix a memory leak in dio write path Commit `c42bca92be` "bio: don't copy bvec for direct IO" changed bio_iov_iter_get_pages() to point bio->bi_iovec at the incoming biovec, meaning if we already allocated one, it'll be leaked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	f7beb4ca04	bcachefs: Preallocate transaction mem This helps avoid transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	9f311f2166	bcachefs: Don't use bch_write_op->cl for delivering completions We already had op->end_io as an alternative mechanism to op->cl.parent for delivering write completions; this switches all code paths to using op->end_io. Two reasons: - op->end_io is more efficient, due to fewer atomic ops, this completes the conversion that was originally only done for the direct IO path. - We'll be restructing the write path to use a different mechanism for punting to process context, refactoring to not use op->cl will make that easier. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:04 -04:00
Kent Overstreet	a6336910b1	bcachefs: Fix for buffered writes getting -ENOSPC Buffered writes may have to increase their disk reservation at btree update time, due to compression and erasure coding being unpredictable: O_DIRECT writes should be checking for -ENOSPC, but buffered writes have already been accepted and should not. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	e7084c9c81	bcachefs: Make bch2_remap_range respect O_SYNC Caught by xfstest generic/628 Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	ef1b20924b	bcachefs: Ratelimiting for writeback IOs Writeback throttling is a kernel config option and not always enabled. When it's not enabled we need a fallback, to avoid unbounded memory pinning and work item backlogs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	050197b1c1	bcachefs: Ensure that fpunch updates inode timestamps Fixes xfstests generic/059 Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	694015c2b1	bcachefs: Refactor bchfs_fallocate() to not nest btree_trans on stack Upcoming patch is going to disallow multiple btree_trans on the stack. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	50dc0f692a	bcachefs: Require all btree iterators to be freed We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:56 -04:00
Kent Overstreet	87a432f5d7	bcachefs: Kill reflink option An option was added to control whether reflink support was on or off because for a long time, reflink + inline data extent support was missing - but that's since been fixed, so we can drop the option now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:56 -04:00
Kent Overstreet	5ff75ccbbc	bcachefs: Fix read retry path for indirect extents In the read path, for retry of indirect extents to work we need to differentiate between the location in the btree the read was for, vs. the location where we found the data. This patch adds that plumbing to bch_read_bio. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:56 -04:00
Kent Overstreet	41f8b09edc	bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	3d4955952f	bcachefs: Fix bch2_btree_iter_peek_prev() This makes bch2_btree_iter_peek_prev() and bch2_btree_iter_prev() consistent with peek() and next(), w.r.t. iter->pos. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	b4725cc1a4	bcachefs: Fix loopback in dio mode We had a deadlock on page_lock, because buffered reads signal completion by unlocking the page, but the dio read path normally dirties the pages it's reading to with set_page_dirty_lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	032ac32c51	bcachefs: Fix .splice_write Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	07a1006ae8	bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	35a067b42d	bcachefs: Change when we allow overwrites Originally, we'd check for -ENOSPC when getting a disk reservation whenever the new extent took up more space on disk than the old extent. Erasure coding screwed this up, because with erasure coding writes are initially replicated, and then in the background the extra replicas are dropped when the stripe is created. This means that with erasure coding enabled, writes will always take up more space on disk than the data they're overwriting - but, according to posix, overwrites aren't supposed to return ENOSPC. So, in this patch we fudge things: if the new extent has more replicas than the _effective_ replicas of the old extent, or if the old extent is compressed and the new one isn't, we check for ENOSPC when getting the disk reservation - otherwise, we don't. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	f30dd86012	bcachefs: Don't write bucket IO time lazily With the btree key cache code, we don't need to update the alloc btree lazily - and this will mean we can remove the bch2_alloc_write() call in the shutdown path. Future work: we really need to expend the bucket IO clocks from 16 to 64 bits, so that we don't have to rescale them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	33c74e4119	bcachefs: Flag inodes that had btree update errors On write error, the vfs inode's i_size may be inconsistent with the btree inode's i_size - flag this so we don't have spurious assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	0fefe8d8ef	bcachefs: Improve some IO error messages it's useful to know whether an error was for a read or a write - this also standardizes error messages a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00

1 2 3 4 5 ...

257 Commits