This fixes assertion in brt_sync_table() on debug builds when last
cloned block on the vdev is freed and bv_meta_dirty is cleared,
while bv_entcount_dirty is not. Should not matter in production.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16791
Update the META file to reflect compatibility with the 6.12 kernel.
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16793
Previous implementation of zap_leaf_array_free() put chunks on the
free list in reverse order. Also zap_leaf_transfer_entry() and
zap_entry_remove() were freeing name and value arrays in reverse
order. Together this created a mess in the free list, making
following allocations much more fragmented than necessary.
This patch re-implements zap_leaf_array_free() to keep existing
chunks order, and implements non-destructive zap_leaf_array_copy()
to be used in zap_leaf_transfer_entry() to allow properly ordered
freeing name and value arrays there and in zap_entry_remove().
With this change test of some writes and deletes shows percent of
non-contiguous chunks in DDT reducing from 61% and 47% to 0% and
17% for arrays and frees respectively. Sure some explicit sorting
could do even better, especially for ZAPs with variable-size arrays,
but it would also cost much more, while this should be very cheap.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16766
The current "Requires" lines only ensure the old kernel is
available on the system but it does not prevent fedora from
updating to an incompatible and breaking user's system.
Set Conflicts to block incompatible kernels from being installed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: tleydxdy <shironeko.github@tesaguri.club>
Closes#16139
zio_delay_interrupt(), apparently used for fault injection, is executed
in the I/O pipeline. It can cause the calling thread to go to sleep,
which is not allowed on FreeBSD. This happens only for small delays,
though, and there's no apparent reason to avoid deferring to a taskqueue
in that case, as it already does otherwise.
Simply go to sleep unconditionally. This fixes an occasional panic I
see when running the ZTS on FreeBSD. Also remove an unhelpful comment
referencing the non-existent timeout_generic().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#16785
- With both pending and current AVL-trees being per-vdev and having
effectively identical comparison functions (pending tree compared
also birth time, but I don't believe it is possible for them to be
different for the same offset within one transaction group), it
makes no sense to move entries from one to another. Instead inline
dramatically simplified brt_entry_addref() into brt_pending_apply().
It no longer requires bv_lock, since there is nothing concurrent
to it at the time. And it does not need to search the tree for the
previous entries, since it is the same tree, we already have the
entry and we know it is unique.
- Put brt_vdev_lookup() and brt_vdev_addref() into different tree
traversals to avoid false positives in the first due to the second
entcount modifications. It saves dramatic amount of time when a
file cloned first time by not looking for non-existent ZAP entries.
- Remove avl_is_empty(bv_tree) check from brt_maybe_exists(). I
don't think it is needed, since by the time all added entries are
already accounted in bv_entcount. The extra check must be producing
too many false positives for no reason. Also we don't need bv_lock
there, since bv_entcount pointer must be table at this point, and
we don't care about false positive races here, while false negative
should be impossible, since all brt_vdev_addref() have already
completed by this point. This dramatically reduces lock contention
on massive deletes of cloned blocks. The only remaining one is
between multiple parallel free threads calling brt_entry_decref().
- Do not update ZAP if net change for a block over the TXG was 0.
In combination with above it makes file move between datasets as
cheap operation as originally intended if it fits into one TXG.
- Do not allocate vdevs on pool creation or import if it did not
have active block cloning. This allows to save a bit in few cases.
- While here, add proper error handling in brt_load() on pool
import instead of assertions.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16773
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment. Additionally, clear the fault injection as
soon as it is no longer required for the test case.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16769
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16714Closes#16782
Those values require global atomics to get current hash_elements
values in few of the hottest code paths, while in all the years I
never cared about it. If somebody wants, it should be easy to
get it by periodic sampling, since neither ARC header nor DBUF
counts change so fast that it would be difficult to catch.
For now I've left hash_elements_max kstat for ARC, since it was
used/reported by arc_summary and it would break older versions,
but now it just reports the current value.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16759
Compression names actually aren't used in dedup table names, but
checksum names are.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16776
Also fix comment cross-referencing to zpool.8.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Steve Mokris <smokris@softpixel.com>
Closes#16777
It should be __VA_ARGS__, not __VA_ARGS.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16780
When an OFFLINE device is physically removed, a spare is automatically
activated. However, this behavior differs in FreeBSD, where we do not
transition from OFFLINE state to REMOVED.
Our support team has encountered cases where customers experienced
unexpected behavior during drive replacements, with multiple spares
activating for the same VDEV due to a single disk replacement. This
patch ensures that a drive in an OFFLINE state remains in that state,
preventing it from transitioning to REMOVED and being automatically
replaced by a spare.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#16751
While block cloning operation from the beginning was made per-vdev,
before this change most of its data were protected by two pool-
wide locks. It created lots of lock contention in many workload.
This change makes most of block cloning data structures per-vdev,
which allows to lock them separately. The only pool-wide lock now
it spa_brt_lock, protecting array of per-vdev pointers and in most
cases taken as reader. Also this splits per-vdev locks into three
different ones: bv_pending_lock protects the AVL-tree of pending
operations in open context, bv_mos_entries_lock protects BRT ZAP
object from while being prefetched, and bv_lock protects the rest
of per-vdev context during TXG commit process. There should be
no functional difference aside of some optimizations.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16740
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16740
We are doing exactly the same checks around all brt_pending_add().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16740
If we write less than 113 bytes with enabled compression we get
embeded block, which then fails check for number of cloned blocks
in bclone_test.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16740
Welcome to the party 🎉
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#16762
This patch fixes compilation with uClibc by applying the same fallback
as commit e12d76176d to the `getversion.c`
file, which was previously overlooked.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes#16735Closes#16741
Since zvol read and write can process up to (DMU_MAX_ACCESS / 2) bytes
in a single operation, the current optimal I/O size is too low. SCST
directly reports this value as the optimal transfer length for the
target SCSI device. Increasing it from the previous volblocksize results
in performance improvement for large block parallel I/O workloads.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#16750
- If we don't want dmu_read_pages() to perform extra readahead/behind,
pass a pointer to 0 instead of a null pointer, as dum_read_pages()
expects rahead and rbehind to be non-null.
- Avoid unneeded iterations in a loop.
Sponsored-by: Klara, Inc.
Reported-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#16758
dsl_free() calls zio_free() to free the block. For most blocks, this
simply calls metaslab_free() without doing any IO or putting anything on
the IO pipeline.
Some blocks however require additional IO to free. This at least
includes gang, dedup and cloned blocks. For those, zio_free() will issue
a ZIO_TYPE_FREE IO and return.
If a huge number of blocks are being freed all at once, it's possible
for dsl_dataset_block_kill() to be called millions of time on a single
transaction (eg a 2T object of 128K blocks is 16M blocks). If those are
all IO-inducing frees, that then becomes 16M FREE IOs placed on the
pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T
object that requires a 20G allocation of resident memory from the
zio_cache. If that can't be satisfied by the kernel, an out-of-memory
condition is raised.
This would be better handled by improving the cases that the
dmu_tx_assign() throttle will handle, or by reducing the overheads
required by the IO pipeline, or with a better central facility for
freeing blocks.
For now, we simply check for the cases that would cause zio_free() to
create a FREE IO, and instead put the block on the pool's freelist. This
is the same place that blocks from destroyed datasets go, and the async
destroy machinery will automatically see them and trickle them out as
normal.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#6783Closes#16708Closes#16722Closes#16697
..., before we make the header or the log block visible to others.
It should fix assertion on allocated space going negative if the
header is freed once the lock is dropped, while the write is still
going.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16040Closes#16743
As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible. However, this is incompatible with direct I/O.
Instead, release the busy lock before trying to acquire the rangelock in
blocking mode. This means that it's possible for the page to be
replaced, so we have to re-lookup.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#16643
mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.
I think this is quite unlikely to happen in practice, it was noticed by
code inspection.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#16643
In zpool_get_user_prop, when called from zpool_expand_proplist and
collect_pool, we often have zpool_props present in zpool_handle_t equal
to NULL. This mostly happens when only one user property is requested
using zpool list -o <user_property>. Checking for this case and
correctly initializing the zpool_props field in zpool_handle_t fixes
this issue.
Interestingly, this issue does not occur if we query any other property
like name or guid along with a user property with -o flag because while
accessing properties like guid, zpool_prop_get_int is called which
checks for this case specifically and calls zpool_get_all_props.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#16734
This commit fixes JSON output for zpool list when user properties are
requested with -o flag. This case needed to be handled specifically
since zpool_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#16734
This commit fixes JSON output for zfs list when user properties are
requested with -o flag. This case needed to be handled specifically
since zfs_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#16732
When building on musl, we get:
```
In file included from tests/zfs-tests/cmd/getversion.c:22:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
#include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
In file included from module/os/linux/zfs/vdev_file.c:36:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
#include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
```
Bug: https://bugs.gentoo.org/925235
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sam James <sam@gentoo.org>
Closes#15925
a10e552 updated abd_free_linear_page() to no longer call
abd_update_scatter_stat(). This meant that linear pages that were not
attached to Direct I/O requests were not doing waste accounting for the
ARC. This led to performance issues due to incorrect ARC accounting that
resulted in 100% of CPU time being spent in arc_evict() during prolonged
I/O workloads with the ARC.
The call to abd_update_scatter_stats() is now conditionally called in
abd_free_linear_page() when the ABD is not from a Direct I/O request.
Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes#16729
Currently, even though send_reader_thread prefetches spill block,
do_dump() will not use it and issues its own blocking arc_read. This
causes significant performance degradation when sending datasets with
lots of spill blocks.
For unmodified spill blocks, we also create send_range struct for them
in send_reader_thread and issue prefetches for them. We piggyback them
on the dnode send_range instead of enqueueing them so we don't break
send_range_after check.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: david.chen <david.chen@nutanix.com>
Closes#16701
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes#16568Closes#16723
This reverts commit b052035990.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes#16568Closes#16723
- Set/remove "Work in Progress"/"Code Review Needed" for drafts.
- Remove "Accepted", "Inactive", "Revision Needed" and "Stale" on
pushes and reopens.
I hope this reduce chances of PRs being forgotten after requested
modifications done due to stale labels. It is better to have no
labels than incorrect ones saying there is nothing to look at.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16721
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Sietse <sietse@wizdom.nu>
Co-authored-by: Sietse <sietse@wizdom.nu>
Closes#16705Closes#16717
Small block workloads may use a very large number of dirty records.
During simple block cloning test due to BRT still using 4KB blocks
I can easily see up to 2.5M of those used. Before this change
dbuf_dirty_record_t structures representing them were allocated via
kmem_zalloc(), that rounded their size up to 512 bytes.
Introduction of specialized kmem cache allows to reduce the size
from 512 to 408 bytes. Additionally, since override and raw params
in dirty records are mutually exclusive, puting them into a union
allows to reduce structure size down to 368 bytes, increasing the
saving to 28%, that can be a 0.5GB or more of RAM.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16694
I think we've done enough experiments.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#16189Closes#16712
Fedora 41 was released 10/29/24, and Fedora 39 will be EOL on 11/12/24.
Update Fedora runners in the test suite. Some minor tweaks also needed
to support ksh 1.0.10.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#16700
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#16692
I was surprised to discover today that `zpool online` and
`zpool offline` don't print any information about why they failed in
many cases, they just return 1 with no information about why.
Let's improve that where we can without changing the library function.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#16244
Now that we can handle these different alignments, we don't this
workaround.
This reverts commit aefc2da8a5.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16687
Freeing an ABD can take sleeping locks to update various stats. We
aren't allowed to sleep on an interrupt handler. So, move the free off
to the io_done callback.
We should never have been freeing things in the interrupt handler, but
we got away with it because we were usually freeing a linear ABD, which
at most is returning two objects to a cache and never sleeping. Scatter
ABDs can be used now, and those have more complex locking.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16687
It seems out our notion of "properly" aligned IO was incomplete. In
particular, dm-crypt does its own splitting, and assumes that a logical
block will never cross an order-0 page boundary (ie, the physical page
size, not compound size). This effectively means that it needs to be
possible to split a BIO at any page or block size boundary and have it
work correctly.
This updates the alignment check function to enforce these rules (to the
extent possible).
Our response to misaligned data is to make some new allocation that is
properly aligned, and copy the data into it. It turns out that
linearising (via abd_borrow_buf()) is not enough, because we allocate eg
4K blocks from a general purpose slab, and so may receive (or already
have) a 4K block that crosses pages.
So instead, we allocate a new ABD, which is guaranteed to be aligned
properly to block sizes, and then copy everything into it, and back out
on the way back.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16687#16631#15646#15533#14533
While reading some code @grwilson came across the above function that
seemingly had no consumers besides a ztest callback that ensures that
the tx_callback infrastructure works correctly. It turns out that Lustre
is the main (and potentially the only) consumer of this. Refer to
`osd_trans_commit_cb` of `lustre/osd-zfs/osd_handler.c` in the Lustre
repo for more info. Let's add a comment highlighting this before someone
removes it by mistake.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheimd@gmail.com>
Closes#16698
If on the first open device's logical ashift is bigger than set
by pool's ashift property, ignore the last as unusable instead of
creating vdev that will fail most of I/Os due to misalignment.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16690
In FreeBSD's `zio_do_crypt_data()`, ensure that two `struct uio`
variables are cleared before copying data out of them. This avoids
accessing garbage data, and fixes gcc `-Wuninitialized` warnings.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes#16688