Commit Graph

9702 Commits

Author SHA1 Message Date
Robert Evans
47b7dc976b Add Makefile dependencies for scripts/zfs-tests.sh -c
This updates the Makefile to be more correct for parallel make.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16030
Closes #16922
2025-01-04 11:58:15 -08:00
Toomas Soome
125731436d ZTS: checkpoint_discard_busy should use save_tunable/restore_tunable
Instead of using hardwired value for SPA_DISCARD_MEMORY_LIMIT,
use save_tunable and restore_tunable to restore the pre-test state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16919
2025-01-03 15:23:49 -08:00
Rob Norris
4425a7bb85 vdev_open: clear async remove flag after reopen
It's possible for a vdev to be flagged for async remove after the pool
has suspended. If the removed device has been returned when the pool is
resumed, the ASYNC_REMOVE task will still run at the end of txg, and
remove the device from the pool again.

To fix, we clear the async remove flag at reopen, just as we did for the
async fault flag in 5de3ac223.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16921
2025-01-03 15:23:49 -08:00
Toomas Soome
e47b033eae ZTS: remove unused TESTDIRS from pam/cleanup.ksh
Remove TESTDIRS as it is not set for pam tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16920
2025-01-03 15:23:49 -08:00
pstef
cfec8f13a2 zfs_vnops_os.c: fallocate is valid but not supported on FreeBSD
This works around
/usr/lib/go-1.18/pkg/tool/linux_amd64/link:
mapping output file failed: invalid argument

It's happened to me under a Linux jail, but it's also happened to other
people, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270247#c4

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: pstef <pstef@users.noreply.github.com>
Closes #16918
2025-01-03 15:23:49 -08:00
Toomas Soome
997db7a7fc ZTS: checkpoint_discard_busy does not set 16M on cleanup
Originally hex value is used as decimal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16917
2025-01-02 17:04:10 -08:00
Toomas Soome
e411081aa0 ZTS: functional/mount scripts are not removing /var/tmp/testdir.X dirs
cleanup.ksh is assuming we have TESTDIRS set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16915
2025-01-02 17:04:10 -08:00
Toomas Soome
a55b6fe94a ZTS: zfs_mount_all_fail leaves /var/tmp/testrootPIDNUM directory around
Before we can remove test files, we need to unmount datasets
used by test first.

See also: zfs_mount_all_mountpoints.ksh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16914
2025-01-02 17:04:10 -08:00
James Reilly
939e9f0b6a ZTS: add centos stream10 (#16904)
Added centos as optional runners via workflow_dispatch

removed centos-stream9 from the FULL_OS runner list as CentOS is not
officially support by ZFS. This commit will add preliminary support for
EL10 and allow testing ZFS ahead of EL10 codebase solidifying in ~6
months

Signed-off-by: James Reilly <jreilly1821@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2025-01-02 17:04:10 -08:00
Andrew Walker
679b164cd3 Add missing zfs_exit() when snapdir is disabled (#16912)
zfs_vget doesn't zfs_exit when erroring out due to snapdir
being disabled.

Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Reviewed-by: @bmeagherix
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-01-02 17:04:10 -08:00
shodanshok
c2d9494f99 set zfs_arc_shrinker_limit to 0 by default
zfs_arc_shrinker_limit was introduced to avoid ARC collapse due to
aggressive kernel reclaim. While useful, the current default (10000) is
too prone to OOM especially when MGLRU-enabled kernels with default
min_ttl_ms are used. Even when no OOM happens, it often causes too much
swap usage.

This patch sets zfs_arc_shrinker_limit=0 to not ignore kernel reclaim
requests. ARC now plays better with both kernel shrinker and pagecache
but, should ARC collapse happen again, MGLRU behavior can be tuned or
even disabled.

Anyway, zfs should not cause OOM when ARC can be released.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16909
2024-12-29 11:53:45 -08:00
Ameer Hamza
b952e061df zvol: implement platform-independent part of block cloning
In Linux, block devices currently lack support for `copy_file_range`
API because the kernel does not provide the necessary functionality.
However, there is an ongoing upstream effort to address this
limitation: https://patchwork.kernel.org/project/dm-devel/cover/20240520102033.9361-1-nj.shetty@samsung.com/.
We have adopted this upstream kernel patch into the TrueNAS kernel and
made some additional modifications to enable block cloning specifically
for the zvol block device. This patch implements the platform-
independent portions of these changes for inclusion in OpenZFS.
This patch does not introduce any new functionality directly into
OpenZFS. The `TX_CLONE_RANGE` replay capability is only relevant when
zvols are migrated to non-TrueNAS systems that support Clone Range
replay in the ZIL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16901
2024-12-29 11:53:45 -08:00
Alexander Motin
0fea7fc109 ZTS: Reduce file size in redacted_panic to 1GB
This test takes 3 minutes on RELEASE FreeBSD bots, but on CURRENT,
probably due to debugging it has in kernel, it does not complete
within 10 minutes, ending up killed.  As I see all the redacting
here happens within the first ~128MB of the file, so I hope it
won't matter if there is 1GB of data instead of 2GB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin
0f6d955a35 ZTS: Remove procfs use from zpool_import_status
procfs might be not mounted on FreeBSD.  Plus checking for specific
PID might be not exactly reliable.  Check for empty list of jobs
instead.

Premature loop exit can result in failed test and failed cleanup,
failing also some following tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin
c3d2412b05 ZTS: Remove non-standard awk hex numbers usage
FreeBSD recently removed non-standard hex numbers support from awk.
Neither it supports -n argument, enabling it in gawk.  Instead of
depending on those rewrite list_file_blocks() fuction to handle the
hex math in shell instead of awk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Rob Norris
74064cb175 zpool_get_vdev_prop_value: show missing vdev userprops
If a vdev userprop is not found, present it as value '-', default
source, so it matches the output from pool userprops.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16887
2024-12-29 11:53:45 -08:00
Alexander Motin
30b97ce218 ZTS: Increase write sizes for RAIDZ/dRAID tests
Many RAIDZ/dRAID tests filled files doing millions of 100 or even
10 byte writes.  It makes very little sense since we are not
micro-benchmarking syscalls or VFS layer here, while before the
blocks reach the vdev layer absolute majority of the small writes
will be aggregated.  In some cases I see we spend almost as much
time creating the test files as actually running the tests.  And
sometimes the tests even time out after that.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16905
2024-12-29 11:53:45 -08:00
Rob Norris
9519e7ebcc microzap: set hard upper limit of 1M
The count of chunks in a microzap block is stored as an uint16_t
(mze_chunkid). Each chunk is 64 bytes, and the first is used to store a
header, so there are 32767 usable chunks, which is just under 2M. 1M is
the largest power-2-rounded block size under 2M, so we must set the
limit there.

If it goes higher, the loop in mzap_addent can overflow and fall into
the PANIC case.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16888
2024-12-29 11:53:45 -08:00
Alexander Motin
f9b02fe7e3 Fix readonly check for vdev user properties
VDEV_PROP_USERPROP is equal do VDEV_PROP_INVAL and so is not a real
property.  That's why vdev_prop_readonly() does not work right for
it.  In particular it may declare all vdev user properties readonly
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16890
2024-12-29 11:53:45 -08:00
Umer Saleem
cb8da70329 Skip iterating over snapshots for share properties
Setting sharenfs and sharesmb properties on a dataset can become costly
if there are large number of snapshots, since setting the share
properties iterates over all snapshots present for a dataset. If it is
the root dataset for which we are trying to set the share property,
snapshots for all child datasets and their children will also be
iterated.

There is no need to iterate over snapshots for share properties
because we do not allow share properties or any other property,
to be set on a snapshot itself execpt for user properties.

This commit skips iterating over snapshots for share properties,
instead iterate over all child dataset and their children for share
properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16877
2024-12-29 11:53:45 -08:00
Rob Norris
c944c46a98 zfs_main: fix alignment on props usage output
I guess we've got some long property names since this was first set up!

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16883
2024-12-29 11:53:45 -08:00
Tino Reichardt
166a7bc602 CI: Fix FreeBSD 13.4 STABLE build
In #16869 we added FreeBSD 13.4 STABLE, but forget the special
thing, that the virtio nic within FreeBSD 13.x is buggy.

This fix adds the needed rtl8139 nic to the VM.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16885
2024-12-29 11:53:45 -08:00
Rob Norris
e90124a7c8 zprop: fix value help for ZPOOL_PROP_CAPACITY
It's a percentage and documented as such, but we were showing it as
<size>.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16881
2024-12-29 11:53:45 -08:00
Brian Behlendorf
18b3bea861 CI: Add FreeBSD 14.2 RELEASE+STABLE builds
Update the CI to include FreeBSD 14.2 as a regularly tested platform.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16869
2024-12-29 11:53:45 -08:00
Brian Atkinson
d67eb17e27 Use pin_user_pages API for Direct I/O requests
As of kernel v5.8, pin_user_pages* interfaced were introduced. These
interfaces use the FOLL_PIN flag. This is preferred interface now for
Direct I/O requests in the kernel. The reasoning for using this new
interface for Direct I/O requests is explained in the kernel
documenetation:
Documentation/core-api/pin_user_pages.rst

If pin_user_pages_unlocked is available, the all Direct I/O requests
will use this new API to stay uptodate with the kernel API requirements.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16856
2024-12-16 10:26:52 -08:00
Brian Atkinson
1862c1c0a8 Removing old code outside of 4.18 kernsls
There were checks still in place to verify we could completely use
iov_iter's on the Linux side. All interfaces are available as of kernel
4.18, so there is no reason to check whether we should use that
interface at this point. This PR completely removes the UIO_USERSPACE
type. It also removes the check for the direct_IO interface checks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16856
2024-12-16 10:26:49 -08:00
Shengqi Chen
b57f53036d simd_stat: fix undefined CONFIG_KERNEL_MODE_NEON error on armel
CONFIG_KERNEL_MODE_NEON depends on CONFIG_NEON. Neither is defined
on armel. Add a guard to avoid compilation errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16871
2024-12-16 10:26:45 -08:00
Brian Behlendorf
4b8bf3c48a Fix stray "no" in configure output
This is purely a cosmetic fix which removes a stray "no" from
the configure output.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16867
2024-12-16 10:26:42 -08:00
Alexander Motin
696943533c Fix use-afer-free regression in RAIDZ expansion
We should not dereference rra after the last zio_nowait() is called.
It seems very unlikely, but ASAN in ztest managed to catch it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16868
2024-12-16 10:26:39 -08:00
kotauskas
2284a61129 Remount datasets on soft-reboot
The one-shot zfs-mount.service is incorrectly deemed active by 
Systemd after a systemctl soft-reboot. As such, soft-rebooting
prevents zfs mount -a from being ran automatically.

This commit makes it so that zfs-mount.service is marked as being 
undone by the time umount.target is reached, so that zfs.target then 
pulls it in again and gets it restarted after a soft reboot.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: kotauskas <v.toncharov@gmail.com>
Closes #16845
2024-12-16 10:26:35 -08:00
Rob Norris
e1833a72f9 flush: only detect lack of flush support in one place
It seems there's no good reason for vdev_disk & vdev_geom to explicitly
detect no support for flush and set vdev_nowritecache.  Instead, just
signal it by setting the error to ENOTSUP, and let zio_vdev_io_assess()
take care of it in one place.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16855
2024-12-16 10:26:30 -08:00
Rob Norris
5bb034f533 flush: don't report flush error when disabling flush support
The first time a device returns ENOTSUP in repsonse to a flush request,
we set vdev_nowritecache so we don't issue flushes in the future and
instead just pretend the succeeded. However, we still return an error
for the initial flush, even though we just decided such errors are
meaningless!

So, when setting vdev_nowritecache in response to a flush error, also
reset the error code to assume success.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16855
2024-12-16 10:26:27 -08:00
Poscat
1e08e49a28 build: use correct bashcompletiondir on arch
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: poscat <poscat@poscat.moe>
Closes #16861
2024-12-16 10:26:23 -08:00
Rob Norris
6604fe9a06 backtrace: fix off-by-one on string output
sizeof("foo") includes the trailing null byte, so all the output had
nulls through it. Most terminals quietly ignore it, but it makes some
tools misdetect file types and other annoyances.

Easy fix: subtract 1.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16862
2024-12-16 10:26:20 -08:00
Brian Behlendorf
7cbe7bbbd4 Tag 2.3.0-rc4
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-12-12 16:20:30 -08:00
Chunwei Chen
2dcc8fe035 Fix DR_OVERRIDDEN use-after-free race in dbuf_sync_leaf
In dbuf_sync_leaf, we clone the arc_buf in dr if we share it with db
except for overridden case. However, this exception causes a race where
dbuf_new_size could free the arc_buf after the last dereference of
*datap and causes use-after-free. We fix this by cloning the buf
regardless if it's overridden.

The race:
--
P0                                     P1

                                       dbuf_hold_impl()
                                         // dbuf_hold_copy passed
                                         // because db_data_pending NULL

dbuf_sync_leaf()
  // doesn't clone *datap
  // *datap derefed to db_buf
  dbuf_write(*datap)

                                       dbuf_new_size()
                                         dmu_buf_will_dirty()
                                           dbuf_fix_old_data()
                                             // alloc new buf for P0 dr
                                             // but can't change *datap

                                         arc_alloc_buf()
                                         arc_buf_destroy()
                                           // alloc new buf for db_buf
                                           // and destroy old buf

  dbuf_write() // continue
    abd_get_from_buf(data->b_data,
    arc_buf_size(data))
      // use-after-free
--

Here's an example when it happens:

BUG: kernel NULL pointer dereference, address: 000000000000002e
RIP: 0010:arc_buf_size+0x1c/0x30 [zfs]
Call Trace:
 dbuf_write+0x3ff/0x580 [zfs]
 dbuf_sync_leaf+0x13c/0x530 [zfs]
 dbuf_sync_list+0xbf/0x120 [zfs]
 dnode_sync+0x3ea/0x7a0 [zfs]
 sync_dnodes_task+0x71/0xa0 [zfs]
 taskq_thread+0x2b8/0x4e0 [spl]
 kthread+0x112/0x130
 ret_from_fork+0x1f/0x30

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #16854
2024-12-12 16:20:30 -08:00
Alexander Motin
38875918d8 BRT: Check bv_mos_entries in brt_entry_lookup()
When vdev first sees some block cloning, there is a window when
brt_maybe_exists() might already return true since something was
cloned, but bv_mos_entries is still 0 since BRT ZAP was not yet
created.  In such case we should not try to look into the ZAP
and dereference NULL bv_mos_entries_dnode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16851
2024-12-12 16:20:30 -08:00
Rob Norris
0d51852ec7 Remove unnecessary CSTYLED escapes on top-level macro invocations
cstyle can handle these cases now, so we don't need to disable it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
73a73cba71 cstyle: ignore old non-POSIX types in macro invocations
In code generation macros, we often use names like `uint` when
constructing handler functions. These are not being used as types, so
exclude them from the admonishment to use POSIX type names.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
87947f2440 cstyle: understand macro params can be empty
It's not uncommon to have empty parameters in code generator macros,
usually when multiple parameters are concatenated or stringified into a
single token or literal. So, exclude the space-before-comma check, which
will allow construction like `MACRO_CALL(foo, , baz)`.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
0e87150b6c cstyle: understand basic top-level macro invocations
We quite often invoke macros outside of functions, usually to generate
functions or data. cstyle inteprets these as function headers, which at
least have opinions for indenting.

This introduces a separate state for top-level macro invocations, and
excludes it from matching functions. For the moment, most of the
existing rules will continue to apply, but this gives us a way to add or
removes rules targeting macros specifically.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Alexander Motin
7742e29387 Optimize RAIDZ expansion
- Instead of copying one ashift-sized block per ZIO, copy as much
as we have contiguous data up to 16MB per old vdev.  To avoid data
moves use gang ABDs, so that read ZIOs can directly fill buffers
for write ZIOs.  ABDs have much smaller overhead than ZIOs in both
memory usage and processing time, plus big I/Os do not depend on
I/O aggregation and scheduling to reach decent performance on HDDs.
 - Reduce raidz_expand_max_copy_bytes to 16MB on 32bit platforms.
 - Use 32bit range tree when possible (practically always now) to
slightly reduce memory usage.
 - Use ZIO_PRIORITY_REMOVAL for early stages of expansion, same as
for main ones.
 - Fix rate overflows in `zpool status` reporting.

With these changes expanding RAIDZ1 from 4 to 5 children I am able
to reach 6-12GB/s rate on SSDs and ~500MB/s on HDDs, both are
limited by devices instead of CPU.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15680
Closes #16819
2024-12-06 09:05:02 -08:00
Alexander Motin
f54052a122 Fix false assertion in dmu_tx_dirty_buf() on cloning
Same as writes block cloning can increase block size and number of
indirection levels.  That means it can dirty block 0 at level 0 or
at new top indirection level without explicitly holding them.

A block cloning test case for large offsets has been added.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16825
2024-12-05 11:49:06 -08:00
Rob Norris
d874f27776 zdb_il: use flex array member to access ZIL records
In 6f50f8e16 we added flex arrays to lr_XX_t structs to silence kernel
bounds check warnings. Userspace code was mostly not updated to use them
though.

It seems that in the right circumstances, compilers can get confused
about sizes in the same way, and throw warnings. This commits switch
those uses over to use the flex array fields also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16832
2024-12-05 09:33:21 -08:00
Alexander Motin
84d7d53e91 Improve speculative prefetcher for block cloning
- Issue prescient prefetches for demand indirect blocks after the
first one.  It should be quite rare for reads/writes, but much more
useful for cloning due to much bigger (up to 1022 blocks) accesses.
It covers the gap during the first couple accesses when we can not
speculate yet, but we know what is needed right now.  It reduces
dbuf_hold() sync read delays in dmu_buf_hold_array_by_dnode().
 - Increase maximum prefetch distance for indirect blocks from 64
to 128MB.  It should cover the maximum 1022 blocks of block cloning
access size in case of default 128KB recordsize used.  In case of
bigger recordsize the above prescient prefetch should also help.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16814
2024-12-05 09:33:21 -08:00
Alexander Motin
d90042dedb Allow dsl_deadlist_open() return errors
In some cases like dsl_dataset_hold_obj() it is possible to handle
those errors, so failure to hold dataset should be better than
kernel panic.  Some other places where these errors are still not
handled but asserted should be less dangerous just as unreachable.

We have a user report about pool corruption leading to assertions
on these errors.  Hopefully this will make behavior a bit nicer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16836
2024-12-05 09:33:21 -08:00
Mark Johnston
0e46085ee6 FreeBSD: Remove an incorrect assertion in zfs_getpages()
The pages in the array may become valid after this initial unbusying,
so the assertion only holds during the first iteration of the outer
loop.

Later in zfs_getpages(), the dmu_read_pages() loop handles already-valid
pages.  Just drop the assertion, it's not terribly useful.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: Klara, Inc.
Closes #16810
Closes #16834
2024-12-05 09:33:21 -08:00
Mariusz Zaborski
3b0c1131ef Add ability to scrub from last scrubbed txg
Some users might want to scrub only new data because they would like
to know if the new write wasn't corrupted.  This PR adds possibility
scrub only newly written data.

This introduces new `last_scrubbed_txg` property, indicating the
transaction group (TXG) up to which the most recent scrub operation
has checked and repaired the dataset, so users can run scrub only
from the last saved point. We use a scn_max_txg and scn_min_txg
which are already built into scrub, to accomplish that.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Sponsored-By: Klara Inc.
Closes #16301
2024-12-05 09:33:21 -08:00
shodanshok
5988de77b0 Fix race in libzfs_run_process_impl
When replacing a disk, a child process is forked to run a script called
zfs_prepare_disk (which can be useful for disk firmware update or health
check). The parent than calls waitpid and checks the child error/status
code.

However, the _reap_children thread (created from zed_exec_process to
manage zedlets) also waits for all children with the same PGID and can
stole the signal, causing the replace operation to be aborted.

As waitpid returns -1, the parent incorrectly assume that the child
process had an error or was killed. This, in turn, leaves the newly
added disk in REMOVED or UNAVAIL status rather than completing the
replace process.

This patch changes the PGID of the child process execuing the
prepare script, shielding it from the _reap_children thread.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16801
2024-12-05 09:33:21 -08:00
Alexander Motin
00debc1361 FreeBSD: Remove some illumos compat from vnode.h
Should make no difference, just some dead code cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-05 09:33:21 -08:00