Compare commits

...

150 Commits

Author SHA1 Message Date
Brian Behlendorf
bc06d8164b Linux: Enable Direct IO by default
Aligns the 2.3 release branch with the well tested default behavior
in the master branch.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-01-13 13:53:41 -08:00
Brian Behlendorf
76745cf5b8 Tag 2.3.0-1
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-01-13 10:03:37 -08:00
Brian Behlendorf
0c88ae6187 Tag 2.3.0-rc5
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-01-05 17:31:26 -08:00
n0-1
307fd0da1f Support for cross-compiling kernel modules
In order to correctly cross-compile, one has to pass ARCH and
CROSS_COMPILE make flags to kernel module build calls. Facilitate this
in the same way as for custom CC flag by recognizing KERNEL_-prefixed
configure environment variables of same name.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Closes #16924
2025-01-05 17:31:26 -08:00
Robert Evans
9f1c5e0b10 Remove duplicate dedup_legacy_create in common.run
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16926
2025-01-05 17:31:26 -08:00
Richard Kojedzinszky
5ba50c8135 fix: make zfs_strerror really thread-safe and portable
#15793 wanted to make zfs_strerror threadsafe, unfortunately, it
turned out that strerror_l() usage was wrong, and also, some libc 
implementations dont have strerror_l().

zfs_strerror() now simply calls original strerror() and copies the 
result to a thread-local buffer, then returns that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Kojedzinszky <richard@kojedz.in>
Closes #15793
Closes #16640
Closes #16923
2025-01-04 11:58:15 -08:00
Don Brady
25565403aa Too many vdev probe errors should suspend pool
Similar to what we saw in #16569, we need to consider that a
replacing vdev should not be considered as fully contributing
to the redundancy of a raidz vdev even though current IO has
enough redundancy.

When a failed vdev_probe() is faulting a disk, it now checks
if that disk is required, and if so it suspends the pool until
the admin can return the missing disks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16864
2025-01-04 11:58:15 -08:00
Robert Evans
47b7dc976b Add Makefile dependencies for scripts/zfs-tests.sh -c
This updates the Makefile to be more correct for parallel make.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16030
Closes #16922
2025-01-04 11:58:15 -08:00
Toomas Soome
125731436d ZTS: checkpoint_discard_busy should use save_tunable/restore_tunable
Instead of using hardwired value for SPA_DISCARD_MEMORY_LIMIT,
use save_tunable and restore_tunable to restore the pre-test state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16919
2025-01-03 15:23:49 -08:00
Rob Norris
4425a7bb85 vdev_open: clear async remove flag after reopen
It's possible for a vdev to be flagged for async remove after the pool
has suspended. If the removed device has been returned when the pool is
resumed, the ASYNC_REMOVE task will still run at the end of txg, and
remove the device from the pool again.

To fix, we clear the async remove flag at reopen, just as we did for the
async fault flag in 5de3ac223.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16921
2025-01-03 15:23:49 -08:00
Toomas Soome
e47b033eae ZTS: remove unused TESTDIRS from pam/cleanup.ksh
Remove TESTDIRS as it is not set for pam tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16920
2025-01-03 15:23:49 -08:00
pstef
cfec8f13a2 zfs_vnops_os.c: fallocate is valid but not supported on FreeBSD
This works around
/usr/lib/go-1.18/pkg/tool/linux_amd64/link:
mapping output file failed: invalid argument

It's happened to me under a Linux jail, but it's also happened to other
people, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270247#c4

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: pstef <pstef@users.noreply.github.com>
Closes #16918
2025-01-03 15:23:49 -08:00
Toomas Soome
997db7a7fc ZTS: checkpoint_discard_busy does not set 16M on cleanup
Originally hex value is used as decimal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16917
2025-01-02 17:04:10 -08:00
Toomas Soome
e411081aa0 ZTS: functional/mount scripts are not removing /var/tmp/testdir.X dirs
cleanup.ksh is assuming we have TESTDIRS set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16915
2025-01-02 17:04:10 -08:00
Toomas Soome
a55b6fe94a ZTS: zfs_mount_all_fail leaves /var/tmp/testrootPIDNUM directory around
Before we can remove test files, we need to unmount datasets
used by test first.

See also: zfs_mount_all_mountpoints.ksh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16914
2025-01-02 17:04:10 -08:00
James Reilly
939e9f0b6a ZTS: add centos stream10 (#16904)
Added centos as optional runners via workflow_dispatch

removed centos-stream9 from the FULL_OS runner list as CentOS is not
officially support by ZFS. This commit will add preliminary support for
EL10 and allow testing ZFS ahead of EL10 codebase solidifying in ~6
months

Signed-off-by: James Reilly <jreilly1821@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2025-01-02 17:04:10 -08:00
Andrew Walker
679b164cd3 Add missing zfs_exit() when snapdir is disabled (#16912)
zfs_vget doesn't zfs_exit when erroring out due to snapdir
being disabled.

Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Reviewed-by: @bmeagherix
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-01-02 17:04:10 -08:00
shodanshok
c2d9494f99 set zfs_arc_shrinker_limit to 0 by default
zfs_arc_shrinker_limit was introduced to avoid ARC collapse due to
aggressive kernel reclaim. While useful, the current default (10000) is
too prone to OOM especially when MGLRU-enabled kernels with default
min_ttl_ms are used. Even when no OOM happens, it often causes too much
swap usage.

This patch sets zfs_arc_shrinker_limit=0 to not ignore kernel reclaim
requests. ARC now plays better with both kernel shrinker and pagecache
but, should ARC collapse happen again, MGLRU behavior can be tuned or
even disabled.

Anyway, zfs should not cause OOM when ARC can be released.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16909
2024-12-29 11:53:45 -08:00
Ameer Hamza
b952e061df zvol: implement platform-independent part of block cloning
In Linux, block devices currently lack support for `copy_file_range`
API because the kernel does not provide the necessary functionality.
However, there is an ongoing upstream effort to address this
limitation: https://patchwork.kernel.org/project/dm-devel/cover/20240520102033.9361-1-nj.shetty@samsung.com/.
We have adopted this upstream kernel patch into the TrueNAS kernel and
made some additional modifications to enable block cloning specifically
for the zvol block device. This patch implements the platform-
independent portions of these changes for inclusion in OpenZFS.
This patch does not introduce any new functionality directly into
OpenZFS. The `TX_CLONE_RANGE` replay capability is only relevant when
zvols are migrated to non-TrueNAS systems that support Clone Range
replay in the ZIL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16901
2024-12-29 11:53:45 -08:00
Alexander Motin
0fea7fc109 ZTS: Reduce file size in redacted_panic to 1GB
This test takes 3 minutes on RELEASE FreeBSD bots, but on CURRENT,
probably due to debugging it has in kernel, it does not complete
within 10 minutes, ending up killed.  As I see all the redacting
here happens within the first ~128MB of the file, so I hope it
won't matter if there is 1GB of data instead of 2GB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin
0f6d955a35 ZTS: Remove procfs use from zpool_import_status
procfs might be not mounted on FreeBSD.  Plus checking for specific
PID might be not exactly reliable.  Check for empty list of jobs
instead.

Premature loop exit can result in failed test and failed cleanup,
failing also some following tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin
c3d2412b05 ZTS: Remove non-standard awk hex numbers usage
FreeBSD recently removed non-standard hex numbers support from awk.
Neither it supports -n argument, enabling it in gawk.  Instead of
depending on those rewrite list_file_blocks() fuction to handle the
hex math in shell instead of awk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Rob Norris
74064cb175 zpool_get_vdev_prop_value: show missing vdev userprops
If a vdev userprop is not found, present it as value '-', default
source, so it matches the output from pool userprops.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16887
2024-12-29 11:53:45 -08:00
Alexander Motin
30b97ce218 ZTS: Increase write sizes for RAIDZ/dRAID tests
Many RAIDZ/dRAID tests filled files doing millions of 100 or even
10 byte writes.  It makes very little sense since we are not
micro-benchmarking syscalls or VFS layer here, while before the
blocks reach the vdev layer absolute majority of the small writes
will be aggregated.  In some cases I see we spend almost as much
time creating the test files as actually running the tests.  And
sometimes the tests even time out after that.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16905
2024-12-29 11:53:45 -08:00
Rob Norris
9519e7ebcc microzap: set hard upper limit of 1M
The count of chunks in a microzap block is stored as an uint16_t
(mze_chunkid). Each chunk is 64 bytes, and the first is used to store a
header, so there are 32767 usable chunks, which is just under 2M. 1M is
the largest power-2-rounded block size under 2M, so we must set the
limit there.

If it goes higher, the loop in mzap_addent can overflow and fall into
the PANIC case.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16888
2024-12-29 11:53:45 -08:00
Alexander Motin
f9b02fe7e3 Fix readonly check for vdev user properties
VDEV_PROP_USERPROP is equal do VDEV_PROP_INVAL and so is not a real
property.  That's why vdev_prop_readonly() does not work right for
it.  In particular it may declare all vdev user properties readonly
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16890
2024-12-29 11:53:45 -08:00
Umer Saleem
cb8da70329 Skip iterating over snapshots for share properties
Setting sharenfs and sharesmb properties on a dataset can become costly
if there are large number of snapshots, since setting the share
properties iterates over all snapshots present for a dataset. If it is
the root dataset for which we are trying to set the share property,
snapshots for all child datasets and their children will also be
iterated.

There is no need to iterate over snapshots for share properties
because we do not allow share properties or any other property,
to be set on a snapshot itself execpt for user properties.

This commit skips iterating over snapshots for share properties,
instead iterate over all child dataset and their children for share
properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16877
2024-12-29 11:53:45 -08:00
Rob Norris
c944c46a98 zfs_main: fix alignment on props usage output
I guess we've got some long property names since this was first set up!

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16883
2024-12-29 11:53:45 -08:00
Tino Reichardt
166a7bc602 CI: Fix FreeBSD 13.4 STABLE build
In #16869 we added FreeBSD 13.4 STABLE, but forget the special
thing, that the virtio nic within FreeBSD 13.x is buggy.

This fix adds the needed rtl8139 nic to the VM.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16885
2024-12-29 11:53:45 -08:00
Rob Norris
e90124a7c8 zprop: fix value help for ZPOOL_PROP_CAPACITY
It's a percentage and documented as such, but we were showing it as
<size>.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16881
2024-12-29 11:53:45 -08:00
Brian Behlendorf
18b3bea861 CI: Add FreeBSD 14.2 RELEASE+STABLE builds
Update the CI to include FreeBSD 14.2 as a regularly tested platform.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16869
2024-12-29 11:53:45 -08:00
Brian Atkinson
d67eb17e27 Use pin_user_pages API for Direct I/O requests
As of kernel v5.8, pin_user_pages* interfaced were introduced. These
interfaces use the FOLL_PIN flag. This is preferred interface now for
Direct I/O requests in the kernel. The reasoning for using this new
interface for Direct I/O requests is explained in the kernel
documenetation:
Documentation/core-api/pin_user_pages.rst

If pin_user_pages_unlocked is available, the all Direct I/O requests
will use this new API to stay uptodate with the kernel API requirements.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16856
2024-12-16 10:26:52 -08:00
Brian Atkinson
1862c1c0a8 Removing old code outside of 4.18 kernsls
There were checks still in place to verify we could completely use
iov_iter's on the Linux side. All interfaces are available as of kernel
4.18, so there is no reason to check whether we should use that
interface at this point. This PR completely removes the UIO_USERSPACE
type. It also removes the check for the direct_IO interface checks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16856
2024-12-16 10:26:49 -08:00
Shengqi Chen
b57f53036d simd_stat: fix undefined CONFIG_KERNEL_MODE_NEON error on armel
CONFIG_KERNEL_MODE_NEON depends on CONFIG_NEON. Neither is defined
on armel. Add a guard to avoid compilation errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16871
2024-12-16 10:26:45 -08:00
Brian Behlendorf
4b8bf3c48a Fix stray "no" in configure output
This is purely a cosmetic fix which removes a stray "no" from
the configure output.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16867
2024-12-16 10:26:42 -08:00
Alexander Motin
696943533c Fix use-afer-free regression in RAIDZ expansion
We should not dereference rra after the last zio_nowait() is called.
It seems very unlikely, but ASAN in ztest managed to catch it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16868
2024-12-16 10:26:39 -08:00
kotauskas
2284a61129 Remount datasets on soft-reboot
The one-shot zfs-mount.service is incorrectly deemed active by 
Systemd after a systemctl soft-reboot. As such, soft-rebooting
prevents zfs mount -a from being ran automatically.

This commit makes it so that zfs-mount.service is marked as being 
undone by the time umount.target is reached, so that zfs.target then 
pulls it in again and gets it restarted after a soft reboot.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: kotauskas <v.toncharov@gmail.com>
Closes #16845
2024-12-16 10:26:35 -08:00
Rob Norris
e1833a72f9 flush: only detect lack of flush support in one place
It seems there's no good reason for vdev_disk & vdev_geom to explicitly
detect no support for flush and set vdev_nowritecache.  Instead, just
signal it by setting the error to ENOTSUP, and let zio_vdev_io_assess()
take care of it in one place.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16855
2024-12-16 10:26:30 -08:00
Rob Norris
5bb034f533 flush: don't report flush error when disabling flush support
The first time a device returns ENOTSUP in repsonse to a flush request,
we set vdev_nowritecache so we don't issue flushes in the future and
instead just pretend the succeeded. However, we still return an error
for the initial flush, even though we just decided such errors are
meaningless!

So, when setting vdev_nowritecache in response to a flush error, also
reset the error code to assume success.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16855
2024-12-16 10:26:27 -08:00
Poscat
1e08e49a28 build: use correct bashcompletiondir on arch
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: poscat <poscat@poscat.moe>
Closes #16861
2024-12-16 10:26:23 -08:00
Rob Norris
6604fe9a06 backtrace: fix off-by-one on string output
sizeof("foo") includes the trailing null byte, so all the output had
nulls through it. Most terminals quietly ignore it, but it makes some
tools misdetect file types and other annoyances.

Easy fix: subtract 1.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16862
2024-12-16 10:26:20 -08:00
Brian Behlendorf
7cbe7bbbd4 Tag 2.3.0-rc4
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-12-12 16:20:30 -08:00
Chunwei Chen
2dcc8fe035 Fix DR_OVERRIDDEN use-after-free race in dbuf_sync_leaf
In dbuf_sync_leaf, we clone the arc_buf in dr if we share it with db
except for overridden case. However, this exception causes a race where
dbuf_new_size could free the arc_buf after the last dereference of
*datap and causes use-after-free. We fix this by cloning the buf
regardless if it's overridden.

The race:
--
P0                                     P1

                                       dbuf_hold_impl()
                                         // dbuf_hold_copy passed
                                         // because db_data_pending NULL

dbuf_sync_leaf()
  // doesn't clone *datap
  // *datap derefed to db_buf
  dbuf_write(*datap)

                                       dbuf_new_size()
                                         dmu_buf_will_dirty()
                                           dbuf_fix_old_data()
                                             // alloc new buf for P0 dr
                                             // but can't change *datap

                                         arc_alloc_buf()
                                         arc_buf_destroy()
                                           // alloc new buf for db_buf
                                           // and destroy old buf

  dbuf_write() // continue
    abd_get_from_buf(data->b_data,
    arc_buf_size(data))
      // use-after-free
--

Here's an example when it happens:

BUG: kernel NULL pointer dereference, address: 000000000000002e
RIP: 0010:arc_buf_size+0x1c/0x30 [zfs]
Call Trace:
 dbuf_write+0x3ff/0x580 [zfs]
 dbuf_sync_leaf+0x13c/0x530 [zfs]
 dbuf_sync_list+0xbf/0x120 [zfs]
 dnode_sync+0x3ea/0x7a0 [zfs]
 sync_dnodes_task+0x71/0xa0 [zfs]
 taskq_thread+0x2b8/0x4e0 [spl]
 kthread+0x112/0x130
 ret_from_fork+0x1f/0x30

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #16854
2024-12-12 16:20:30 -08:00
Alexander Motin
38875918d8 BRT: Check bv_mos_entries in brt_entry_lookup()
When vdev first sees some block cloning, there is a window when
brt_maybe_exists() might already return true since something was
cloned, but bv_mos_entries is still 0 since BRT ZAP was not yet
created.  In such case we should not try to look into the ZAP
and dereference NULL bv_mos_entries_dnode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16851
2024-12-12 16:20:30 -08:00
Rob Norris
0d51852ec7 Remove unnecessary CSTYLED escapes on top-level macro invocations
cstyle can handle these cases now, so we don't need to disable it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
73a73cba71 cstyle: ignore old non-POSIX types in macro invocations
In code generation macros, we often use names like `uint` when
constructing handler functions. These are not being used as types, so
exclude them from the admonishment to use POSIX type names.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
87947f2440 cstyle: understand macro params can be empty
It's not uncommon to have empty parameters in code generator macros,
usually when multiple parameters are concatenated or stringified into a
single token or literal. So, exclude the space-before-comma check, which
will allow construction like `MACRO_CALL(foo, , baz)`.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
0e87150b6c cstyle: understand basic top-level macro invocations
We quite often invoke macros outside of functions, usually to generate
functions or data. cstyle inteprets these as function headers, which at
least have opinions for indenting.

This introduces a separate state for top-level macro invocations, and
excludes it from matching functions. For the moment, most of the
existing rules will continue to apply, but this gives us a way to add or
removes rules targeting macros specifically.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Alexander Motin
7742e29387 Optimize RAIDZ expansion
- Instead of copying one ashift-sized block per ZIO, copy as much
as we have contiguous data up to 16MB per old vdev.  To avoid data
moves use gang ABDs, so that read ZIOs can directly fill buffers
for write ZIOs.  ABDs have much smaller overhead than ZIOs in both
memory usage and processing time, plus big I/Os do not depend on
I/O aggregation and scheduling to reach decent performance on HDDs.
 - Reduce raidz_expand_max_copy_bytes to 16MB on 32bit platforms.
 - Use 32bit range tree when possible (practically always now) to
slightly reduce memory usage.
 - Use ZIO_PRIORITY_REMOVAL for early stages of expansion, same as
for main ones.
 - Fix rate overflows in `zpool status` reporting.

With these changes expanding RAIDZ1 from 4 to 5 children I am able
to reach 6-12GB/s rate on SSDs and ~500MB/s on HDDs, both are
limited by devices instead of CPU.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15680
Closes #16819
2024-12-06 09:05:02 -08:00
Alexander Motin
f54052a122 Fix false assertion in dmu_tx_dirty_buf() on cloning
Same as writes block cloning can increase block size and number of
indirection levels.  That means it can dirty block 0 at level 0 or
at new top indirection level without explicitly holding them.

A block cloning test case for large offsets has been added.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16825
2024-12-05 11:49:06 -08:00
Rob Norris
d874f27776 zdb_il: use flex array member to access ZIL records
In 6f50f8e16 we added flex arrays to lr_XX_t structs to silence kernel
bounds check warnings. Userspace code was mostly not updated to use them
though.

It seems that in the right circumstances, compilers can get confused
about sizes in the same way, and throw warnings. This commits switch
those uses over to use the flex array fields also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16832
2024-12-05 09:33:21 -08:00
Alexander Motin
84d7d53e91 Improve speculative prefetcher for block cloning
- Issue prescient prefetches for demand indirect blocks after the
first one.  It should be quite rare for reads/writes, but much more
useful for cloning due to much bigger (up to 1022 blocks) accesses.
It covers the gap during the first couple accesses when we can not
speculate yet, but we know what is needed right now.  It reduces
dbuf_hold() sync read delays in dmu_buf_hold_array_by_dnode().
 - Increase maximum prefetch distance for indirect blocks from 64
to 128MB.  It should cover the maximum 1022 blocks of block cloning
access size in case of default 128KB recordsize used.  In case of
bigger recordsize the above prescient prefetch should also help.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16814
2024-12-05 09:33:21 -08:00
Alexander Motin
d90042dedb Allow dsl_deadlist_open() return errors
In some cases like dsl_dataset_hold_obj() it is possible to handle
those errors, so failure to hold dataset should be better than
kernel panic.  Some other places where these errors are still not
handled but asserted should be less dangerous just as unreachable.

We have a user report about pool corruption leading to assertions
on these errors.  Hopefully this will make behavior a bit nicer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16836
2024-12-05 09:33:21 -08:00
Mark Johnston
0e46085ee6 FreeBSD: Remove an incorrect assertion in zfs_getpages()
The pages in the array may become valid after this initial unbusying,
so the assertion only holds during the first iteration of the outer
loop.

Later in zfs_getpages(), the dmu_read_pages() loop handles already-valid
pages.  Just drop the assertion, it's not terribly useful.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: Klara, Inc.
Closes #16810
Closes #16834
2024-12-05 09:33:21 -08:00
Mariusz Zaborski
3b0c1131ef Add ability to scrub from last scrubbed txg
Some users might want to scrub only new data because they would like
to know if the new write wasn't corrupted.  This PR adds possibility
scrub only newly written data.

This introduces new `last_scrubbed_txg` property, indicating the
transaction group (TXG) up to which the most recent scrub operation
has checked and repaired the dataset, so users can run scrub only
from the last saved point. We use a scn_max_txg and scn_min_txg
which are already built into scrub, to accomplish that.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Sponsored-By: Klara Inc.
Closes #16301
2024-12-05 09:33:21 -08:00
shodanshok
5988de77b0 Fix race in libzfs_run_process_impl
When replacing a disk, a child process is forked to run a script called
zfs_prepare_disk (which can be useful for disk firmware update or health
check). The parent than calls waitpid and checks the child error/status
code.

However, the _reap_children thread (created from zed_exec_process to
manage zedlets) also waits for all children with the same PGID and can
stole the signal, causing the replace operation to be aborted.

As waitpid returns -1, the parent incorrectly assume that the child
process had an error or was killed. This, in turn, leaves the newly
added disk in REMOVED or UNAVAIL status rather than completing the
replace process.

This patch changes the PGID of the child process execuing the
prepare script, shielding it from the _reap_children thread.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16801
2024-12-05 09:33:21 -08:00
Alexander Motin
00debc1361 FreeBSD: Remove some illumos compat from vnode.h
Should make no difference, just some dead code cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-05 09:33:21 -08:00
Alexander Motin
af10714e42 FreeBSD: Return ifndef IN_BASE back to fix the build
FreeBSD's libprocstat seems to build kernel code in user space,
which does not work here due to undefined vnode_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-05 09:33:21 -08:00
Rob Norris
747781a345 zinject(8): rename "ioctl" to "flush"
Doc bug missed in d7605ae77.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16827
2024-12-02 18:14:26 -08:00
Alexander Motin
b17ea73f9d Fix regression in dmu_buf_will_fill()
Direct I/O implementation added condition to call dbuf_undirty()
only in case of block cloning.  But the condition is not right if
the block is no longer dirty in this TXG, but still in DB_NOFILL
state.  It resulted in block not reverting to DB_UNCACHED and
following NULL de-reference on attempt to access absent db_data.

While there, add assertions for db_data to make debugging easier.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16829
2024-12-02 18:14:26 -08:00
Alexander Motin
17cdb7a2b1 Add missing parenthesis in VERIFYF()
Without them the order of operations might get unexpected.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16826
2024-12-02 18:14:26 -08:00
Pavel Snajdr
b673bcba4d Linux: fix zfs_uio_dio_check_for_zero_page
The intent here is to replace the zero page pointer in the array of
pointers to pages in the struct.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #16812 
Closes #16689
Closes #16642
2024-12-02 18:14:26 -08:00
Ivan Volosyuk
d8886275df Linux: Fix detection of register_sysctl_sz
Adjust the m4 function to mimic sentinel we use in spl-proc.c
This fixes the detection on kernels compiled with CONFIG_RANDSTRUCT=y

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ivan Volosyuk <Ivan.Volosyuk@gmail.com>
Closes: #16620
Closes: #16805
2024-12-02 18:14:26 -08:00
Rob Norris
1aee375947 zdb: show dedup table and log attributes
There's interesting info in there that is going to help with
understanding dedup behavior at any given moment.

Since this is a format change, tests that rely on that output have been
modified to match.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16755
2024-12-02 18:14:26 -08:00
Pavel Snajdr
a1907b038a Assert if we're logging after final txg was set
This allowed to debug #16714, fixed in #16782.  Without assertions
added here it is difficult to figure out what logs cause the problem,
since the assertion happens in sync thread context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes #16795
2024-12-02 18:14:26 -08:00
Alexander Motin
d359f7f547 FreeBSD: Reduce copy_file_range() source lock to shared
Linux locks copy_file_range() source as shared.  FreeBSD was doing
it also, but then was changed to exclusive, partially because KPI
of that time was doing so, and partially seems out of caution.
Considering zfs_clone_range() uses range locks on both source and
destination, neither should require exclusive vnode locks. But one
step at a time, just sync it with Linux for now.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16789
Closes #16797
2024-12-02 18:14:26 -08:00
Alexander Motin
90603601b4 FreeBSD: Lock vnode in zfs_ioctl()
Previously vnode was not locked there, unlike Linux.  It required
locking it in vn_flush_cached_data(), which recursed on the lock
if called from zfs_clone_range(), having the vnode locked.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16789
Closes #16796
2024-12-02 18:14:26 -08:00
Pavel Snajdr
ecd0b1528e Linux: Fix zfs_prune panics
by protecting against sb->s_shrink eviction on umount with newer kernels

deactivate_locked_super calls shrinker_free and only then
sops->kill_sb cb, resulting in UAF on umount when trying
to reach for the shrinker functions in zpl_prune_sb of
in-umount dataset

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #16770
2024-12-02 18:14:26 -08:00
Brian Behlendorf
3ed1d608a8 Linux 6.12 compat: META
Update the META file to reflect compatibility with the 6.12 kernel.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16793
2024-11-21 08:24:37 -08:00
Alexander Motin
c165daa0b1 BRT: Clear bv_entcount_dirty on destroy
This fixes assertion in brt_sync_table() on debug builds when last
cloned block on the vdev is freed and bv_meta_dirty is cleared,
while bv_entcount_dirty is not.  Should not matter in production.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16791
2024-11-21 08:24:37 -08:00
Alexander Motin
1a5414ba2f BRT: More optimizations after per-vdev splitting
- With both pending and current AVL-trees being per-vdev and having
effectively identical comparison functions (pending tree compared
also birth time, but I don't believe it is possible for them to be
different for the same offset within one transaction group), it
makes no sense to move entries from one to another.  Instead inline
dramatically simplified brt_entry_addref() into brt_pending_apply().
It no longer requires bv_lock, since there is nothing concurrent
to it at the time.  And it does not need to search the tree for the
previous entries, since it is the same tree, we already have the
entry and we know it is unique.
 - Put brt_vdev_lookup() and brt_vdev_addref() into different tree
traversals to avoid false positives in the first due to the second
entcount modifications.  It saves dramatic amount of time when a
file cloned first time by not looking for non-existent ZAP entries.
 - Remove avl_is_empty(bv_tree) check from brt_maybe_exists().  I
don't think it is needed, since by the time all added entries are
already accounted in bv_entcount. The extra check must be producing
too many false positives for no reason.  Also we don't need bv_lock
there, since bv_entcount pointer must be table at this point, and
we don't care about false positive races here, while false negative
should be impossible, since all brt_vdev_addref() have already
completed by this point.  This dramatically reduces lock contention
on massive deletes of cloned blocks.  The only remaining one is
between multiple parallel free threads calling brt_entry_decref().
 - Do not update ZAP if net change for a block over the TXG was 0.
In combination with above it makes file move between datasets as
cheap operation as originally intended if it fits into one TXG.
 - Do not allocate vdevs on pool creation or import if it did not
have active block cloning. This allows to save a bit in few cases.
 - While here, add proper error handling in brt_load() on pool
import instead of assertions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16773
2024-11-21 08:24:37 -08:00
Alexander Motin
409aad3f33 BRT: Rework structures and locks to be per-vdev
While block cloning operation from the beginning was made per-vdev,
before this change most of its data were protected by two pool-
wide locks.  It created lots of lock contention in many workload.

This change makes most of block cloning data structures per-vdev,
which allows to lock them separately.  The only pool-wide lock now
it spa_brt_lock, protecting array of per-vdev pointers and in most
cases taken as reader.  Also this splits per-vdev locks into three
different ones: bv_pending_lock protects the AVL-tree of pending
operations in open context, bv_mos_entries_lock protects BRT ZAP
object from while being prefetched, and bv_lock protects the rest
of per-vdev context during TXG commit process.  There should be
no functional difference aside of some optimizations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin
1917c26944 ZAP: Add by_dnode variants to lookup/prefetch_uint64
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin
2b64d41be8 BRT: Don't call brt_pending_remove() on holes/embedded
We are doing exactly the same checks around all brt_pending_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin
9753feaa63 ZTS: Avoid embedded blocks in bclone/bclone_prop_sync
If we write less than 113 bytes with enabled compression we get
embeded block, which then fails check for number of cloned blocks
in bclone_test.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
tleydxdy
3c0b8da206 fix: block incompatible kernel from being installed
The current "Requires" lines only ensure the old kernel is
available on the system but it does not prevent fedora from
updating to an incompatible and breaking user's system.

Set Conflicts to block incompatible kernels from being installed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: tleydxdy <shironeko.github@tesaguri.club>
Closes #16139
2024-11-21 08:24:37 -08:00
Mark Johnston
d7abeef621 zio: Avoid sleeping in the I/O path
zio_delay_interrupt(), apparently used for fault injection, is executed
in the I/O pipeline.  It can cause the calling thread to go to sleep,
which is not allowed on FreeBSD.  This happens only for small delays,
though, and there's no apparent reason to avoid deferring to a taskqueue
in that case, as it already does otherwise.

Simply go to sleep unconditionally.  This fixes an occasional panic I
see when running the ZTS on FreeBSD.  Also remove an unhelpful comment
referencing the non-existent timeout_generic().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16785
2024-11-21 08:24:37 -08:00
Brian Behlendorf
7fb7eb9a63 ZTS: Fix zpool_status_008_pos false positive
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment.  Additionally, clear the fault injection as
soon as it is no longer required for the test case.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16769
2024-11-21 08:24:37 -08:00
Alexander Motin
3f9af023f6 L2ARC: Stop rebuild before setting spa_final_txg
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16714
Closes #16782
2024-11-21 08:24:37 -08:00
Alexander Motin
f7675ae30f Remove hash_elements_max accounting from DBUF and ARC
Those values require global atomics to get current hash_elements
values in few of the hottest code paths, while in all the years I
never cared about it.  If somebody wants, it should be easy to
get it by periodic sampling, since neither ARC header nor DBUF
counts change so fast that it would be difficult to catch.

For now I've left hash_elements_max kstat for ARC, since it was
used/reported by arc_summary and it would break older versions,
but now it just reports the current value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16759
2024-11-21 08:24:37 -08:00
Rob Norris
920603990a Move "no name changes" from compression to checksum table
Compression names actually aren't used in dedup table names, but
checksum names are.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16776
2024-11-21 08:24:37 -08:00
Steve Mokris
9a4b2f08d3 Expand zpool-remove.8 manpage with example results
Also fix comment cross-referencing to zpool.8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Steve Mokris <smokris@softpixel.com>
Closes #16777
2024-11-21 08:24:37 -08:00
Alexander Motin
8023d9d4b5 Fix few __VA_ARGS typos in assertions
It should be __VA_ARGS__, not __VA_ARGS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16780
2024-11-21 08:24:37 -08:00
Ameer Hamza
23f063d2e6 zed: prevent automatic replacement of offline vdevs
When an OFFLINE device is physically removed, a spare is automatically
activated. However, this behavior differs in FreeBSD, where we do not
transition from OFFLINE state to REMOVED.
Our support team has encountered cases where customers experienced
unexpected behavior during drive replacements, with multiple spares
activating for the same VDEV due to a single disk replacement. This
patch ensures that a drive in an OFFLINE state remains in that state,
preventing it from transitioning to REMOVED and being automatically
replaced by a spare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16751
2024-11-21 08:24:37 -08:00
Rob Norris
18474efeec AUTHORS: refresh with recent new contributors
Welcome to the party 🎉

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16762
2024-11-15 15:08:59 -08:00
José Luis Salvador Rufo
260065099e tests: fix uClibc for getversion.c
This patch fixes compilation with uClibc by applying the same fallback
as commit e12d76176d to the `getversion.c`
file, which was previously overlooked.
 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #16735
Closes #16741
2024-11-14 16:56:34 -08:00
Ameer Hamza
4c9f2cec46 zvol_os.c: Increase optimal IO size
Since zvol read and write can process up to (DMU_MAX_ACCESS / 2) bytes
in a single operation, the current optimal I/O size is too low. SCST
directly reports this value as the optimal transfer length for the
target SCSI device. Increasing it from the previous volblocksize results
in performance improvement for large block parallel I/O workloads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16750
2024-11-14 16:52:10 -08:00
Mark Johnston
ee3677d321 Fix some nits in zfs_getpages()
- If we don't want dmu_read_pages() to perform extra readahead/behind,
  pass a pointer to 0 instead of a null pointer, as dum_read_pages()
  expects rahead and rbehind to be non-null.
- Avoid unneeded iterations in a loop.

Sponsored-by: Klara, Inc.
Reported-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16758
2024-11-14 16:52:06 -08:00
Rob Norris
0274a9a57d dsl_dataset: put IO-inducing frees on the pool deadlist
dsl_free() calls zio_free() to free the block. For most blocks, this
simply calls metaslab_free() without doing any IO or putting anything on
the IO pipeline.

Some blocks however require additional IO to free. This at least
includes gang, dedup and cloned blocks. For those, zio_free() will issue
a ZIO_TYPE_FREE IO and return.

If a huge number of blocks are being freed all at once, it's possible
for dsl_dataset_block_kill() to be called millions of time on a single
transaction (eg a 2T object of 128K blocks is 16M blocks). If those are
all IO-inducing frees, that then becomes 16M FREE IOs placed on the
pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T
object that requires a 20G allocation of resident memory from the
zio_cache. If that can't be satisfied by the kernel, an out-of-memory
condition is raised.

This would be better handled by improving the cases that the
dmu_tx_assign() throttle will handle, or by reducing the overheads
required by the IO pipeline, or with a better central facility for
freeing blocks.

For now, we simply check for the cases that would cause zio_free() to
create a FREE IO, and instead put the block on the pool's freelist. This
is the same place that blocks from destroyed datasets go, and the async
destroy machinery will automatically see them and trickle them out as
normal.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #6783
Closes #16708
Closes #16722 
Closes #16697
2024-11-14 16:52:02 -08:00
Alexander Motin
025f8b2e74 L2ARC: Move different stats updates earlier
..., before we make the header or the log block visible to others.
It should fix assertion on allocated space going negative if the
header is freed once the lock is dropped, while the write is still
going.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16040
Closes #16743
2024-11-14 16:51:58 -08:00
Mark Johnston
37e8f3ae17 Grab the rangelock unconditionally in zfs_getpages()
As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
2024-11-14 16:51:20 -08:00
Mark Johnston
7313c6e382 Fix a potential page leak in mappedread_sf()
mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.

I think this is quite unlikely to happen in practice, it was noticed by
code inspection.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
2024-11-14 16:51:17 -08:00
Umer Saleem
1c6b0302ef Fix user properties output for zpool list
In zpool_get_user_prop, when called from zpool_expand_proplist and
collect_pool, we often have zpool_props present in zpool_handle_t equal
to NULL. This mostly happens when only one user property is requested
using zpool list -o <user_property>. Checking for this case and
correctly initializing the zpool_props field in zpool_handle_t fixes
this issue.

Interestingly, this issue does not occur if we query any other property
like name or guid along with a user property with -o flag because while
accessing properties like guid, zpool_prop_get_int is called which
checks for this case specifically and calls zpool_get_all_props.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16734
2024-11-14 16:51:12 -08:00
Umer Saleem
7e3af4658b JSON: fix user properties output for zpool list
This commit fixes JSON output for zpool list when user properties are
requested with -o flag. This case needed to be handled specifically
since zpool_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16734
2024-11-14 16:51:07 -08:00
Brian Behlendorf
1a54b13aaf Tag 2.3.0-rc3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-11-07 11:33:51 -08:00
Umer Saleem
9061a4da0b JSON: fix user properties output for zfs list
This commit fixes JSON output for zfs list when user properties are
requested with -o flag. This case needed to be handled specifically
since zfs_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16732
2024-11-07 11:33:51 -08:00
Sam James
e12d76176d Use <fcntl.h> instead of <sys/fcntl.h>
When building on musl, we get:

```
In file included from tests/zfs-tests/cmd/getversion.c:22:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>

In file included from module/os/linux/zfs/vdev_file.c:36:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
```

Bug: https://bugs.gentoo.org/925235
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sam James <sam@gentoo.org>
Closes #15925
2024-11-07 11:33:51 -08:00
Brian Atkinson
8131793d6f Update ABD stats for linear page Linux
a10e552 updated abd_free_linear_page() to no longer call
abd_update_scatter_stat(). This meant that linear pages that were not
attached to Direct I/O requests were not doing waste accounting for the
ARC. This led to performance issues due to incorrect ARC accounting that
resulted in 100% of CPU time being spent in arc_evict() during prolonged
I/O workloads with the ARC.

The call to abd_update_scatter_stats() is now conditionally called in
abd_free_linear_page() when the ABD is not from a Direct I/O request.

Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16729
2024-11-07 11:33:51 -08:00
Chunwei Chen
c82eb27b22 ZFS send should use spill block prefetched from send_reader_thread
Currently, even though send_reader_thread prefetches spill block,
do_dump() will not use it and issues its own blocking arc_read. This
causes significant performance degradation when sending datasets with
lots of spill blocks.

For unmodified spill blocks, we also create send_range struct for them
in send_reader_thread and issue prefetches for them. We piggyback them
on the dnode send_range instead of enqueueing them so we don't break
send_range_after check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: david.chen <david.chen@nutanix.com>
Closes #16701
2024-11-06 11:54:32 -08:00
tstabrawa
661bb434e6 Use simple folio migration function
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
tstabrawa
ae48c2f6a9 Revert "Avoid BUG in migrate_folio_extra"
This reverts commit b052035990.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
Uglymotha
b96845b632 Verify parent_dev before calling udev_device_get_sysattr_value
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Sietse <sietse@wizdom.nu>
Co-authored-by: Sietse <sietse@wizdom.nu>
Closes #16705 
Closes #16717
2024-11-04 16:46:39 -08:00
Alexander Motin
55cbd1f9bd Reduce dirty records memory usage
Small block workloads may use a very large number of dirty records.
During simple block cloning test due to BRT still using 4KB blocks
I can easily see up to 2.5M of those used.  Before this change
dbuf_dirty_record_t structures representing them were allocated via
kmem_zalloc(), that rounded their size up to 512 bytes.

Introduction of specialized kmem cache allows to reduce the size
from 512 to 408 bytes.  Additionally, since override and raw params
in dirty records are mutually exclusive, puting them into a union
allows to reduce structure size down to 368 bytes, increasing the
saving to 28%, that can be a 0.5GB or more of RAM.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16694
2024-11-04 16:46:39 -08:00
Rob Norris
880b73956b zfs(4): remove "experimental" from zfs_bclone_enabled
I think we've done enough experiments.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16189 
Closes #16712
2024-11-04 16:46:39 -08:00
Tony Hutter
d367ef2995 ZTS: Add Fedora 41, remove Fedora 39
Fedora 41 was released 10/29/24, and Fedora 39 will be EOL on 11/12/24.
Update Fedora runners in the test suite.  Some minor tweaks also needed
to support ksh 1.0.10.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16700
2024-11-01 10:03:05 -07:00
Rob Norris
7546fbd6e9 zdb: add extra -T flag to show histograms of BRT refcounts
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16692
2024-11-01 09:49:54 -07:00
Rich Ercolani
903d3f9187 Added output to zpool online and offline
I was surprised to discover today that `zpool online` and
`zpool offline` don't print any information about why they failed in
many cases, they just return 1 with no information about why.

Let's improve that where we can without changing the library function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #16244
2024-11-01 09:49:49 -07:00
Rob Norris
86b5853cfb vdev_disk: move abd return and free off the interrupt handler
Freeing an ABD can take sleeping locks to update various stats. We
aren't allowed to sleep on an interrupt handler. So, move the free off
to the io_done callback.

We should never have been freeing things in the interrupt handler, but
we got away with it because we were usually freeing a linear ABD, which
at most is returning two objects to a cache and never sleeping. Scatter
ABDs can be used now, and those have more complex locking.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687
2024-11-01 09:49:23 -07:00
Rob Norris
19a8dd48e1 vdev_disk: try harder to ensure IO alignment rules
It seems out our notion of "properly" aligned IO was incomplete. In
particular, dm-crypt does its own splitting, and assumes that a logical
block will never cross an order-0 page boundary (ie, the physical page
size, not compound size). This effectively means that it needs to be
possible to split a BIO at any page or block size boundary and have it
work correctly.

This updates the alignment check function to enforce these rules (to the
extent possible).

Our response to misaligned data is to make some new allocation that is
properly aligned, and copy the data into it. It turns out that
linearising (via abd_borrow_buf()) is not enough, because we allocate eg
4K blocks from a general purpose slab, and so may receive (or already
have) a 4K block that crosses pages.

So instead, we allocate a new ABD, which is guaranteed to be aligned
properly to block sizes, and then copy everything into it, and back out
on the way back.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687 #16631 #15646 #15533 #14533
2024-11-01 09:49:14 -07:00
Serapheim Dimitropoulos
8ac70aade7 Add warning for external consumers of dmu_tx_callback_register
While reading some code @grwilson came across the above function that
seemingly had no consumers besides a ztest callback that ensures that
the tx_callback infrastructure works correctly. It turns out that Lustre
is the main (and potentially the only) consumer of this. Refer to
`osd_trans_commit_cb` of `lustre/osd-zfs/osd_handler.c` in the Lustre
repo for more info. Let's add a comment highlighting this before someone
removes it by mistake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheimd@gmail.com>
Closes #16698
2024-11-01 09:49:05 -07:00
Alexander Motin
bbc0d34bfd On the first vdev open ignore impossible ashift hints
If on the first open device's logical ashift is bigger than set
by pool's ashift property, ignore the last as unusable instead of
creating vdev that will fail most of I/Os due to misalignment.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16690
2024-11-01 09:49:00 -07:00
Dimitry Andric
f3823a9ab2 Fix gcc uninitialized warning in FreeBSD zio_crypt.c
In FreeBSD's `zio_do_crypt_data()`, ensure that two `struct uio`
variables are cleared before copying data out of them. This avoids
accessing garbage data, and fixes gcc `-Wuninitialized` warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16688
2024-11-01 09:48:55 -07:00
Dimitry Andric
fd2cae969f Fix gcc unused value warning in FreeBSD simd.h
The macros `simd_stat_init()` and `simd_stat_fini()` in FreeBSD's
`simd.h` are defined as zero, but they are actually only used as
statements. Replace the definitions with `do {} while (0)` instead, to
avoid gcc `-Wunused-value` warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16693
2024-11-01 09:48:51 -07:00
Tony Hutter
f7b4bca66a ZTS: Add LUKS sanity test
Add a LUKS sanity test to trigger: #16631

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16681
2024-11-01 09:48:47 -07:00
Alexander Motin
7e3ce4efaa Pack dmu_buf_impl_t by 16 bytes
On 64bit FreeBSD this reduces one from 296 to 280 bytes.  On small
block workloads dbufs may consume gigabytes of ARC, and this saves
5% of it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16684
2024-11-01 09:48:42 -07:00
Tino Reichardt
77d81974b6 Fix dependency install on Debian 11 (#16683)
Adding cryptsetup breaks some dialog things on Debian 11.
Apply some workaround for it.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-11-01 09:48:37 -07:00
Brian Behlendorf
5237760b17 ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16670
2024-10-21 13:02:07 -07:00
Rob Norris
ede715d1e4 spl/thread: explicitly define thread_func_t as noreturn
All of our thread entry functions have this signature:

    void (*)(void*) __attribute__((noreturn))

The low-level `__thread_create()` function accepts a `thread_func_t` as
the entry point, which is defined more simply as:

    void (*)(void *)

And then the `thread_create()` and `thread_create_named()` macros cast
the passed-in function point down to `thread_func_t`, that is, casting
away the `noreturn` attribute.

Clang considers casting between these two types to be invalid because
both the caller and the callee may have elided parts of the stack frame
save and restore, knowing that they won't be needed.

Recent Linux appears to be setting `-Wcast-function-type-strict`, which
causes this invalid cast to emit a warning, which with `-Werror` is
converted to an error, breaking the build.

This commit fixes this in the simplest possible way: adding `noreturn`
to the `thread_func_t` attribute. Since all our thread entry functions
already have this attribute, it's arguably a just a consistency fix
anyway.

I considered removing the casts in the macros, which silences the
warnings, but it turns out that Clang has a bug that won't emit this
error for implicit conversions, only explicit casts. So leaving them
there seems like a reasonable belt-and-suspenders approach. Also,
frankly, this whole mechanism seems a little undercooked inside LLVM, so
I'm content go with my intuition about the smallest, least invaisve
change.

**NOTE**: `__thread_create` is exported by `spl.ko` and has a
`thread_func_t` arg, so this is an ABI break. Whether that matters in
practice, I have no idea.

Further reading:
- 1aad641c79
- https://github.com/llvm/llvm-project/issues/7325
- https://github.com/llvm/llvm-project/issues/41465

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16672 
Closes #16673
2024-10-21 13:02:07 -07:00
Rob Norris
e30c69365d config: fix dequeue_signal check for kernels <4.20
Before 4.20, kernel_siginfo_t was just called siginfo_t. This was
causing the kthread_dequeue_signal_3arg_task check, which uses
kernel_siginfo_t, to fail on older kernels.

In d6b8c17f1, we started checking for the "new" three-arg
dequeue_signal() by testing for the "old" version. Because that test is
explicitly using kernel_siginfo_t, it would fail, leading to the build
trying to use the new three-arg version, which would then not compile.

This commit fixes that by avoiding checking for the old 3-arg
dequeue_signal entirely. Instead, we check for the new one, as well as
the 4-arg form, and we use the old form as a fallback. This way, we
never have to test for it explicitly, and once we're building
HAVE_SIGINFO will make sure we get the right kernel_siginfo_t for it, so
everything works out nice.

Original-patch-by: Finix <yancw@info2soft.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16666
2024-10-21 13:02:07 -07:00
Rob Norris
78d39d91fa zdb: show bp in uberblock dump
Just another useful nugget of info in times of strife.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16667
2024-10-21 13:02:07 -07:00
Tomohiro Kusumi
2b359c7824 Fix compile-time warnings caused by duplicate struct typedefs
Some compiler/versions warn these typedefs according to #16660.

The platform specific header sys/abd_os.h shouldn't define or use abd_t,
as it's defined in its non-platform specific consumer sys/abd.h.
Do the same as what FreeBSD header does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #16660 
Closes #16665
2024-10-21 13:02:07 -07:00
Alexander Motin
ace2e17a9b zfs_debug: Restore log size limit for userspace
For some reason it was dropped when split from kernel, that makes
raidz_test to accumulate in RAM up to 100GB of logs we don't need.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by:  Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16492
Closes #16566
Closes #16664
2024-10-21 13:02:07 -07:00
Rob Norris
b4cd10ce5b libspl/backtrace: comment and harden libunwind backtracer
This is the sort of code that we get right once and never look at again.
Anyone reading this code is already likely in the middle of a debugging
nightmare, and then they have a wall of manual string construction and
an unfamiliar and idiosyncratic library to deal with. So, comment the
whole thing to try to make it clear what's going on.

In pursuit of the above, I've added return checks to some of the
libunwind calls, fixed the frame loop to not skip the "top" frame
(however unseful it may be), and fix a couple of calls to
spl_bt_u64_to_hex_str() which requested 18 digits instead of 16.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Rob Norris
f52d7aaaac libspl/backtrace: rename and document hex conversion function
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Rob Norris
d5db840260 libspl/backtrace: helper macros for output
My eyes are going blurry looking at all those write calls. This is much
nicer.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Close #16653
2024-10-21 13:02:07 -07:00
Rob Norris
bcd61d9579 libspl/backtrace: dump registers in libunwind backtraces
More useful stuff, especially when trying to follow a disassembly.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Umer Saleem
36a67b50a2 Fix inconsistent mount options for ZFS root
While mounting ZFS root during boot on Linux distributions from initrd,
mount from busybox is effectively used which executes mount system call
directly. This skips the ZFS helper mount.zfs, which checks and enables
the mount options as specified in dataset properties. As a result,
datasets mounted during boot from initrd do not have correct mount
options as specified in ZFS dataset properties.

There has been an attempt to use mount.zfs in zfs initrd script,
responsible for mounting the ZFS root filesystem (PR#13305). This was
later reverted (PR#14908) after discovering that using mount.zfs breaks
mounting of snapshots on root (/) and other child datasets of root have
the same issue (Issue#9461).

This happens because switching from busybox mount to mount.zfs correctly
parses the mount options but also adds 'mntpoint=/root' to the mount
options, which is then prepended to the snapshot mountpoint in
'.zfs/snapshot'. '/root' is the directory on Debian with initramfs-tools
where root filesystem is mounted before pivot_root. When Linux runtime
is reached, trying to access the snapshots on root results in
automounting the snapshot on '/root/.zfs/*', which fails.

This commit attempts to fix the automounting of snapshots on root, while
using mount.zfs in initrd script. Since the mountpoint of dataset is
stored in vfs_mntpoint field, we can check if current mountpoint of
dataset and vfs_mntpoint are same or not. If they are not same, reset
the vfs_mntpoint field with current mountpoint. This fixes the
mountpoints of root dataset and children in respective vfs_mntpoint
fields when we try to access the snapshots of root dataset or its
children. With correct mountpoint for root dataset and children stored
in vfs_mntpoint, all snapshots of root dataset are mounted correctly
and become accessible.

This fix will come into play only if current process, that is trying to
access the snapshots is not in chroot context. The Linux kernel API
that is used to convert struct path into char format (d_path), returns
the complete path for given struct path. It works in chroot environment
as well and returns the correct path from original filesystem root.

However d_path fails to return the complete path if any directory from
original root filesystem is mounted using --bind flag or --rbind flag
in chroot environment. In this case, if we try to access the snapshot
from outside the chroot environment, d_path returns the path correctly,
i.e. it returns the correct path to the directory that is mounted with
--bind flag. However inside the chroot environment, it only returns the
path inside chroot.

For now, there is not a better way in my understanding that gives the
complete path in char format and handles the case where directories from
root filesystem are mounted with --bind or --rbind on another path which
user will later chroot into. So this fix gets enabled if current
process trying to access the snapshot is not in chroot context.

With the snapshots issue fixed for root filesystem, using mount.zfs in
ZFS initrd script, mounts the datasets with correct mount options.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16646
2024-10-21 13:02:07 -07:00
Warner Losh
3d9129a7b6 freebsd: Use compiler.h from FreeBSD's base's linuxkpi
The FreeBSD linux/compiler.h in OpenZFS was copied from a very old
version of FreeBSD's linuxkpi's linux/compiler.h. There's no need for
this duplication. Use FreeBSD's linuxkpi version instead, and provide
zfs_fallthrough to augment it (it's all that's needed). Use #pragma once
to avoid naming issues for guard variables. Since this is a complete
rewrite, use my copyright here (the original code in FreeBSD still
credits everybody). This works back at least to FreeBSD 12.4, which
is not out of support, and all newer releases.

Remove extra copies of macros that were defined elsewhere, but are now
properly defined in LinuxKPI so are redundant.

Sponsored-by: Netflix
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes #16650
2024-10-21 13:02:07 -07:00
Brian Behlendorf
0409c47fe0 Tag 2.3.0-rc2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-10-13 19:25:58 -07:00
Tino Reichardt
b5a3825244 ZTS: Make use of optimal CPU pinning
With CPU pinning, we should get some speedup because of better
cpu cache re-use.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:25:58 -07:00
Tino Reichardt
77df762a1b ZTS: Optimize Kernel Same-page Merging (KSM)
Kernel same-page Merging (KSM) allows KVM guests to share identical
memory pages. These shared pages are usually common libraries or other
identical, high-use data.

The current configuration was a bit to lazy - so KSM didn't work very
well. With the new configuration I could run 3 Linux VMs in parralel.

FreeBSD can't benefit from it. But FreeBSD is not so memory hungry in
general, so there is no need for it ;)

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:25:58 -07:00
Brian Behlendorf
56871e465a Fallback to strerror() when strerror_l() isn't available
Some C libraries, such as uClibc, do not provide strerror_l() in
which case we fallback to strerror().

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16636 
Closes #16640
2024-10-13 19:25:58 -07:00
Brian Behlendorf
c645b07eaa ZTS: Increase zpool_import_parallel_pos import margin
Increase the pool import time allowed by assuming a minimum reduction
to 1/2 instead of 1/3 when comparing sequential to parallel import
times.  This is sufficient to verify parallel imports are working as
intended and should address the occasional false positive failure
when the time is slightly exceeded.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16638
2024-10-11 16:21:25 -07:00
Brian Behlendorf
5bc27acf51 ZTS: Slightly increase dedup_quota limit
As described in the comment above this check the space used by
logged entries is not accounted for and some margin needs to be
added in.  While uncommon we have slightly exceeded the 600,000
threshold on some CI run so we increase the limit a bit more.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16637
2024-10-11 16:21:25 -07:00
Brian Behlendorf
7f830d783b CI: Stick with ubuntu-22.04 for CodeQL analysis
The ubuntu-latest alias now refers to ubuntu-24.04 instead of
ubuntu-22.04 which causes CodeQL's autobuild to fail with:

    cpp/autobuilder: deptrace not supported in ubuntu 24.04

Until deptrace is supported by ubuntu-24.04 hosted runners request
ubuntu-22.04 which is supported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Closes #16639
2024-10-11 16:21:25 -07:00
Martin Matuška
58162960a1 zdb: fix printf format in dump_zap()
When compiling zdb.c on 32-bit platforms, a format conversion error 
is reported for a printf() in dump_zap().  Change %l to macro 
%" PRIu64 " to match the platform size of a 64-bit unsigned integer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16635
2024-10-11 16:21:25 -07:00
Rob Norris
666903610d zpool/zfs: allow --json wherever -j is allowed
Mostly so that with the JSON formatting options are also used, they all
look the same. To my eye, `-j --json-flat-vdevs` suggests that they are
different or unrelated, while `--json --json-flat-vdevs` invites no
further questions.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16632
2024-10-11 16:21:25 -07:00
Brian Atkinson
26ecd8b993 Always validate checksums for Direct I/O reads
This fixes an oversight in the Direct I/O PR. There is nothing that
stops a process from manipulating the contents of a buffer for a
Direct I/O read while the I/O is in flight. This can lead checksum
verify failures. However, the disk contents are still correct, and this
would lead to false reporting of checksum validation failures.

To remedy this, all Direct I/O reads that have a checksum verification
failure are treated as suspicious. In the event a checksum validation
failure occurs for a Direct I/O read, then the I/O request will be
reissued though the ARC. This allows for actual validation to happen and
removes any possibility of the buffer being manipulated after the I/O
has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O
read checksum validation failures are reported though zpool status -d in
the DIO column. Also the zevent has been updated to have both:
1. dio_verify_wr -> Checksum verification failure for writes
2. dio_verify_rd -> Checksum verification failure for reads.
This allows for determining what I/O operation was the culprit for the
checksum verification failure. All DIO errors are reported only on the
top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has
the same issue as Linux with Direct I/O reads.

This commit updates the following:
1. Propogates checksum failures for reads all the way up to the
   top-level VDEV.
2. Reports errors through zpool status -d as DIO.
3. Has two zevents for checksum verify errors with Direct I/O. One for
   read and one for write.
4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and
   handle ABD buffer contents validation the same as Linux.
5. Updated manipulate_user_buffer.c to also manipulate a buffer while a
   Direct I/O read is taking place.
6. Adds a new ZTS test case dio_read_verify that stress tests the new
   code.
7. Updated man pages.
8. Added an IMPLY statement to zio_checksum_verify() to make sure that
   Direct I/O reads are not issued as speculative.
9. Removed self healing through mirror, raidz, and dRAID VDEVs for
   Direct I/O reads.

This issue was first observed when installing a Windows 11 VM on a ZFS
dataset with the dataset property direct set to always. The zpool
devices would report checksum failures, but running a subsequent zpool
scrub would not repair any data and report no errors.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16598
2024-10-09 13:45:06 -07:00
Martin Matuška
774dcba86d FreeBSD: ignore some includes when not building kernel
The function abd_alloc_from_pages() is used only in kernel.
Excluding sys/vm.h, and vm/vm_page.h includes avoids dependency
problems.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16616
2024-10-09 13:45:02 -07:00
Brian Behlendorf
09f6b2ebe3 ztest: Fix scrub check in ztest_raidz_expand_check()
The scrub code may return EBUSY under several possible scenarios
causing ztest to incorrectly ASSERT when verifying the result of
a raidz expansion.  Update the test case to allow EBUSY since it
does not indicate pool damage.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16627
2024-10-09 13:44:58 -07:00
Matthew Heller
2609d93b65 vdev_id: multi-lun disks & slot num zero pad
Add ability to generate disk names that contain both a slot number
and a lun number in order to support multi-actuator SAS hard drives
with multiple luns. Also add the ability to zero pad slot numbers to
a desired digit length for easier sorting.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Heller <matthew.f.heller@accre.vanderbilt.edu>
Closes #16603
2024-10-09 13:44:55 -07:00
Brian Behlendorf
10f46d2aba ZTS: resilver_restart_001.ksh restore defaults
Update resilver_restart_001.ksh to restore the default
resilver_defer_percent when the test completes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16618
2024-10-09 13:44:50 -07:00
Umer Saleem
0df10dc911 Only serialize native-deb* targets
.NOTPARALLEL target is being forced on userspace as well. This commit
removes .NOTPARALEL target and only serializes the execution of
native-deb* targets.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16622
2024-10-09 13:44:46 -07:00
Rob Norris
0fbe9d352c zpool/zfs: restore -V & --version options
The -j option added a round of getopt, which didn't know the magic
version flags. So just bypass the whole thing and go straight to the
human output function for the special case.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16615 
Closes #16617
2024-10-09 13:44:42 -07:00
Martin Matuška
84f44ec07f Return boolean_t in inline functions of lib/libspl/include/sys/uio.h
The inline functions zfs_dio_offset_aligned(), zfs_dio_size_aligned()
and zfs_dio_aligned() are declared as boolean_t but return the bool
type.

This fixes the build of FreeBSD.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16613
2024-10-09 13:44:36 -07:00
Shengqi Chen
fc9608e2e6 Bump SONAME of libzfs and libzpool
The ABI of libzfs and libzpool have breaking changes since last
SONAME bump in commit fe6babc:

* libzfs: `zpool_print_unsup_feat` removed (used by zpool cmd).
* libzpool: multiple `ddt_*` symbols removed (used by zdb cmd).

Bump them to avoid ABI breakage.

See: https://github.com/openzfs/zfs/pull/11817
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-09 13:44:32 -07:00
Shengqi Chen
d32c05949a contrib/debian: add new manpages to installation list
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-09 13:44:26 -07:00
JKDingwall
1ebb6b866f Fix generation of kernel uevents for snapshot rename on linux
`zvol_rename_minors()` needs to be given the full path not just the
snapshot name.  Use code removed in a0bd735ad as a guide
to providing the necessary values.

Add ZTS check for /dev changes after snapshot rename.  After
renaming a snapshot with 'snapdev=visible' ensure that the /dev
entries are updated to reflect the rename.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Dingwall <james@dingwall.me.uk>
Closes #14223 
Closes #16600
2024-10-09 13:44:22 -07:00
Tino Reichardt
f019b445f3 ZTS: Fix summary page creation again - second try
In PR #16599 I used 'return' like in C - which is wrong :/
This fix generates the summary as needed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16611
2024-10-09 13:44:18 -07:00
Tino Reichardt
03822a61be ZTS: Remove FreeBSD 13.4-STABLE
Current CI is failing on FreeBSD 13.4-STABLE, because samba4 can't be
installed there. Lets remove it for now.

Update also the FreeBSD version definitions a bit.

The naming is like this now:

FreeBSD variants:
- freebsd13-3r, freebsd13-4r, freebsd14-0r, freebsd14-1r (RELEASE)
- freebsd13-4s, freebsd14-1s (STABLE)
- freebsd15-0c (CURRENT)

RHL based distros:
- almalinux8, almalinux9, centos-stream9, fedora39, fedora40

Debian based:
- debian11, debian12, ubuntu20, ubuntu22, ubuntu24

Misc Linux distros:
- archlinux, tumbleweed

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16610
2024-10-09 13:43:46 -07:00
230 changed files with 4227 additions and 2684 deletions

View File

@ -11,7 +11,7 @@ concurrency:
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
permissions:
actions: read
contents: read

View File

@ -18,19 +18,21 @@ ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -q -N ""
# we expect RAM shortage
cat << EOF | sudo tee /etc/ksmtuned.conf > /dev/null
# /etc/ksmtuned.conf - Configuration file for ksmtuned
# https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-ksm
KSM_MONITOR_INTERVAL=60
# Millisecond sleep between ksm scans for 16Gb server.
# Smaller servers sleep more, bigger sleep less.
KSM_SLEEP_MSEC=10
KSM_NPAGES_BOOST=300
KSM_NPAGES_DECAY=-50
KSM_NPAGES_MIN=64
KSM_NPAGES_MAX=2048
KSM_SLEEP_MSEC=30
KSM_THRES_COEF=25
KSM_THRES_CONST=2048
KSM_NPAGES_BOOST=0
KSM_NPAGES_DECAY=0
KSM_NPAGES_MIN=1000
KSM_NPAGES_MAX=25000
KSM_THRES_COEF=80
KSM_THRES_CONST=8192
LOGFILE=/var/log/ksmtuned.log
DEBUG=1

View File

@ -14,7 +14,7 @@ OSv=$OS
# compressed with .zst extension
REPO="https://github.com/mcmilk/openzfs-freebsd-images"
FREEBSD="$REPO/releases/download/v2024-09-16"
FREEBSD="$REPO/releases/download/v2024-12-14"
URLzs=""
# Ubuntu mirrors
@ -40,6 +40,12 @@ case "$OS" in
# dns sometimes fails with that url :/
echo "89.187.191.12 geo.mirror.pkgbuild.com" | sudo tee /etc/hosts > /dev/null
;;
centos-stream10)
OSNAME="CentOS Stream 10"
# TODO: #16903 Overwrite OSv to stream9 for virt-install until it's added to osinfo
OSv="centos-stream9"
URL="https://cloud.centos.org/centos/10-stream/x86_64/images/CentOS-Stream-GenericCloud-10-latest.x86_64.qcow2"
;;
centos-stream9)
OSNAME="CentOS Stream 9"
URL="https://cloud.centos.org/centos/9-stream/x86_64/images/CentOS-Stream-GenericCloud-9-latest.x86_64.qcow2"
@ -52,43 +58,56 @@ case "$OS" in
OSNAME="Debian 12"
URL="https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2"
;;
fedora39)
OSNAME="Fedora 39"
OSv="fedora39"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/39/Cloud/x86_64/images/Fedora-Cloud-Base-39-1.5.x86_64.qcow2"
;;
fedora40)
OSNAME="Fedora 40"
OSv="fedora39"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/40/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2"
;;
freebsd13r)
fedora41)
OSNAME="Fedora 41"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/41/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-1.4.x86_64.qcow2"
;;
freebsd13-3r)
OSNAME="FreeBSD 13.3-RELEASE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.3-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd13-4r)
OSNAME="FreeBSD 13.4-RELEASE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd13)
freebsd14-1r)
OSNAME="FreeBSD 14.1-RELEASE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd14-2r)
OSNAME="FreeBSD 14.2-RELEASE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.2-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd13-4s)
OSNAME="FreeBSD 13.4-STABLE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd14r)
OSNAME="FreeBSD 14.1-RELEASE"
freebsd14-2s)
OSNAME="FreeBSD 14.2-STABLE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-RELEASE.qcow2.zst"
URLzs="$FREEBSD/amd64-freebsd-14.2-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd14)
OSNAME="FreeBSD 14.1-STABLE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd15)
freebsd15-0c)
OSNAME="FreeBSD 15.0-CURRENT"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-15.0-CURRENT.qcow2.zst"

View File

@ -13,10 +13,10 @@ function archlinux() {
echo "##[endgroup]"
echo "##[group]Install Development Tools"
sudo pacman -Sy --noconfirm base-devel bc cpio dhclient dkms fakeroot \
fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils parted \
pax perf python-packaging python-setuptools qemu-guest-agent ksh samba \
sysstat rng-tools rsync wget xxhash
sudo pacman -Sy --noconfirm base-devel bc cpio cryptsetup dhclient dkms \
fakeroot fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils \
parted pax perf python-packaging python-setuptools qemu-guest-agent ksh \
samba sysstat rng-tools rsync wget xxhash
echo "##[endgroup]"
}
@ -30,11 +30,11 @@ function debian() {
echo "##[group]Install Development Tools"
sudo apt-get install -y \
acl alien attr autoconf bc cpio curl dbench dh-python dkms fakeroot \
fio gdb gdebi git ksh lcov isc-dhcp-client jq libacl1-dev libaio-dev \
libattr1-dev libblkid-dev libcurl4-openssl-dev libdevmapper-dev libelf-dev \
libffi-dev libmount-dev libpam0g-dev libselinux-dev libssl-dev libtool \
libtool-bin libudev-dev libunwind-dev linux-headers-$(uname -r) \
acl alien attr autoconf bc cpio cryptsetup curl dbench dh-python dkms \
fakeroot fio gdb gdebi git ksh lcov isc-dhcp-client jq libacl1-dev \
libaio-dev libattr1-dev libblkid-dev libcurl4-openssl-dev libdevmapper-dev \
libelf-dev libffi-dev libmount-dev libpam0g-dev libselinux-dev libssl-dev \
libtool libtool-bin libudev-dev libunwind-dev linux-headers-$(uname -r) \
lsscsi nfs-kernel-server pamtester parted python3 python3-all-dev \
python3-cffi python3-dev python3-distlib python3-packaging \
python3-setuptools python3-sphinx qemu-guest-agent rng-tools rpm2cpio \
@ -66,16 +66,23 @@ function rhel() {
echo "##[endgroup]"
echo "##[group]Install Development Tools"
sudo dnf group install -y "Development Tools"
# Alma wants "Development Tools", Fedora 41 wants "development-tools"
if ! sudo dnf group install -y "Development Tools" ; then
echo "Trying 'development-tools' instead of 'Development Tools'"
sudo dnf group install -y development-tools
fi
sudo dnf install -y \
acl attr bc bzip2 curl dbench dkms elfutils-libelf-devel fio gdb git \
jq kernel-rpm-macros ksh libacl-devel libaio-devel libargon2-devel \
libattr-devel libblkid-devel libcurl-devel libffi-devel ncompress \
libselinux-devel libtirpc-devel libtool libudev-devel libuuid-devel \
lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester parted perf \
python3 python3-cffi python3-devel python3-packaging kernel-devel \
python3-setuptools qemu-guest-agent rng-tools rpcgen rpm-build rsync \
samba sysstat systemd watchdog wget xfsprogs-devel xxhash zlib-devel
acl attr bc bzip2 cryptsetup curl dbench dkms elfutils-libelf-devel fio \
gdb git jq kernel-rpm-macros ksh libacl-devel libaio-devel \
libargon2-devel libattr-devel libblkid-devel libcurl-devel libffi-devel \
ncompress libselinux-devel libtirpc-devel libtool libudev-devel \
libuuid-devel lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester \
parted perf python3 python3-cffi python3-devel python3-packaging \
kernel-devel python3-setuptools qemu-guest-agent rng-tools rpcgen \
rpm-build rsync samba sysstat systemd watchdog wget xfsprogs-devel xxhash \
zlib-devel
echo "##[endgroup]"
}
@ -97,7 +104,7 @@ case "$1" in
sudo dnf install -y kernel-abi-whitelists
echo "##[endgroup]"
;;
almalinux9|centos-stream9)
almalinux9|centos-stream9|centos-stream10)
echo "##[group]Enable epel and crb repositories"
sudo dnf config-manager -y --set-enabled crb
sudo dnf install -y epel-release
@ -111,6 +118,7 @@ case "$1" in
archlinux
;;
debian*)
echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
debian
echo "##[group]Install Debian specific"
sudo apt-get install -yq linux-perf dh-sequence-dkms

View File

@ -83,7 +83,7 @@ function rpm_build_and_install() {
echo "##[endgroup]"
echo "##[group]Install"
run sudo dnf -y --skip-broken localinstall $(ls *.rpm | grep -v src.rpm)
run sudo dnf -y --nobest install $(ls *.rpm | grep -v src.rpm)
echo "##[endgroup]"
}

View File

@ -14,17 +14,21 @@ PID=$(pidof /usr/bin/qemu-system-x86_64)
tail --pid=$PID -f /dev/null
sudo virsh undefine openzfs
# definitions of per operating system
# default values per test vm:
VMs=2
CPU=2
# cpu pinning
CPUSET=("0,1" "2,3")
case "$OS" in
freebsd*)
VMs=2
CPU=3
# FreeBSD can't be optimized via ksmtuned
RAM=6
;;
*)
VMs=2
CPU=3
RAM=7
# Linux can be optimized via ksmtuned
RAM=8
;;
esac
@ -73,6 +77,7 @@ EOF
--cpu host-passthrough \
--virt-type=kvm --hvm \
--vcpus=$CPU,sockets=1 \
--cpuset=${CPUSET[$((i-1))]} \
--memory $((1024*RAM)) \
--memballoon model=virtio \
--graphics none \

View File

@ -11,12 +11,10 @@ function output() {
}
function outfile() {
test -s "$1" || return
cat "$1" >> "out-$logfile.md"
}
function outfile_plain() {
test -s "$1" || return
output "<pre>"
cat "$1" >> "out-$logfile.md"
output "</pre>"
@ -45,6 +43,8 @@ if [ ! -f out-1.md ]; then
tar xf "$tarfile"
test -s env.txt || continue
source env.txt
# when uname.txt is there, the other files are also ok
test -s uname.txt || continue
output "\n## Functional Tests: $OSNAME\n"
outfile_plain uname.txt
outfile_plain summary.txt

View File

@ -3,6 +3,18 @@ name: zfs-qemu
on:
push:
pull_request:
workflow_dispatch:
inputs:
include_stream9:
type: boolean
required: false
default: false
description: 'Test on CentOS 9 stream'
include_stream10:
type: boolean
required: false
default: false
description: 'Test on CentOS 10 stream'
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
@ -22,8 +34,8 @@ jobs:
- name: Generate OS config and CI type
id: os
run: |
FULL_OS='["almalinux8", "almalinux9", "centos-stream9", "debian11", "debian12", "fedora39", "fedora40", "freebsd13", "freebsd13r", "freebsd14", "freebsd14r", "ubuntu20", "ubuntu22", "ubuntu24"]'
QUICK_OS='["almalinux8", "almalinux9", "debian12", "fedora40", "freebsd13", "freebsd14", "ubuntu24"]'
FULL_OS='["almalinux8", "almalinux9", "debian11", "debian12", "fedora40", "fedora41", "freebsd13-3r", "freebsd13-4s", "freebsd14-1r", "freebsd14-2s", "freebsd15-0c", "ubuntu20", "ubuntu22", "ubuntu24"]'
QUICK_OS='["almalinux8", "almalinux9", "debian12", "fedora41", "freebsd13-3r", "freebsd14-2r", "ubuntu24"]'
# determine CI type when running on PR
ci_type="full"
if ${{ github.event_name == 'pull_request' }}; then
@ -37,19 +49,35 @@ jobs:
os_selection="$FULL_OS"
fi
os_json=$(echo ${os_selection} | jq -c)
# Add optional runners
if [ "${{ github.event.inputs.include_stream9 }}" == 'true' ]; then
os_json=$(echo $os_json | jq -c '. += ["centos-stream9"]')
fi
if [ "${{ github.event.inputs.include_stream10 }}" == 'true' ]; then
os_json=$(echo $os_json | jq -c '. += ["centos-stream10"]')
fi
echo $os_json
echo "os=$os_json" >> $GITHUB_OUTPUT
echo "ci_type=$ci_type" >> $GITHUB_OUTPUT
qemu-vm:
name: qemu-x86
needs: [ test-config ]
strategy:
fail-fast: false
matrix:
# all:
# os: [almalinux8, almalinux9, archlinux, centos-stream9, fedora39, fedora40, debian11, debian12, freebsd13, freebsd13r, freebsd14, freebsd14r, freebsd15, ubuntu20, ubuntu22, ubuntu24]
# openzfs:
# os: [almalinux8, almalinux9, centos-stream9, debian11, debian12, fedora39, fedora40, freebsd13, freebsd13r, freebsd14, freebsd14r, ubuntu20, ubuntu22, ubuntu24]
# rhl: almalinux8, almalinux9, centos-stream9, fedora40, fedora41
# debian: debian11, debian12, ubuntu20, ubuntu22, ubuntu24
# misc: archlinux, tumbleweed
# FreeBSD variants of 2024-12:
# FreeBSD Release: freebsd13-3r, freebsd13-4r, freebsd14-1r, freebsd14-2r
# FreeBSD Stable: freebsd13-4s, freebsd14-2s
# FreeBSD Current: freebsd15-0c
os: ${{ fromJson(needs.test-config.outputs.test_os) }}
runs-on: ubuntu-24.04
steps:

View File

@ -70,6 +70,7 @@ Rob Norris <robn@despairlabs.com>
Rob Norris <rob.norris@klarasystems.com>
Sam Lunt <samuel.j.lunt@gmail.com>
Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Sebastian Wuerl <s.wuerl@mailbox.org>
Stoiko Ivanov <github@nomore.at>
Tamas TEVESZ <ice@extreme.hu>
WHR <msl0000023508@gmail.com>
@ -78,6 +79,7 @@ Youzhong Yang <youzhong@gmail.com>
# Signed-off-by: overriding Author:
Ryan <errornointernet@envs.net> <error.nointernet@gmail.com>
Sietse <sietse@wizdom.nu> <uglymotha@wizdom.nu>
Qiuhao Chen <chenqiuhao1997@gmail.com> <haohao0924@126.com>
Yuxin Wang <yuxinwang9999@gmail.com> <Bi11gates9999@gmail.com>
Zhenlei Huang <zlei@FreeBSD.org> <zlei.huang@gmail.com>

View File

@ -423,6 +423,7 @@ CONTRIBUTORS:
Mathieu Velten <matmaul@gmail.com>
Matt Fiddaman <github@m.fiddaman.uk>
Matthew Ahrens <matt@delphix.com>
Matthew Heller <matthew.f.heller@gmail.com>
Matthew Thode <mthode@mthode.org>
Matthias Blankertz <matthias@blankertz.org>
Matt Johnston <matt@fugro-fsi.com.au>
@ -562,6 +563,7 @@ CONTRIBUTORS:
Scot W. Stevenson <scot.stevenson@gmail.com>
Sean Eric Fagan <sef@ixsystems.com>
Sebastian Gottschall <s.gottschall@dd-wrt.com>
Sebastian Wuerl <s.wuerl@mailbox.org>
Sebastien Roy <seb@delphix.com>
Sen Haerens <sen@senhaerens.be>
Serapheim Dimitropoulos <serapheim@delphix.com>
@ -574,6 +576,7 @@ CONTRIBUTORS:
Shawn Bayern <sbayern@law.fsu.edu>
Shengqi Chen <harry-chen@outlook.com>
Shen Yan <shenyanxxxy@qq.com>
Sietse <sietse@wizdom.nu>
Simon Guest <simon.guest@tesujimath.org>
Simon Klinkert <simon.klinkert@gmail.com>
Sowrabha Gopal <sowrabha.gopal@delphix.com>
@ -629,6 +632,7 @@ CONTRIBUTORS:
Trevor Bautista <trevrb@trevrb.net>
Trey Dockendorf <treydock@gmail.com>
Troels Nørgaard <tnn@tradeshift.com>
tstabrawa <tstabrawa@users.noreply.github.com>
Tulsi Jain <tulsi.jain@delphix.com>
Turbo Fredriksson <turbo@bayour.com>
Tyler J. Stachecki <stachecki.tyler@gmail.com>

4
META
View File

@ -2,9 +2,9 @@ Meta: 1
Name: zfs
Branch: 1.0
Version: 2.3.0
Release: rc1
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 6.11
Linux-Maximum: 6.12
Linux-Minimum: 4.18

View File

@ -662,10 +662,7 @@ def section_arc(kstats_dict):
print()
print('ARC hash breakdown:')
prt_i1('Elements max:', f_hits(arc_stats['hash_elements_max']))
prt_i2('Elements current:',
f_perc(arc_stats['hash_elements'], arc_stats['hash_elements_max']),
f_hits(arc_stats['hash_elements']))
prt_i1('Elements:', f_hits(arc_stats['hash_elements']))
prt_i1('Collisions:', f_hits(arc_stats['hash_collisions']))
prt_i1('Chain max:', f_hits(arc_stats['hash_chain_max']))

View File

@ -1131,7 +1131,7 @@ dump_zap(objset_t *os, uint64_t object, void *data, size_t size)
!!(zap_getflags(zc.zc_zap) & ZAP_FLAG_UINT64_KEY);
if (key64)
(void) printf("\t\t0x%010lx = ",
(void) printf("\t\t0x%010" PRIu64 "x = ",
*(uint64_t *)attrp->za_name);
else
(void) printf("\t\t%s = ", attrp->za_name);
@ -1967,17 +1967,53 @@ dump_dedup_ratio(const ddt_stat_t *dds)
static void
dump_ddt_log(ddt_t *ddt)
{
if (ddt->ddt_version != DDT_VERSION_FDT ||
!(ddt->ddt_flags & DDT_FLAG_LOG))
return;
for (int n = 0; n < 2; n++) {
ddt_log_t *ddl = &ddt->ddt_log[n];
char flagstr[64] = {0};
if (ddl->ddl_flags > 0) {
flagstr[0] = ' ';
int c = 1;
if (ddl->ddl_flags & DDL_FLAG_FLUSHING)
c += strlcpy(&flagstr[c], " FLUSHING",
sizeof (flagstr) - c);
if (ddl->ddl_flags & DDL_FLAG_CHECKPOINT)
c += strlcpy(&flagstr[c], " CHECKPOINT",
sizeof (flagstr) - c);
if (ddl->ddl_flags &
~(DDL_FLAG_FLUSHING|DDL_FLAG_CHECKPOINT))
c += strlcpy(&flagstr[c], " UNKNOWN",
sizeof (flagstr) - c);
flagstr[1] = '[';
flagstr[c++] = ']';
}
uint64_t count = avl_numnodes(&ddl->ddl_tree);
if (count == 0)
continue;
printf(DMU_POOL_DDT_LOG ": %lu log entries\n",
zio_checksum_table[ddt->ddt_checksum].ci_name, n, count);
printf(DMU_POOL_DDT_LOG ": flags=0x%02x%s; obj=%llu; "
"len=%llu; txg=%llu; entries=%llu\n",
zio_checksum_table[ddt->ddt_checksum].ci_name, n,
ddl->ddl_flags, flagstr,
(u_longlong_t)ddl->ddl_object,
(u_longlong_t)ddl->ddl_length,
(u_longlong_t)ddl->ddl_first_txg, (u_longlong_t)count);
if (dump_opt['D'] < 4)
if (ddl->ddl_flags & DDL_FLAG_CHECKPOINT) {
const ddt_key_t *ddk = &ddl->ddl_checkpoint;
printf(" checkpoint: "
"%016llx:%016llx:%016llx:%016llx:%016llx\n",
(u_longlong_t)ddk->ddk_cksum.zc_word[0],
(u_longlong_t)ddk->ddk_cksum.zc_word[1],
(u_longlong_t)ddk->ddk_cksum.zc_word[2],
(u_longlong_t)ddk->ddk_cksum.zc_word[3],
(u_longlong_t)ddk->ddk_prop);
}
if (count == 0 || dump_opt['D'] < 4)
continue;
ddt_lightweight_entry_t ddlwe;
@ -1991,7 +2027,7 @@ dump_ddt_log(ddt_t *ddt)
}
static void
dump_ddt(ddt_t *ddt, ddt_type_t type, ddt_class_t class)
dump_ddt_object(ddt_t *ddt, ddt_type_t type, ddt_class_t class)
{
char name[DDT_NAMELEN];
ddt_lightweight_entry_t ddlwe;
@ -2016,11 +2052,8 @@ dump_ddt(ddt_t *ddt, ddt_type_t type, ddt_class_t class)
ddt_object_name(ddt, type, class, name);
(void) printf("%s: %llu entries, size %llu on disk, %llu in core\n",
name,
(u_longlong_t)count,
(u_longlong_t)dspace,
(u_longlong_t)mspace);
(void) printf("%s: dspace=%llu; mspace=%llu; entries=%llu\n", name,
(u_longlong_t)dspace, (u_longlong_t)mspace, (u_longlong_t)count);
if (dump_opt['D'] < 3)
return;
@ -2043,24 +2076,52 @@ dump_ddt(ddt_t *ddt, ddt_type_t type, ddt_class_t class)
(void) printf("\n");
}
static void
dump_ddt(ddt_t *ddt)
{
if (!ddt || ddt->ddt_version == DDT_VERSION_UNCONFIGURED)
return;
char flagstr[64] = {0};
if (ddt->ddt_flags > 0) {
flagstr[0] = ' ';
int c = 1;
if (ddt->ddt_flags & DDT_FLAG_FLAT)
c += strlcpy(&flagstr[c], " FLAT",
sizeof (flagstr) - c);
if (ddt->ddt_flags & DDT_FLAG_LOG)
c += strlcpy(&flagstr[c], " LOG",
sizeof (flagstr) - c);
if (ddt->ddt_flags & ~DDT_FLAG_MASK)
c += strlcpy(&flagstr[c], " UNKNOWN",
sizeof (flagstr) - c);
flagstr[1] = '[';
flagstr[c] = ']';
}
printf("DDT-%s: version=%llu [%s]; flags=0x%02llx%s; rootobj=%llu\n",
zio_checksum_table[ddt->ddt_checksum].ci_name,
(u_longlong_t)ddt->ddt_version,
(ddt->ddt_version == 0) ? "LEGACY" :
(ddt->ddt_version == 1) ? "FDT" : "UNKNOWN",
(u_longlong_t)ddt->ddt_flags, flagstr,
(u_longlong_t)ddt->ddt_dir_object);
for (ddt_type_t type = 0; type < DDT_TYPES; type++)
for (ddt_class_t class = 0; class < DDT_CLASSES; class++)
dump_ddt_object(ddt, type, class);
dump_ddt_log(ddt);
}
static void
dump_all_ddts(spa_t *spa)
{
ddt_histogram_t ddh_total = {{{0}}};
ddt_stat_t dds_total = {0};
for (enum zio_checksum c = 0; c < ZIO_CHECKSUM_FUNCTIONS; c++) {
ddt_t *ddt = spa->spa_ddt[c];
if (!ddt || ddt->ddt_version == DDT_VERSION_UNCONFIGURED)
continue;
for (ddt_type_t type = 0; type < DDT_TYPES; type++) {
for (ddt_class_t class = 0; class < DDT_CLASSES;
class++) {
dump_ddt(ddt, type, class);
}
}
dump_ddt_log(ddt);
}
for (enum zio_checksum c = 0; c < ZIO_CHECKSUM_FUNCTIONS; c++)
dump_ddt(spa->spa_ddt[c]);
ddt_get_dedup_stats(spa, &dds_total);
@ -2119,9 +2180,6 @@ dump_brt(spa_t *spa)
return;
}
brt_t *brt = spa->spa_brt;
VERIFY(brt);
char count[32], used[32], saved[32];
zdb_nicebytes(brt_get_used(spa), used, sizeof (used));
zdb_nicebytes(brt_get_saved(spa), saved, sizeof (saved));
@ -2132,11 +2190,8 @@ dump_brt(spa_t *spa)
if (dump_opt['T'] < 2)
return;
for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid];
if (brtvd == NULL)
continue;
for (uint64_t vdevid = 0; vdevid < spa->spa_brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = spa->spa_brt_vdevs[vdevid];
if (!brtvd->bv_initiated) {
printf("BRT: vdev %" PRIu64 ": empty\n", vdevid);
continue;
@ -2152,34 +2207,54 @@ dump_brt(spa_t *spa)
if (dump_opt['T'] < 3)
return;
char dva[64];
printf("\n%-16s %-10s\n", "DVA", "REFCNT");
/* -TTT shows a per-vdev histograms; -TTTT shows all entries */
boolean_t do_histo = dump_opt['T'] == 3;
for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid];
if (brtvd == NULL || !brtvd->bv_initiated)
char dva[64];
if (!do_histo)
printf("\n%-16s %-10s\n", "DVA", "REFCNT");
for (uint64_t vdevid = 0; vdevid < spa->spa_brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = spa->spa_brt_vdevs[vdevid];
if (!brtvd->bv_initiated)
continue;
uint64_t counts[64] = {};
zap_cursor_t zc;
zap_attribute_t *za = zap_attribute_alloc();
for (zap_cursor_init(&zc, brt->brt_mos, brtvd->bv_mos_entries);
for (zap_cursor_init(&zc, spa->spa_meta_objset,
brtvd->bv_mos_entries);
zap_cursor_retrieve(&zc, za) == 0;
zap_cursor_advance(&zc)) {
uint64_t refcnt;
VERIFY0(zap_lookup_uint64(brt->brt_mos,
VERIFY0(zap_lookup_uint64(spa->spa_meta_objset,
brtvd->bv_mos_entries,
(const uint64_t *)za->za_name, 1,
za->za_integer_length, za->za_num_integers,
&refcnt));
uint64_t offset = *(const uint64_t *)za->za_name;
if (do_histo)
counts[highbit64(refcnt)]++;
else {
uint64_t offset =
*(const uint64_t *)za->za_name;
snprintf(dva, sizeof (dva), "%" PRIu64 ":%llx", vdevid,
(u_longlong_t)offset);
printf("%-16s %-10llu\n", dva, (u_longlong_t)refcnt);
snprintf(dva, sizeof (dva), "%" PRIu64 ":%llx",
vdevid, (u_longlong_t)offset);
printf("%-16s %-10llu\n", dva,
(u_longlong_t)refcnt);
}
}
zap_cursor_fini(&zc);
zap_attribute_free(za);
if (do_histo) {
printf("\nBRT: vdev %" PRIu64
": DVAs with 2^n refcnts:\n", vdevid);
dump_histogram(counts, 64, 0);
}
}
}
@ -4266,6 +4341,10 @@ dump_uberblock(uberblock_t *ub, const char *header, const char *footer)
(void) printf("\ttimestamp = %llu UTC = %s",
(u_longlong_t)ub->ub_timestamp, ctime(&timestamp));
char blkbuf[BP_SPRINTF_LEN];
snprintf_blkptr(blkbuf, sizeof (blkbuf), &ub->ub_rootbp);
(void) printf("\tbp = %s\n", blkbuf);
(void) printf("\tmmp_magic = %016llx\n",
(u_longlong_t)ub->ub_mmp_magic);
if (MMP_VALID(ub)) {
@ -6874,7 +6953,7 @@ iterate_deleted_livelists(spa_t *spa, ll_iter_t func, void *arg)
for (zap_cursor_init(&zc, mos, zap_obj);
zap_cursor_retrieve(&zc, attrp) == 0;
(void) zap_cursor_advance(&zc)) {
dsl_deadlist_open(&ll, mos, attrp->za_first_integer);
VERIFY0(dsl_deadlist_open(&ll, mos, attrp->za_first_integer));
func(&ll, arg);
dsl_deadlist_close(&ll);
}
@ -8204,14 +8283,11 @@ dump_mos_leaks(spa_t *spa)
}
}
if (spa->spa_brt != NULL) {
brt_t *brt = spa->spa_brt;
for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid];
if (brtvd != NULL && brtvd->bv_initiated) {
mos_obj_refd(brtvd->bv_mos_brtvdev);
mos_obj_refd(brtvd->bv_mos_entries);
}
for (uint64_t vdevid = 0; vdevid < spa->spa_brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = spa->spa_brt_vdevs[vdevid];
if (brtvd->bv_initiated) {
mos_obj_refd(brtvd->bv_mos_brtvdev);
mos_obj_refd(brtvd->bv_mos_entries);
}
}

View File

@ -67,19 +67,19 @@ zil_prt_rec_create(zilog_t *zilog, int txtype, const void *arg)
const lr_create_t *lrc = arg;
const _lr_create_t *lr = &lrc->lr_create;
time_t crtime = lr->lr_crtime[0];
char *name, *link;
const char *name, *link;
lr_attr_t *lrattr;
name = (char *)(lr + 1);
name = (const char *)&lrc->lr_data[0];
if (lr->lr_common.lrc_txtype == TX_CREATE_ATTR ||
lr->lr_common.lrc_txtype == TX_MKDIR_ATTR) {
lrattr = (lr_attr_t *)(lr + 1);
lrattr = (lr_attr_t *)&lrc->lr_data[0];
name += ZIL_XVAT_SIZE(lrattr->lr_attr_masksize);
}
if (txtype == TX_SYMLINK) {
link = name + strlen(name) + 1;
link = (const char *)&lrc->lr_data[strlen(name) + 1];
(void) printf("%s%s -> %s\n", tab_prefix, name, link);
} else if (txtype != TX_MKXATTR) {
(void) printf("%s%s\n", tab_prefix, name);
@ -104,7 +104,7 @@ zil_prt_rec_remove(zilog_t *zilog, int txtype, const void *arg)
const lr_remove_t *lr = arg;
(void) printf("%sdoid %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (char *)(lr + 1));
(u_longlong_t)lr->lr_doid, (const char *)&lr->lr_data[0]);
}
static void
@ -115,7 +115,7 @@ zil_prt_rec_link(zilog_t *zilog, int txtype, const void *arg)
(void) printf("%sdoid %llu, link_obj %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (u_longlong_t)lr->lr_link_obj,
(char *)(lr + 1));
(const char *)&lr->lr_data[0]);
}
static void
@ -124,8 +124,8 @@ zil_prt_rec_rename(zilog_t *zilog, int txtype, const void *arg)
(void) zilog, (void) txtype;
const lr_rename_t *lrr = arg;
const _lr_rename_t *lr = &lrr->lr_rename;
char *snm = (char *)(lr + 1);
char *tnm = snm + strlen(snm) + 1;
const char *snm = (const char *)&lrr->lr_data[0];
const char *tnm = (const char *)&lrr->lr_data[strlen(snm) + 1];
(void) printf("%ssdoid %llu, tdoid %llu\n", tab_prefix,
(u_longlong_t)lr->lr_sdoid, (u_longlong_t)lr->lr_tdoid);
@ -211,7 +211,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
/* data is stored after the end of the lr_write record */
data = abd_alloc(lr->lr_length, B_FALSE);
abd_copy_from_buf(data, lr + 1, lr->lr_length);
abd_copy_from_buf(data, &lr->lr_data[0], lr->lr_length);
}
(void) printf("%s", tab_prefix);
@ -309,7 +309,7 @@ zil_prt_rec_setsaxattr(zilog_t *zilog, int txtype, const void *arg)
(void) zilog, (void) txtype;
const lr_setsaxattr_t *lr = arg;
char *name = (char *)(lr + 1);
const char *name = (const char *)&lr->lr_data[0];
(void) printf("%sfoid %llu\n", tab_prefix,
(u_longlong_t)lr->lr_foid);
@ -318,7 +318,7 @@ zil_prt_rec_setsaxattr(zilog_t *zilog, int txtype, const void *arg)
(void) printf("%sXAT_VALUE NULL\n", tab_prefix);
} else {
(void) printf("%sXAT_VALUE ", tab_prefix);
char *val = name + (strlen(name) + 1);
const char *val = (const char *)&lr->lr_data[strlen(name) + 1];
for (int i = 0; i < lr->lr_size; i++) {
(void) printf("%c", *val);
val++;

View File

@ -445,8 +445,8 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
* its a loopback event from spa_async_remove(). Just
* ignore it.
*/
if (vs->vs_state == VDEV_STATE_REMOVED &&
state == VDEV_STATE_REMOVED)
if ((vs->vs_state == VDEV_STATE_REMOVED && state ==
VDEV_STATE_REMOVED) || vs->vs_state == VDEV_STATE_OFFLINE)
return;
/* Remove the vdev since device is unplugged */

View File

@ -139,7 +139,8 @@ dev_event_nvlist(struct udev_device *dev)
* is /dev/sda.
*/
struct udev_device *parent_dev = udev_device_get_parent(dev);
if ((value = udev_device_get_sysattr_value(parent_dev, "size"))
if (parent_dev != NULL &&
(value = udev_device_get_sysattr_value(parent_dev, "size"))
!= NULL) {
uint64_t numval = DEV_BSIZE;

View File

@ -500,7 +500,7 @@ usage_prop_cb(int prop, void *cb)
{
FILE *fp = cb;
(void) fprintf(fp, "\t%-15s ", zfs_prop_to_name(prop));
(void) fprintf(fp, "\t%-22s ", zfs_prop_to_name(prop));
if (zfs_prop_readonly(prop))
(void) fprintf(fp, " NO ");
@ -561,40 +561,40 @@ usage(boolean_t requested)
(void) fprintf(fp, "%s",
gettext("\nThe following properties are supported:\n"));
(void) fprintf(fp, "\n\t%-14s %s %s %s\n\n",
(void) fprintf(fp, "\n\t%-21s %s %s %s\n\n",
"PROPERTY", "EDIT", "INHERIT", "VALUES");
/* Iterate over all properties */
(void) zprop_iter(usage_prop_cb, fp, B_FALSE, B_TRUE,
ZFS_TYPE_DATASET);
(void) fprintf(fp, "\t%-15s ", "userused@...");
(void) fprintf(fp, "\t%-22s ", "userused@...");
(void) fprintf(fp, " NO NO <size>\n");
(void) fprintf(fp, "\t%-15s ", "groupused@...");
(void) fprintf(fp, "\t%-22s ", "groupused@...");
(void) fprintf(fp, " NO NO <size>\n");
(void) fprintf(fp, "\t%-15s ", "projectused@...");
(void) fprintf(fp, "\t%-22s ", "projectused@...");
(void) fprintf(fp, " NO NO <size>\n");
(void) fprintf(fp, "\t%-15s ", "userobjused@...");
(void) fprintf(fp, "\t%-22s ", "userobjused@...");
(void) fprintf(fp, " NO NO <size>\n");
(void) fprintf(fp, "\t%-15s ", "groupobjused@...");
(void) fprintf(fp, "\t%-22s ", "groupobjused@...");
(void) fprintf(fp, " NO NO <size>\n");
(void) fprintf(fp, "\t%-15s ", "projectobjused@...");
(void) fprintf(fp, "\t%-22s ", "projectobjused@...");
(void) fprintf(fp, " NO NO <size>\n");
(void) fprintf(fp, "\t%-15s ", "userquota@...");
(void) fprintf(fp, "\t%-22s ", "userquota@...");
(void) fprintf(fp, "YES NO <size> | none\n");
(void) fprintf(fp, "\t%-15s ", "groupquota@...");
(void) fprintf(fp, "\t%-22s ", "groupquota@...");
(void) fprintf(fp, "YES NO <size> | none\n");
(void) fprintf(fp, "\t%-15s ", "projectquota@...");
(void) fprintf(fp, "\t%-22s ", "projectquota@...");
(void) fprintf(fp, "YES NO <size> | none\n");
(void) fprintf(fp, "\t%-15s ", "userobjquota@...");
(void) fprintf(fp, "\t%-22s ", "userobjquota@...");
(void) fprintf(fp, "YES NO <size> | none\n");
(void) fprintf(fp, "\t%-15s ", "groupobjquota@...");
(void) fprintf(fp, "\t%-22s ", "groupobjquota@...");
(void) fprintf(fp, "YES NO <size> | none\n");
(void) fprintf(fp, "\t%-15s ", "projectobjquota@...");
(void) fprintf(fp, "\t%-22s ", "projectobjquota@...");
(void) fprintf(fp, "YES NO <size> | none\n");
(void) fprintf(fp, "\t%-15s ", "written@<snap>");
(void) fprintf(fp, "\t%-22s ", "written@<snap>");
(void) fprintf(fp, " NO NO <size>\n");
(void) fprintf(fp, "\t%-15s ", "written#<bookmark>");
(void) fprintf(fp, "\t%-22s ", "written#<bookmark>");
(void) fprintf(fp, " NO NO <size>\n");
(void) fprintf(fp, gettext("\nSizes are specified in bytes "
@ -2162,6 +2162,7 @@ zfs_do_get(int argc, char **argv)
cb.cb_type = ZFS_TYPE_DATASET;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZFS_OPTION_JSON_NUMS_AS_INT},
{0, 0, 0, 0}
};
@ -3760,8 +3761,13 @@ collect_dataset(zfs_handle_t *zhp, list_cbdata_t *cb)
if (cb->cb_json) {
if (pl->pl_prop == ZFS_PROP_NAME)
continue;
const char *prop_name;
if (pl->pl_prop != ZPROP_USERPROP)
prop_name = zfs_prop_to_name(pl->pl_prop);
else
prop_name = pl->pl_user_prop;
if (zprop_nvlist_one_property(
zfs_prop_to_name(pl->pl_prop), propstr,
prop_name, propstr,
sourcetype, source, NULL, props,
cb->cb_json_as_int) != 0)
nomem();
@ -3852,6 +3858,7 @@ zfs_do_list(int argc, char **argv)
nvlist_t *data = NULL;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZFS_OPTION_JSON_NUMS_AS_INT},
{0, 0, 0, 0}
};
@ -7436,9 +7443,15 @@ share_mount(int op, int argc, char **argv)
uint_t nthr;
jsobj = data = item = NULL;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt(argc, argv, op == OP_MOUNT ? ":ajRlvo:Of" : "al"))
!= -1) {
while ((c = getopt_long(argc, argv,
op == OP_MOUNT ? ":ajRlvo:Of" : "al",
op == OP_MOUNT ? long_options : NULL, NULL)) != -1) {
switch (c) {
case 'a':
do_all = 1;
@ -8374,8 +8387,14 @@ zfs_do_channel_program(int argc, char **argv)
boolean_t sync_flag = B_TRUE, json_output = B_FALSE;
zpool_handle_t *zhp;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt(argc, argv, "nt:m:j")) != -1) {
while ((c = getopt_long(argc, argv, "nt:m:j", long_options,
NULL)) != -1) {
switch (c) {
case 't':
case 'm': {
@ -9083,7 +9102,13 @@ zfs_do_version(int argc, char **argv)
int c;
nvlist_t *jsobj = NULL, *zfs_ver = NULL;
boolean_t json = B_FALSE;
while ((c = getopt(argc, argv, "j")) != -1) {
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{0, 0, 0, 0}
};
while ((c = getopt_long(argc, argv, "j", long_options, NULL)) != -1) {
switch (c) {
case 'j':
json = B_TRUE;
@ -9187,7 +9212,7 @@ main(int argc, char **argv)
* Special case '-V|--version'
*/
if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0))
return (zfs_do_version(argc, argv));
return (zfs_version_print() != 0);
/*
* Special case 'help'

View File

@ -512,7 +512,8 @@ get_usage(zpool_help_t idx)
return (gettext("\tinitialize [-c | -s | -u] [-w] <pool> "
"[<device> ...]\n"));
case HELP_SCRUB:
return (gettext("\tscrub [-s | -p] [-w] [-e] <pool> ...\n"));
return (gettext("\tscrub [-e | -s | -p | -C] [-w] "
"<pool> ...\n"));
case HELP_RESILVER:
return (gettext("\tresilver <pool> ...\n"));
case HELP_TRIM:
@ -6882,8 +6883,13 @@ collect_pool(zpool_handle_t *zhp, list_cbdata_t *cb)
if (cb->cb_json) {
if (pl->pl_prop == ZPOOL_PROP_NAME)
continue;
const char *prop_name;
if (pl->pl_prop != ZPROP_USERPROP)
prop_name = zpool_prop_to_name(pl->pl_prop);
else
prop_name = pl->pl_user_prop;
(void) zprop_nvlist_one_property(
zpool_prop_to_name(pl->pl_prop), propstr,
prop_name, propstr,
sourcetype, NULL, NULL, props, cb->cb_json_as_int);
} else {
/*
@ -7340,6 +7346,7 @@ zpool_do_list(int argc, char **argv)
current_prop_type = ZFS_TYPE_POOL;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT},
{"json-pool-key-guid", no_argument, NULL,
ZPOOL_OPTION_POOL_KEY_GUID},
@ -7965,8 +7972,11 @@ zpool_do_online(int argc, char **argv)
poolname = argv[0];
if ((zhp = zpool_open(g_zfs, poolname)) == NULL)
if ((zhp = zpool_open(g_zfs, poolname)) == NULL) {
(void) fprintf(stderr, gettext("failed to open pool "
"\"%s\""), poolname);
return (1);
}
for (i = 1; i < argc; i++) {
vdev_state_t oldstate;
@ -7987,12 +7997,15 @@ zpool_do_online(int argc, char **argv)
&l2cache, NULL);
if (tgt == NULL) {
ret = 1;
(void) fprintf(stderr, gettext("couldn't find device "
"\"%s\" in pool \"%s\"\n"), argv[i], poolname);
continue;
}
uint_t vsc;
oldstate = ((vdev_stat_t *)fnvlist_lookup_uint64_array(tgt,
ZPOOL_CONFIG_VDEV_STATS, &vsc))->vs_state;
if (zpool_vdev_online(zhp, argv[i], flags, &newstate) == 0) {
if ((rc = zpool_vdev_online(zhp, argv[i], flags,
&newstate)) == 0) {
if (newstate != VDEV_STATE_HEALTHY) {
(void) printf(gettext("warning: device '%s' "
"onlined, but remains in faulted state\n"),
@ -8018,6 +8031,9 @@ zpool_do_online(int argc, char **argv)
}
}
} else {
(void) fprintf(stderr, gettext("Failed to online "
"\"%s\" in pool \"%s\": %d\n"),
argv[i], poolname, rc);
ret = 1;
}
}
@ -8102,8 +8118,11 @@ zpool_do_offline(int argc, char **argv)
poolname = argv[0];
if ((zhp = zpool_open(g_zfs, poolname)) == NULL)
if ((zhp = zpool_open(g_zfs, poolname)) == NULL) {
(void) fprintf(stderr, gettext("failed to open pool "
"\"%s\""), poolname);
return (1);
}
for (i = 1; i < argc; i++) {
uint64_t guid = zpool_vdev_path_to_guid(zhp, argv[i]);
@ -8411,12 +8430,13 @@ wait_callback(zpool_handle_t *zhp, void *data)
}
/*
* zpool scrub [-s | -p] [-w] [-e] <pool> ...
* zpool scrub [-e | -s | -p | -C] [-w] <pool> ...
*
* -e Only scrub blocks in the error log.
* -s Stop. Stops any in-progress scrub.
* -p Pause. Pause in-progress scrub.
* -w Wait. Blocks until scrub has completed.
* -C Scrub from last saved txg.
*/
int
zpool_do_scrub(int argc, char **argv)
@ -8432,9 +8452,10 @@ zpool_do_scrub(int argc, char **argv)
boolean_t is_error_scrub = B_FALSE;
boolean_t is_pause = B_FALSE;
boolean_t is_stop = B_FALSE;
boolean_t is_txg_continue = B_FALSE;
/* check options */
while ((c = getopt(argc, argv, "spwe")) != -1) {
while ((c = getopt(argc, argv, "spweC")) != -1) {
switch (c) {
case 'e':
is_error_scrub = B_TRUE;
@ -8448,6 +8469,9 @@ zpool_do_scrub(int argc, char **argv)
case 'w':
wait = B_TRUE;
break;
case 'C':
is_txg_continue = B_TRUE;
break;
case '?':
(void) fprintf(stderr, gettext("invalid option '%c'\n"),
optopt);
@ -8459,6 +8483,18 @@ zpool_do_scrub(int argc, char **argv)
(void) fprintf(stderr, gettext("invalid option "
"combination :-s and -p are mutually exclusive\n"));
usage(B_FALSE);
} else if (is_pause && is_txg_continue) {
(void) fprintf(stderr, gettext("invalid option "
"combination :-p and -C are mutually exclusive\n"));
usage(B_FALSE);
} else if (is_stop && is_txg_continue) {
(void) fprintf(stderr, gettext("invalid option "
"combination :-s and -C are mutually exclusive\n"));
usage(B_FALSE);
} else if (is_error_scrub && is_txg_continue) {
(void) fprintf(stderr, gettext("invalid option "
"combination :-e and -C are mutually exclusive\n"));
usage(B_FALSE);
} else {
if (is_error_scrub)
cb.cb_type = POOL_SCAN_ERRORSCRUB;
@ -8467,6 +8503,8 @@ zpool_do_scrub(int argc, char **argv)
cb.cb_scrub_cmd = POOL_SCRUB_PAUSE;
} else if (is_stop) {
cb.cb_type = POOL_SCAN_NONE;
} else if (is_txg_continue) {
cb.cb_scrub_cmd = POOL_SCRUB_FROM_LAST_TXG;
} else {
cb.cb_scrub_cmd = POOL_SCRUB_NORMAL;
}
@ -9224,6 +9262,12 @@ vdev_stats_nvlist(zpool_handle_t *zhp, status_cbdata_t *cb, nvlist_t *nv,
}
}
if (cb->cb_print_dio_verify) {
nice_num_str_nvlist(vds, "dio_verify_errors",
vs->vs_dio_verify_errors, cb->cb_literal,
cb->cb_json_as_int, ZFS_NICENUM_1024);
}
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT,
&notpresent) == 0) {
nice_num_str_nvlist(vds, ZPOOL_CONFIG_NOT_PRESENT,
@ -10010,9 +10054,8 @@ print_removal_status(zpool_handle_t *zhp, pool_removal_stat_t *prs)
(void) printf(gettext("Removal of %s canceled on %s"),
vdev_name, ctime(&end));
} else {
uint64_t copied, total, elapsed, mins_left, hours_left;
uint64_t copied, total, elapsed, rate, mins_left, hours_left;
double fraction_done;
uint_t rate;
assert(prs->prs_state == DSS_SCANNING);
@ -10108,9 +10151,8 @@ print_raidz_expand_status(zpool_handle_t *zhp, pool_raidz_expand_stat_t *pres)
copied_buf, time_buf, ctime((time_t *)&end));
} else {
char examined_buf[7], total_buf[7], rate_buf[7];
uint64_t copied, total, elapsed, secs_left;
uint64_t copied, total, elapsed, rate, secs_left;
double fraction_done;
uint_t rate;
assert(pres->pres_state == DSS_SCANNING);
@ -10975,6 +11017,7 @@ zpool_do_status(int argc, char **argv)
struct option long_options[] = {
{"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT},
{"json-flat-vdevs", no_argument, NULL,
ZPOOL_OPTION_JSON_FLAT_VDEVS},
@ -12583,6 +12626,7 @@ zpool_do_get(int argc, char **argv)
current_prop_type = cb.cb_type;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT},
{"json-pool-key-guid", no_argument, NULL,
ZPOOL_OPTION_POOL_KEY_GUID},
@ -13497,7 +13541,12 @@ zpool_do_version(int argc, char **argv)
int c;
nvlist_t *jsobj = NULL, *zfs_ver = NULL;
boolean_t json = B_FALSE;
while ((c = getopt(argc, argv, "j")) != -1) {
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
};
while ((c = getopt_long(argc, argv, "j", long_options, NULL)) != -1) {
switch (c) {
case 'j':
json = B_TRUE;
@ -13613,7 +13662,7 @@ main(int argc, char **argv)
* Special case '-V|--version'
*/
if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0))
return (zpool_do_version(argc, argv));
return (zfs_version_print() != 0);
/*
* Special case 'help'

View File

@ -6717,6 +6717,17 @@ out:
*
* Only after a full scrub has been completed is it safe to start injecting
* data corruption. See the comment in zfs_fault_inject().
*
* EBUSY may be returned for the following six cases. It's the callers
* responsibility to handle them accordingly.
*
* Current state Requested
* 1. Normal Scrub Running Normal Scrub or Error Scrub
* 2. Normal Scrub Paused Error Scrub
* 3. Normal Scrub Paused Pause Normal Scrub
* 4. Error Scrub Running Normal Scrub or Error Scrub
* 5. Error Scrub Paused Pause Error Scrub
* 6. Resilvering Anything else
*/
static int
ztest_scrub_impl(spa_t *spa)
@ -8082,8 +8093,14 @@ ztest_raidz_expand_check(spa_t *spa)
(void) printf("verifying an interrupted raidz "
"expansion using a pool scrub ...\n");
}
/* Will fail here if there is non-recoverable corruption detected */
VERIFY0(ztest_scrub_impl(spa));
int error = ztest_scrub_impl(spa);
if (error == EBUSY)
error = 0;
VERIFY0(error);
if (ztest_opts.zo_verbose >= 1) {
(void) printf("raidz expansion scrub check complete\n");
}

View File

@ -58,9 +58,9 @@ deb-utils: deb-local rpm-utils-initramfs
pkg1=$${name}-$${version}.$${arch}.rpm; \
pkg2=libnvpair3-$${version}.$${arch}.rpm; \
pkg3=libuutil3-$${version}.$${arch}.rpm; \
pkg4=libzfs5-$${version}.$${arch}.rpm; \
pkg5=libzpool5-$${version}.$${arch}.rpm; \
pkg6=libzfs5-devel-$${version}.$${arch}.rpm; \
pkg4=libzfs6-$${version}.$${arch}.rpm; \
pkg5=libzpool6-$${version}.$${arch}.rpm; \
pkg6=libzfs6-devel-$${version}.$${arch}.rpm; \
pkg7=$${name}-test-$${version}.$${arch}.rpm; \
pkg8=$${name}-dracut-$${version}.noarch.rpm; \
pkg9=$${name}-initramfs-$${version}.$${arch}.rpm; \
@ -72,7 +72,7 @@ deb-utils: deb-local rpm-utils-initramfs
path_prepend=`mktemp -d /tmp/intercept.XXXXXX`; \
echo "#!$(SHELL)" > $${path_prepend}/dh_shlibdeps; \
echo "`which dh_shlibdeps` -- \
-xlibuutil3linux -xlibnvpair3linux -xlibzfs5linux -xlibzpool5linux" \
-xlibuutil3linux -xlibnvpair3linux -xlibzfs6linux -xlibzpool6linux" \
>> $${path_prepend}/dh_shlibdeps; \
## These -x arguments are passed to dpkg-shlibdeps, which exclude the
## Debianized packages from the auto-generated dependencies of the new debs,
@ -93,13 +93,17 @@ debian:
cp -r contrib/debian debian; chmod +x debian/rules;
native-deb-utils: native-deb-local debian
while [ -f debian/deb-build.lock ]; do sleep 1; done; \
echo "native-deb-utils" > debian/deb-build.lock; \
cp contrib/debian/control debian/control; \
$(DPKGBUILD) -b -rfakeroot -us -uc;
$(DPKGBUILD) -b -rfakeroot -us -uc; \
$(RM) -f debian/deb-build.lock
native-deb-kmod: native-deb-local debian
while [ -f debian/deb-build.lock ]; do sleep 1; done; \
echo "native-deb-kmod" > debian/deb-build.lock; \
sh scripts/make_gitrev.sh; \
fakeroot debian/rules override_dh_binary-modules;
fakeroot debian/rules override_dh_binary-modules; \
$(RM) -f debian/deb-build.lock
native-deb: native-deb-utils native-deb-kmod
.NOTPARALLEL: native-deb native-deb-utils native-deb-kmod

View File

@ -17,14 +17,21 @@ AC_DEFUN([ZFS_AC_KERNEL_KTHREAD_COMPLETE_AND_EXIT], [
AC_DEFUN([ZFS_AC_KERNEL_KTHREAD_DEQUEUE_SIGNAL], [
dnl #
dnl # 5.17 API: enum pid_type * as new 4th dequeue_signal() argument,
dnl # 5768d8906bc23d512b1a736c1e198aa833a6daa4 ("signal: Requeue signals in the appropriate queue")
dnl # prehistory:
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask,
dnl # siginfo_t *info)
dnl #
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask, kernel_siginfo_t *info);
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type);
dnl # 4.20: kernel_siginfo_t introduced, replaces siginfo_t
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask,
dnl kernel_siginfo_t *info)
dnl #
dnl # 6.12 API: first arg struct_task* removed
dnl # int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type);
dnl # 5.17: enum pid_type introduced as 4th arg
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask,
dnl # kernel_siginfo_t *info, enum pid_type *type)
dnl #
dnl # 6.12: first arg struct_task* removed
dnl # int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info,
dnl # enum pid_type *type)
dnl #
AC_MSG_CHECKING([whether dequeue_signal() takes 4 arguments])
ZFS_LINUX_TEST_RESULT([kthread_dequeue_signal_4arg], [
@ -33,11 +40,11 @@ AC_DEFUN([ZFS_AC_KERNEL_KTHREAD_DEQUEUE_SIGNAL], [
[dequeue_signal() takes 4 arguments])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether dequeue_signal() a task argument])
ZFS_LINUX_TEST_RESULT([kthread_dequeue_signal_3arg_task], [
AC_MSG_CHECKING([whether 3-arg dequeue_signal() takes a type argument])
ZFS_LINUX_TEST_RESULT([kthread_dequeue_signal_3arg_type], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_DEQUEUE_SIGNAL_3ARG_TASK, 1,
[dequeue_signal() takes a task argument])
AC_DEFINE(HAVE_DEQUEUE_SIGNAL_3ARG_TYPE, 1,
[3-arg dequeue_signal() takes a type argument])
], [
AC_MSG_RESULT(no)
])
@ -56,17 +63,6 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_KTHREAD_COMPLETE_AND_EXIT], [
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_KTHREAD_DEQUEUE_SIGNAL], [
ZFS_LINUX_TEST_SRC([kthread_dequeue_signal_3arg_task], [
#include <linux/sched/signal.h>
], [
struct task_struct *task = NULL;
sigset_t *mask = NULL;
kernel_siginfo_t *info = NULL;
int error __attribute__ ((unused));
error = dequeue_signal(task, mask, info);
])
ZFS_LINUX_TEST_SRC([kthread_dequeue_signal_4arg], [
#include <linux/sched/signal.h>
], [
@ -78,6 +74,17 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_KTHREAD_DEQUEUE_SIGNAL], [
error = dequeue_signal(task, mask, info, type);
])
ZFS_LINUX_TEST_SRC([kthread_dequeue_signal_3arg_type], [
#include <linux/sched/signal.h>
], [
sigset_t *mask = NULL;
kernel_siginfo_t *info = NULL;
enum pid_type *type = NULL;
int error __attribute__ ((unused));
error = dequeue_signal(mask, info, type);
])
])
AC_DEFUN([ZFS_AC_KERNEL_KTHREAD], [

View File

@ -0,0 +1,33 @@
dnl #
dnl # Check for pin_user_pages_unlocked().
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_PIN_USER_PAGES], [
ZFS_LINUX_TEST_SRC([pin_user_pages_unlocked], [
#include <linux/mm.h>
],[
unsigned long start = 0;
unsigned long nr_pages = 1;
struct page **pages = NULL;
unsigned int gup_flags = 0;
long ret __attribute__ ((unused));
ret = pin_user_pages_unlocked(start, nr_pages, pages,
gup_flags);
])
])
AC_DEFUN([ZFS_AC_KERNEL_PIN_USER_PAGES], [
dnl #
dnl # Kernal 5.8 introduced the pin_user_pages* interfaces which should
dnl # be used for Direct I/O requests.
dnl #
AC_MSG_CHECKING([whether pin_user_pages_unlocked() is available])
ZFS_LINUX_TEST_RESULT([pin_user_pages_unlocked], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_PIN_USER_PAGES_UNLOCKED, 1,
[pin_user_pages_unlocked() is available])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -36,7 +36,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_SZ], [
ZFS_LINUX_TEST_SRC([has_register_sysctl_sz], [
#include <linux/sysctl.h>
],[
struct ctl_table test_table[] __attribute__((unused)) = {0};
struct ctl_table test_table[] __attribute__((unused)) = {{}};
register_sysctl_sz("", test_table, 0);
])
])

View File

@ -1,57 +0,0 @@
dnl #
dnl # Check for Direct I/O interfaces.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_DIRECT_IO], [
ZFS_LINUX_TEST_SRC([direct_io_iter], [
#include <linux/fs.h>
static ssize_t test_direct_IO(struct kiocb *kiocb,
struct iov_iter *iter) { return 0; }
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.direct_IO = test_direct_IO,
};
],[])
ZFS_LINUX_TEST_SRC([direct_io_iter_offset], [
#include <linux/fs.h>
static ssize_t test_direct_IO(struct kiocb *kiocb,
struct iov_iter *iter, loff_t offset) { return 0; }
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.direct_IO = test_direct_IO,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO], [
dnl #
dnl # Linux 4.6.x API change
dnl #
AC_MSG_CHECKING([whether aops->direct_IO() uses iov_iter])
ZFS_LINUX_TEST_RESULT([direct_io_iter], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_DIRECT_IO_ITER, 1,
[aops->direct_IO() uses iov_iter without rw])
],[
AC_MSG_RESULT([no])
dnl #
dnl # Linux 4.1.x API change
dnl #
AC_MSG_CHECKING(
[whether aops->direct_IO() uses offset])
ZFS_LINUX_TEST_RESULT([direct_io_iter_offset], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_DIRECT_IO_ITER_OFFSET, 1,
[aops->direct_IO() uses iov_iter with offset])
],[
AC_MSG_RESULT([no])
ZFS_LINUX_TEST_ERROR([Direct I/O])
])
])
])

View File

@ -1,33 +0,0 @@
dnl #
dnl # Linux 5.18 uses invalidate_folio in lieu of invalidate_page
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_INVALIDATE_FOLIO], [
ZFS_LINUX_TEST_SRC([vfs_has_invalidate_folio], [
#include <linux/fs.h>
static void
test_invalidate_folio(struct folio *folio, size_t offset,
size_t len) {
(void) folio; (void) offset; (void) len;
return;
}
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.invalidate_folio = test_invalidate_folio,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_INVALIDATE_FOLIO], [
dnl #
dnl # Linux 5.18 uses invalidate_folio in lieu of invalidate_page
dnl #
AC_MSG_CHECKING([whether invalidate_folio exists])
ZFS_LINUX_TEST_RESULT([vfs_has_invalidate_folio], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_INVALIDATE_FOLIO, 1, [invalidate_folio exists])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -13,34 +13,6 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_IOV_ITER], [
error = fault_in_iov_iter_readable(&iter, size);
])
ZFS_LINUX_TEST_SRC([iov_iter_get_pages2], [
#include <linux/uio.h>
], [
struct iov_iter iter = { 0 };
struct page **pages = NULL;
size_t maxsize = 4096;
unsigned maxpages = 1;
size_t start;
size_t ret __attribute__ ((unused));
ret = iov_iter_get_pages2(&iter, pages, maxsize, maxpages,
&start);
])
ZFS_LINUX_TEST_SRC([iov_iter_get_pages], [
#include <linux/uio.h>
], [
struct iov_iter iter = { 0 };
struct page **pages = NULL;
size_t maxsize = 4096;
unsigned maxpages = 1;
size_t start;
size_t ret __attribute__ ((unused));
ret = iov_iter_get_pages(&iter, pages, maxsize, maxpages,
&start);
])
ZFS_LINUX_TEST_SRC([iov_iter_type], [
#include <linux/fs.h>
#include <linux/uio.h>
@ -49,6 +21,15 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_IOV_ITER], [
__attribute__((unused)) enum iter_type i = iov_iter_type(&iter);
])
ZFS_LINUX_TEST_SRC([iter_is_ubuf], [
#include <linux/uio.h>
],[
struct iov_iter iter = { 0 };
bool ret __attribute__((unused));
ret = iter_is_ubuf(&iter);
])
ZFS_LINUX_TEST_SRC([iter_iov], [
#include <linux/fs.h>
#include <linux/uio.h>
@ -59,7 +40,6 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_IOV_ITER], [
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_IOV_ITER], [
enable_vfs_iov_iter="yes"
AC_MSG_CHECKING([whether fault_in_iov_iter_readable() is available])
ZFS_LINUX_TEST_RESULT([fault_in_iov_iter_readable], [
@ -70,27 +50,6 @@ AC_DEFUN([ZFS_AC_KERNEL_VFS_IOV_ITER], [
AC_MSG_RESULT(no)
])
dnl #
dnl # Kernel 6.0 changed iov_iter_get_pages() to iov_iter_page_pages2().
dnl #
AC_MSG_CHECKING([whether iov_iter_get_pages2() is available])
ZFS_LINUX_TEST_RESULT([iov_iter_get_pages2], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_IOV_ITER_GET_PAGES2, 1,
[iov_iter_get_pages2() is available])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether iov_iter_get_pages() is available])
ZFS_LINUX_TEST_RESULT([iov_iter_get_pages], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_IOV_ITER_GET_PAGES, 1,
[iov_iter_get_pages() is available])
], [
AC_MSG_RESULT(no)
enable_vfs_iov_iter="no"
])
])
dnl #
dnl # This checks for iov_iter_type() in linux/uio.h. It is not
dnl # required, however, and the module will compiled without it
@ -106,14 +65,15 @@ AC_DEFUN([ZFS_AC_KERNEL_VFS_IOV_ITER], [
])
dnl #
dnl # As of the 4.9 kernel support is provided for iovecs, kvecs,
dnl # bvecs and pipes in the iov_iter structure. As long as the
dnl # other support interfaces are all available the iov_iter can
dnl # be correctly used in the uio structure.
dnl # Kernel 6.0 introduced the ITER_UBUF iov_iter type. iter_is_ubuf()
dnl # was also added to determine if the iov_iter is an ITER_UBUF.
dnl #
AS_IF([test "x$enable_vfs_iov_iter" = "xyes"], [
AC_DEFINE(HAVE_VFS_IOV_ITER, 1,
[All required iov_iter interfaces are available])
AC_MSG_CHECKING([whether iter_is_ubuf() is available])
ZFS_LINUX_TEST_RESULT([iter_is_ubuf], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_ITER_IS_UBUF, 1, [iter_is_ubuf() is available])
],[
AC_MSG_RESULT(no)
])
dnl #

View File

@ -0,0 +1,27 @@
dnl #
dnl # Linux 6.0 uses migrate_folio in lieu of migrate_page
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_MIGRATE_FOLIO], [
ZFS_LINUX_TEST_SRC([vfs_has_migrate_folio], [
#include <linux/fs.h>
#include <linux/migrate.h>
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.migrate_folio = migrate_folio,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_MIGRATE_FOLIO], [
dnl #
dnl # Linux 6.0 uses migrate_folio in lieu of migrate_page
dnl #
AC_MSG_CHECKING([whether migrate_folio exists])
ZFS_LINUX_TEST_RESULT([vfs_has_migrate_folio], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_MIGRATE_FOLIO, 1, [migrate_folio exists])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -1,32 +0,0 @@
dnl #
dnl # Linux 5.19 uses release_folio in lieu of releasepage
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_RELEASE_FOLIO], [
ZFS_LINUX_TEST_SRC([vfs_has_release_folio], [
#include <linux/fs.h>
static bool
test_release_folio(struct folio *folio, gfp_t gfp) {
(void) folio; (void) gfp;
return (0);
}
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.release_folio = test_release_folio,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_RELEASE_FOLIO], [
dnl #
dnl # Linux 5.19 uses release_folio in lieu of releasepage
dnl #
AC_MSG_CHECKING([whether release_folio exists])
ZFS_LINUX_TEST_RESULT([vfs_has_release_folio], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_RELEASE_FOLIO, 1, [release_folio exists])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -54,7 +54,6 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_GET_DENTRY_INODE_FLAGS], [
])
AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_GET_DENTRY_INODE_FLAGS], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether xattr_handler->get() wants dentry and inode and flags])
ZFS_LINUX_TEST_RESULT([xattr_handler_get_dentry_inode_flags], [

View File

@ -77,10 +77,8 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_SGET
ZFS_AC_KERNEL_SRC_VFS_FILEMAP_DIRTY_FOLIO
ZFS_AC_KERNEL_SRC_VFS_READ_FOLIO
ZFS_AC_KERNEL_SRC_VFS_RELEASE_FOLIO
ZFS_AC_KERNEL_SRC_VFS_INVALIDATE_FOLIO
ZFS_AC_KERNEL_SRC_VFS_MIGRATE_FOLIO
ZFS_AC_KERNEL_SRC_VFS_FSYNC_2ARGS
ZFS_AC_KERNEL_SRC_VFS_DIRECT_IO
ZFS_AC_KERNEL_SRC_VFS_READPAGES
ZFS_AC_KERNEL_SRC_VFS_SET_PAGE_DIRTY_NOBUFFERS
ZFS_AC_KERNEL_SRC_VFS_IOV_ITER
@ -129,6 +127,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_MM_PAGE_SIZE
ZFS_AC_KERNEL_SRC_MM_PAGE_MAPPING
ZFS_AC_KERNEL_SRC_FILE
ZFS_AC_KERNEL_SRC_PIN_USER_PAGES
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_SRC_CPU_HAS_FEATURE
@ -189,10 +188,8 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_SGET
ZFS_AC_KERNEL_VFS_FILEMAP_DIRTY_FOLIO
ZFS_AC_KERNEL_VFS_READ_FOLIO
ZFS_AC_KERNEL_VFS_RELEASE_FOLIO
ZFS_AC_KERNEL_VFS_INVALIDATE_FOLIO
ZFS_AC_KERNEL_VFS_MIGRATE_FOLIO
ZFS_AC_KERNEL_VFS_FSYNC_2ARGS
ZFS_AC_KERNEL_VFS_DIRECT_IO
ZFS_AC_KERNEL_VFS_READPAGES
ZFS_AC_KERNEL_VFS_SET_PAGE_DIRTY_NOBUFFERS
ZFS_AC_KERNEL_VFS_IOV_ITER
@ -242,6 +239,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_MM_PAGE_MAPPING
ZFS_AC_KERNEL_1ARG_ASSIGN_STR
ZFS_AC_KERNEL_FILE
ZFS_AC_KERNEL_PIN_USER_PAGES
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_CPU_HAS_FEATURE
@ -683,11 +681,16 @@ AC_DEFUN([ZFS_LINUX_COMPILE], [
building kernel modules])
AC_ARG_VAR([KERNEL_LLVM], [Binary option to
build kernel modules with LLVM/CLANG toolchain])
AC_ARG_VAR([KERNEL_CROSS_COMPILE], [Cross compile prefix
for kernel module builds])
AC_ARG_VAR([KERNEL_ARCH], [Architecture to build kernel modules for])
AC_TRY_COMMAND([
KBUILD_MODPOST_NOFINAL="$5" KBUILD_MODPOST_WARN="$6"
make modules -k -j$TEST_JOBS ${KERNEL_CC:+CC=$KERNEL_CC}
${KERNEL_LD:+LD=$KERNEL_LD} ${KERNEL_LLVM:+LLVM=$KERNEL_LLVM}
CONFIG_MODULES=y CFLAGS_MODULE=-DCONFIG_MODULES
${KERNEL_CROSS_COMPILE:+CROSS_COMPILE=$KERNEL_CROSS_COMPILE}
${KERNEL_ARCH:+ARCH=$KERNEL_ARCH}
-C $LINUX_OBJ $ARCH_UM M=$PWD/$1 >$1/build.log 2>&1])
AS_IF([AC_TRY_COMMAND([$2])], [$3], [$4])
])

View File

@ -393,6 +393,8 @@ AC_DEFUN([ZFS_AC_RPM], [
RPM_DEFINE_KMOD=${RPM_DEFINE_KMOD}' --define "kernel_cc KERNEL_CC=$(KERNEL_CC)"'
RPM_DEFINE_KMOD=${RPM_DEFINE_KMOD}' --define "kernel_ld KERNEL_LD=$(KERNEL_LD)"'
RPM_DEFINE_KMOD=${RPM_DEFINE_KMOD}' --define "kernel_llvm KERNEL_LLVM=$(KERNEL_LLVM)"'
RPM_DEFINE_KMOD=${RPM_DEFINE_KMOD}' --define "kernel_cross_compile KERNEL_CROSS_COMPILE=$(KERNEL_CROSS_COMPILE)"'
RPM_DEFINE_KMOD=${RPM_DEFINE_KMOD}' --define "kernel_arch KERNEL_ARCH=$(KERNEL_ARCH)"'
])
RPM_DEFINE_DKMS=''
@ -627,7 +629,7 @@ AC_DEFUN([ZFS_AC_DEFAULT_PACKAGE], [
AC_MSG_CHECKING([default bash completion directory])
case "$VENDOR" in
alpine|artix|debian|gentoo|ubuntu)
alpine|arch|artix|debian|gentoo|ubuntu)
bashcompletiondir=/usr/share/bash-completion/completions
;;
freebsd)

View File

@ -12,14 +12,14 @@ dist_noinst_DATA += %D%/openzfs-libpam-zfs.postinst
dist_noinst_DATA += %D%/openzfs-libpam-zfs.prerm
dist_noinst_DATA += %D%/openzfs-libuutil3.docs
dist_noinst_DATA += %D%/openzfs-libuutil3.install.in
dist_noinst_DATA += %D%/openzfs-libzfs4.docs
dist_noinst_DATA += %D%/openzfs-libzfs4.install.in
dist_noinst_DATA += %D%/openzfs-libzfs6.docs
dist_noinst_DATA += %D%/openzfs-libzfs6.install.in
dist_noinst_DATA += %D%/openzfs-libzfsbootenv1.docs
dist_noinst_DATA += %D%/openzfs-libzfsbootenv1.install.in
dist_noinst_DATA += %D%/openzfs-libzfs-dev.docs
dist_noinst_DATA += %D%/openzfs-libzfs-dev.install.in
dist_noinst_DATA += %D%/openzfs-libzpool5.docs
dist_noinst_DATA += %D%/openzfs-libzpool5.install.in
dist_noinst_DATA += %D%/openzfs-libzpool6.docs
dist_noinst_DATA += %D%/openzfs-libzpool6.install.in
dist_noinst_DATA += %D%/openzfs-python3-pyzfs.install
dist_noinst_DATA += %D%/openzfs-zfs-dkms.config
dist_noinst_DATA += %D%/openzfs-zfs-dkms.dkms

View File

@ -6,6 +6,6 @@ contrib/pyzfs/libzfs_core/bindings/__pycache__/
contrib/pyzfs/pyzfs.egg-info/
debian/openzfs-libnvpair3.install
debian/openzfs-libuutil3.install
debian/openzfs-libzfs4.install
debian/openzfs-libzfs6.install
debian/openzfs-libzfs-dev.install
debian/openzfs-libzpool5.install
debian/openzfs-libzpool6.install

View File

@ -78,9 +78,9 @@ Architecture: linux-any
Depends: libssl-dev | libssl1.0-dev,
openzfs-libnvpair3 (= ${binary:Version}),
openzfs-libuutil3 (= ${binary:Version}),
openzfs-libzfs4 (= ${binary:Version}),
openzfs-libzfs6 (= ${binary:Version}),
openzfs-libzfsbootenv1 (= ${binary:Version}),
openzfs-libzpool5 (= ${binary:Version}),
openzfs-libzpool6 (= ${binary:Version}),
${misc:Depends}
Replaces: libzfslinux-dev
Conflicts: libzfslinux-dev
@ -90,18 +90,18 @@ Description: OpenZFS filesystem development files for Linux
libraries of OpenZFS filesystem.
.
This package includes the development files of libnvpair3, libuutil3,
libzpool5 and libzfs4.
libzpool6 and libzfs6.
Package: openzfs-libzfs4
Package: openzfs-libzfs6
Section: contrib/libs
Architecture: linux-any
Depends: ${misc:Depends}, ${shlibs:Depends}
# The libcurl4 is loaded through dlopen("libcurl.so.4").
# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988521
Recommends: libcurl4
Breaks: libzfs2, libzfs4
Replaces: libzfs2, libzfs4, libzfs4linux
Conflicts: libzfs4linux
Breaks: libzfs2, libzfs4, libzfs4linux, libzfs6linux
Replaces: libzfs2, libzfs4, libzfs4linux, libzfs6linux
Conflicts: libzfs6linux
Description: OpenZFS filesystem library for Linux - general support
OpenZFS is a storage platform that encompasses the functionality of
traditional filesystems and volume managers. It supports data checksums,
@ -123,13 +123,13 @@ Description: OpenZFS filesystem library for Linux - label info support
.
The zfsbootenv library provides support for modifying ZFS label information.
Package: openzfs-libzpool5
Package: openzfs-libzpool6
Section: contrib/libs
Architecture: linux-any
Depends: ${misc:Depends}, ${shlibs:Depends}
Breaks: libzpool2, libzpool5
Replaces: libzpool2, libzpool5, libzpool5linux
Conflicts: libzpool5linux
Breaks: libzpool2, libzpool5, libzpool5linux, libzpool6linux
Replaces: libzpool2, libzpool5, libzpool5linux, libzpool6linux
Conflicts: libzpool6linux
Description: OpenZFS pool library for Linux
OpenZFS is a storage platform that encompasses the functionality of
traditional filesystems and volume managers. It supports data checksums,
@ -246,8 +246,8 @@ Architecture: linux-any
Pre-Depends: ${misc:Pre-Depends}
Depends: openzfs-libnvpair3 (= ${binary:Version}),
openzfs-libuutil3 (= ${binary:Version}),
openzfs-libzfs4 (= ${binary:Version}),
openzfs-libzpool5 (= ${binary:Version}),
openzfs-libzfs6 (= ${binary:Version}),
openzfs-libzpool6 (= ${binary:Version}),
python3,
${misc:Depends},
${shlibs:Depends}

View File

@ -98,6 +98,7 @@ usr/share/man/man8/zpool-attach.8
usr/share/man/man8/zpool-checkpoint.8
usr/share/man/man8/zpool-clear.8
usr/share/man/man8/zpool-create.8
usr/share/man/man8/zpool-ddtprune.8
usr/share/man/man8/zpool-destroy.8
usr/share/man/man8/zpool-detach.8
usr/share/man/man8/zpool-ddtprune.8
@ -113,6 +114,7 @@ usr/share/man/man8/zpool-list.8
usr/share/man/man8/zpool-offline.8
usr/share/man/man8/zpool-online.8
usr/share/man/man8/zpool-prefetch.8
usr/share/man/man8/zpool-prefetch.8
usr/share/man/man8/zpool-reguid.8
usr/share/man/man8/zpool-remove.8
usr/share/man/man8/zpool-reopen.8

View File

@ -344,7 +344,7 @@ mount_fs()
# Need the _original_ datasets mountpoint!
mountpoint=$(get_fs_value "$fs" mountpoint)
ZFS_CMD="mount -o zfsutil -t zfs"
ZFS_CMD="mount.zfs -o zfsutil"
if [ "$mountpoint" = "legacy" ] || [ "$mountpoint" = "none" ]; then
# Can't use the mountpoint property. Might be one of our
# clones. Check the 'org.zol:mountpoint' property set in
@ -359,9 +359,8 @@ mount_fs()
# isn't the root fs.
return 0
fi
# Don't use mount.zfs -o zfsutils for legacy mountpoint
if [ "$mountpoint" = "legacy" ]; then
ZFS_CMD="mount -t zfs"
ZFS_CMD="mount.zfs"
fi
# Last hail-mary: Hope 'rootmnt' is set!
mountpoint=""

View File

@ -8,6 +8,13 @@ After=systemd-remount-fs.service
Before=local-fs.target
ConditionPathIsDirectory=/sys/module/zfs
# This merely tells the service manager
# that unmounting everything undoes the
# effect of this service. No extra logic
# is ran as a result of these settings.
Conflicts=umount.target
Before=umount.target
[Service]
Type=oneshot
RemainAfterExit=yes

View File

@ -27,7 +27,7 @@
#define _LIBZUTIL_H extern __attribute__((visibility("default")))
#include <string.h>
#include <locale.h>
#include <pthread.h>
#include <sys/nvpair.h>
#include <sys/fs/zfs.h>
@ -276,7 +276,14 @@ _LIBZUTIL_H void update_vdev_config_dev_sysfs_path(nvlist_t *nv,
* Thread-safe strerror() for use in ZFS libraries
*/
static inline char *zfs_strerror(int errnum) {
return (strerror_l(errnum, uselocale(0)));
static __thread char errbuf[512];
static pthread_mutex_t zfs_strerror_lock = PTHREAD_MUTEX_INITIALIZER;
(void) pthread_mutex_lock(&zfs_strerror_lock);
(void) strlcpy(errbuf, strerror(errnum), sizeof (errbuf));
(void) pthread_mutex_unlock(&zfs_strerror_lock);
return (errbuf);
}
#ifdef __cplusplus

View File

@ -1,10 +1,5 @@
/*
* Copyright (c) 2010 Isilon Systems, Inc.
* Copyright (c) 2010 iXsystems, Inc.
* Copyright (c) 2010 Panasas, Inc.
* Copyright (c) 2013-2016 Mellanox Technologies, Ltd.
* Copyright (c) 2015 François Tigeot
* All rights reserved.
* Copyright (c) 2024 Warner Losh.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@ -26,76 +21,14 @@
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* $FreeBSD$
*/
#ifndef _LINUX_COMPILER_H_
#define _LINUX_COMPILER_H_
#include <sys/cdefs.h>
/*
* FreeBSD's LinuxKPI compiler.h as far back as FreeBSD 12 has what we need,
* except zfs_fallthrough.
*/
#pragma once
#include <compat/linuxkpi/common/include/linux/compiler.h>
#define __user
#define __kernel
#define __safe
#define __force
#define __nocast
#define __iomem
#define __chk_user_ptr(x) ((void)0)
#define __chk_io_ptr(x) ((void)0)
#define __builtin_warning(x, y...) (1)
#define __acquires(x)
#define __releases(x)
#define __acquire(x) do { } while (0)
#define __release(x) do { } while (0)
#define __cond_lock(x, c) (c)
#define __bitwise
#define __devinitdata
#define __deprecated
#define __init
#define __initconst
#define __devinit
#define __devexit
#define __exit
#define __rcu
#define __percpu
#define __weak __weak_symbol
#define __malloc
#define ___stringify(...) #__VA_ARGS__
#define __stringify(...) ___stringify(__VA_ARGS__)
#define __attribute_const__ __attribute__((__const__))
#undef __always_inline
#define __always_inline inline
#define noinline __noinline
#define ____cacheline_aligned __aligned(CACHE_LINE_SIZE)
#define zfs_fallthrough __attribute__((__fallthrough__))
#if !defined(_KERNEL) && !defined(_STANDALONE)
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
#endif
#define typeof(x) __typeof(x)
#define uninitialized_var(x) x = x
#define __maybe_unused __unused
#define __always_unused __unused
#define __must_check __result_use_check
#define __printf(a, b) __printflike(a, b)
#define barrier() __asm__ __volatile__("": : :"memory")
#define ___PASTE(a, b) a##b
#define __PASTE(a, b) ___PASTE(a, b)
#define ACCESS_ONCE(x) (*(volatile __typeof(x) *)&(x))
#define WRITE_ONCE(x, v) do { \
barrier(); \
ACCESS_ONCE(x) = (v); \
barrier(); \
} while (0)
#define lockless_dereference(p) READ_ONCE(p)
#define _AT(T, X) ((T)(X))
#endif /* _LINUX_COMPILER_H_ */

View File

@ -70,15 +70,6 @@ hlist_del(struct hlist_node *n)
n->next->pprev = n->pprev;
}
/* BEGIN CSTYLED */
#define READ_ONCE(x) ({ \
__typeof(x) __var = ({ \
barrier(); \
ACCESS_ONCE(x); \
}); \
barrier(); \
__var; \
})
#define HLIST_HEAD_INIT { }
#define HLIST_HEAD(name) struct hlist_head name = HLIST_HEAD_INIT
#define INIT_HLIST_HEAD(head) (head)->first = NULL

View File

@ -95,10 +95,6 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
#ifndef expect
#define expect(expr, value) (__builtin_expect((expr), (value)))
#endif
#ifndef __linux__
#define likely(expr) expect((expr) != 0, 1)
#define unlikely(expr) expect((expr) != 0, 0)
#endif
#define PANIC(fmt, a...) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, fmt, ## a)
@ -109,7 +105,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
__FILE__, __FUNCTION__, __LINE__))
#define VERIFYF(cond, str, ...) do { \
if (unlikely(!cond)) \
if (unlikely(!(cond))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY(" #cond ") failed " str "\n", __VA_ARGS__);\
} while (0)
@ -205,7 +201,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
"failed (%lld " #OP " %lld) " STR "\n", \
(long long)(_verify3_left), \
(long long)(_verify3_right), \
__VA_ARGS); \
__VA_ARGS__); \
} while (0)
#define VERIFY3UF(LEFT, OP, RIGHT, STR, ...) do { \
@ -217,7 +213,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
"failed (%llu " #OP " %llu) " STR "\n", \
(unsigned long long)(_verify3_left), \
(unsigned long long)(_verify3_right), \
__VA_ARGS); \
__VA_ARGS__); \
} while (0)
#define VERIFY3PF(LEFT, OP, RIGHT, STR, ...) do { \

View File

@ -31,9 +31,9 @@
#include_next <sys/sdt.h>
#ifdef KDTRACE_HOOKS
/* BEGIN CSTYLED */
SDT_PROBE_DECLARE(sdt, , , set__error);
/* BEGIN CSTYLED */
#define SET_ERROR(err) ({ \
SDT_PROBE1(sdt, , , set__error, (uintptr_t)err); \
err; \

View File

@ -50,7 +50,7 @@
#define kfpu_fini() do {} while (0)
#endif
#define simd_stat_init() 0
#define simd_stat_fini() 0
#define simd_stat_init() do {} while (0)
#define simd_stat_fini() do {} while (0)
#endif

View File

@ -68,47 +68,30 @@ enum symfollow { NO_FOLLOW = NOFOLLOW };
#include <vm/vm_object.h>
typedef struct vop_vector vnodeops_t;
#define VOP_FID VOP_VPTOFH
#define vop_fid vop_vptofh
#define vop_fid_args vop_vptofh_args
#define a_fid a_fhp
#define rootvfs (rootvnode == NULL ? NULL : rootvnode->v_mount)
#ifndef IN_BASE
static __inline int
vn_is_readonly(vnode_t *vp)
{
return (vp->v_mount->mnt_flag & MNT_RDONLY);
}
#endif
#define vn_vfswlock(vp) (0)
#define vn_vfsunlock(vp) do { } while (0)
#define vn_ismntpt(vp) \
((vp)->v_type == VDIR && (vp)->v_mountedhere != NULL)
#define vn_mountedvfs(vp) ((vp)->v_mountedhere)
#ifndef IN_BASE
#define vn_has_cached_data(vp) \
((vp)->v_object != NULL && \
(vp)->v_object->resident_page_count > 0)
#ifndef IN_BASE
static __inline void
vn_flush_cached_data(vnode_t *vp, boolean_t sync)
{
if (vm_object_mightbedirty(vp->v_object)) {
int flags = sync ? OBJPC_SYNC : 0;
vn_lock(vp, LK_SHARED | LK_RETRY);
zfs_vmobject_wlock(vp->v_object);
vm_object_page_clean(vp->v_object, 0, 0, flags);
zfs_vmobject_wunlock(vp->v_object);
VOP_UNLOCK(vp);
}
}
#endif
#define vn_exists(vp) do { } while (0)
#define vn_invalid(vp) do { } while (0)
#define vn_free(vp) do { } while (0)
#define vn_matchops(vp, vops) ((vp)->v_op == &(vops))
#define VN_HOLD(v) vref(v)
@ -123,9 +106,6 @@ vn_flush_cached_data(vnode_t *vp, boolean_t sync)
#define vnevent_rename_dest(vp, dvp, name, ct) do { } while (0)
#define vnevent_rename_dest_dir(vp, ct) do { } while (0)
#define specvp(vp, rdev, type, cr) (VN_HOLD(vp), (vp))
#define MANDLOCK(vp, mode) (0)
/*
* We will use va_spare is place of Solaris' va_mask.
* This field is initialized in zfs_setattr().

View File

@ -26,8 +26,10 @@
#ifndef _ABD_OS_H
#define _ABD_OS_H
#ifdef _KERNEL
#include <sys/vm.h>
#include <vm/vm_page.h>
#endif
#ifdef __cplusplus
extern "C" {
@ -47,8 +49,10 @@ struct abd_linear {
#endif
};
#ifdef _KERNEL
__attribute__((malloc))
struct abd *abd_alloc_from_pages(vm_page_t *, unsigned long, uint64_t);
#endif
#ifdef __cplusplus
}

View File

@ -109,7 +109,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
__FILE__, __FUNCTION__, __LINE__))
#define VERIFYF(cond, str, ...) do { \
if (unlikely(!cond)) \
if (unlikely(!(cond))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY(" #cond ") failed " str "\n", __VA_ARGS__);\
} while (0)
@ -205,7 +205,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
"failed (%lld " #OP " %lld) " STR "\n", \
(long long)(_verify3_left), \
(long long)(_verify3_right), \
__VA_ARGS); \
__VA_ARGS__); \
} while (0)
#define VERIFY3UF(LEFT, OP, RIGHT, STR, ...) do { \
@ -217,7 +217,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
"failed (%llu " #OP " %llu) " STR "\n", \
(unsigned long long)(_verify3_left), \
(unsigned long long)(_verify3_right), \
__VA_ARGS); \
__VA_ARGS__); \
} while (0)
#define VERIFY3PF(LEFT, OP, RIGHT, STR, ...) do { \

View File

@ -38,8 +38,7 @@
#include <sys/rwlock.h>
#include <sys/wait.h>
#include <sys/wmsum.h>
typedef struct kstat_s kstat_t;
#include <sys/kstat.h>
#define TASKQ_NAMELEN 31

View File

@ -42,7 +42,7 @@
#define TS_ZOMB EXIT_ZOMBIE
#define TS_STOPPED TASK_STOPPED
typedef void (*thread_func_t)(void *);
typedef void (*thread_func_t)(void *) __attribute__((noreturn));
#define thread_create_named(name, stk, stksize, func, arg, len, \
pp, state, pri) \

View File

@ -40,7 +40,7 @@
*/
#define UIO_DIRECT 0x0001 /* Direct I/O request */
#if defined(HAVE_VFS_IOV_ITER) && defined(HAVE_FAULT_IN_IOV_ITER_READABLE)
#if defined(HAVE_FAULT_IN_IOV_ITER_READABLE)
#define iov_iter_fault_in_readable(a, b) fault_in_iov_iter_readable(a, b)
#endif
@ -52,12 +52,9 @@ typedef enum zfs_uio_rw {
} zfs_uio_rw_t;
typedef enum zfs_uio_seg {
UIO_USERSPACE = 0,
UIO_SYSSPACE = 1,
UIO_BVEC = 2,
#if defined(HAVE_VFS_IOV_ITER)
UIO_ITER = 3,
#endif
UIO_SYSSPACE = 0,
UIO_BVEC = 1,
UIO_ITER = 2,
} zfs_uio_seg_t;
/*
@ -72,9 +69,7 @@ typedef struct zfs_uio {
union {
const struct iovec *uio_iov;
const struct bio_vec *uio_bvec;
#if defined(HAVE_VFS_IOV_ITER)
struct iov_iter *uio_iter;
#endif
};
int uio_iovcnt; /* Number of iovecs */
offset_t uio_soffset; /* Starting logical offset */
@ -129,7 +124,7 @@ zfs_uio_iovec_init(zfs_uio_t *uio, const struct iovec *iov,
unsigned long nr_segs, offset_t offset, zfs_uio_seg_t seg, ssize_t resid,
size_t skip)
{
ASSERT(seg == UIO_USERSPACE || seg == UIO_SYSSPACE);
ASSERT(seg == UIO_SYSSPACE);
uio->uio_iov = iov;
uio->uio_iovcnt = nr_segs;
@ -175,7 +170,6 @@ zfs_uio_bvec_init(zfs_uio_t *uio, struct bio *bio, struct request *rq)
memset(&uio->uio_dio, 0, sizeof (zfs_uio_dio_t));
}
#if defined(HAVE_VFS_IOV_ITER)
static inline void
zfs_uio_iov_iter_init(zfs_uio_t *uio, struct iov_iter *iter, offset_t offset,
ssize_t resid, size_t skip)
@ -192,7 +186,6 @@ zfs_uio_iov_iter_init(zfs_uio_t *uio, struct iov_iter *iter, offset_t offset,
uio->uio_soffset = uio->uio_loffset;
memset(&uio->uio_dio, 0, sizeof (zfs_uio_dio_t));
}
#endif /* HAVE_VFS_IOV_ITER */
#if defined(HAVE_ITER_IOV)
#define zfs_uio_iter_iov(iter) iter_iov((iter))

View File

@ -30,6 +30,8 @@
extern "C" {
#endif
struct abd;
struct abd_scatter {
uint_t abd_offset;
uint_t abd_nents;
@ -41,10 +43,8 @@ struct abd_linear {
struct scatterlist *abd_sgl; /* for LINEAR_PAGE */
};
typedef struct abd abd_t;
typedef int abd_iter_page_func_t(struct page *, size_t, size_t, void *);
int abd_iterate_page_func(abd_t *, size_t, size_t, abd_iter_page_func_t *,
int abd_iterate_page_func(struct abd *, size_t, size_t, abd_iter_page_func_t *,
void *);
/*
@ -52,11 +52,11 @@ int abd_iterate_page_func(abd_t *, size_t, size_t, abd_iter_page_func_t *,
* Note: these are only needed to support vdev_classic. See comment in
* vdev_disk.c.
*/
unsigned int abd_bio_map_off(struct bio *, abd_t *, unsigned int, size_t);
unsigned long abd_nr_pages_off(abd_t *, unsigned int, size_t);
unsigned int abd_bio_map_off(struct bio *, struct abd *, unsigned int, size_t);
unsigned long abd_nr_pages_off(struct abd *, unsigned int, size_t);
__attribute__((malloc))
abd_t *abd_alloc_from_pages(struct page **, unsigned long, uint64_t);
struct abd *abd_alloc_from_pages(struct page **, unsigned long, uint64_t);
#ifdef __cplusplus
}

View File

@ -69,6 +69,7 @@ typedef struct vfs {
boolean_t vfs_do_relatime;
boolean_t vfs_nbmand;
boolean_t vfs_do_nbmand;
kmutex_t vfs_mntpt_lock;
} vfs_t;
typedef struct zfs_mnt {

View File

@ -347,6 +347,7 @@ void l2arc_fini(void);
void l2arc_start(void);
void l2arc_stop(void);
void l2arc_spa_rebuild_start(spa_t *spa);
void l2arc_spa_rebuild_stop(spa_t *spa);
#ifndef _KERNEL
extern boolean_t arc_watch;

View File

@ -942,6 +942,7 @@ typedef struct arc_sums {
wmsum_t arcstat_evict_l2_eligible_mru;
wmsum_t arcstat_evict_l2_ineligible;
wmsum_t arcstat_evict_l2_skip;
wmsum_t arcstat_hash_elements;
wmsum_t arcstat_hash_collisions;
wmsum_t arcstat_hash_chains;
aggsum_t arcstat_size;

View File

@ -86,28 +86,38 @@ typedef struct brt_vdev_phys {
uint64_t bvp_savedspace;
} brt_vdev_phys_t;
typedef struct brt_vdev {
struct brt_vdev {
/*
* Pending changes from open contexts.
*/
kmutex_t bv_pending_lock;
avl_tree_t bv_pending_tree[TXG_SIZE];
/*
* Protects bv_mos_*.
*/
krwlock_t bv_mos_entries_lock ____cacheline_aligned;
/*
* Protects all the fields starting from bv_initiated.
*/
krwlock_t bv_lock ____cacheline_aligned;
/*
* VDEV id.
*/
uint64_t bv_vdevid;
/*
* Is the structure initiated?
* (bv_entcount and bv_bitmap are allocated?)
*/
boolean_t bv_initiated;
uint64_t bv_vdevid ____cacheline_aligned;
/*
* Object number in the MOS for the entcount array and brt_vdev_phys.
*/
uint64_t bv_mos_brtvdev;
/*
* Object number in the MOS for the entries table.
* Object number in the MOS and dnode for the entries table.
*/
uint64_t bv_mos_entries;
dnode_t *bv_mos_entries_dnode;
/*
* Entries to sync.
* Is the structure initiated?
* (bv_entcount and bv_bitmap are allocated?)
*/
avl_tree_t bv_tree;
boolean_t bv_initiated;
/*
* Does the bv_entcount[] array needs byte swapping?
*/
@ -120,6 +130,26 @@ typedef struct brt_vdev {
* This is the array with BRT entry count per BRT_RANGESIZE.
*/
uint16_t *bv_entcount;
/*
* bv_entcount[] potentially can be a bit too big to sychronize it all
* when we just changed few entcounts. The fields below allow us to
* track updates to bv_entcount[] array since the last sync.
* A single bit in the bv_bitmap represents as many entcounts as can
* fit into a single BRT_BLOCKSIZE.
* For example we have 65536 entcounts in the bv_entcount array
* (so the whole array is 128kB). We updated bv_entcount[2] and
* bv_entcount[5]. In that case only first bit in the bv_bitmap will
* be set and we will write only first BRT_BLOCKSIZE out of 128kB.
*/
ulong_t *bv_bitmap;
/*
* bv_entcount[] needs updating on disk.
*/
boolean_t bv_entcount_dirty;
/*
* brt_vdev_phys needs updating on disk.
*/
boolean_t bv_meta_dirty;
/*
* Sum of all bv_entcount[]s.
*/
@ -133,65 +163,27 @@ typedef struct brt_vdev {
*/
uint64_t bv_savedspace;
/*
* brt_vdev_phys needs updating on disk.
* Entries to sync.
*/
boolean_t bv_meta_dirty;
/*
* bv_entcount[] needs updating on disk.
*/
boolean_t bv_entcount_dirty;
/*
* bv_entcount[] potentially can be a bit too big to sychronize it all
* when we just changed few entcounts. The fields below allow us to
* track updates to bv_entcount[] array since the last sync.
* A single bit in the bv_bitmap represents as many entcounts as can
* fit into a single BRT_BLOCKSIZE.
* For example we have 65536 entcounts in the bv_entcount array
* (so the whole array is 128kB). We updated bv_entcount[2] and
* bv_entcount[5]. In that case only first bit in the bv_bitmap will
* be set and we will write only first BRT_BLOCKSIZE out of 128kB.
*/
ulong_t *bv_bitmap;
uint64_t bv_nblocks;
} brt_vdev_t;
avl_tree_t bv_tree;
};
/*
* In-core brt
*/
typedef struct brt {
krwlock_t brt_lock;
spa_t *brt_spa;
#define brt_mos brt_spa->spa_meta_objset
uint64_t brt_rangesize;
uint64_t brt_usedspace;
uint64_t brt_savedspace;
avl_tree_t brt_pending_tree[TXG_SIZE];
kmutex_t brt_pending_lock[TXG_SIZE];
/* Sum of all entries across all bv_trees. */
uint64_t brt_nentries;
brt_vdev_t *brt_vdevs;
uint64_t brt_nvdevs;
} brt_t;
/* Size of bre_offset / sizeof (uint64_t). */
/* Size of offset / sizeof (uint64_t). */
#define BRT_KEY_WORDS (1)
#define BRE_OFFSET(bre) (DVA_GET_OFFSET(&(bre)->bre_bp.blk_dva[0]))
/*
* In-core brt entry.
* On-disk we use bre_offset as the key and bre_refcount as the value.
* On-disk we use ZAP with offset as the key and count as the value.
*/
typedef struct brt_entry {
uint64_t bre_offset;
uint64_t bre_refcount;
avl_node_t bre_node;
blkptr_t bre_bp;
uint64_t bre_count;
uint64_t bre_pcount;
} brt_entry_t;
typedef struct brt_pending_entry {
blkptr_t bpe_bp;
int bpe_count;
avl_node_t bpe_node;
} brt_pending_entry_t;
#ifdef __cplusplus
}
#endif

View File

@ -171,7 +171,6 @@ typedef struct dbuf_dirty_record {
* gets COW'd in a subsequent transaction group.
*/
arc_buf_t *dr_data;
blkptr_t dr_overridden_by;
override_states_t dr_override_state;
uint8_t dr_copies;
boolean_t dr_nopwrite;
@ -179,14 +178,21 @@ typedef struct dbuf_dirty_record {
boolean_t dr_diowrite;
boolean_t dr_has_raw_params;
/*
* If dr_has_raw_params is set, the following crypt
* params will be set on the BP that's written.
*/
boolean_t dr_byteorder;
uint8_t dr_salt[ZIO_DATA_SALT_LEN];
uint8_t dr_iv[ZIO_DATA_IV_LEN];
uint8_t dr_mac[ZIO_DATA_MAC_LEN];
/* Override and raw params are mutually exclusive. */
union {
blkptr_t dr_overridden_by;
struct {
/*
* If dr_has_raw_params is set, the
* following crypt params will be set
* on the BP that's written.
*/
boolean_t dr_byteorder;
uint8_t dr_salt[ZIO_DATA_SALT_LEN];
uint8_t dr_iv[ZIO_DATA_IV_LEN];
uint8_t dr_mac[ZIO_DATA_MAC_LEN];
};
};
} dl;
struct dirty_lightweight_leaf {
/*
@ -264,6 +270,27 @@ typedef struct dmu_buf_impl {
*/
uint8_t db_level;
/* This block was freed while a read or write was active. */
uint8_t db_freed_in_flight;
/*
* Evict user data as soon as the dirty and reference counts are equal.
*/
uint8_t db_user_immediate_evict;
/*
* dnode_evict_dbufs() or dnode_evict_bonus() tried to evict this dbuf,
* but couldn't due to outstanding references. Evict once the refcount
* drops to 0.
*/
uint8_t db_pending_evict;
/* Number of TXGs in which this buffer is dirty. */
uint8_t db_dirtycnt;
/* The buffer was partially read. More reads may follow. */
uint8_t db_partial_read;
/*
* Protects db_buf's contents if they contain an indirect block or data
* block of the meta-dnode. We use this lock to protect the structure of
@ -288,6 +315,9 @@ typedef struct dmu_buf_impl {
*/
dbuf_states_t db_state;
/* In which dbuf cache this dbuf is, if any. */
dbuf_cached_state_t db_caching_status;
/*
* Refcount accessed by dmu_buf_{hold,rele}.
* If nonzero, the buffer can't be destroyed.
@ -304,39 +334,10 @@ typedef struct dmu_buf_impl {
/* Link in dbuf_cache or dbuf_metadata_cache */
multilist_node_t db_cache_link;
/* Tells us which dbuf cache this dbuf is in, if any */
dbuf_cached_state_t db_caching_status;
uint64_t db_hash;
/* Data which is unique to data (leaf) blocks: */
/* User callback information. */
dmu_buf_user_t *db_user;
/*
* Evict user data as soon as the dirty and reference
* counts are equal.
*/
uint8_t db_user_immediate_evict;
/*
* This block was freed while a read or write was
* active.
*/
uint8_t db_freed_in_flight;
/*
* dnode_evict_dbufs() or dnode_evict_bonus() tried to
* evict this dbuf, but couldn't due to outstanding
* references. Evict once the refcount drops to 0.
*/
uint8_t db_pending_evict;
uint8_t db_dirtycnt;
/* The buffer was partially read. More reads may follow. */
uint8_t db_partial_read;
} dmu_buf_impl_t;
#define DBUF_HASH_MUTEX(h, idx) \
@ -351,6 +352,8 @@ typedef struct dbuf_hash_table {
typedef void (*dbuf_prefetch_fn)(void *, uint64_t, uint64_t, boolean_t);
extern kmem_cache_t *dbuf_dirty_kmem_cache;
uint64_t dbuf_whichblock(const struct dnode *di, const int64_t level,
const uint64_t offset);

View File

@ -381,6 +381,7 @@ typedef struct dmu_buf {
#define DMU_POOL_CREATION_VERSION "creation_version"
#define DMU_POOL_SCAN "scan"
#define DMU_POOL_ERRORSCRUB "error_scrub"
#define DMU_POOL_LAST_SCRUBBED_TXG "last_scrubbed_txg"
#define DMU_POOL_FREE_BPOBJ "free_bpobj"
#define DMU_POOL_BPTREE_OBJ "bptree_obj"
#define DMU_POOL_EMPTY_BPOBJ "empty_bpobj"

View File

@ -89,7 +89,7 @@ extern int zfs_livelist_min_percent_shared;
typedef int deadlist_iter_t(void *args, dsl_deadlist_entry_t *dle);
void dsl_deadlist_open(dsl_deadlist_t *dl, objset_t *os, uint64_t object);
int dsl_deadlist_open(dsl_deadlist_t *dl, objset_t *os, uint64_t object);
void dsl_deadlist_close(dsl_deadlist_t *dl);
void dsl_deadlist_iterate(dsl_deadlist_t *dl, deadlist_iter_t func, void *arg);
uint64_t dsl_deadlist_alloc(objset_t *os, dmu_tx_t *tx);

View File

@ -198,7 +198,7 @@ void dsl_dir_set_reservation_sync_impl(dsl_dir_t *dd, uint64_t value,
dmu_tx_t *tx);
void dsl_dir_zapify(dsl_dir_t *dd, dmu_tx_t *tx);
boolean_t dsl_dir_is_zapified(dsl_dir_t *dd);
void dsl_dir_livelist_open(dsl_dir_t *dd, uint64_t obj);
int dsl_dir_livelist_open(dsl_dir_t *dd, uint64_t obj);
void dsl_dir_livelist_close(dsl_dir_t *dd);
void dsl_dir_remove_livelist(dsl_dir_t *dd, dmu_tx_t *tx, boolean_t total);
int dsl_dir_wait(dsl_dir_t *dd, dsl_dataset_t *ds, zfs_wait_activity_t activity,

View File

@ -179,6 +179,12 @@ typedef struct dsl_scan {
dsl_errorscrub_phys_t errorscrub_phys;
} dsl_scan_t;
typedef struct {
pool_scan_func_t func;
uint64_t txgstart;
uint64_t txgend;
} setup_sync_arg_t;
typedef struct dsl_scan_io_queue dsl_scan_io_queue_t;
void scan_init(void);
@ -189,7 +195,8 @@ void dsl_scan_setup_sync(void *, dmu_tx_t *);
void dsl_scan_fini(struct dsl_pool *dp);
void dsl_scan_sync(struct dsl_pool *, dmu_tx_t *);
int dsl_scan_cancel(struct dsl_pool *);
int dsl_scan(struct dsl_pool *, pool_scan_func_t);
int dsl_scan(struct dsl_pool *, pool_scan_func_t, uint64_t starttxg,
uint64_t txgend);
void dsl_scan_assess_vdev(struct dsl_pool *dp, vdev_t *vd);
boolean_t dsl_scan_scrubbing(const struct dsl_pool *dp);
boolean_t dsl_errorscrubbing(const struct dsl_pool *dp);

View File

@ -42,7 +42,8 @@ extern "C" {
#define FM_EREPORT_ZFS_DATA "data"
#define FM_EREPORT_ZFS_DELAY "delay"
#define FM_EREPORT_ZFS_DEADMAN "deadman"
#define FM_EREPORT_ZFS_DIO_VERIFY "dio_verify"
#define FM_EREPORT_ZFS_DIO_VERIFY_WR "dio_verify_wr"
#define FM_EREPORT_ZFS_DIO_VERIFY_RD "dio_verify_rd"
#define FM_EREPORT_ZFS_POOL "zpool"
#define FM_EREPORT_ZFS_DEVICE_UNKNOWN "vdev.unknown"
#define FM_EREPORT_ZFS_DEVICE_OPEN_FAILED "vdev.open_failed"

View File

@ -265,6 +265,7 @@ typedef enum {
ZPOOL_PROP_DEDUP_TABLE_SIZE,
ZPOOL_PROP_DEDUP_TABLE_QUOTA,
ZPOOL_PROP_DEDUPCACHED,
ZPOOL_PROP_LAST_SCRUBBED_TXG,
ZPOOL_NUM_PROPS
} zpool_prop_t;
@ -1088,6 +1089,7 @@ typedef enum pool_scan_func {
typedef enum pool_scrub_cmd {
POOL_SCRUB_NORMAL = 0,
POOL_SCRUB_PAUSE,
POOL_SCRUB_FROM_LAST_TXG,
POOL_SCRUB_FLAGS_END
} pool_scrub_cmd_t;

View File

@ -53,6 +53,7 @@ extern "C" {
/*
* Forward references that lots of things need.
*/
typedef struct brt_vdev brt_vdev_t;
typedef struct spa spa_t;
typedef struct vdev vdev_t;
typedef struct metaslab metaslab_t;
@ -821,6 +822,8 @@ extern void spa_l2cache_drop(spa_t *spa);
/* scanning */
extern int spa_scan(spa_t *spa, pool_scan_func_t func);
extern int spa_scan_range(spa_t *spa, pool_scan_func_t func, uint64_t txgstart,
uint64_t txgend);
extern int spa_scan_stop(spa_t *spa);
extern int spa_scrub_pause_resume(spa_t *spa, pool_scrub_cmd_t flag);
@ -1079,6 +1082,7 @@ extern uint64_t spa_get_deadman_failmode(spa_t *spa);
extern void spa_set_deadman_failmode(spa_t *spa, const char *failmode);
extern boolean_t spa_suspended(spa_t *spa);
extern uint64_t spa_bootfs(spa_t *spa);
extern uint64_t spa_get_last_scrubbed_txg(spa_t *spa);
extern uint64_t spa_delegation(spa_t *spa);
extern objset_t *spa_meta_objset(spa_t *spa);
extern space_map_t *spa_syncing_log_sm(spa_t *spa);

View File

@ -318,6 +318,7 @@ struct spa {
uint64_t spa_scan_pass_scrub_spent_paused; /* total paused */
uint64_t spa_scan_pass_exam; /* examined bytes per pass */
uint64_t spa_scan_pass_issued; /* issued bytes per pass */
uint64_t spa_scrubbed_last_txg; /* last txg scrubbed */
/* error scrub pause time in milliseconds */
uint64_t spa_scan_pass_errorscrub_pause;
@ -412,8 +413,12 @@ struct spa {
uint64_t spa_dedup_dspace; /* Cache get_dedup_dspace() */
uint64_t spa_dedup_checksum; /* default dedup checksum */
uint64_t spa_dspace; /* dspace in normal class */
uint64_t spa_rdspace; /* raw (non-dedup) --//-- */
boolean_t spa_active_ddt_prune; /* ddt prune process active */
struct brt *spa_brt; /* in-core BRT */
brt_vdev_t **spa_brt_vdevs; /* array of per-vdev BRTs */
uint64_t spa_brt_nvdevs; /* number of vdevs in BRT */
uint64_t spa_brt_rangesize; /* pool's BRT range size */
krwlock_t spa_brt_lock; /* Protects brt_vdevs/nvdevs */
kmutex_t spa_vdev_top_lock; /* dueling offline/remove */
kmutex_t spa_proc_lock; /* protects spa_proc* */
kcondvar_t spa_proc_cv; /* spa_proc_state transitions */

View File

@ -57,7 +57,7 @@ void vdev_raidz_reconstruct(struct raidz_map *, const int *, int);
void vdev_raidz_child_done(zio_t *);
void vdev_raidz_io_done(zio_t *);
void vdev_raidz_checksum_error(zio_t *, struct raidz_col *, abd_t *);
struct raidz_row *vdev_raidz_row_alloc(int);
struct raidz_row *vdev_raidz_row_alloc(int, zio_t *);
void vdev_raidz_reflow_copy_scratch(spa_t *);
void raidz_dtl_reassessed(vdev_t *);

View File

@ -223,11 +223,15 @@ int zap_lookup_norm(objset_t *ds, uint64_t zapobj, const char *name,
boolean_t *normalization_conflictp);
int zap_lookup_uint64(objset_t *os, uint64_t zapobj, const uint64_t *key,
int key_numints, uint64_t integer_size, uint64_t num_integers, void *buf);
int zap_lookup_uint64_by_dnode(dnode_t *dn, const uint64_t *key,
int key_numints, uint64_t integer_size, uint64_t num_integers, void *buf);
int zap_contains(objset_t *ds, uint64_t zapobj, const char *name);
int zap_prefetch(objset_t *os, uint64_t zapobj, const char *name);
int zap_prefetch_object(objset_t *os, uint64_t zapobj);
int zap_prefetch_uint64(objset_t *os, uint64_t zapobj, const uint64_t *key,
int key_numints);
int zap_prefetch_uint64_by_dnode(dnode_t *dn, const uint64_t *key,
int key_numints);
int zap_lookup_by_dnode(dnode_t *dn, const char *name,
uint64_t integer_size, uint64_t num_integers, void *buf);
@ -236,9 +240,6 @@ int zap_lookup_norm_by_dnode(dnode_t *dn, const char *name,
matchtype_t mt, char *realname, int rn_len,
boolean_t *ncp);
int zap_count_write_by_dnode(dnode_t *dn, const char *name,
int add, zfs_refcount_t *towrite, zfs_refcount_t *tooverwrite);
/*
* Create an attribute with the given name and value.
*

View File

@ -208,25 +208,25 @@ typedef uint64_t zio_flag_t;
#define ZIO_FLAG_PROBE (1ULL << 16)
#define ZIO_FLAG_TRYHARD (1ULL << 17)
#define ZIO_FLAG_OPTIONAL (1ULL << 18)
#define ZIO_FLAG_DIO_READ (1ULL << 19)
#define ZIO_FLAG_VDEV_INHERIT (ZIO_FLAG_DONT_QUEUE - 1)
/*
* Flags not inherited by any children.
*/
#define ZIO_FLAG_DONT_QUEUE (1ULL << 19) /* must be first for INHERIT */
#define ZIO_FLAG_DONT_PROPAGATE (1ULL << 20)
#define ZIO_FLAG_IO_BYPASS (1ULL << 21)
#define ZIO_FLAG_IO_REWRITE (1ULL << 22)
#define ZIO_FLAG_RAW_COMPRESS (1ULL << 23)
#define ZIO_FLAG_RAW_ENCRYPT (1ULL << 24)
#define ZIO_FLAG_GANG_CHILD (1ULL << 25)
#define ZIO_FLAG_DDT_CHILD (1ULL << 26)
#define ZIO_FLAG_GODFATHER (1ULL << 27)
#define ZIO_FLAG_NOPWRITE (1ULL << 28)
#define ZIO_FLAG_REEXECUTED (1ULL << 29)
#define ZIO_FLAG_DELEGATED (1ULL << 30)
#define ZIO_FLAG_DIO_CHKSUM_ERR (1ULL << 31)
#define ZIO_FLAG_DONT_QUEUE (1ULL << 20) /* must be first for INHERIT */
#define ZIO_FLAG_DONT_PROPAGATE (1ULL << 21)
#define ZIO_FLAG_IO_BYPASS (1ULL << 22)
#define ZIO_FLAG_IO_REWRITE (1ULL << 23)
#define ZIO_FLAG_RAW_COMPRESS (1ULL << 24)
#define ZIO_FLAG_RAW_ENCRYPT (1ULL << 25)
#define ZIO_FLAG_GANG_CHILD (1ULL << 26)
#define ZIO_FLAG_DDT_CHILD (1ULL << 27)
#define ZIO_FLAG_GODFATHER (1ULL << 28)
#define ZIO_FLAG_NOPWRITE (1ULL << 29)
#define ZIO_FLAG_REEXECUTED (1ULL << 30)
#define ZIO_FLAG_DELEGATED (1ULL << 31)
#define ZIO_FLAG_DIO_CHKSUM_ERR (1ULL << 32)
#define ZIO_ALLOCATOR_NONE (-1)
#define ZIO_HAS_ALLOCATOR(zio) ((zio)->io_allocator != ZIO_ALLOCATOR_NONE)
@ -647,6 +647,7 @@ extern void zio_vdev_io_redone(zio_t *zio);
extern void zio_change_priority(zio_t *pio, zio_priority_t priority);
extern void zio_checksum_verified(zio_t *zio);
extern void zio_dio_chksum_verify_error_report(zio_t *zio);
extern int zio_worst_error(int e1, int e2);
extern enum zio_checksum zio_checksum_select(enum zio_checksum child,

View File

@ -88,6 +88,11 @@ int zvol_get_data(void *arg, uint64_t arg2, lr_write_t *lr, char *buf,
int zvol_init_impl(void);
void zvol_fini_impl(void);
void zvol_wait_close(zvol_state_t *zv);
int zvol_clone_range(zvol_state_handle_t *, uint64_t,
zvol_state_handle_t *, uint64_t, uint64_t);
void zvol_log_clone_range(zilog_t *zilog, dmu_tx_t *tx, int txtype,
uint64_t off, uint64_t len, uint64_t blksz, const blkptr_t *bps,
size_t nbps);
/*
* platform dependent functions exported to platform independent code

View File

@ -35,7 +35,6 @@
(void) __atomic_add_fetch(target, 1, __ATOMIC_SEQ_CST); \
}
/* BEGIN CSTYLED */
ATOMIC_INC(8, uint8_t)
ATOMIC_INC(16, uint16_t)
ATOMIC_INC(32, uint32_t)
@ -44,7 +43,6 @@ ATOMIC_INC(uchar, uchar_t)
ATOMIC_INC(ushort, ushort_t)
ATOMIC_INC(uint, uint_t)
ATOMIC_INC(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_DEC(name, type) \
@ -53,7 +51,6 @@ ATOMIC_INC(ulong, ulong_t)
(void) __atomic_sub_fetch(target, 1, __ATOMIC_SEQ_CST); \
}
/* BEGIN CSTYLED */
ATOMIC_DEC(8, uint8_t)
ATOMIC_DEC(16, uint16_t)
ATOMIC_DEC(32, uint32_t)
@ -62,7 +59,6 @@ ATOMIC_DEC(uchar, uchar_t)
ATOMIC_DEC(ushort, ushort_t)
ATOMIC_DEC(uint, uint_t)
ATOMIC_DEC(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_ADD(name, type1, type2) \
@ -77,7 +73,6 @@ atomic_add_ptr(volatile void *target, ssize_t bits)
(void) __atomic_add_fetch((void **)target, bits, __ATOMIC_SEQ_CST);
}
/* BEGIN CSTYLED */
ATOMIC_ADD(8, uint8_t, int8_t)
ATOMIC_ADD(16, uint16_t, int16_t)
ATOMIC_ADD(32, uint32_t, int32_t)
@ -86,7 +81,6 @@ ATOMIC_ADD(char, uchar_t, signed char)
ATOMIC_ADD(short, ushort_t, short)
ATOMIC_ADD(int, uint_t, int)
ATOMIC_ADD(long, ulong_t, long)
/* END CSTYLED */
#define ATOMIC_SUB(name, type1, type2) \
@ -101,7 +95,6 @@ atomic_sub_ptr(volatile void *target, ssize_t bits)
(void) __atomic_sub_fetch((void **)target, bits, __ATOMIC_SEQ_CST);
}
/* BEGIN CSTYLED */
ATOMIC_SUB(8, uint8_t, int8_t)
ATOMIC_SUB(16, uint16_t, int16_t)
ATOMIC_SUB(32, uint32_t, int32_t)
@ -110,7 +103,6 @@ ATOMIC_SUB(char, uchar_t, signed char)
ATOMIC_SUB(short, ushort_t, short)
ATOMIC_SUB(int, uint_t, int)
ATOMIC_SUB(long, ulong_t, long)
/* END CSTYLED */
#define ATOMIC_OR(name, type) \
@ -119,7 +111,6 @@ ATOMIC_SUB(long, ulong_t, long)
(void) __atomic_or_fetch(target, bits, __ATOMIC_SEQ_CST); \
}
/* BEGIN CSTYLED */
ATOMIC_OR(8, uint8_t)
ATOMIC_OR(16, uint16_t)
ATOMIC_OR(32, uint32_t)
@ -128,7 +119,6 @@ ATOMIC_OR(uchar, uchar_t)
ATOMIC_OR(ushort, ushort_t)
ATOMIC_OR(uint, uint_t)
ATOMIC_OR(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_AND(name, type) \
@ -137,7 +127,6 @@ ATOMIC_OR(ulong, ulong_t)
(void) __atomic_and_fetch(target, bits, __ATOMIC_SEQ_CST); \
}
/* BEGIN CSTYLED */
ATOMIC_AND(8, uint8_t)
ATOMIC_AND(16, uint16_t)
ATOMIC_AND(32, uint32_t)
@ -146,7 +135,6 @@ ATOMIC_AND(uchar, uchar_t)
ATOMIC_AND(ushort, ushort_t)
ATOMIC_AND(uint, uint_t)
ATOMIC_AND(ulong, ulong_t)
/* END CSTYLED */
/*
@ -159,7 +147,6 @@ ATOMIC_AND(ulong, ulong_t)
return (__atomic_add_fetch(target, 1, __ATOMIC_SEQ_CST)); \
}
/* BEGIN CSTYLED */
ATOMIC_INC_NV(8, uint8_t)
ATOMIC_INC_NV(16, uint16_t)
ATOMIC_INC_NV(32, uint32_t)
@ -168,7 +155,6 @@ ATOMIC_INC_NV(uchar, uchar_t)
ATOMIC_INC_NV(ushort, ushort_t)
ATOMIC_INC_NV(uint, uint_t)
ATOMIC_INC_NV(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_DEC_NV(name, type) \
@ -177,7 +163,6 @@ ATOMIC_INC_NV(ulong, ulong_t)
return (__atomic_sub_fetch(target, 1, __ATOMIC_SEQ_CST)); \
}
/* BEGIN CSTYLED */
ATOMIC_DEC_NV(8, uint8_t)
ATOMIC_DEC_NV(16, uint16_t)
ATOMIC_DEC_NV(32, uint32_t)
@ -186,7 +171,6 @@ ATOMIC_DEC_NV(uchar, uchar_t)
ATOMIC_DEC_NV(ushort, ushort_t)
ATOMIC_DEC_NV(uint, uint_t)
ATOMIC_DEC_NV(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_ADD_NV(name, type1, type2) \
@ -201,7 +185,6 @@ atomic_add_ptr_nv(volatile void *target, ssize_t bits)
return (__atomic_add_fetch((void **)target, bits, __ATOMIC_SEQ_CST));
}
/* BEGIN CSTYLED */
ATOMIC_ADD_NV(8, uint8_t, int8_t)
ATOMIC_ADD_NV(16, uint16_t, int16_t)
ATOMIC_ADD_NV(32, uint32_t, int32_t)
@ -210,7 +193,6 @@ ATOMIC_ADD_NV(char, uchar_t, signed char)
ATOMIC_ADD_NV(short, ushort_t, short)
ATOMIC_ADD_NV(int, uint_t, int)
ATOMIC_ADD_NV(long, ulong_t, long)
/* END CSTYLED */
#define ATOMIC_SUB_NV(name, type1, type2) \
@ -225,7 +207,6 @@ atomic_sub_ptr_nv(volatile void *target, ssize_t bits)
return (__atomic_sub_fetch((void **)target, bits, __ATOMIC_SEQ_CST));
}
/* BEGIN CSTYLED */
ATOMIC_SUB_NV(8, uint8_t, int8_t)
ATOMIC_SUB_NV(char, uchar_t, signed char)
ATOMIC_SUB_NV(16, uint16_t, int16_t)
@ -234,7 +215,6 @@ ATOMIC_SUB_NV(32, uint32_t, int32_t)
ATOMIC_SUB_NV(int, uint_t, int)
ATOMIC_SUB_NV(long, ulong_t, long)
ATOMIC_SUB_NV(64, uint64_t, int64_t)
/* END CSTYLED */
#define ATOMIC_OR_NV(name, type) \
@ -243,7 +223,6 @@ ATOMIC_SUB_NV(64, uint64_t, int64_t)
return (__atomic_or_fetch(target, bits, __ATOMIC_SEQ_CST)); \
}
/* BEGIN CSTYLED */
ATOMIC_OR_NV(8, uint8_t)
ATOMIC_OR_NV(16, uint16_t)
ATOMIC_OR_NV(32, uint32_t)
@ -252,7 +231,6 @@ ATOMIC_OR_NV(uchar, uchar_t)
ATOMIC_OR_NV(ushort, ushort_t)
ATOMIC_OR_NV(uint, uint_t)
ATOMIC_OR_NV(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_AND_NV(name, type) \
@ -261,7 +239,6 @@ ATOMIC_OR_NV(ulong, ulong_t)
return (__atomic_and_fetch(target, bits, __ATOMIC_SEQ_CST)); \
}
/* BEGIN CSTYLED */
ATOMIC_AND_NV(8, uint8_t)
ATOMIC_AND_NV(16, uint16_t)
ATOMIC_AND_NV(32, uint32_t)
@ -270,7 +247,6 @@ ATOMIC_AND_NV(uchar, uchar_t)
ATOMIC_AND_NV(ushort, ushort_t)
ATOMIC_AND_NV(uint, uint_t)
ATOMIC_AND_NV(ulong, ulong_t)
/* END CSTYLED */
/*
@ -300,7 +276,6 @@ atomic_cas_ptr(volatile void *target, void *exp, void *des)
return (exp);
}
/* BEGIN CSTYLED */
ATOMIC_CAS(8, uint8_t)
ATOMIC_CAS(16, uint16_t)
ATOMIC_CAS(32, uint32_t)
@ -309,7 +284,6 @@ ATOMIC_CAS(uchar, uchar_t)
ATOMIC_CAS(ushort, ushort_t)
ATOMIC_CAS(uint, uint_t)
ATOMIC_CAS(ulong, ulong_t)
/* END CSTYLED */
/*
@ -322,7 +296,6 @@ ATOMIC_CAS(ulong, ulong_t)
return (__atomic_exchange_n(target, bits, __ATOMIC_SEQ_CST)); \
}
/* BEGIN CSTYLED */
ATOMIC_SWAP(8, uint8_t)
ATOMIC_SWAP(16, uint16_t)
ATOMIC_SWAP(32, uint32_t)
@ -331,7 +304,6 @@ ATOMIC_SWAP(uchar, uchar_t)
ATOMIC_SWAP(ushort, ushort_t)
ATOMIC_SWAP(uint, uint_t)
ATOMIC_SWAP(ulong, ulong_t)
/* END CSTYLED */
void *
atomic_swap_ptr(volatile void *target, void *bits)

View File

@ -25,19 +25,32 @@
#include <sys/backtrace.h>
#include <sys/types.h>
#include <sys/debug.h>
#include <unistd.h>
/*
* libspl_backtrace() must be safe to call from inside a signal hander. This
* mostly means it must not allocate, and so we can't use things like printf.
* Output helpers. libspl_backtrace() must not block, must be thread-safe and
* must be safe to call from a signal handler. At least, that means not having
* printf, so we end up having to call write() directly on the fd. That's
* awkward, as we always have to pass through a length, and some systems will
* complain if we don't consume the return. So we have some macros to make
* things a little more palatable.
*/
#define spl_bt_write_n(fd, s, n) \
do { ssize_t r __maybe_unused = write(fd, s, n); } while (0)
#define spl_bt_write(fd, s) spl_bt_write_n(fd, s, sizeof (s)-1)
#if defined(HAVE_LIBUNWIND)
#define UNW_LOCAL_ONLY
#include <libunwind.h>
/*
* Convert `v` to ASCII hex characters. The bottom `n` nybbles (4-bits ie one
* hex digit) will be written, up to `buflen`. The buffer will not be
* null-terminated. Returns the number of digits written.
*/
static size_t
libspl_u64_to_hex_str(uint64_t v, size_t digits, char *buf, size_t buflen)
spl_bt_u64_to_hex_str(uint64_t v, size_t n, char *buf, size_t buflen)
{
static const char hexdigits[] = {
'0', '1', '2', '3', '4', '5', '6', '7',
@ -45,10 +58,10 @@ libspl_u64_to_hex_str(uint64_t v, size_t digits, char *buf, size_t buflen)
};
size_t pos = 0;
boolean_t want = (digits == 0);
boolean_t want = (n == 0);
for (int i = 15; i >= 0; i--) {
const uint64_t d = v >> (i * 4) & 0xf;
if (!want && (d != 0 || digits > i))
if (!want && (d != 0 || n > i))
want = B_TRUE;
if (want) {
buf[pos++] = hexdigits[d];
@ -62,40 +75,181 @@ libspl_u64_to_hex_str(uint64_t v, size_t digits, char *buf, size_t buflen)
void
libspl_backtrace(int fd)
{
ssize_t ret __attribute__((unused));
unw_context_t uc;
unw_cursor_t cp;
unw_word_t loc;
unw_word_t v;
char buf[128];
size_t n;
int err;
ret = write(fd, "Call trace:\n", 12);
/* Snapshot the current frame and state. */
unw_getcontext(&uc);
/*
* TODO: walk back to the frame that tripped the assertion / the place
* where the signal was recieved.
*/
/*
* Register dump. We're going to loop over all the registers in the
* top frame, and show them, with names, in a nice three-column
* layout, which keeps us within 80 columns.
*/
spl_bt_write(fd, "Registers:\n");
/* Initialise a frame cursor, starting at the current frame */
unw_init_local(&cp, &uc);
while (unw_step(&cp) > 0) {
unw_get_reg(&cp, UNW_REG_IP, &loc);
ret = write(fd, " [0x", 5);
n = libspl_u64_to_hex_str(loc, 10, buf, sizeof (buf));
ret = write(fd, buf, n);
ret = write(fd, "] ", 2);
unw_get_proc_name(&cp, buf, sizeof (buf), &loc);
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {}
ret = write(fd, buf, n);
ret = write(fd, "+0x", 3);
n = libspl_u64_to_hex_str(loc, 2, buf, sizeof (buf));
ret = write(fd, buf, n);
#ifdef HAVE_LIBUNWIND_ELF
ret = write(fd, " (in ", 5);
unw_get_elf_filename(&cp, buf, sizeof (buf), &loc);
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {}
ret = write(fd, buf, n);
ret = write(fd, " +0x", 4);
n = libspl_u64_to_hex_str(loc, 2, buf, sizeof (buf));
ret = write(fd, buf, n);
ret = write(fd, ")", 1);
#endif
ret = write(fd, "\n", 1);
/*
* libunwind's list of possible registers for this architecture is an
* enum, unw_regnum_t. UNW_TDEP_LAST_REG is the highest-numbered
* register in that list, however, not all register numbers in this
* range are defined by the architecture, and not all defined registers
* will be present on every implementation of that architecture.
* Moreover, libunwind provides nice names for most, but not all
* registers, but these are hardcoded; a name being available does not
* mean that register is available.
*
* So, we have to pull this all together here. We try to get the value
* of every possible register. If we get a value for it, then the
* register must exist, and so we get its name. If libunwind has no
* name for it, we synthesize something. These cases should be rare,
* and they're usually for uninteresting or niche registers, so it
* shouldn't really matter. We can see the value, and that's the main
* thing.
*/
uint_t cols = 0;
for (uint_t regnum = 0; regnum <= UNW_TDEP_LAST_REG; regnum++) {
/*
* Get the value. Any error probably means the register
* doesn't exist, and we skip it.
*/
if (unw_get_reg(&cp, regnum, &v) < 0)
continue;
/*
* Register name. If libunwind doesn't have a name for it,
* it will return "???". As a shortcut, we just treat '?'
* is an alternate end-of-string character.
*/
const char *name = unw_regname(regnum);
for (n = 0; name[n] != '\0' && name[n] != '?'; n++) {}
if (n == 0) {
/*
* No valid name, so make one of the form "?xx", where
* "xx" is the two-char hex of libunwind's register
* number.
*/
buf[0] = '?';
n = spl_bt_u64_to_hex_str(regnum, 2,
&buf[1], sizeof (buf)-1) + 1;
name = buf;
}
/*
* Two spaces of padding before each column, plus extra
* spaces to align register names shorter than three chars.
*/
spl_bt_write_n(fd, " ", 5-MIN(n, 3));
/* Register name and column punctuation */
spl_bt_write_n(fd, name, n);
spl_bt_write(fd, ": 0x");
/*
* Convert register value (from unw_get_reg()) to hex. We're
* assuming that all registers are 64-bits wide, which is
* probably fine for any general-purpose registers on any
* machine currently in use. A more generic way would be to
* look at the width of unw_word_t, but that would also
* complicate the column code a bit. This is fine.
*/
n = spl_bt_u64_to_hex_str(v, 16, buf, sizeof (buf));
spl_bt_write_n(fd, buf, n);
/* Every third column, emit a newline */
if (!(++cols % 3))
spl_bt_write(fd, "\n");
}
/* If we finished before the third column, emit a newline. */
if (cols % 3)
spl_bt_write(fd, "\n");
/* Now the main event, the backtrace. */
spl_bt_write(fd, "Call trace:\n");
/* Reset the cursor to the top again. */
unw_init_local(&cp, &uc);
do {
/*
* Getting the IP should never fail; libunwind handles it
* specially, because its used a lot internally. Still, no
* point being silly about it, as the last thing we want is
* our crash handler to crash. So if it ever does fail, we'll
* show an error line, but keep going to the next frame.
*/
if (unw_get_reg(&cp, UNW_REG_IP, &v) < 0) {
spl_bt_write(fd, " [couldn't get IP register; "
"corrupt frame?]");
continue;
}
/* IP & punctuation */
n = spl_bt_u64_to_hex_str(v, 16, buf, sizeof (buf));
spl_bt_write(fd, " [0x");
spl_bt_write_n(fd, buf, n);
spl_bt_write(fd, "] ");
/*
* Function ("procedure") name for the current frame. `v`
* receives the offset from the named function to the IP, which
* we show as a "+offset" suffix.
*
* If libunwind can't determine the name, we just show "???"
* instead. We've already displayed the IP above; that will
* have to do.
*
* unw_get_proc_name() will return ENOMEM if the buffer is too
* small, instead truncating the name. So we treat that as a
* success and use whatever is in the buffer.
*/
err = unw_get_proc_name(&cp, buf, sizeof (buf), &v);
if (err == 0 || err == -UNW_ENOMEM) {
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {}
spl_bt_write_n(fd, buf, n);
/* Offset from proc name */
spl_bt_write(fd, "+0x");
n = spl_bt_u64_to_hex_str(v, 2, buf, sizeof (buf));
spl_bt_write_n(fd, buf, n);
} else
spl_bt_write(fd, "???");
#ifdef HAVE_LIBUNWIND_ELF
/*
* Newer libunwind has unw_get_elf_filename(), which gets
* the name of the ELF object that the frame was executing in.
* Like `unw_get_proc_name()`, `v` recieves the offset within
* the file, and UNW_ENOMEM indicates that a truncate filename
* was left in the buffer.
*/
err = unw_get_elf_filename(&cp, buf, sizeof (buf), &v);
if (err == 0 || err == -UNW_ENOMEM) {
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {}
spl_bt_write(fd, " (in ");
spl_bt_write_n(fd, buf, n);
/* Offset within file */
spl_bt_write(fd, " +0x");
n = spl_bt_u64_to_hex_str(v, 2, buf, sizeof (buf));
spl_bt_write_n(fd, buf, n);
spl_bt_write(fd, ")");
}
#endif
spl_bt_write(fd, "\n");
} while (unw_step(&cp) > 0);
}
#elif defined(HAVE_BACKTRACE)
#include <execinfo.h>
@ -103,15 +257,12 @@ libspl_backtrace(int fd)
void
libspl_backtrace(int fd)
{
ssize_t ret __attribute__((unused));
void *btptrs[64];
size_t nptrs = backtrace(btptrs, 64);
ret = write(fd, "Call trace:\n", 12);
spl_bt_write(fd, "Call trace:\n");
backtrace_symbols_fd(btptrs, nptrs, fd);
}
#else
#include <sys/debug.h>
void
libspl_backtrace(int fd __maybe_unused)
{

View File

@ -57,8 +57,7 @@ typedef enum zfs_uio_rw {
} zfs_uio_rw_t;
typedef enum zfs_uio_seg {
UIO_USERSPACE = 0,
UIO_SYSSPACE = 1,
UIO_SYSSPACE = 0,
} zfs_uio_seg_t;
#elif defined(__FreeBSD__)
@ -92,20 +91,20 @@ zfs_dio_page_aligned(void *buf)
static inline boolean_t
zfs_dio_offset_aligned(uint64_t offset, uint64_t blksz)
{
return (IS_P2ALIGNED(offset, blksz));
return ((IS_P2ALIGNED(offset, blksz)) ? B_TRUE : B_FALSE);
}
static inline boolean_t
zfs_dio_size_aligned(uint64_t size, uint64_t blksz)
{
return ((size % blksz) == 0);
return (((size % blksz) == 0) ? B_TRUE : B_FALSE);
}
static inline boolean_t
zfs_dio_aligned(uint64_t offset, uint64_t size, uint64_t blksz)
{
return (zfs_dio_offset_aligned(offset, blksz) &&
zfs_dio_size_aligned(size, blksz));
return ((zfs_dio_offset_aligned(offset, blksz) &&
zfs_dio_size_aligned(size, blksz)) ? B_TRUE : B_FALSE);
}
static inline void

View File

@ -70,7 +70,7 @@ if BUILD_FREEBSD
libzfs_la_LIBADD += -lutil -lgeom
endif
libzfs_la_LDFLAGS += -version-info 5:0:1
libzfs_la_LDFLAGS += -version-info 6:0:0
pkgconfig_DATA += %D%/libzfs.pc

View File

@ -1,4 +1,4 @@
<abi-corpus version='2.0' architecture='elf-amd-x86_64' soname='libzfs.so.4'>
<abi-corpus version='2.0' architecture='elf-amd-x86_64' soname='libzfs.so.6'>
<elf-needed>
<dependency name='libzfs_core.so.3'/>
<dependency name='libnvpair.so.3'/>
@ -3132,7 +3132,8 @@
<enumerator name='ZPOOL_PROP_DEDUP_TABLE_SIZE' value='36'/>
<enumerator name='ZPOOL_PROP_DEDUP_TABLE_QUOTA' value='37'/>
<enumerator name='ZPOOL_PROP_DEDUPCACHED' value='38'/>
<enumerator name='ZPOOL_NUM_PROPS' value='39'/>
<enumerator name='ZPOOL_PROP_LAST_SCRUBBED_TXG' value='39'/>
<enumerator name='ZPOOL_NUM_PROPS' value='40'/>
</enum-decl>
<typedef-decl name='zpool_prop_t' type-id='af1ba157' id='5d0c23fb'/>
<typedef-decl name='regoff_t' type-id='95e97e5e' id='54a2a2a8'/>
@ -5984,7 +5985,8 @@
<underlying-type type-id='9cac1fee'/>
<enumerator name='POOL_SCRUB_NORMAL' value='0'/>
<enumerator name='POOL_SCRUB_PAUSE' value='1'/>
<enumerator name='POOL_SCRUB_FLAGS_END' value='2'/>
<enumerator name='POOL_SCRUB_FROM_LAST_TXG' value='2'/>
<enumerator name='POOL_SCRUB_FLAGS_END' value='3'/>
</enum-decl>
<typedef-decl name='pool_scrub_cmd_t' type-id='a1474cbd' id='b51cf3c2'/>
<enum-decl name='zpool_errata' id='d9abbf54'>

View File

@ -563,8 +563,15 @@ change_one(zfs_handle_t *zhp, void *data)
cn = NULL;
}
if (!clp->cl_alldependents)
ret = zfs_iter_children_v2(zhp, 0, change_one, data);
if (!clp->cl_alldependents) {
if (clp->cl_prop != ZFS_PROP_MOUNTPOINT) {
ret = zfs_iter_filesystems_v2(zhp, 0,
change_one, data);
} else {
ret = zfs_iter_children_v2(zhp, 0, change_one,
data);
}
}
/*
* If we added the handle to the changelist, we will re-use it
@ -738,6 +745,11 @@ changelist_gather(zfs_handle_t *zhp, zfs_prop_t prop, int gather_flags,
changelist_free(clp);
return (NULL);
}
} else if (clp->cl_prop != ZFS_PROP_MOUNTPOINT) {
if (zfs_iter_filesystems_v2(zhp, 0, change_one, clp) != 0) {
changelist_free(clp);
return (NULL);
}
} else if (zfs_iter_children_v2(zhp, 0, change_one, clp) != 0) {
changelist_free(clp);
return (NULL);

View File

@ -471,13 +471,15 @@ int
zpool_get_userprop(zpool_handle_t *zhp, const char *propname, char *buf,
size_t len, zprop_source_t *srctype)
{
nvlist_t *nv, *nvl;
nvlist_t *nv;
uint64_t ival;
const char *value;
zprop_source_t source = ZPROP_SRC_LOCAL;
nvl = zhp->zpool_props;
if (nvlist_lookup_nvlist(nvl, propname, &nv) == 0) {
if (zhp->zpool_props == NULL)
zpool_get_all_props(zhp);
if (nvlist_lookup_nvlist(zhp->zpool_props, propname, &nv) == 0) {
if (nvlist_lookup_uint64(nv, ZPROP_SOURCE, &ival) == 0)
source = ival;
verify(nvlist_lookup_string(nv, ZPROP_VALUE, &value) == 0);
@ -2796,7 +2798,7 @@ zpool_scan(zpool_handle_t *zhp, pool_scan_func_t func, pool_scrub_cmd_t cmd)
}
/*
* With EBUSY, five cases are possible:
* With EBUSY, six cases are possible:
*
* Current state Requested
* 1. Normal Scrub Running Normal Scrub or Error Scrub
@ -5340,7 +5342,8 @@ zpool_get_vdev_prop_value(nvlist_t *nvprop, vdev_prop_t prop, char *prop_name,
strval = fnvlist_lookup_string(nv, ZPROP_VALUE);
} else {
/* user prop not found */
return (-1);
src = ZPROP_SRC_DEFAULT;
strval = "-";
}
(void) strlcpy(buf, strval, len);
if (srctype)

View File

@ -932,6 +932,7 @@ libzfs_run_process_impl(const char *path, char *argv[], char *env[], int flags,
pid = fork();
if (pid == 0) {
/* Child process */
setpgid(0, 0);
devnull_fd = open("/dev/null", O_WRONLY | O_CLOEXEC);
if (devnull_fd < 0)

View File

@ -212,7 +212,7 @@ if BUILD_FREEBSD
libzpool_la_LIBADD += -lgeom
endif
libzpool_la_LDFLAGS += -version-info 5:0:0
libzpool_la_LDFLAGS += -version-info 6:0:0
if TARGET_CPU_POWERPC
module/zfs/libzpool_la-vdev_raidz_math_powerpc_altivec.$(OBJEXT) : CFLAGS += -maltivec

View File

@ -35,9 +35,25 @@ typedef struct zfs_dbgmsg {
static list_t zfs_dbgmsgs;
static kmutex_t zfs_dbgmsgs_lock;
static uint_t zfs_dbgmsg_size = 0;
static uint_t zfs_dbgmsg_maxsize = 4<<20; /* 4MB */
int zfs_dbgmsg_enable = B_TRUE;
static void
zfs_dbgmsg_purge(uint_t max_size)
{
while (zfs_dbgmsg_size > max_size) {
zfs_dbgmsg_t *zdm = list_remove_head(&zfs_dbgmsgs);
if (zdm == NULL)
return;
uint_t size = zdm->zdm_size;
kmem_free(zdm, size);
zfs_dbgmsg_size -= size;
}
}
void
zfs_dbgmsg_init(void)
{
@ -74,6 +90,8 @@ __zfs_dbgmsg(char *buf)
mutex_enter(&zfs_dbgmsgs_lock);
list_insert_tail(&zfs_dbgmsgs, zdm);
zfs_dbgmsg_size += size;
zfs_dbgmsg_purge(zfs_dbgmsg_maxsize);
mutex_exit(&zfs_dbgmsgs_lock);
}

View File

@ -18,7 +18,7 @@
.\"
.\" Copyright (c) 2024, Klara, Inc.
.\"
.Dd October 2, 2024
.Dd November 1, 2024
.Dt ZFS 4
.Os
.
@ -436,7 +436,7 @@ write.
It can also help to identify if reported checksum errors are tied to Direct I/O
writes.
Each verify error causes a
.Sy dio_verify
.Sy dio_verify_wr
zevent.
Direct Write I/O checkum verify errors can be seen with
.Nm zpool Cm status Fl d .
@ -867,14 +867,14 @@ where that percent may exceed
This
only operates during memory pressure/reclaim.
.
.It Sy zfs_arc_shrinker_limit Ns = Ns Sy 10000 Pq int
.It Sy zfs_arc_shrinker_limit Ns = Ns Sy 0 Pq int
This is a limit on how many pages the ARC shrinker makes available for
eviction in response to one page allocation attempt.
Note that in practice, the kernel's shrinker can ask us to evict
up to about four times this for one allocation attempt.
To reduce OOM risk, this limit is applied for kswapd reclaims only.
.Pp
The default limit of
For example a value of
.Sy 10000 Pq in practice, Em 160 MiB No per allocation attempt with 4 KiB pages
limits the amount of time spent attempting to reclaim ARC memory to
less than 100 ms per allocation attempt,
@ -1333,9 +1333,10 @@ results in vector instructions
from the respective CPU instruction set being used.
.
.It Sy zfs_bclone_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
Enable the experimental block cloning feature.
Enables access to the block cloning feature.
If this setting is 0, then even if feature@block_cloning is enabled,
attempts to clone blocks will act as though the feature is disabled.
using functions and system calls that attempt to clone blocks will act as
though the feature is disabled.
.
.It Sy zfs_bclone_wait_dirty Ns = Ns Sy 0 Ns | Ns 1 Pq int
When set to 1 the FICLONE and FICLONERANGE ioctls wait for dirty data to be

View File

@ -92,6 +92,11 @@ before a generic mapping for the same slot.
In this way a custom mapping may be applied to a particular channel
and a default mapping applied to the others.
.
.It Sy zpad_slot Ar digits
Pad slot numbers with zeros to make them
.Ar digits
long, which can help to make disk names a consistent length and easier to sort.
.
.It Sy multipath Sy yes Ns | Ns Sy no
Specifies whether
.Xr vdev_id 8
@ -122,7 +127,7 @@ device is connected to.
The default is
.Sy 4 .
.
.It Sy slot Sy bay Ns | Ns Sy phy Ns | Ns Sy port Ns | Ns Sy id Ns | Ns Sy lun Ns | Ns Sy ses
.It Sy slot Sy bay Ns | Ns Sy phy Ns | Ns Sy port Ns | Ns Sy id Ns | Ns Sy lun Ns | Ns Sy bay_lun Ns | Ns Sy ses
Specifies from which element of a SAS identifier the slot number is
taken.
The default is
@ -138,6 +143,9 @@ use the SAS port as the slot number.
use the scsi id as the slot number.
.It Sy lun
use the scsi lun as the slot number.
.It Sy bay_lun
read the slot number from the bay identifier and append the lun number.
Useful for multi-lun multi-actuator hard drives.
.It Sy ses
use the SCSI Enclosure Services (SES) enclosure device slot number,
as reported by

View File

@ -28,7 +28,7 @@
.\" Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
.\" Copyright (c) 2023, Klara Inc.
.\"
.Dd July 29, 2024
.Dd November 18, 2024
.Dt ZPOOLPROPS 7
.Os
.
@ -135,6 +135,19 @@ A unique identifier for the pool.
The current health of the pool.
Health can be one of
.Sy ONLINE , DEGRADED , FAULTED , OFFLINE, REMOVED , UNAVAIL .
.It Sy last_scrubbed_txg
Indicates the transaction group (TXG) up to which the most recent scrub
operation has checked and repaired the dataset.
This provides insight into the data integrity status of their pool at
a specific point in time.
.Xr zpool-scrub 8
can utilize this property to scan only data that has changed since the last
scrub completed, when given the
.Fl C
flag.
This property is not updated when performing an error scrub with the
.Fl e
flag.
.It Sy leaked
Space not released while
.Sy freeing

View File

@ -14,7 +14,7 @@
.\" Copyright (c) 2017 Lawrence Livermore National Security, LLC.
.\" Copyright (c) 2017 Intel Corporation.
.\"
.Dd November 18, 2023
.Dd October 27, 2024
.Dt ZDB 8
.Os
.
@ -408,6 +408,8 @@ blocks cloned, the space saving as a result of cloning, and the saving ratio.
.It Fl TT
Display the per-vdev BRT statistics, including total references.
.It Fl TTT
Display histograms of per-vdev BRT refcounts.
.It Fl TTTT
Dump the contents of the block reference tables.
.It Fl u , -uberblock
Display the current uberblock.

View File

@ -71,7 +71,7 @@ The following fields are displayed:
Used for scripting mode.
Do not print headers and separate fields by a single tab instead of arbitrary
white space.
.It Fl j Op Ar --json-int
.It Fl j , -json Op Ar --json-int
Print the output in JSON format.
Specify
.Sy --json-int

View File

@ -59,7 +59,7 @@
.Xc
Displays all ZFS file systems currently mounted.
.Bl -tag -width "-j"
.It Fl j
.It Fl j , -json
Displays all mounted file systems in JSON format.
.El
.It Xo

View File

@ -50,7 +50,7 @@ and any attempts to access or modify other pools will cause an error.
.
.Sh OPTIONS
.Bl -tag -width "-t"
.It Fl j
.It Fl j , -json
Display channel program output in JSON format.
When this flag is specified and standard output is empty -
channel program encountered an error.

View File

@ -130,7 +130,7 @@ The value
can be used to display all properties that apply to the given dataset's type
.Pq Sy filesystem , volume , snapshot , No or Sy bookmark .
.Bl -tag -width "-s source"
.It Fl j Op Ar --json-int
.It Fl j , -json Op Ar --json-int
Display the output in JSON format.
Specify
.Sy --json-int

View File

@ -23,7 +23,7 @@
.\"
.\" lint-ok: WARNING: sections out of conventional order: Sh SYNOPSIS
.\"
.Dd April 4, 2024
.Dd December 2, 2024
.Dt ZINJECT 8
.Os
.
@ -268,7 +268,7 @@ Run for this many seconds before reporting failure.
.It Fl T Ar failure
Set the failure type to one of
.Sy all ,
.Sy ioctl ,
.Sy flush ,
.Sy claim ,
.Sy free ,
.Sy read ,

View File

@ -98,7 +98,10 @@ This can be an indicator of problems with the underlying storage device.
The number of delay events is ratelimited by the
.Sy zfs_slow_io_events_per_second
module parameter.
.It Sy dio_verify
.It Sy dio_verify_rd
Issued when there was a checksum verify error after a Direct I/O read has been
issued.
.It Sy dio_verify_wr
Issued when there was a checksum verify error after a Direct I/O write has been
issued.
This event can only take place if the module parameter

View File

@ -98,7 +98,7 @@ See the
.Xr zpoolprops 7
manual page for more information on the available pool properties.
.Bl -tag -compact -offset Ds -width "-o field"
.It Fl j Op Ar --json-int, --json-pool-key-guid
.It Fl j , -json Op Ar --json-int, --json-pool-key-guid
Display the list of properties in JSON format.
Specify
.Sy --json-int
@ -157,7 +157,7 @@ See the
.Xr vdevprops 7
manual page for more information on the available pool properties.
.Bl -tag -compact -offset Ds -width "-o field"
.It Fl j Op Ar --json-int
.It Fl j , -json Op Ar --json-int
Display the list of properties in JSON format.
Specify
.Sy --json-int

View File

@ -59,7 +59,7 @@ is specified, the command exits after
.Ar count
reports are printed.
.Bl -tag -width Ds
.It Fl j Op Ar --json-int, --json-pool-key-guid
.It Fl j , -json Op Ar --json-int, --json-pool-key-guid
Display the list of pools in JSON format.
Specify
.Sy --json-int

View File

@ -109,7 +109,7 @@ Stops and cancels an in-progress removal of a top-level vdev.
.El
.
.Sh EXAMPLES
.\" These are, respectively, examples 14 from zpool.8
.\" These are, respectively, examples 15 from zpool.8
.\" Make sure to update them bidirectionally
.Ss Example 1 : No Removing a Mirrored top-level (Log or Data) Device
The following commands remove the mirrored log device
@ -142,9 +142,43 @@ The command to remove the mirrored log
.Ar mirror-2 No is :
.Dl # Nm zpool Cm remove Ar tank mirror-2
.Pp
At this point, the log device no longer exists
(both sides of the mirror have been removed):
.Bd -literal -compact -offset Ds
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
.Ed
.Pp
The command to remove the mirrored data
.Ar mirror-1 No is :
.Dl # Nm zpool Cm remove Ar tank mirror-1
.Pp
After
.Ar mirror-1 No has been evacuated, the pool remains redundant, but
the total amount of space is reduced:
.Bd -literal -compact -offset Ds
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
.Ed
.
.Sh SEE ALSO
.Xr zpool-add 8 ,

View File

@ -26,7 +26,7 @@
.\" Copyright 2017 Nexenta Systems, Inc.
.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
.\"
.Dd June 22, 2023
.Dd November 18, 2024
.Dt ZPOOL-SCRUB 8
.Os
.
@ -36,9 +36,8 @@
.Sh SYNOPSIS
.Nm zpool
.Cm scrub
.Op Fl s Ns | Ns Fl p
.Op Ns Fl e | Ns Fl p | Fl s Ns | Fl C Ns
.Op Fl w
.Op Fl e
.Ar pool Ns
.
.Sh DESCRIPTION
@ -114,6 +113,10 @@ The pool must have been scrubbed at least once with the
feature enabled to use this option.
Error scrubbing cannot be run simultaneously with regular scrubbing or
resilvering, nor can it be run when a regular scrub is paused.
.It Fl C
Continue scrub from last saved txg (see zpool
.Sy last_scrubbed_txg
property).
.El
.Sh EXAMPLES
.Ss Example 1

View File

@ -70,7 +70,7 @@ See the
option of
.Nm zpool Cm iostat
for complete details.
.It Fl j Op Ar --json-int, --json-flat-vdevs, --json-pool-key-guid
.It Fl j , -json Op Ar --json-int, --json-flat-vdevs, --json-pool-key-guid
Display the status for ZFS pools in JSON format.
Specify
.Sy --json-int
@ -82,14 +82,18 @@ Specify
.Sy --json-pool-key-guid
to set pool GUID as key for pool objects instead of pool names.
.It Fl d
Display the number of Direct I/O write checksum verify errors that have occured
on a top-level VDEV.
Display the number of Direct I/O read/write checksum verify errors that have
occured on a top-level VDEV.
See
.Sx zfs_vdev_direct_write_verify
in
.Xr zfs 4
for details about the conditions that can cause Direct I/O write checksum
verify failures to occur.
Direct I/O reads checksum verify errors can also occur if the contents of the
buffer are being manipulated after the I/O has been issued and is in flight.
In the case of Direct I/O read checksum verify errors, the I/O will be reissued
through the ARC.
.It Fl D
Display a histogram of deduplication statistics, showing the allocated
.Pq physically present on disk

View File

@ -405,9 +405,43 @@ The command to remove the mirrored log
.Ar mirror-2 No is :
.Dl # Nm zpool Cm remove Ar tank mirror-2
.Pp
At this point, the log device no longer exists
(both sides of the mirror have been removed):
.Bd -literal -compact -offset Ds
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
.Ed
.Pp
The command to remove the mirrored data
.Ar mirror-1 No is :
.Dl # Nm zpool Cm remove Ar tank mirror-1
.Pp
After
.Ar mirror-1 No has been evacuated, the pool remains redundant, but
the total amount of space is reduced:
.Bd -literal -compact -offset Ds
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
.Ed
.
.Ss Example 16 : No Displaying expanded space on a device
The following command displays the detailed information for the pool

View File

@ -55,6 +55,8 @@ modules-Linux:
mkdir -p $(sort $(dir $(zfs-objs) $(zfs-)))
$(MAKE) -C @LINUX_OBJ@ $(if @KERNEL_CC@,CC=@KERNEL_CC@) \
$(if @KERNEL_LD@,LD=@KERNEL_LD@) $(if @KERNEL_LLVM@,LLVM=@KERNEL_LLVM@) \
$(if @KERNEL_CROSS_COMPILE@,CROSS_COMPILE=@KERNEL_CROSS_COMPILE@) \
$(if @KERNEL_ARCH@,ARCH=@KERNEL_ARCH@) \
M="$$PWD" @KERNEL_MAKE@ CONFIG_ZFS=m modules
modules-FreeBSD:

View File

@ -3281,7 +3281,6 @@ nvs_xdr_nvp_##type(XDR *xdrs, void *ptr, ...) \
#endif
/* BEGIN CSTYLED */
NVS_BUILD_XDRPROC_T(char);
NVS_BUILD_XDRPROC_T(short);
NVS_BUILD_XDRPROC_T(u_short);
@ -3289,7 +3288,6 @@ NVS_BUILD_XDRPROC_T(int);
NVS_BUILD_XDRPROC_T(u_int);
NVS_BUILD_XDRPROC_T(longlong_t);
NVS_BUILD_XDRPROC_T(u_longlong_t);
/* END CSTYLED */
/*
* The format of xdr encoded nvpair is:

View File

@ -31,5 +31,4 @@
#include <sys/queue.h>
#include <sys/sdt.h>
/* CSTYLED */
SDT_PROBE_DEFINE1(sdt, , , set__error, "int");

Some files were not shown because too many files have changed in this diff Show More