Compare commits

..

452 Commits

Author SHA1 Message Date
Tony Hutter
34f96a15c7 Tag zfs-2.3.4
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2025-08-20 09:29:36 -07:00
Tino Reichardt
fdb5078d82 CI: Add Debian 13 to the FULL_OS runner list
This commit adds Debian 13 alias Trixie to the checked operating
systems. The image needs to be run with UEFI support.

Current Debian version overview:
- Debian 11 (Bullseye) -> "oldoldstable"
- Debian 12 (Bookworm) -> "oldstable"
- Debian 13 (Trixie) -> new "stable"

The CI will be run on Debian 12 and Debian 13 now.
Debian 11 is kept, but won't be used automatically.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17648
2025-08-20 09:29:36 -07:00
Shengqi Chen
8cca55f18b Debian rules: install scripts/objtool-wrapper.in into dkms tree
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17633
Closes #17646
2025-08-20 09:20:45 -07:00
Attila Fülöp
ef1ee9421d objtool-wrapper: Update Debian packaging
6cf17f65 (#17456) introduced a change to `configure.ac` which
breaks the patching done in the Debian packages DKMS source
installation phase. This results in a failed module build.

Adapt the awk script doing the patching to handle the added
`AC_CONFIG_FILE` entry.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17633
Closes #17646
2025-08-20 09:12:26 -07:00
shodanshok
435006d81d add uncompressed_size to arc_summary
Add uncompressed ARC size to statistics reported by arc_summary.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #17556
2025-08-19 10:30:04 -07:00
rmacklem
725886d67a FreeBSD: Add support for _PC_HAS_HIDDENSYSTEM
In FreeBSD there is now a pathconf name _PC_HAS_HIDDENSYSTEM.
This patch adds support for it to OpenZFS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #17518
2025-08-19 10:30:04 -07:00
Meriel Luna Mittelbach
bce049389d Add templated zfs-mount@.service
Runs `zfs mount -R <dataset>` at boot, after `zfs mount -a`.
Intended to replace `mountpoint=legacy` in certain mount setups.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Meriel Luna Mittelbach <lunarlambda@gmail.com>
Closes #17483
2025-08-19 10:30:04 -07:00
Mark Johnston
c3d74a0d6f FreeBSD: Ensure that z_pflags is initialized for new znodes
The field is subsequently accessed in zfs_mknode(), in
zfs_inherit_projid().  The Linux implementation of zfs_create_fs() has
this initialization already; there is no counterpart to
zfs_create_share_dir() that I can see.

Reported-by: KMSAN
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #17486
2025-08-19 10:30:04 -07:00
Tony Hutter
9cf069b366 CI: Add optional patch level, fix hostname on F42
In the past there have been times when we need to generate new RPMs
for an existing ZFS release.  Typically this happens when a new RHEL
version comes out and the kernel symbols no longer match.  To get
users to auto-update we just bump the patch number.  For example, we
had to create zfs-2.1.13-1 for EL8.8 and zfs-2.1.13-2 for EL8.9.

This commit adds an optional patch level text box to the github
package builder runner.

In addition, this commit also uses `hostnamectl` instead of `hostname`
for F42+ compatibility, if available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17638
2025-08-18 17:06:55 -07:00
Richard Yao
3f87c9c276 Add CodeQL mismatched dsl_dataset_hold/_rele pairs check
This check is currently limited to checking mismatches that occur in the
same stack frame. It does not detect across stack frames.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17352
2025-08-18 17:06:55 -07:00
Patrick Fasano
9d14ce4db7 Add conflict/replacement with older SONAME libzfs and libzpool packages
In e8f0aa143e, the SONAMEs and package
names for libzfs and libzpool were bumped. The `contrib/debian/control`
file did not declare a conflict/replacement with the old package name.
This can cause dpkg to leave a system in an inconsistent state if the
old package is not manually uninstalled first.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: Patrick Fasano <patrick@patrickfasano.com>
Closes #17586
2025-08-18 16:09:44 -07:00
Rob Norris
3b64a9619f FreeBSD: zfs_putpages: don't undirty pages until after write completes
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
In syncing mode, zfs_putpages() would put the entire range of pages onto
the ZIL, then return VM_PAGER_OK for each page to the kernel. However,
an associated zil_commit() or txg sync had not happened at this point,
so the write may not actually be on disk.

So, we rework that case to use a ZIL commit callback, and do the
post-write work of undirtying the page and signaling completion there.
We return VM_PAGER_PEND to the kernel instead so it knows that we will
take care of it.

The original version of this (238eab7dc1) copied the Linux model and did
the cleanup in a ZIL callback for both async and sync. This was a
mistake, as FreeBSD does not have a separate "busy for writeback" flag
like Linux which keeps the page usable. The full sbusy flag locks the
entire page out until the itx callback fires, which for async is after
txg sync, which could be literal seconds in the future.

For the async case, the data is already on the DMU and the in-memory
ZIL, which is sufficient for async writeback, so the old method of
logging it without a callback, undirtying the page and returning is more
than sufficient and reclaims that lost performance.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17533
2025-08-12 22:41:17 -04:00
Mark Johnston
a072611eef Revert "FreeBSD: zfs_putpages: don't undirty pages until after write completes"
This causes async putpages to leave the pages sbusied for a long time,
which hurts concurrency.  Revert for now until we have a better
approach.

This reverts commit 238eab7dc1.

Reported by:    Ihor Antonov <ngor@hugpoint.tech>
Discussed with: Rob Norris <rob.norris@klarasystems.com>

References: freebsd/freebsd-src@738a9a7
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Ported-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17533
2025-08-12 22:41:17 -04:00
Brian Behlendorf
0fe10361ba Allow vmem_alloc backed multilists
Systems with a large number of CPU cores (192+) may trigger the large
allocation warning in multilist_create() on Linux.  Silence the warning
by converting the allocation to vmem_alloc().

On Linux this results in a call to kvalloc() which will alloc vmem
for large allocations and kmem for small allocations.

On FreeBSD both vmem_alloc and kmem_alloc internally use the same
allocator so there is no functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17616
2025-08-12 17:24:30 -07:00
Brian Behlendorf
3e78905ffb Silence zstd large allocation warning
Allow zstd_mempool_init() to allocate using vmem_alloc() instead
of kmem_alloc() to silence the large allocation warning on Linux
during module load when the system has a large number of CPUs.

It's not at all clear to me that scaling the allocation size with
the number of CPUs is beneficial and that should be evaluated.
But for the moment this should resolve the warning without
introducing any unexpected side effects.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17620
Closes #11557
2025-08-12 17:24:26 -07:00
Colin Percival
46de04d2e9 FreeBSD 15.0 is now "PRERELEASE"
Chase URL change from the FreeBSD project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colin Percival <cperciva@tarsnap.com>
Closes #17617
2025-08-12 17:24:22 -07:00
achill
41ca2296cd Linux 6.16 compat: META
Update the META file to reflect compatibility with the 6.16
kernel.

Tested with 6.16.0-0-stable of Alpine Linux edge, see
<https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/87929>.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Achill Gilgenast <achill@achill.org>
Closes #17578
2025-08-12 17:24:19 -07:00
René Wirnata
9651668457 zed: prettify slack notification message
This converts the body of a ZED slack notification from
plain text to code block style to help with readability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: René Wirnata <rene.wirnata@pandascience.net>
Closes #17610
2025-08-12 17:24:15 -07:00
Rob Norris
a49c957299 linux/zvol_os: fix crash with blk-mq on Linux 4.19
03987f71e3 (#16069) added a workaround to get the blk-mq hardware
context for older kernels that don't cache it in the struct request.
However, this workaround appears to be incomplete.

In 4.19, the rq data context is optional. If its not initialised, then
the cached rq->cpu will be -1, and so using it to index into mq_map
causes a crash.

Given that the upstream 4.19 is now in extended LTS and rarely seen,
RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached
hardware context has been available since 5.1, I'm not going to huge
lengths to get queue selection correct for the very few people that are
likely to feel it. To that end, we simply call raw_smp_processor_id() to
get a valid CPU id and use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17597
2025-08-12 17:24:11 -07:00
Todd Zullinger
d1d706350e rpm: don't list /sbin/zgenhostid twice in %files
The location of zgenhostid was changed in 0ae733c7a (Install zgenhostid
to sbindir, 2021-01-21).  We include all files within sbindir two lines
earlier, which causes rpmbuild to report:

    File listed twice: /sbin/zgenhostid

Drop the redundant entry from the %files section.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Closes #17601
2025-08-12 17:24:08 -07:00
Attila Fülöp
11f844175e config: Avoid void main() in toolchain-simd.m4
Be standard-compliant by using `int main()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13303
Closes #17590
2025-08-12 17:23:57 -07:00
Attila Fülöp
57b614e025 SIMD: Don't require definition of HAVE_XSAVE
Currently we fail the compilation via the #error directive if
`HAVE_XSAVE` isn't defined. This breaks i586 builds since we check
the toolchains SIMD support only on i686 and onward.

Remove the requirement to fix the build on i586.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13303
Closes #17590
2025-08-12 17:23:51 -07:00
Rob Norris
0c7d6e20e6 Linux: zfs_putpage: document (and fix!) confusing sync/commit modes
The structure of zfs_putpage() and its callers is tricky to follow.
There's a lot more we could do to improve it, but at least now we have
some description of one of the trickier bits.

Writing this exposed a very subtle bug: most async pages pushed out
through zpl_putpages() would go to the ZIL with commit=false, which can
yield a less-efficient write policy. So this commit updates that too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
2025-08-12 17:23:46 -07:00
Rob Norris
b9c45fe68c Linux: zfs_putpage: complete async page writeback immediately
For async page writeback, we do not need to wait for the page to be on
disk before returning to the caller; it's enough that the data from the
dirty page be on the DMU and in the in-memory ZIL, just like any other
write.

So, if this is not a syncing write, don't add a callback to the itx, and
instead just unlock the page immediately.

(This is effectively the same concept used for FreeBSD in d323fbf49c).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
Closes #14290
2025-08-12 17:23:43 -07:00
Rob Norris
f72226a75c Linux: sync: remove async/sync accounting
All this machinery is there to try to understand when there an async
writeback waiting to complete because the intent log callbacks are still
outstanding, and force them with a timely zil_commit(). The next commit
fixes this properly, so there's no need for all this extra housekeeping.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
2025-08-12 17:23:39 -07:00
Rob Norris
97fe86837c ZTS: mmap_ftruncate test to confirm async writeback behaviour
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
2025-08-12 17:23:35 -07:00
Rob Norris
df5e02d253 CI: match and trim out internal timestamp for test prefix
Adjust the regexes to match the test line with timestamps, then remove
them for the summary. The internal timestamp is still in the full logs.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17045
2025-08-12 17:23:28 -07:00
Rob Norris
245adb6a4f ZTS: include microsecond timestamps on all output
When reviewing test output after a failure, it's often quite difficult
to work out the order and timing of events, and to correlate test suite
output with kernel logs.

This adds timestamps to ZTS output to help with this, in three places:

- all of the standard log_XXX functions ultimately end up in _printline,
  which now prefixes output with a timestamp. An escape hatch
  environment variable is provided for user_cmd, which often calls the
  logging functions while also depending on the captured output.

- the test runner logging function log() also now prefixes its output
  with a timestamp.

- on failure, when capturing the kernel log in zfs_dmesg.ksh, the "iso"
  time format is requested.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17045
2025-08-12 17:23:07 -07:00
Brian Behlendorf
82a0868ce4 CI: Remove Debian backports
The latest Debian 11 image includes bullseye-backports as a default
repository in the /etc/apt/sources.list.  However, this repository
has gone end of life which effectively breaks the default install.

We shouldn't need anything in backports so lets unconditionally
remove backports on all Debian builders to resolve the issue.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17569
2025-08-12 17:22:45 -07:00
Coleman Kane
e7e0bb3b61 linux: Fix out-of-src builds
The linux kernel modules haven't been building successfully when the
build occurs in a separate directory than the source code, which is a
common build pattern in Linux. Was not able to determine the root cause,
but the %.o targets in subdirectories are no longer being matched by the
pattern targets in the Linux Kbuild system. This change fixes the issue
by dynamically creating the missing ones inside our Kbuild.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #17517
2025-08-12 17:22:40 -07:00
Paul Dagnelie
6af1f61ad4 Fix zdb pool/ with -k
When examining the root dataset with zdb -k, we get into a mismatched
state. main() knows we are not examining the whole pool, but it strips
off the trailing slash. import_checkpointed_state() then thinks we are
examining the whole pool, and does not update the target path
appropriately. The fix is to directly inform import_checkpointed_state
that we are examining a filesystem, and not the whole pool.

Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17536
2025-08-12 17:21:47 -07:00
Carl George
8c4f625c12 CI: Add CentOS Stream 9/10 to the FULL_OS runner list
Testing on CentOS Stream provides several months advance notice of
changes coming to the RHEL kernel.  This should help OpenZFS be
proactive instead of reactive to new RHEL minor versions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Carl George <carlwgeorge@gmail.com>
ZFS-CI-Type: full
Closes #16904
Closes #17526
2025-08-12 17:20:16 -07:00
Tino Reichardt
7882e85a9b Delete unused .cirrus.yml
The Cirrus_CI was planned for testing FreeBSD, but never really used I
think. Currently it's not needed anymore, so remove it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17155
Closes #17535
2025-08-12 17:19:43 -07:00
Tino Reichardt
6b38d0f7ff ZTS: Fix FreeBSD 15.0 ksh errors
The package ksh93 is replaced by ksh now.
This works for FreeBSD 13 and 14 also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17523
2025-08-12 17:19:32 -07:00
Alexander Motin
80b6457fcd CI: Switch from FreeBSD 13.4 to 13.5
FreeBSD 13.4 is EOL since June 30, 2025.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Closes #17519
2025-08-12 17:19:07 -07:00
Brian Behlendorf
2518f4b124 Revert "Fix incorrect expected error in ztest"
This reverts commit 2076011e0c.  The
comment which explains EINVAL should be expected for this case was
wrong, not the code.  The kernel will return ENOTSUP when attaching
a distributed spare to the wrong top-level dRAID vdev.  See the
check for this in spa_vdev_attach().

Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17503
2025-08-12 17:18:40 -07:00
Igor Ostapenko
90d2c4407a ztest: Fix false positive of ENOSPC handling
Before running a pass zs_enospc_count is checked to free up some space
by destroying a random dataset. But the space freed may still be not
re-usable during the TXG_DEFER window breaking the next dataset creation
in ztest_generic_run().
    
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Closes #17506
2025-08-12 17:18:34 -07:00
Brian Behlendorf
f7698f47e8 CI: run ztest on compressed zpool
When running ztest under the CI a common failure mode is for the
underlying filesystem to run out of available free space.  Since
the storage associated with a GitHub-hosted running is fixed, we
instead create a pool and use a compressed ZFS dataset to store
the ztest vdev files.  This significantly increases the available
capacity since the data written by ztest is highly compressible.
A compression ratio of over 40:1 is conservatively achieved using
the default lz4 compression.  Autotrimming is enabled to ensure
freed blocks are discarded from the backing cipool vdev file.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17501
2025-08-12 17:17:49 -07:00
Martin Rüegg
6c1130a730 pyzfs: Adapt python lib directory evaluation from ax_python_devel.m4
71216b91d2 introduced a regression
on debian/ubuntu systems during build.

The reason being, that building the RPM for pyzfs was using
a different library path than building the library itself.
This is now harmonized.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch>
Closes #16155
Closes #17480
2025-08-12 17:17:24 -07:00
Martin Rüegg
74b539d3dc pyzfs: Update ax_python_devel.m4 to serial 37
Fixes an obvious typo, where a variable was missing the required
leading dollar sign ($)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch>
Closes #17480
2025-08-12 17:17:17 -07:00
Chunwei Chen
024e60b927 Missing tests in make pkg
```
Warning: TestGroup '/var/tmp/tests/functional/ctime' not added to this
run. Auxiliary script '/var/tmp/tests/functional/ctime/setup' failed
verification.
```

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17491
2025-08-12 17:17:09 -07:00
Olivier Certner
5289f6f961 spa: ZIO_TASKQ_ISSUE: Use symbolic priority
This allows to change the meaning of priority differences in FreeBSD
without requiring code changes in ZFS.

This upstreams commit fd141584cf89d7d2 from FreeBSD src.

Sponsored-by: The FreeBSD Foundation
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Olivier Certner <olce@FreeBSD.org>
Closes #17489
2025-08-12 17:16:00 -07:00
Paul Dagnelie
094305c937 Fix TestGroup warning due to missing tags
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17473
2025-08-12 15:59:59 -07:00
Tino Reichardt
a826f7a993 ZTS: Use FreeBSD cloudinit images
FreeBSD provides CI-IMAGES since some time. These images are
based on nuageinit, which does not support fqdn and sudo for
example. So we need currently some workarounds to get it
working.

The FreeBSD images will be more compatible with cloud-init in
some near future. Then we can remove the workaround things.

These versions are used for testing:
- freebsd13-4r (RELEASE)
- freebsd14-3s (STABLE)
- freebsd15-0c (CURRENT)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17462
2025-08-12 15:58:46 -07:00
Attila Fülöp
86bf73c1eb objtool wrapper: use absolute path to call the wrapper
Older kernel versions run make outside of the build directory. This
works since all paths are absolute. Relative paths will fail in such
a scenario.

Use an absolute path to the objtool wrapper as well, since the
relative path breaks the build on older kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17541
2025-08-07 13:10:58 -04:00
Attila Fülöp
1d293b377a Linux build: handle CONFIG_OBJTOOL_WERROR=y
Linux 5.16 by default fails the build on objtool warnings. We have
known and understood objtool warnings we can't fix without
involving Linux maintainers.

To work around this we introduce an objtool wrapper script which
removes the `--Werror` flag.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17456
2025-08-07 13:10:33 -04:00
Alexander Motin
22eb2bdce3 Make TX abort after assign safer
It is not right, but there are few examples when TX is aborted
after being assigned in case of error.  To handle it better on
production systems add extra cleanup steps.

While here, replace couple dmu_tx_abort() in simple cases.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17438
2025-08-07 12:41:40 -04:00
Alexander Motin
809b553940 Introduce zfs rewrite subcommand (#17246)
This allows to rewrite content of specified file(s) as-is without
modifications, but at a different location, compression, checksum,
dedup, copies and other parameter values.  It is faster than read
plus write, since it does not require data copying to user-space.
It is also faster for sync=always datasets, since without data
modification it does not require ZIL writing.  Also since it is
protected by normal range range locks, it can be done under any
other load.  Also it does not affect file's modification time or
other properties.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
2025-08-07 12:34:28 -04:00
Rob Norris
abb6211e7a Linux 6.16: remove writepage and readahead_page
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17443
2025-08-07 12:29:45 -04:00
khoang98
c405a7a35c Skip dbuf_evict_one() from dbuf_evict_notify() for reclaim thread
Avoid calling dbuf_evict_one() from memory reclaim contexts (e.g. Linux
kswapd, FreeBSD pagedaemon). This prevents deadlock caused by reclaim
threads waiting for the dbuf hash lock in the call sequence:
dbuf_evict_one -> dbuf_destroy -> arc_buf_destroy

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Kaitlin Hoang <kthoang@amazon.com>
Closes #17561
2025-08-07 12:15:14 -04:00
shodanshok
4808641e71 enforce arc_dnode_limit
Linux kernel shrinker in the context of null/root memcg does not scan
dentry and inode caches added by a task running in non-root memcg. For
ZFS this means that dnode cache routinely overflows, evicting valuable
meta/data and putting additional memory pressure on the system.

This patch restores zfs_prune_aliases as fallback when the kernel
shrinker does nothing, enabling zfs to actually free dnodes. Moreover,
it (indirectly) calls arc_evict when dnode_size > dnode_limit.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #17487
Closes #17542
2025-08-07 12:11:34 -04:00
Alexander Motin
30fa92bff3 Increase meta-dnode redundancy in "some" mode
Loss of one indirect block of the meta dnode likely means loss of
the whole dataset.  It is worse than one file that the man page
promises, and in my opinion is not much better than "none" mode.

This change restores redundancy of the meta-dnode indirect blocks,
while same time still corrects expectations in the man page.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17339
2025-08-05 13:15:44 -04:00
Paul Dagnelie
fd5a27c9db Ensure that gang_copies is always at least as large as copies
As discussed in the comments of PR #17004, you can theoretically run
into a case where a gang child has more copies than the gang header,
which can lead to some odd accounting behavior (and even trip a
VERIFY). While the accounting code could be changed to handle this, it
fundamentally doesn't seem to make a lot of sense to allow this to
happen. If the data is supposed to have a certain level of reliability,
that isn't actually achieved unless the gang_copies property is set to
match it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17484
2025-08-05 13:14:45 -04:00
Rob Norris
3ad3f439bb zts: add spdx license tags to gang_blocks tests (#17160)
Missed in #17073, probably because that PR was branched before #17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-08-05 13:14:11 -04:00
Paul Dagnelie
a46ce73ca8 Make ganging redundancy respect redundant_metadata property (#17073)
The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-08-05 13:10:40 -04:00
Ameer Hamza
90790955a6 SPDX: Add missing CDDL-1.0 license
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
2025-08-05 12:51:35 -04:00
Igor Ostapenko
95abbc71c3 range_tree: Provide more debug details upon unexpected add/remove
Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Closes #17581
2025-08-05 12:34:54 -04:00
Tino Reichardt
fc658b9935 Faster checksum benchmark on system boot
While booting, only the needed 256KiB benchmarks are done now.

The delay for checking all checksums occurs when requested via:
- Linux: cat /proc/spl/kstat/zfs/chksum_bench
- FreeBSD: sysctl kstat.zfs.misc.chksum_bench

Reported by: Lahiru Gunathilake <gunathilakebllg@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-authored-by: Colin Percival <cperciva@tarsnap.com>
Closes #17563
Closes #17560
2025-08-05 12:34:13 -04:00
Paul Dagnelie
271b9797c5 Don't use wrong weight when passivating group
When we're passivating a metaslab group we start by passivating the 
metaslabs that have been activated for each of the allocators.  To do 
that, we need to provide a weight. However, currently this erroneously 
always uses a segment-based weight, even if segment-based weighting is 
disabled.

Use the normal weight function, which will decide which type of weight 
to use.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17566
2025-08-05 12:33:52 -04:00
Brian Behlendorf
582e7847f6 Default to zfs_bclone_wait_dirty=1
Update the default FICLONE and FICLONERANGE ioctl behavior to wait
on dirty blocks.  While this does remove some control from the
application, in practice ZFS is better positioned to the optimial
thing and immediately force a TXG sync.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17455
2025-08-05 12:33:30 -04:00
Andriy Tkachuk
6d378564b4 zdb: fix checksum calculation for decompressed blocks
Currently, when reading compressed blocks with -R and decompressing
them with :d option and specifying lsize, which is normally bigger
than psize for compressed blocks, the checksum is calculated on
decompressed data. But it makes no sense since zfs always calculates
checksum on physical, i.e. compressed data. So reading the same block
produces different checksum results depending on how we read it,
whether we decompress it or not, which, again, makes no sense.

Fix: use psize instead of lsize when calculating the checksum so that
it is always calculated on the physical block size, no matter was it
compressed or not.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17547
2025-08-05 12:33:00 -04:00
Ameer Hamza
0c928f7a37 ZED: Fix device type detection and pool iteration logic
During hotplug REMOVED events, devid matching fails for partition-based
spares because devid information is not stored in pool config for
partitioned devices. However, when devid is populated by the hotplug
event, the original code skipped the search logic entirely, skipping
vdev_guid matching and resulting in wrong device type detection that
caused spares to be incorrectly identified as l2arc devices.
Additionally, fix zfs_agent_iter_pool() to use the return value from
zfs_agent_iter_vdev() instead of relying on search parameters, which
was previously ignored. Also add pool_guid optimization to enable
targeted pool searching when pool_guid is available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17545
2025-08-05 12:32:28 -04:00
Chunwei Chen
c79d5e4f33 Define sops->free_inode() to prevent use-after-free during lookup
On Linux, when doing path lookup with LOOKUP_RCU, dentry and inode can
be dereferenced without refcounts and locks. For this reason, dentry and
inode must only be freed after RCU grace period.

However, zfs currently frees inode in zfs_inode_destroy synchronously
and we can't use GPL-only call_rcu() in zfs directly. Fortunately, on
Linux 5.2 and after, if we define sops->free_inode(), the kernel will do
call_rcu() for us.

This issue may be triggered more easily with init_on_free=1 boot
parameter:

BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:selinux_inode_permission+0x10e/0x1c0
Call Trace:
 ? show_trace_log_lvl+0x1be/0x2d9
 ? show_trace_log_lvl+0x1be/0x2d9
 ? show_trace_log_lvl+0x1be/0x2d9
 ? security_inode_permission+0x37/0x60
 ? __die_body.cold+0x8/0xd
 ? no_context+0x113/0x220
 ? exc_page_fault+0x6d/0x130
 ? asm_exc_page_fault+0x1e/0x30
 ? selinux_inode_permission+0x10e/0x1c0
 security_inode_permission+0x37/0x60
 link_path_walk.part.0.constprop.0+0xb5/0x360
 ? path_init+0x27d/0x3c0
 path_lookupat+0x3e/0x1a0
 filename_lookup+0xc0/0x1d0
 ? __check_object_size.part.0+0x123/0x150
 ? strncpy_from_user+0x4e/0x130
 ? getname_flags.part.0+0x4b/0x1c0
 vfs_statx+0x72/0x120
 ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120
 __do_sys_newlstat+0x39/0x70
 ? __x64_sys_ioctl+0x8d/0xd0
 do_syscall_64+0x30/0x40
 entry_SYSCALL_64_after_hwframe+0x62/0xc7

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17546
2025-08-05 12:30:23 -04:00
Alexander Motin
347d68048a ZIL: Force writing of open LWB on suspend
Under parallel workloads ZIL may delay writes of open LWBs that
are not full enough.  On suspend we do not expect anything new to
appear since zil_get_commit_list() will not let it pass, only
returning TXG number to wait for.  But I suspect that waiting for
the TXG commit without having the last LWB issued may not wait for
its completion, resulting in panic described in #17509.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17521
2025-08-05 12:28:41 -04:00
Paul Dagnelie
acf3871ef8 Correct weight recalculation of space-based metaslabs
Currently, after a failed allocation, the metaslab code recalculates the
weight for a metaslab. However, for space-based metaslabs, it uses the
maximum free segment size instead of the normal weighting
algorithm. This is presumably because the normal metaslab weight is
(roughly) intended to estimate the size of the largest free segment, but
it doesn't do that reliably at most fragmentation levels. This means
that recalculated metaslabs are forced to a weight that isn't really
using the same units as the rest of them, resulting in undesirable
behaviors. We switch this to use the normal space-weighting function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Sponsored-by: Klara, Inc.
Closes #17531
2025-08-05 12:28:34 -04:00
Ameer Hamza
21d5f25724 Validate mountpoint on path-based unmount using statx
Use statx to verify that path-based unmounts proceed only if the
mountpoint reported by statx matches the MNTTAB entry reported by
libzfs, aborting the operation if they differ. Align
`zfs umount /path` behavior with `zfs umount dataset`.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17481
2025-08-05 12:27:25 -04:00
Paul Dagnelie
7e945a5b3f Fix other nonrot bugs
There are still a variety of bugs involving the vdev_nonrot property
that will cause problems if you try to run the test suite with
segment-based weighting disabled, and with other things in the weighting
code. Parents' nonrot property need to be updated when children are
added. When vdevs are expanded and more metaslabs are added, the weights
have to be recalculated (since the number of metaslabs is an input to
the lba bias function). When opening, faulted or unopenable children
should not be considered for whether a vdev is nonrot or not (since the
nonrot property is determined during a successful open, this can cause
false negatives). And draid spares need to have the nonrot property set
correctly.

Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17469
2025-08-05 12:25:26 -04:00
Alexander Motin
85ce6b8ab2 Polish db_rwlock scope
dbuf_verify(): Don't need the lock, since we only compare pointers.

dbuf_findbp(): Don't need the lock, since aside of unneeded assert
we only produce the pointer, but don't de-reference it.

dnode_next_offset_level(): When working on top level indirection
should lock dnode buffer's db_rwlock, since it is our parent.  If
dnode has no buffer, then it is meta-dnode or one of quotas and we
should lock the dataset's ds_bp_rwlock instead.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17441
2025-08-05 12:23:32 -04:00
Mariusz Zaborski
954894ee53 scrub: generate scrub_finish event
The `scn_min_txg` can now be used not only with resilver. Instead
of checking `scn_min_txg` to determine whether it’s a resilver or
a scrub, simply check which function is defined. Thanks to this
change, a scrub_finish event is generated when performing a scrub
from the saved txg.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Closes #17432
2025-08-05 12:21:46 -04:00
Alexander Motin
a4e775d2ca Some arc_release() cleanup
- Don't drop L2ARC header if we have more buffers in this header.
Since we leave them the header, leave them the L2ARC header also.
Honestly we are not required to drop it even if there are no other
buffers, but then we'd need to allocate it a separate header, which
we might drop soon if the old block is really deleted.  Multiple
buffers in a header likely mean active snapshots or dedup, so we
know that the block in L2ARC will remain valid.  It might be rare,
but why not?
 - Remove some impossible assertions and conditions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17126
2025-08-05 12:16:27 -04:00
Paul Dagnelie
661310ff5c FDT dedup log sync -- remove incremental
This PR condenses the FDT dedup log syncing into a single sync
pass. This reduces the overhead of modifying indirect blocks for the
dedup table multiple times per txg. In addition, changes were made to
the formula for how much to sync per txg. We now also consider the
backlog we have to clear, to prevent it from growing too large, or
remaining large on an idle system.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Authored-by: Don Brady <don.brady@klarasystems.com>
Authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17038
2025-08-05 12:15:21 -04:00
Alexander Motin
f9d59b579e ZIL: Relax parallel write ZIOs processing
ZIL introduced dependencies between its write ZIOs to permit flush
defer, when we flush vdev caches only once all the write ZIOs has
completed.  But it was recently spotted that it serializes not only
ZIO completions handling, but also their ready stage.  It means ZIO
pipeline can't calculate checksums for the following ZIOs until all
the previous are checksumed, even though it is not required.  On a
systems where memory throughput of a single CPU core is limited,
it creates single-core CPU bottleneck, which is difficult to see
due to ZIO pipeline design with many taskqueue threads.

While it would be great to bypass the ready stage waits, it would
require changes to ZIO code, and I haven't found a clean way to do
it.  But I've noticed that we don't need any dependency between
the write ZIOs if the previous one has some waiters, which means
it won't defer any flushes and work as a barrier for the earlier
ones.

Bypassing it won't help large single-thread writes, since all the
write ZIOs except the last in that case won't have waiters, and
so will be dependent.  But in that case the ZIO processing might
not be a bottleneck, since there will be only one thread populating
the write buffers, that will likely be the bottleneck.

But bypassing the ZIO dependency on multi-threaded write workloads
really allows them to scale beyond the checksuming throughput of
one CPU core.

My tests with writing 12 files on a same dataset on a pool with
4 striped NVMes as SLOGs from 12 threads with 1MB blocks on a
system with Xeon Silver 4114 CPU show total throughput increase
from 4.3GB/s to 8.5GB/s, increasing the SLOGs busy from ~30% to
~70%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17458
2025-08-05 12:14:18 -04:00
Brian Behlendorf
1af41fd203 Tag zfs-2.3.3
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
META file and changelog updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-06-19 08:21:04 -07:00
Tony Hutter
faefa5ffc3 Linux 6.15 compat: META
Update the META file to reflect compatibility with the 6.15
kernel.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17393
2025-06-17 10:50:27 -07:00
Germano Massullo
777d8ee345 Fix mixed-use-of-spaces-and-tabs rpmlint warning
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Germano Massullo <germano.massullo@gmail.com>
Closes #17461
2025-06-17 10:50:27 -07:00
Rob Norris
b00bc81b05 ioctl: remove FICLONE/FICLONERANGE/FIDEDUPERANGE compat
These are only required to support these ioctls on Linux <4.5. Since
4.18 is our cutoff, we don't need this code anymore.

Also removing related test things that will never match again.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17308
2025-06-17 10:50:27 -07:00
Alexander Motin
f7e6dcc68d Relax zfs_vnops_read_chunk_size limitations
It makes no sense to limit read size below the block size, since
DMU will any way consume resources for the whole block, while the
current zfs_vnops_read_chunk_size is only 1MB, which is smaller
that maximum block size of 16MB.  Plus in case of misaligned
Uncached I/O the buffer may get evicted between the chunks,
requiring repeating I/Os.

On 64-bit platforms increase zfs_vnops_read_chunk_size to 32MB.
It allows to less depend on speculative prefetcher if application
requests specific size, first not waiting for prefetcher to start
and later not prefetching more than needed.

Also while there, we don't need to align reads to the chunk size,
but only to a block size, which is smaller and so more forgiving.

My profiles show ~4% of CPU time saving when reading 16MB blocks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17415
2025-06-17 10:50:27 -07:00
Rob Norris
e2de00ca44 dmu_traverse: remove 'ignore_hole_birth' tunable alias
It's been many years, we can probably do without.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17376
2025-06-17 10:50:27 -07:00
Allan Jude
8e9ffe1b4f ARC: parallel eviction
On systems with enormous amounts of memory, the single arc_evict thread
can become a bottleneck if reads and writes are stuck behind it, waiting
for old data to be evicted before new data can take its place.

This commit adds support for evicting from multiple ARC lists in
parallel, by farming the evict work out to some number of threads and
then accumulating their results.

A new tuneable, zfs_arc_evict_threads, sets the number of threads. By
default, it will scale based on the number of CPUs.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Youzhong Yang <youzhong@gmail.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Co-authored-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Closes #16486
2025-06-17 10:50:26 -07:00
Don Brady
6b67a5bdd3 During pool export flush the ARC asynchronously
This also includes removing L2 vdevs asynchronously.

This commit also guarantees that spa_load_guid is unique.

The zpool reguid feature introduced the spa_load_guid, which is a
transient value used for runtime identification purposes in the ARC.
This value is not the same as the spa's persistent pool guid.

However, the value is seeded from spa_generate_load_guid() which
does not check for uniqueness against the spa_load_guid from other
pools.  Although extremely rare, you can end up with two different
pools sharing the same spa_load_guid value! So we guarantee that
the value is always unique and additionally not still in use by an
async arc flush task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16215
2025-06-17 10:50:26 -07:00
Alexander Motin
4f1b91e343 CI: Automate some GitHub PR status labels manipulations
- Set/remove "Work in Progress"/"Code Review Needed" for drafts.
 - Remove "Accepted", "Inactive", "Revision Needed" and "Stale" on
pushes and reopens.

I hope this reduce chances of PRs being forgotten after requested
modifications done due to stale labels.  It is better to have no
labels than incorrect ones saying there is nothing to look at.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16721
2025-06-17 10:50:26 -07:00
Rob Norris
a65225ec7e FreeBSD: zfs_putpages: don't undirty pages until after write completes
zfs_putpages() would put the entire range of pages onto the ZIL, then
return VM_PAGER_OK for each page to the kernel. However, an associated
zil_commit() or txg sync had not happened at this point, so the write
may not actually be on disk.

So, we rework it to use a ZIL commit callback, and do the post-write
work of undirtying the page and signaling completion there. We return
VM_PAGER_PEND to the kernel instead so it knows that we will take care
of it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17445
2025-06-17 10:50:26 -07:00
Rob Norris
9c0f5bc183 zfs_log_write: only put the callback on the last itx
If a write is split across mutliple itxs, we only want the callback on
the last one, otherwise it will be called for every itx associated with
this single write, which makes it very hard to know what to clean up.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17445
2025-06-17 10:50:26 -07:00
Rob Norris
e1dd433a44 zpl_sync_fs: work around kernels that ignore sync_fs errors
If the kernel will honour our error returns, use them. If not, fool it
by setting a writeback error on the superblock, if available.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Rob Norris
08cec6532e zfs_sync: return error when pool suspends
If the pool is suspended, we'll just block in zil_commit(). If the
system is shutting down, blocking wouldn't help anyone. So, we should
keep this test for now, but at least return an error for anyone who is
actually interested.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Rob Norris
d944641502 zfs_sync: remove support for impossible scenarios
The superblock pointer will always be set, as will z_log, so remove code
supporting cases that can't occur (on Linux at least).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Rob Norris
c758072b2f zts: test syncfs() behaviour when pool suspends
Fairly coarse, but if it returns while the pool suspends, it must be
with an error.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Alexander Motin
0c9cdd1606 Improve block cloning transactions accounting
Previous dmu_tx_count_clone() was broken, stating that cloning is
similar to free.  While they might be from some points, cloning
is not net-free.  It will likely consume space and memory, and
unlike free it will do it no matter whether the destination has
the blocks or not (usually not, so previous code did nothing).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17431
2025-06-17 10:50:26 -07:00
Alexander Motin
23fad19818 Reduce zfs_dmu_offset_next_sync penalty
Looking on txg_wait_synced(, 0) I've noticed that it always syncs
5 TXGs: 3 TXG_CONCURRENT_STATES + 2 TXG_DEFER_SIZE.  But in case
of dmu_offset_next() we do not care about deferred frees. And even
concurrent TXGs we might not need sync all 3 if the dnode was not
dirtied in last few TXGs.

This patch makes dmu_offset_next() to sync one TXG at a time until
the dnode is clean, but no more than 3 TXG_CONCURRENT_STATES times.
My tests with random simultaneous writes and seeks over many files
on HDD pool show 7-14% performance increase.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17434
2025-06-17 10:50:26 -07:00
Alexander Motin
3897e86bd1 Make TX abort after assign safer
It is not right, but there are few examples when TX is aborted
after being assigned in case of error.  To handle it better on
production systems add extra cleanup steps.

While here, replace couple dmu_tx_abort() in simple cases.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17438
2025-06-17 10:50:26 -07:00
Alexander Motin
4c8d0471fa Allow zero compression if dedup is enabled
Having high-refcount dedup entries for zero blocks is inefficient
when they could be recorded as a holes instead.  Normally, zero
compression is not done if compression is disabled to not confuse
naive benchmarks.  But with dedup enabled, it is expected that the
write will be skipped anyway, so we are just optimizing the way it
is skipped.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17435
2025-06-17 10:50:26 -07:00
Tino Reichardt
ea3a600bba ZTS: Enable io_uring on CentOS Stream 9 and 10 also
The io_uring interface is available as a Technology Preview.
Details: https://access.redhat.com/solutions/4723221

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17447
2025-06-17 10:50:26 -07:00
Attila Fülöp
e9c1e08e07 Linux build: silence objtool warnings
After #17401 the Linux build produces some stack related warnings.

Silence them with the `STACK_FRAME_NON_STANDARD` macro.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17410
2025-06-17 10:50:26 -07:00
Brian Behlendorf
1688d9991d CI: Retire Fedora 40 builder
Fedora 40 has gone EOL as of May 2025, retire the CI builder.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17408
2025-06-17 10:50:26 -07:00
Tino Reichardt
e0ad633c64 ZTS: Enable io_uring support on el9/el10
The io_uring interface is available as a Technology Preview.
Details: https://access.redhat.com/solutions/4723221

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17397
2025-06-17 10:50:26 -07:00
Tino Reichardt
da4dfa85eb ZTS: Add AlmaLinux 10
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17397
2025-06-17 10:50:26 -07:00
Rob Norris
7be33d2d40 abd_os: move headers from libzpool to libspl
5b9e695 added specific userspace versions of abd_os.h and abd_impl_os.h
for libzpool. However, abd.h and abd_impl.h, which include them, are
packaged with libzfs, so other programs building against libzfs can
fail to build, either because the headers aren't installed, or because
they aren't on any standard include path.

So, move abd_os.h and abd_impl_os.h to libspl, where they we will be
installed alongside abd.h and abd_impl.h in a known path.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16940
Closes #17390
Closes #17394
2025-06-17 10:50:26 -07:00
Alexander Motin
bf4baee81e Set spa_final_txg in spa_unload()
I've noticed that after some dedup tests system reboot ends up in
assertion about ms_defer tree not free.  It seems to be caused by
DDT flushing still freeing some blocks while ZFS trying to reach
a final steady state due to spa_final_txg, while being set by
spa_export_common() on pool export, is not set when spa_unload()
is called by spa_evict_all() on system reboot/shutdown.  Setting
spa_final_txg in spa_unload() fixes this issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17395
2025-06-17 10:50:26 -07:00
Ameer Hamza
e93d15f112 zpool: clarify ZPOOL_STATUS_REMOVED_DEV status message
Disks can be removed either by the administrator via hotplug or by the
kernel when a disk failure occurs. The previous message implied that
removal was always manual, which could be confusing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17400
2025-06-17 10:50:26 -07:00
Ameer Hamza
f292b0f146 vdev: skip faulting disks pending removal
This patch fixes a race where vdev_remove_wanted may be set after probe
initiation, which could otherwise trigger redundant fault and removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17400
2025-06-17 10:50:26 -07:00
Brian Behlendorf
bda0bc6304 CI: Retire Ubuntu 20.04 builder
Ubuntu 20.04 has gone EOL as of April 2025, retire the CI builder.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17403
2025-06-17 10:50:26 -07:00
Rob Norris
04493ca819 linux/zvol_os: don't try to set disk ops if alloc fails
If the kernel fails to allocate the gendisk, zvo_disk will be NULL, and
derefencing it will explode. So don't do that.

Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17396
2025-06-17 10:50:26 -07:00
Attila Fülöp
2c53fe7764 Linux build: always use objtool
We silence `objtool` warnings on some object files using
`OBJECT_FILES_NON_STANDARD_some_file.o`. Nowadays `objtool` is
needed for CPU vulnerability mitigations and a lot more
functionality so its use is desirable.

Just remove the `OBJECT_FILES_NON_STANDARD` definitions. A follow-up
commit is needed to make the offending files standard and address
the compile time warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17401
Closes #17364
2025-06-17 10:50:26 -07:00
Rob Norris
d7bb6bbf13 tunables: fix spelling
Three occurences with an 'e', and all of them mine. Maybe it's an
British thing?

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
b8f80812a3 tunables: remove __check_old_set_param workaround
This was fully removed from Linux in 4.15, so we won't be seeing it
again.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
97696962b5 tunables: remove unused param get/set aliases
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
06fd6dc6f7 tunables: use Linux ullong param ops for u64
Since 3.17 Linux has provided param ops for 64-bit ints, so we don't
need to use our own anymore.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
28ff5ff1c6 tunables: remove support for s64 tunables
Nothing uses them now.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
e9002887e2 tunables: remove direct use of module_param_cb
The use for spl_taskq_kick was the only use, and the comment that
module_param_call is obsolete is no longer true - it's still very much
used even in recent kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
840b070ec7 tunables: remove FreeBSD compat macros for Linux module params
Nothing in any FreeBSD code uses them.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
d02d3add0d tunables: ensure tunable and variable have same define gate
If a variable is only available in the kernel, then the tunable should
also only be available there.

This matters very little so long as we don't have userspace tunables,
but its still good hygeine.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
cc5724f38d tunables: don't assert initialisation in impl getters
It actually doesn't matter if it's not initialised when we first query
the current value; it just returns empty-string. A crash is quite
obnoxious even if it is a rare case.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
8317244270 zfs_log: make zfs_immediate_write_sz uint
Likely it's only int64 for comparison with ssize_t, which is signed.
However, it would make no sense for it to be less than 0 or greater than
4G, so making it a regular uint will make it safe for comparison and
remove the only S64 tunable in core.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Paul Dagnelie
65cf521353 Only interrupt active disk I/Os in failmode=continue
failmode=continue is in a sorry state. Originally designed to fix a very
specific problem, it causes crashes and panics for most people who end
up trying to use it. At this point, we should either remove it entirely,
or try to make it more usable.

With this patch, I choose the latter. While the feature is fundamentally
unpredictable and prone to race conditions, it should be possible to get
it to the point where it can at least sometimes be useful for some
users. This patch fixes one of the major issues with failmode=continue:
it interrupts even ZIOs that are patiently waiting in line behind stuck
IOs.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17372
2025-06-17 10:49:40 -07:00
Pavel Snajdr
08caad8257 zcp: get_prop: fix encryptionroot and encryption
It was reported that channel programs' zfs.get_prop doesn't work for
dataset properties encryption and encryptionroot.

They are handled in get_special_prop due to the need to call
dsl_dataset_crypt_stats to load those dsl props.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Graham Christensen <graham@grahamc.com>
Closes #17280
2025-06-17 10:49:40 -07:00
Fedor Uporov
d187e3e1a7 ZVOL: Comment platform-specific empty functions bodies on FreeBSD side
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17383
2025-06-17 10:49:40 -07:00
Ameer Hamza
1215c3b609 Expose dataset encryption status via fast stat path
In truenas_pylibzfs, we query list of encrypted datasets several times,
which is expensive. This commit exposes a public API zfs_is_encrypted()
to get encryption status from fast stat path without having to refresh
the properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17368
2025-06-17 10:49:40 -07:00
Alexander Motin
2fe0d5df94 ZIL: Improve write log size accounting
Before this change write log size TXG throttling mechanism was
accounting only user payload bytes.  But the actual ZIL both on
disk and especially in memory include headers of hundred(s) of
bytes.  Not accouting those may allow applications like
bonnie++, in their wisdom writing one byte at a time, to consume
excessive amount of memory and ZIL/SLOG in one TXG.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17373
2025-06-17 10:49:40 -07:00
George Amanakis
fa545db846 ZTS: testing for leaked key mappings in encrypted non-raw send
This test covers a bug fixed by commit ea74cde: performing an
incremental non-raw send from an encrypted filesystem followed by
exporting the pool. Before that commit, exporting the sending pool
in this scenario would trigger a panic:

VERIFY(avl_is_empty(&sk->sk_dsl_keys)) failed
PANIC at dsl_crypt.c:353:spa_keystore_fini()
Call Trace:
 spl_dumpstack+0x29/0x2f [spl]
 spl_panic+0xd1/0xe9 [spl]
 spl_assert.constprop.0+0x1a/0x30 [zfs]
 spa_keystore_fini+0xc2/0xf0 [zfs]
 spa_deactivate+0x25f/0x610 [zfs]
 spa_evict_all+0xf4/0x200 [zfs]
 spa_fini+0x13/0x140 [zfs]
 zfs_kmod_fini+0x72/0xc0 [zfs]
 openzfs_fini_os+0x13/0x3a [zfs]
 openzfs_fini+0x9/0x6b8 [zfs]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #17366
2025-06-17 10:49:40 -07:00
Cameron Harr
cfb9cba51c Refactor man page and CLI help output per mandoc
The man page and the usage statement from the CLI have been refactored
to abide by the ManDoc standard. Style changes include:
 * Upper-case letters before lower-case
 * List short options w/o arguments first
 * Then list short options w/ arguments
 * Then list long arguments

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #17357
2025-06-17 10:49:40 -07:00
Cameron Harr
b647336cc4 Reformat cli help and man page to be in sync
The man page and CLI usage statements were both a little out
of sync and neither fully alphabetized correctly. That has
been fixed. One outstanding question is whether to get rid of
the ellipses on the CLI usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #16004
Closes #17357
2025-06-17 10:49:40 -07:00
Paul Dagnelie
b9324a1e75 Fix off-by-one bug in range tree code
Without this fix, zfs_range_tree_find_in could return an overlap when
the found range starts immediately after the searched range, with no
actual overlap.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17363
2025-06-17 10:49:40 -07:00
Alexander Motin
64e77fdf3b Fix null dereference in spa_vdev_remove_cancel_sync()
We don't really need to access space map to know where the metaslab
ends, while msp->ms_sm might be NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Fixes #17164
Fixes #17359
Closes #17361
(cherry picked from commit 5c30b24381)
2025-05-28 16:00:28 -07:00
Andres
fd13ad0503 Update 69-vdev.rules.in
Add support to alias md-type devices in udev rules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andres <a-d-j-i@users.noreply.github.com>
Closes #17345
(cherry picked from commit a6f20250de)
2025-05-28 16:00:28 -07:00
Rob Norris
b9f5227d2b lzc_ioctl_fd: add ZFS_IOC_TRACE envvar to enable ioctl tracing
When set, dumps all ZFS ioctl calls and returns and their nvlists to
STDERR, to make debugging and understanding a lot easier.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
(cherry picked from commit a387b7599c)
2025-05-28 16:00:28 -07:00
Rob Norris
b7d14266f1 lzc: move lzc_ioctl_fd() into lzc proper
Name the OS-specific call lzc_ioctl_fd_os(), and make lzc_ioctl_fd()
wrap it, so we can do more in the wrapper.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
(cherry picked from commit c4c3917b2a)
2025-05-28 16:00:28 -07:00
Rob Norris
13768f46e6 libzfs: ensure all ioctl calls go through lzc_ioctl_fd()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
(cherry picked from commit f454cc1723)
2025-05-28 16:00:28 -07:00
Richard Yao
999717c5ac Add Quality Assurance to pull request template
PRs like #17352 have no applicable checkbox, so let us add one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17354
(cherry picked from commit 2e5e4bb0f8)
2025-05-28 16:00:28 -07:00
Richard Yao
25ad9ce692 dmu_objset_hold_flags() should call dsl_dataset_rele_flags() on error
This was caught when doing a manual check to see if #17352 needed to be
improved to catch mismatches across stack frames of the kind that were
first found in #17340.

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17353
(cherry picked from commit 83fa80a550)
2025-05-28 16:00:28 -07:00
Ameer Hamza
6b70ca665d arcstat: prevent ZeroDivisionError when L2ARC becomes empty
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard@ryao.dev>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17348
(cherry picked from commit f0baaa329a)
2025-05-28 16:00:28 -07:00
Rob Norris
8c0f7619b2 Linux 6.2/6.15: del_timer_sync() renamed to timer_delete_sync()
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or
functional change apart from that, so a very minimal update for us.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17229
(cherry picked from commit 841be1d049)
2025-05-28 16:00:28 -07:00
Rob Norris
e64d4718a7 Linux 6.15: mkdir now returns struct dentry *
The intent is that the filesystem may have a reference to an "old"
version of the new directory, eg if it was keeping it alive because a
remote NFS client still had it open.

We don't need anything like that, so this really just changes things so
we return error codes encoded in pointers.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17229
(cherry picked from commit bb740d66de)
2025-05-28 16:00:28 -07:00
Richard Yao
a4de1d38da icp: Use explicit_memset() exclusively in gcm_clear_ctx()
d634d20d1b had been intended to fix a
potential information leak issue where the compiler's optimization
passes appeared to remove `memset()` operations that sanitize sensitive
data before memory is freed for use by the rest of the kernel.

When I wrote it, I had assumed that the compiler would not remove the
other `memset()` operations, but upon reflection, I have realized that
this was a bad assumption to make. I would rather have a very slight
amount of additional overhead when calling `gcm_clear_ctx()` than risk a
future compiler remove `memset()` calls. This is likely to happen if
someone decides to try doing link time optimization and the person will
not think to audit the assembly output for issues like this, so it is
best to preempt the possibility before it happens.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17343
(cherry picked from commit d8a33bc0a5)
2025-05-28 16:00:28 -07:00
George Amanakis
f28c685a84 Fix 2 bugs in non-raw send with encryption
Bisecting identified the redacted send/receive as the source of the bug
for issue #12014. Specifically the call to
dsl_dataset_hold_obj(&fromds) has been replaced by
dsl_dataset_hold_obj_flags() which passes a DECRYPT flag and creates
a key mapping. A subsequent dsl_dataset_rele_flag(&fromds) is missing
and the key mapping is not cleared. This may be inadvertedly used, which
results in arc_untransform failing with ECKSUM in:
 arc_untransform+0x96/0xb0 [zfs]
 dbuf_read_verify_dnode_crypt+0x196/0x350 [zfs]
 dbuf_read+0x56/0x770 [zfs]
 dmu_buf_hold_by_dnode+0x4a/0x80 [zfs]
 zap_lockdir+0x87/0xf0 [zfs]
 zap_lookup_norm+0x5c/0xd0 [zfs]
 zap_lookup+0x16/0x20 [zfs]
 zfs_get_zplprop+0x8d/0x1d0 [zfs]
 setup_featureflags+0x267/0x2e0 [zfs]
 dmu_send_impl+0xe7/0xcb0 [zfs]
 dmu_send_obj+0x265/0x360 [zfs]
 zfs_ioc_send+0x10c/0x280 [zfs]

Fix this by restoring the call to dsl_dataset_hold_obj().

The same applies for to_ds: here replace dsl_dataset_rele(&to_ds) with
dsl_dataset_rele_flags().

Both leaked key mappings will cause a panic when exporting the
sending pool or unloading the zfs module after a non-raw send from
an encrypted filesystem.

Contributions-by: Hank Barta <hbarta@gmail.com>
Contributions-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12014
Closes #17340
(cherry picked from commit ea74cdedda)
2025-05-28 16:00:28 -07:00
Paul Dagnelie
6517ebf4da Cause zpool scan resume commands to get logged in history
Currently, commands that resume a scrub/errorscrub from a paused state
don't get logged in the pool history. This is because resumes actually
return ECANCELED, instead of 0. This causes the tsd code in the common
ioctl logic to not think the ioctl succeeded, which causes the
log_history ioctl to fail with EPERM. However, for resuming a scrub from
a paused state, ECANCELED is success.

There are two options for how to deal with this. The first is the one
that I implemented here; I can't find a good reason for dmu_scan to
return ECANCELED on resume instead of 0, so let's just not. The only
place we check for the ECANCELED value is in zpool_scan, where we just
convert it back to zero.  However, I am aware that this is changing an
ioctl interface, which I believe is a breaking change. I don't think
it's an important change, but maybe there is someone who relies on it.

The other option that could be implemented is to either allow ECANCELED
specifically from dsl_scan in the common ioctl code, or add a generic
facility to the common ioctl code that allows each command to specify
whether or not success happened, regardless of the return values. I am
open to feedback on which option people think would be better.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17301
(cherry picked from commit 086105f4c4)
2025-05-28 16:00:28 -07:00
Alexander Motin
97c1fb6ad5 ARC: Notify dbuf cache about target size reduction
ARC target size might drop significantly under memory pressure,
especially if current ARC size was much smaller than the target.
Since dbuf cache size is a fraction of the target ARC size, it
might need eviction too.  Aside of memory from the dbuf eviction
itself, it might help ARC by making more buffers evictable.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17314
(cherry picked from commit 89a8a91582)
2025-05-28 16:00:28 -07:00
Alexander Motin
db290fd48b Linux: Stop using NR_FILE_PAGES for ARC scaling
I've found that QEMU/KVM guest memory accounted as shared also
included into NR_FILE_PAGES.  But it is actually a non-evictable
anonymous memory.  Using it as a base for zfs_arc_pc_percent
parameter makes ARC to ignore shrinker requests while page cache
does not really have anything to evict, ending up in OOM killer
killing the QEMU process.

Instead use of NR_ACTIVE_FILE + NR_INACTIVE_FILE should represent
the part of a page cache that is actually evictable, which should
be safer to use as a reference for ARC scaling.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17334
(cherry picked from commit 0aa83dce99)
2025-05-28 16:00:28 -07:00
Tony Hutter
a2bcf194f8 runners: Add option to install custom kernel on Fedora
Allow installing a custom kernel version from the Fedora experimental
kernel repos onto the github runners.  This is useful for testing if
ZFS works against a newer kernel.

Fedora has a number of repos with experimental kernel packages. This
PR allows installs from kernels in these repos:

@kernel-vanilla/stable
@kernel-vanilla/mainline
(https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories)

You will need to manually kick of a github runner to test with a custom
kernel version.  To do that, go to the github actions tab under
'zfs-qemu' and click the drop-down for 'run workflow'.  In there you
will see a text box to specify the version (like '6.14').  The scripts
will do their best to match the version to the newest matching version
that the repos support (since they're may be multiple nightly versions
of, say, '6.14').  A full list of kernel versions can be seen in the
dependency stage output if you kick off a manual run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17156
(cherry picked from commit b55256e5bb)
2025-05-28 16:00:28 -07:00
diwakar-kristappagari
280c3c3ace vdev_id: symlinks creation for multipath disk partitions (#17331)
It has been observed that the symlinks are not being created
for the disk partitions on multipath enabled systems.
This fix addresses the issue.

Signed-off-by: Diwakar Kristappagari <diwakar-k@hpe.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
(cherry picked from commit e2ba0f7643)
2025-05-28 16:00:28 -07:00
Rob Norris
2e1f95b6bc AUTHORS/mailmap: update with new contributors
Thanks everyone!

Signed-off-by: Rob Norris <robn@despairlabs.com>
(cherry picked from commit 485a2d0112)
2025-05-28 16:00:28 -07:00
Rob Norris
0dc3656a55 update_authors: output possible mailmap additions
Once we've selected a best ident for the AUTHORS file, it makes sense to
set up a corresponding mailmap entry for any other ident for that
committer, to ensure the git history also reflects this into the future.

So, here we output potential mailmap updates for a human to consider.

For the moment, this needs to be done by a human, because update_authors
uses git to get the author names, and thus is reliant on the mailmap
contents to generate its output, so having it update mailmap directly
would introduce a circular dependency that I'm not totally sure about.
It's definitely better than having to go back through the history and
check each commit by hand though.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
(cherry picked from commit ae2caf9cb0)
2025-05-28 16:00:28 -07:00
Rob Norris
f8bad9b1f9 update_authors: consider Signed-off-by trailers for committer idents
I've increasingly found that commits from new contributors have the
author set in the "Github noreply" obfuscated style. If they do have a
better canonical choice, it's usually in the Signed-off-by: trailer in
the commit message.

I had avoided using these in the first version of this program because
they aren't always present, aren't always correct, and some commits have
multiple signoffs. It seems however that requiring either the name or
the email address to match the commit author sufficiently narrows the
scope to be useful for the "Github noreply" situation, which is really
the main sticking point. And of course, if it gets it wrong, overriding
in .mailmap or AUTHORS is always an option.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
(cherry picked from commit 8e318fda80)
2025-05-28 16:00:28 -07:00
Rob Norris
56a61d54f4 test-runner: rework output dir construction
The old code would compare all the test group names to work out some
sort of common path, but it didn't appear to work consistently,
sometimes placing output in a top-level dir, other times in one or more
subdirs. (I confess, I do not quite understand what it's supposed to
do).

This is a very simple rework that simply looks at all the test group
paths, removes common leading components, and uses the remainder as the
output directory. This should work because groups paths are unique, and
means we get a output dir tree of roughly the same shape as the test
groups in the runfiles and the test source dirs themselves.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: @ImAwsumm
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17167
(cherry picked from commit 9aae14a14a)
2025-05-28 16:00:28 -07:00
Mariusz Zaborski
1c11d3a54f spa: clear checkpoint information during retry
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Closes #17319
(cherry picked from commit 8b9c4e643b)
2025-05-28 16:00:28 -07:00
Rob Norris
db988fabfb linux/uio: remove "skip" offset for UIO_ITER
For UIO_ITER, we are just wrapping a kernel iterator. It will take care
of its own offsets if necessary. We don't need to do anything, and if we
do try to do anything with it (like advancing the iterator by the skip
in zfs_uio_advance) we're just confusing the kernel iterator, ending up
at the wrong position or worse, off the end of the memory region.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17298
(cherry picked from commit 2ee5b51a57)
2025-05-28 16:00:28 -07:00
Alan Somers
fa2cdaa604 More aggressively assert that db_mtx protects db.db_data
db.db_mtx must be held any time that db.db_data is accessed.  All of
these functions do have the lock held by a parent; add assertions to
ensure that it stays that way.

See https://github.com/openzfs/zfs/discussions/17118

* Refactor dbuf_read_bonus to make it obvious why db_rwlock isn't
required.

* Refactor dbuf_hold_copy to eliminate the db_rwlock
Copy data into the newly allocated buffer before assigning it to the db.
That way, there will be no need to take db->db_rwlock.

* Refactor dbuf_read_hole
In the case of an indirect hole, initialize the newly allocated buffer
before assigning it to the dmu_buf_impl_t.

Sponsored by:	ConnectWise
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #17209
(cherry picked from commit c17bdc4914)
2025-05-28 16:00:28 -07:00
Olivier Certner
51ed9640e9 FreeBSD: Use new SYSCTL_SIZEOF()
SYSCTL_SIZEOF() has been introduced in FreeBSD by commit "sysctl(9):
Ease exporting struct sizes; Discourage doing that" (713abc9880aa) in
branch 'main'.  It will soon be backported to 'stable/14'.  We will thus
be able to remove the old, alternate version left in the '#else' branch
as soon as 'stable/13' goes out of support (April 30, 2026).

Sponsored-by:   The FreeBSD Foundation
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Olivier Certner <olce@FreeBSD.org>
Closes #17309
(cherry picked from commit 78628a5c15)
2025-05-28 16:00:28 -07:00
Alexander Motin
9b446fbb60 ARC: Avoid overflows in arc_evict_adj() (#17255)
With certain combinations of target ARC states balance and ghost
hit rates it was possible to get the fractions outside of allowed
range.  This patch limits maximum balance adjustment speed, which
should make it impossible, and also asserts it.

Fixes #17210
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit b1ccab1721)
2025-05-28 16:00:28 -07:00
Rob Norris
ce9cd12c97 txg: generalise txg_wait_synced_sig() to txg_wait_synced_flags() (#17284)
txg_wait_synced_sig() is "wait for txg, unless a signal arrives". We
expect that future development will require similar "wait unless X"
behaviour.

This generalises the API as txg_wait_synced_flags(), where the provided
flags describe the events that should cause the call to return.

Instead of a boolean, the return is now an error code, which the caller
can use to know which event caused the call to return.

The existing call to txg_wait_synced_sig() is now
txg_wait_synced_flags(TXG_WAIT_SIGNAL).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
(cherry picked from commit a7de203c86)
2025-05-28 16:00:28 -07:00
Alexander Motin
52749ebb49 ZTS: Restore some delays in online_offline tests
After more CI runs and code reading after #17259 I've found that
online starts resilver via async mechanism, which does not provide
wait primitives at this time.  Restore some delays to restore CI
until this is properly fixed.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
(cherry picked from commit f85c96edf7)
2025-05-28 16:00:28 -07:00
Alexander Motin
101edf7ed9 Fix race between resilver wait and offline/detach
We should not clear scn_state and notify waiters until we call
vdev_dtl_reassess(), otherwise following offline/detach request
may fail with "no valid replicas".

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
(cherry picked from commit f86d9af16b)
2025-05-28 16:00:28 -07:00
José Luis Salvador Rufo
a2593c1610 tests: fix S_IFMT undeclared at statx.c
`S_IFMT` is declared in `sys/stat.h`, but we cannot include this header
because it redeclares the `statx` function with different argument
types. Therefore, we define `S_IFMT` ourselves, in the same way as the
other definitions.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #17293
Closes #17294
(cherry picked from commit 634c172ee8)
2025-05-28 16:00:28 -07:00
Tony Hutter
fcc7259789 ZTS: Stop zpool_status tests from spamming stdout (#17292)
zpool_status_003 and zpool_status_004_pos use 'dd' to trigger a read of
a file without specifying 'of=/dev/null'.  This spams the ZTS logs
with ~20MB of garbage data. This commit adds 'of=/dev/null'.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
(cherry picked from commit a6cca8a7da)
2025-05-28 16:00:28 -07:00
Tony Hutter
4b014840ea Fix double spares for failed vdev
It's possible for two spares to get attached to a single failed vdev.
This happens when you have a failed disk that is spared, and then you
replace the failed disk with a new disk, but during the resilver
the new disk fails, and ZED kicks in a spare for the failed new
disk.  This commit checks for that condition and disallows it.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes: #16547
Closes: #17231
(cherry picked from commit f40ab9e399)
2025-05-28 16:00:28 -07:00
Tino Reichardt
cd777ba5ad ZTS: Fix replacement/resilver_restart_001 on FreeBSD
Decrease the RESILVER_MIN_TIME_MS variable from 50 to 20.
So the test, which expects two 2 resilver starts will see them.

Logfile of the seen failures before this fix:
log: NOTE: expected 2 resilver start(s) after offline/online, found 1
log: expected 2 resilver start(s) after offline/online, found 1

The test time decreases also from around 00:42 to 00:24 seconds.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16822
Closes #17279
(cherry picked from commit 3b18877269)
2025-05-28 16:00:28 -07:00
Artem
ad63ab2d90 Sort the blocking snapshots list #12751 (#17264)
When multiple snapshots prevent the destruction/rollback of the
respective dataset/snapshot/volume via zfs destroy or zfs rollback,
the error message does not list the blocking snapshots sorted
according to their order of creation. This causes inconvenience and can
lead to confusion, and also creates a contrast with a returned message
from zfs list -t snap function.

Closes: #12751

Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 27f3d94940)
2025-05-28 16:00:28 -07:00
Aleksandr Liber
edae295af9 Double quote variables to prevent globbing and word splitting
This change goes through and quotes variables where appropriate to
avoid issues with incorrect splitting. The performance tests ran into
an issue with $SUDO_COMMAND splitting incorrectly because it was not
quoted. This change fixes that issue and hopefully gets ahead of any
other similar problems.

Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Aleksandr Liber <aleksandr.liber@perforce.com>
Closes #17235
(cherry picked from commit aa46cc9812)
2025-05-28 16:00:28 -07:00
Rob Norris
c85f2fd531 cred: properly pass and test creds on other threads (#17273)
### Background

Various admin operations will be invoked by some userspace task, but the
work will be done on a separate kernel thread at a later time. Snapshots
are an example, which are triggered through zfs_ioc_snapshot() ->
dsl_dataset_snapshot(), but the actual work is from a task dispatched to
dp_sync_taskq.

Many such tasks end up in dsl_enforce_ds_ss_limits(), where various
limits and permissions are enforced. Among other things, it is necessary
to ensure that the invoking task (that is, the user) has permission to
do things. We can't simply check if the running task has permission; it
is a privileged kernel thread, which can do anything.

However, in the general case it's not safe to simply query the task for
its permissions at the check time, as the task may not exist any more,
or its permissions may have changed since it was first invoked. So
instead, we capture the permissions by saving CRED() in the user task,
and then using it for the check through the secpolicy_* functions.

### Current implementation

The current code calls CRED() to get the credential, which gets a
pointer to the cred_t inside the current task and passes it to the
worker task. However, it doesn't take a reference to the cred_t, and so
expects that it won't change, and that the task continues to exist. In
practice that is always the case, because we don't let the calling task
return from the kernel until the work is done.

For Linux, we also take a reference to the current task, because the
Linux credential APIs for the most part do not check an arbitrary
credential, but rather, query what a task can do. See
secpolicy_zfs_proc(). Again, we don't take a reference on the task, just
a pointer to it.

### Changes

We change to calling crhold() on the task credential, and crfree() when
we're done with it. This ensures it stays alive and unchanged for the
duration of the call.

On the Linux side, we change the main policy checking function
priv_policy_ns() to use override_creds()/revert_creds() if necessary to
make the provided credential active in the current task, allowing the
standard task-permission APIs to do the needed check. Since the task
pointer is no longer required, this lets us entirely remove
secpolicy_zfs_proc() and the need to carry a task pointer around as
well.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit c8fa39b46c)
2025-05-28 16:00:28 -07:00
Tino Reichardt
aa9335bbbc ZTS: Optimize KSM on Linux and remove it for FreeBSD
Don't use KSM on the FreeBSD VMs and optimize KSM settings for
Linux to have faster run times.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17247
(cherry picked from commit ba17cedf65)
2025-05-28 16:00:28 -07:00
Quentin Thébault
4f34e8dcf6 zfs-rollback.8: fix typo in example number
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alexander Ziaee <ziaee@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Quentin Thébault <quentin.thebault@defenso.fr>
Closes #17282
(cherry picked from commit 63de2d2dbd)
2025-05-28 16:00:28 -07:00
Tino Reichardt
a33e8b05ee ZTS: Use Ubuntu default url for cloud-image
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17278
(cherry picked from commit 88ec6c4f40)
2025-05-28 16:00:28 -07:00
Alexander Motin
fbff1ae9f6 ZTS: Make zvol_stress write some more
Sometimes it fails unable to see any injected write errors.
I guess writing 25KB of zeroes might be not enough to trigger
errors with probability set to 10%.  Lets try to write more.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17270
(cherry picked from commit d947b9aedd)
2025-05-28 16:00:28 -07:00
Alexander Motin
273db246a4 ZTS: Reduce extra caching in pool_checkpoint (#17268)
Those tests are write-mostly at the nested pool.  Considering we have
3 more layers of caching underneath, we can hint ZFS how to use the
memory better by setting primarycache=metadata.

While there, add missing zpool sync after rm in checkpoint_capacity
before we could potentially see the freed space, would not there be
a pool checkpoint.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 1ef706c4ad)
2025-05-28 16:00:28 -07:00
Sebastian Pauka
28f0c5cfdc Support using llvm-libunwind
This commit adds support for using llvm-libunwind for kernels built
using llvm and clang. The two differences are that the largest register
index is given by _LIBUNWIND_HIGHEST_DWARF_REGISTER, we need to check
whether the register is a floating point register and the prototype
for unw_regname takes the unwind cursor as the first argument.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Sebastian Pauka <me@spauka.se>
Closes #17230
(cherry picked from commit 1b4826b9a2)
2025-05-28 16:00:28 -07:00
Brian Atkinson
a77d641f01 Export correct symbols for Lustre Direct I/O
Originally the Lustre ZFS OSD code was going to use zfs_uio_t structs
for supporting Direct I/O with ZFS. However, this has changed to using
abd_t structs instead. This exports the proper symbols that will be used
by the Lustre ZFS OSD code.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #17256
(cherry picked from commit 7031a48c70)
2025-05-28 16:00:28 -07:00
Artem-OSSRevival
0956fd736c Add more descriptive destroy error message
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org>
Fixes: #14538
Closes: #17234
(cherry picked from commit 37a3e26552)
2025-05-28 16:00:28 -07:00
Alexander Motin
7bb7ff7b49 ZTS: Fix 256MB file leak in zed_cksum_reported
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes: #17267
(cherry picked from commit 38c3a8be83)
2025-05-28 16:00:28 -07:00
Tino Reichardt
658526db99 ZTS: Update FreeBSD version numbers
All defined variants:
- freebsd13-4r, freebsd13-5r, freebsd14-1r, freebsd14-2r (RELEASE)
- freebsd13-5s, freebsd14-2s (STABLE)
- freebsd15-0c (CURRENT)

Used for testing:
- freebsd13-4r (RELEASE)
- freebsd14-2s (STABLE)
- freebsd15-0c (CURRENT)

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes: #17260
(cherry picked from commit 6afb405d96)
2025-05-28 16:00:28 -07:00
Alexander Motin
b590bfc6c8 ZTS: Remove fixed sleeps from slog_006_pos
Replace `sleep 15` with `zpool wait`, which should take much less
than the 15 seconds.  And considering it is called 16 times, this
should save us up to 4 minutes total.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes: #17257
(cherry picked from commit 8f2c2dea3c)
2025-05-28 16:00:28 -07:00
Alexander Motin
03ac770008 ZTS: Polish online_offline tests
- Kill workload first for faster cleanup.
 - Use `zpool wait` for resilver instead of `sleep`.
 - Remove irrelevant workload from `online_offline_003_neg`.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes: #17259
(cherry picked from commit cb49e7701f)
2025-05-28 16:00:28 -07:00
Alexander Motin
95df01020d ZTS: Remove ashift setting from dedup_quota test (#17250)
The test writes 1M of 1KB blocks, which may produce up to 1GB of
dirty data.  On top of that ashift=12 likely produces additional
4GB of ZIO buffers during sync process.  On top of that we likely
need some page cache since the pool reside on files.  And finally
we need to cache the DDT.  Not surprising that the test regularly
ends up in OOMs, possibly depending on TXG size variations.

Also replace fio with pretty strange parameter set with a set of
dd writes and TXG commits, just as we neeed here.

While here, remove compression.  It has nothing to do here, but
waste CI CPU time.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 1d8f625233)
2025-05-28 16:00:28 -07:00
Alexander Motin
243a46f28d Cleanup VERIFY() macros (#17163)
- Fix VERIFY3B() when given non-boolean values.
 - Map EQUIV() into VERIFY3B(,==,) as equivalent.
 - Tune messages for better readability and to closer match source
code for easier search.  Unify user-space messages with kernel.
 - Tune printed types and remove %px outside of Linux kernel.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: @ImAwsumm
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 4866c2fabf)
2025-05-28 16:00:28 -07:00
Rob Norris
7fde3933fb vdev_to_nvlist_iter: ignore draid parameters when matching names (#17228)
Various tools will display draid vdev names with parameters embedded in
them, but would not accept them as valid vdev names when looking them
up, making it difficult to build pipelines involving draid vdevs.

This commit makes it so that if a full draid name is offered for match,
it gets truncated at the first ':' character.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 131df3bbf2)
2025-05-28 16:00:28 -07:00
Alexander Motin
c2424f8d1a Improve L2 caching control for prefetched indirects
dbuf_prefetch_impl() should look on level of current indirect, not
the target prefetch level.  dbuf_prefetch_indirect_done() should
call dnode_level_is_l2cacheable() if we have dpa_dnode to pass it.
It should fix some both false positive and negative L2ARC caching.

While there, fix redacted feature activation assertions.  One was
always true, while another could give false positive if dpa_dnode
is NULL.

George Amanakis <gamanakis@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17204
(cherry picked from commit a497c5fc8b)
2025-05-28 16:00:28 -07:00
Alexander Motin
40b9ad19cc ZTS: Remove TXG_TIMEOUT from dedup_quota test (#17150)
It seems `fio` in `ddt_dedup_vdev_limit` overwhelms the system
with the amount of dirty data caused by DDT updates within one
TXG due to tiny 1KB records used, while I see no reason for this
test to extend the TXGs beyond default.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: @ImAwsumm
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 21850f519b)
2025-05-28 16:00:28 -07:00
Alexander Motin
602fecc316 Prefer embedded blocks to dedup
Since embedded blocks introduction 11 years ago, their writing was
blocked if dedup is enabled.  After searching through the modern
code I see no reason for this restriction to exist.  Same time
embedded blocks are dramatically cheaper.  Even regular write of
so small blocks would likely be cheaper than deduplication, even
if the last is successful, not mentioning otherwise.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17113
(cherry picked from commit 09f4dd06c3)
2025-05-28 16:00:28 -07:00
Alexander Motin
588fa16830 ZAP: Reduce leaf array and free chunks fragmentation
Previous implementation of zap_leaf_array_free() put chunks on the
free list in reverse order.  Also zap_leaf_transfer_entry() and
zap_entry_remove() were freeing name and value arrays in reverse
order.  Together this created a mess in the free list, making
following allocations much more fragmented than necessary.

This patch re-implements zap_leaf_array_free() to keep existing
chunks order, and implements non-destructive zap_leaf_array_copy()
to be used in zap_leaf_transfer_entry() to allow properly ordered
freeing name and value arrays there and in zap_entry_remove().

With this change test of some writes and deletes shows percent of
non-contiguous chunks in DDT reducing from 61% and 47% to 0% and
17% for arrays and frees respectively.  Sure some explicit sorting
could do even better, especially for ZAPs with variable-size arrays,
but it would also cost much more, while this should be very cheap.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16766
(cherry picked from commit 9a81484e35)
2025-05-28 16:00:28 -07:00
Tony Hutter
92f430b00f Tag zfs-2.3.2
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2025-04-30 10:58:28 -07:00
Tony Hutter
2f8fc4a869 RPM: Hold back incompatible kernel packages on Fedora
A user reported that when your upgrade your kernel packages on Fedora
with ZFS installed, only the kernel-devel package gets held back to the
ZFS-supported version, but not the other kernel packages. So if ZFS only
supports the 6.13 kernel, Fedora will still happily upgrade the kernel
RPM to 6.14, but hold back kernel-devel at 6.13, for example.

This commit includes version checks for the 'kernel-uname-r' dependency,
typically provided by the 'kernel-core' package.

Original-patch-by: @jkool702
Reviewed-by: @jkool702
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17265
Closes #17271
2025-04-30 10:58:28 -07:00
Tony Hutter
a39a14eb6e CI: Add Fedora 42 runner (#17249)
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-22 12:29:07 -07:00
Tony Hutter
36864e3d77 GCC 15: Fix unterminated-string-initialization (#17244)
Fix build errors on Fedora 42 like:

  module/zcommon/zfs_valstr.c:193:16: error: initializer-string for
  array of 'char' truncates NUL terminator but destination lacks
  'nonstring' attribute (3 chars into 2 available)

The arrays in zpool_vdev_os.c and zfs_valstr.c don't need to be
NULL terminated, but we do so to make GCC happy.

Closes: #17242

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:45 -07:00
Rob Norris
fea534d1d0 gcm_avx_init: zero the ghash state after hashing the IV
IVs != 96 bits get hashed with GHASH to bring them to 96 bits. Any call
to GHASH will mix the ghash state in gcm_ghash. This is expected to be
zero at first use in an encrypt or decrypt operation, so it needs to be
zeroed after using GHASH in setup.

gcm_init() does this, but gcm_avx_init() zeroed it before setup, not
after, resulting in incorrect encrypt/decrypt results when using AVX GCM
with an IV != 96 bits.

OpenZFS _always_ uses a 96 bit IV (ZIO_DATA_IV_LEN) so this will never
have been hit in any real-world use, which is extremely fortunate, as we
would have incorrectly-encrypted data on-disk. Still, as long as we have
this code here we should make sure it's correct.

Thanks-to: Joel Low <joel@joelsplace.sg>
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
2025-04-16 09:59:45 -07:00
IIIPr0t0typ3III
cc43549b08 Fixed zfs_notify_email for programs like sendmail
zfs_notify_email will now include an empty line separating the header
from the body of the email in case the subject is not provided via a
command line argument. This is necessary for programs like sendmail to
function correctly (everything up to the first empty line is interpreted
as header, which previously resulted in either missing message parts or
unsent emails)

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Felix Schmidt <felixschmidt20@aol.com>
Closed #17238
2025-04-16 09:59:45 -07:00
Rob Norris
04b02f0663 config: fix ZFS_LINUX_TEST_RESULT_SYMBOL with --enable-linux-builtin
The tiniest typo in dd2a46b5e6 (#17106) broke it, by setting the wrong
var with the test var, resulting in it always producing "no".

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17236
2025-04-16 09:59:45 -07:00
Tony Hutter
20f00819f3 Linux 6.0 compat: Check for migratepage VFS (#17217)
The 6.0 kernel removes the 'migratepage' VFS op. Check for
migratepage.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org
2025-04-16 09:59:45 -07:00
Tony Hutter
81de1eae4c debian: Add libtirpc-dev dependency (#17220)
Debian requires libtirpc-dev.  Update our debian/control file to
match Debian's upstream one.

Closes: #17197

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: @manfromafar
2025-04-16 09:59:45 -07:00
Richard Kojedzinszky
5952fc15b9 Fix memory leaks in pool properties handling
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Kojedzinszky <richard@kojedz.in>
Closes #17208
2025-04-16 09:59:45 -07:00
Syed Shahrukh Hussain
a486cac359 Added fix for zpool get state segfaults with two or more vdevs (#15972). (#17213)
The problem was identified in handling of the zpool get state command
line arguments. A pointer vdev was used to point to the argv[1], and
its address set to cb.cb_vdevs.cb_names(pointer to array of strings)
so any increment to cb_names resulted in a segfault. Fix covers a
special case of root parameter at argv[1] and remaining cases are
handled by passing in the argv + 1, which allows cb_names iteration
of next command line arguments (vdevs).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>

Signed-off-by: Syed Shahrukh Hussain <syed.shahrukh@ossrevival.org>
2025-04-16 09:59:45 -07:00
Paul Dagnelie
fbac52e1e9 Fix FDT rollback to not overwrite unnecessary fields (#17205)
When a dedup write fails, we try to roll the DDT entry back to a known
good state. However, this also rolls the refcounts and the last-update
time back to the state they were at when we started this write. This
doesn't appear to be able to cause any refcount leaks (after the fix in
17123). This PR prevents that from happening by only rolling back the
parts of the DDT entry that have been updated by the write so far.

Sponsored-by: iXsystems, Inc.
Sponsored-by: Klara, Inc.

Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-16 09:59:45 -07:00
Rob Norris
8539bdf568 [2.3.2] uconv: add SPDX license tag
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-16 09:59:45 -07:00
Martin Matuška
c312a988b5 freebsd: unbreak module/Makefile.bsd build on 15-CURRENT-arm64
- don't include foreign machine assembly files
- reduce diff to FreeBSD module Makefile

Discovered in FreeBSD port filesystems/openzfs-kmod

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #17219
2025-04-16 09:59:45 -07:00
Paul Dagnelie
bd5465e4eb Fix nonrot property being incorrectly unset (#17206)
When opening a vdev and setting the nonrot property, we used to wait for
each child to be opened before examining its nonrot property. When the
change was made to open vdevs asynchronously, we didn't move the nonrot
check out of the main loop. As a result, the nonrot property is almost
always set to false, regardless of the actual type of the underlying
disks. The fix is simply to move the nonrot check to a separate loop
after the taskq has been waited for.

Sponsored-by: Klara, Inc.
Sponsored-by: Eshtek, Inc.

Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 09:59:45 -07:00
Martin Matuška
5fb1d520fe Multiple printf() size fixes (#17199)
cmd/zinject/zinject.c:
 - use PRIu64 when printing uint64_t

tests/zfs-tests/cmd/clonefile.c:
 - use an unsigned long long to store result from strtoull()
 - use %jd for printing off_t, %zu for size_t, %zd for ssize_t

tests/zfs-tests/tests/functional/vdev_disk/page_alignment.c:
 - use %zx to print size_t

Discovered when compiling on FreeBSD i386.

Signed-off-by: Martin Matuska <mm@FreeBSD.org>

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: @ImAwsumm
2025-04-16 09:59:45 -07:00
Alexander Motin
6f2080f1ab Fix lock reversal on device removal cancel
FreeBSD kernel's WITNESS code detected lock ordering violation in
spa_vdev_remove_cancel_sync().  It took svr_lock while holding
ms_lock, which is opposite to other places.  I was thinking to
resolve it similar to #17145, but looking closer I don't think
we even need svr_lock at that point, since we already asserted
svr_allocd_segs is empty, and we don't need to add there segments
we are going to call free_mapped_segment_cb for.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17164
2025-04-16 09:59:45 -07:00
Paul Dagnelie
9f0be8fca0 Fix dspace underflow bug
Since spa_dspace accounts only normal allocation class space,
spa_nonallocating_dspace should do the same.  Otherwise we may get
negative overflow or respective assertion spa_update_dspace() if
removed special/dedup vdev is bigger than all normal class space.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17183
2025-04-16 09:59:45 -07:00
Piotr Kubaj
12657df52a simd_powerpc.h: enable FPU on FreeBSD
FreeBSD nowadays supports FPU in the kernel on powerpc*, so enable it.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Piotr Kubaj <pkubaj@FreeBSD.org>
Closes #17191
2025-04-16 09:59:45 -07:00
aokblast
153c982aac spl_vfs: fix vrele task runner signature mismatch
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: SHENGYI HONG <aokblast@FreeBSD.org>
Closes #17101
2025-04-16 09:59:45 -07:00
Attila Fülöp
398bdcd884 ZTS: Fix zpool dry run tests output formating
Signed-off-by: Attila Fülöp <attila@fueloep.org>
2025-04-16 09:59:45 -07:00
Attila Fülöp
a9c37b7c38 ZTS: Fix zpool dry run tests depending on output format
Signed-off-by: Attila Fülöp <attila@fueloep.org>
2025-04-16 09:59:45 -07:00
Friedrich Weber
1c6d2f71aa contrib/initramfs: use LVM autoactivation for activating VGs (#17125)
Currently, the zfs initramfs-tools boot script under local-top calls
`vgchange -ay`, which unconditionally activates all logical volumes
(LVs) in all discovered volume groups (VGs). This causes all LVs to be
active after boot. However, users may prefer to not activate certain
VGs/LVs on boot. They might normally use the `--setautoactivation n`
VG/LV flag or the `auto_activation_volume_list` LVM config option to
achieve this, but since the script unconditionally activates all LVs,
neither has an effect.

To fix this, call `vgchange -aay` instead. This triggers LVM
autoactivation, which honors autoactivation settings such as the
`--setautoactivation` flag. It is also more in line with the LVM
documentation, which says autoactivation is "meant to be used by
activation commands that are run automatically by the system" [1].

Note that this change might break misconfigured setups that have ZFS
on top of an LV for which autoactivation is disabled.

[1] https://gitlab.com/lvmteam/lvm2/-/blob/cff93e4d/conf/example.conf.in#L1579


Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-16 09:59:45 -07:00
Ameer Hamza
ab455c7b80 zed: Ensure spare activation after kernel-initiated device removal
In addition to hotplug events, the kernel may also mark a failing vdev
as REMOVED. This was observed in a customer report and reproduced by
forcing the NVMe host driver to disable the device after a failed reset
due to command timeout. In such cases, the spare was not activated
because the device had already transitioned to a REMOVED state before
zed processed the event.
To address this, explicitly attempt hot spare activation when the
kernel marks a device as REMOVED.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17187
2025-04-16 09:59:45 -07:00
Rob Norris
76bd2ae5c8 config: cache results of kernel checks (#17106)
Kernel checks are the heaviest part of the configure checks. This allows
the results to be cached through the normal autoconf cache.

Since we don't want to reuse cached values for different kernels, but
don't want to discard the entire cache on every kernel, we instead add a
short checksum to kernel config cache keys, based on the version and
path, so the cache can hold results for multiple different kernels.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-16 09:59:45 -07:00
Alexander Motin
e6f8c1f612 Block remap for cloned blocks on device removal
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #15604
Closes #17180
2025-04-16 09:59:45 -07:00
Tony Hutter
4f2118edaf runners: Fix tarball build for zfs-qemu-packages workflow (#17158)
The initial tarballs we built for for zfs-2.3.1 were incorrect since
they did not have a ./configure script, and their files were not
in a top level zfs-2.3.1/ directory.  This commit copies the way we
built them on buildbot so the tarballs are created as expected.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:45 -07:00
Tony Hutter
8f6f85472f runners: Fix zfs-release RPM creation (#17173)
The zfs-qemu-packages workflow was incorrectly copying the built
zfs-release RPMs to ~/zfsonlinux.github.com rather than ~/zfs.  This
meant that the RPMs were not being correctly picked in the artifacts
files.  This fixes the issue.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: @ImAwsumm
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2025-04-16 09:59:45 -07:00
Pavel Snajdr
c22f5c1c55 Linux: Fix zfs_prune panics v2 (#17121)
It turns out that approach taken in the original version of the patch
was wrong. So now, we're taking approach in-line with how kernel
actually does it - when sb is being torn down, access to it
is serialized via sb->s_umount rwsem, only when that lock is taken
is it okay to work with s_flags - and the other mistake I was doing
was trying to make SB_ACTIVE work, but apparently the kernel checks
the negative variant - not SB_DYING and not SB_BORN.

Kernels pre-6.6 don't have SB_DYING, but check if sb is hashed
instead.

Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:45 -07:00
Tony Hutter
01095b288f ZTS: Fix zpool_status_features_001_pos local test (#17174)
Update 'zfs-helpers.sh -i' to install the compatibility.d/ file
symlinks. These are need to run the zpool_status_features_001_pos test
from a local workspace (as opposed to running ZTS from a formal
'make install' or install from RPMs, which are unaffected).

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: @ImAwsumm
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 09:59:45 -07:00
Simon Howard
eff0634c1c Disambiguate reference to kibibytes, not kilobytes
A minor nitpick that is kind of obvious based on the surrounding context
and reference to powers of two. It's better to be explicit, though.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 09:59:45 -07:00
Simon Howard
2c41e2a109 Fix spelling errors
Unlike some of my other fixes which are more subtle, these are
unambigously spelling errors.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 09:59:45 -07:00
Simon Howard
81bcab3525 Correct "umount" to "unmount" in a couple of places
This is admittedly a nitpicky change, but `umount` is the command that
performs an *unmount*. So if we are talking about unmounting something
we should phrase it that way.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 09:59:45 -07:00
Simon Howard
1422bcaaf8 Capitalize in various places where appropriate
These are mostly acronyms (CPUs; ZILs) but also proper nouns such as
"Unix" and "Unicode" which should also be capitalized.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 09:59:45 -07:00
Simon Howard
97c2569371 Fix cases where "descendent" is used as a noun
As per Wiktionary: "descendent" may be used as an adjective (e.g.
"a descendent dataset") but for nouns (e.g. "descendants of this
dataset"), "descendant" is the correct spelling.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 09:59:45 -07:00
Simon Howard
2bbc8e10ae Make use of "i.e." (id est) consistent
This is the most common way it is written throughout the manpages, but
there are a few cases where it is written slightly differently.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 09:59:45 -07:00
Simon Howard
74fadc0cc9 Harmonize on American spelling in several places
Most of the documentation is written in American English, so it makes
sense to be consistent.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 09:59:45 -07:00
Alexander Motin
08b0a161cc CI: Remove FreeBSD 13.3 and 14.1 tests (#17162)
They are out of support and we are really low on CI resources.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2025-04-16 09:59:45 -07:00
Brian Atkinson
559d34e8c2 Updating dio_read_verify ZTS test (#16830)
There was a recent CI ZTS test failure on FreeBSD 14 for the
dio_read_verify test case. The failure reported there was no ARC reads
while the buffer wes being manipulated. All checksum verify errors for
Direct I/O reads are rerouted through the ARC, so there should be ARC
reads accounted for. In order to help debug any future failures of this
test case, the order of checks has been changed. First there is a check
for DIO verify failures for the reads and then ARC read counts are
checked.

This PR also contains general cleanup of the comments in the test
script.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 09:59:45 -07:00
Alexander Motin
a848b05b13 Fix deadlock on I/O errors during device removal
spa_vdev_remove_thread() should not hold svr_lock while loading a
metaslab.  It may block ZIO threads, required to handle metaslab
loading, at least in case of read errors causing recovery writes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17145
2025-04-16 09:59:45 -07:00
Alan Somers
7cc60afb0b Always perform bounds-checking in metaslab_free_concrete
The vd->vdev_ms access can overflow due to on-disk corruption, not just
due to programming bugs.  So it makes sense to check its boundaries even
in production builds.

Sponsored by:	ConnectWise
Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #17136
2025-04-16 09:59:45 -07:00
Rob Norris
b0f2bcd063 convert_wycheproof: don't check tag len on invalid tests
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
2025-04-16 09:59:45 -07:00
Rob Norris
a6a2f37acd convert_wycheproof: fix compile failure
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
2025-04-16 09:59:45 -07:00
Rob Norris
9e009acbdc dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143)
This helps to avoids confusion with the similarly-named
txg_wait_synced().

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-16 09:59:45 -07:00
Rob Norris
be73f72453 spdxcheck: program to check SPDX license tags
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:45 -07:00
Rob Norris
76d0c74c35 SPDX: license tags: LicenseRef-OpenZFS-ThirdParty-PublicDomain
SPDX have repeatedly rejected the creation of a tag for a public domain
dedication, as not all dedications are clear and unambiguious in their
meaning and not all jurisdictions permit relinquishing a copyright
anyway.

A reasonably common workaround appears to be to create a local
(project-specific) identifier to convey whatever meaning the project
wishes it to. To cover OpenZFS' use of third-party code with a public
domain dedication, we use this custom tag.

Further reading:
- https://github.com/spdx/old-wiki/blob/main/Pages/Legal%20Team/Decisions/Dealing%20with%20Public%20Domain%20within%20SPDX%20Files.md
- https://spdx.github.io/spdx-spec/v2.3/other-licensing-information-detected/
- https://cr.yp.to/spdx.html

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:45 -07:00
Rob Norris
c30a228608 SPDX: license tags: OpenSSL-standalone
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:45 -07:00
Rob Norris
846796c424 SPDX: license tags: Brian-Gladman-3-Clause
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
e4a2ab7c90 SPDX: license tags: BSD-2-Clause OR GPL-2.0-only
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
38468bbad6 SPDX: license tags: BSD-3-Clause OR GPL-2.0-only
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
61d88b0243 SPDX: license tags: LGPL-2.1-or-later
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
6b2c046d18 SPDX: license tags: GPL-2.0-or-later
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
9070f890e1 SPDX: license tags: Apache-2.0
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
091da72c66 SPDX: license tags: MIT
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
8cacac7ed4 SPDX: license tags: BSD-3-Clause
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
865ca576ab SPDX: license tags: BSD-2-Clause
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
9530eb64e0 SPDX: license tags: CDDL-1.0
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
059ab2ddc2 ZTS: ICP encryption tests
This commit adds tests that ensure that the ICP crypto_encrypt() and
crypto_decrypt() produce the correct results for all implementations
available on this platform.

The actual ZTS scripts are simple drivers for the crypto_test program in
it's "correctness" mode. This mode takes a file full of test vectors
(inputs and expected outputs), runs them, and checks that the results
are expected. It will run the tests for each implementation of the
algorithm provided by the ICP.

The test vectors are taken from Project Wycheproof, which provides a
huge number of tests, including exercising many edge cases and common
implementation mistakes. These tests are provided are JSON files, so a
program is included here to convert them into a simpler line-based
format for crypto_test to consume.

crypto_test also has a "performance" mode, which will run simple
benchmarks against all implementations provded by the ICP and output
them for comparison. This is not used by ZTS, but is available to assist
with development of new implementations of the underlying primitives.

Thanks-to: Joel Low <joel@joelsplace.sg>
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
2025-04-16 09:59:44 -07:00
Rob Norris
4541c95a9b ZTS: test clearing pool and vdev userprops
Confirming that clearing pool and vdev userprops produce the same
result: an empty value, with default source.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16887
2025-04-16 09:59:44 -07:00
Rob Norris
3062b3866c spa_sync_props: remove pool userprops by setting empty-string
People have noted there's no way to remove a pool userprop, only zero
it. Turns vdev userprops had a method, by setting empty-string. So this
makes pool userprops follow the same behaviour.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16887
2025-04-16 09:59:43 -07:00
shodanshok
52f3f92bbf Add receive:append permission for limited receive
Force receive (zfs receive -F) can rollback or destroy snapshots and
file systems that do not exist on the sending side (see zfs-receive man
page). This means an user having the receive permission can effectively
delete data on receiving side, even if such user does not have explicit
rollback or destroy permissions.

This patch adds the receive:append permission, which only permits
limited, non-forced receive. Behavior for users with full receive
permission is not changed in any way.

Fixes #16943
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #17015
2025-04-02 17:06:40 -07:00
Alan Somers
91656b4e2a Update FreeBSD CI images
* FreeBSD 12 is EoL.  Drop it.
* Use the latest FreeBSD 13 and 14 versions.
* Add FreeBSD 15.0-CURRENT.
* Use the current python version.

Sponsored by:	ConnectWise
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #17139
2025-04-02 17:05:31 -07:00
Alexander Motin
53cbf06d68 Fix deduplication of overridden blocks
Implementation of DDT pruning introduced verification of DVAs in
a block pointer during ddt_lookup() to not by mistake free previous
pruned incarnation of the entry.  But when writing a new block in
zio_ddt_write() we might have the DVAs only from override pointer,
which may never have "D" flag to be confused with pruned DDT entry,
and we'll abandon those DVAs if we find a matching entry in DDT.

This fixes deduplication for blocks written via dmu_sync() for
purposes of indirect ZIL write records, that I have tested.  And
I suspect it might actually allow deduplication for Direct I/O,
even though in an odd way -- first write block directly and then
delete it later during TXG commit if found duplicate, which part
I haven't tested.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17120
2025-04-02 17:05:24 -07:00
Rob Norris
6503f8c6f0 Linux/vnops: implement STATX_DIOALIGN
This statx(2) mask returns the alignment restrictions for O_DIRECT
access on the given file.

We're expected to return both memory and IO alignment. For memory, it's
always PAGE_SIZE. For IO, we return the current block size for the file,
which is the required alignment for an arbitrary block, and for the
first block we'll fall back to the ARC when necessary, so it should
always work.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16972
2025-04-02 17:04:14 -07:00
Alan Somers
ad07b09cc3 Verify every block pointer is either embedded, hole, or has a valid DVA
Now instead of crashing when attempting to read the corrupt block
pointer, ZFS will return ECKSUM, in a stack that looks like this:

```
none:set-error
zfs.ko`arc_read+0x1d82
zfs.ko`dbuf_read+0xa8c
zfs.ko`dmu_buf_hold_array_by_dnode+0x292
zfs.ko`dmu_read_uio_dnode+0x47
zfs.ko`zfs_read+0x2d5
zfs.ko`zfs_freebsd_read+0x7b
kernel`VOP_READ_APV+0xd0
kernel`vn_read+0x20e
kernel`vn_io_fault_doio+0x45
kernel`vn_io_fault1+0x15e
kernel`vn_io_fault+0x150
kernel`dofileread+0x80
kernel`sys_read+0xb7
kernel`amd64_syscall+0x424
kernel`0xffffffff810633cb
```

This patch should hopefully also prevent such corrupt block pointers
from being written to disk in the first place.

And in zdb, don't crash when printing a block pointer with no valid
DVAs.  If a block pointer isn't embedded yet doesn't have any valid
DVAs, that's a data corruption bug.  zdb should be able to handle the
situation gracefully.

Finally, remove an extra check for gang blocks in SNPRINTF_BLKPTR.  This
check, which compares the asizes of two different DVAs within the same
BP, was added by illumos-gate commit b24ab67[^1], and I can't understand
why.  It doesn't appear to do anything useful, so remove it.

[^1]: b24ab67627

Fixes		#17077
Sponsored by:	ConnectWise
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #17078
2025-04-02 17:03:01 -07:00
Rob Norris
49da67eb39 AUTHORS: refresh with recent new contributors
The reward for good work is more work.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closed #17141
2025-04-02 17:03:01 -07:00
Fabian-Gruenbichler
112dca3483 linux: zvols: correctly detect flush requests (#17131)
since 4.10, bio->bi_opf needs to be checked to determine all kinds of
flush requests. this was the case prior to the commit referenced below,
but the order of ifdefs was not the usual one (newest up top), which
might have caused this to slip through.

this fixes a regression when using zvols as Qemu block devices, but
might have broken other use cases as well. the symptoms are that all
sync writes from within a VM configured to use such a virtual block
devices are ignored and treated as async writes by the host ZFS layer.

this can be verified using fio in sync mode inside the VM, for example
with

 fio \
 --filename=/dev/sda --ioengine=libaio --loops=1 --size=10G \
 --time_based --runtime=60 --group_reporting --stonewall --name=cc1 \
 --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 \
 --numjobs=1 --sync=1

which shows an IOPS number way above what the physical device underneath
supports, with "zpool iostat -r 1" on the hypervisor side showing no
sync IO occuring during the benchmark.

with the regression fixed, both fio inside the VM and the IO stats on
the host show the expected numbers.

Fixes: 846b598519
"config: remove HAVE_REQ_OP_* and HAVE_REQ_*"

Signed-off-by: Fabian-Gruenbichler <f.gruenbichler@proxmox.com>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-02 17:03:01 -07:00
Tony Hutter
a2c5295d70 zed: Print return code on failed zpool_prepare_disk
We had a case where we were autoreplacing a disk and
zpool_prepare_disk failed for some reason, and ZED
didn't log the return code.  This commit logs the code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17124
2025-04-02 17:03:01 -07:00
Alexander Motin
f145371660 Check portable objset MAC even if local is zeroed
PR #14161 made spa_do_crypt_objset_mac_abd() to ignore MAC errors
if local MAC can not be calculated at the time.  But it does not
mean we should also ignore portable MAC errors there.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17122
2025-04-02 17:03:01 -07:00
Paul Dagnelie
8100e476ea Add more DDT tests
The new Fast Dedup feature has a lot of moving parts, and only some of
them have tests. We have some tests for prefetch and quota, and a
generic ZAP shrinking test, but we don't have anything for the pruning
command or specific to DDT zap shrinking. Here we add a couple small new
tests for zpool ddtprune and DDT-specific ZAP shrinking.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17049
2025-04-02 17:03:01 -07:00
Rob Norris
95bd9d4ebc ZTS: replace uses of TMPDIR with mktemp
Most of these are trying to use TMPDIR to put their work files somewhere
sensible. Now that we've set up correctly, they can all just use mktemp
to do the job.

In a couple of places cleaning up temp files wasn't being done
correctly, which has been fixed.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-04-02 17:03:01 -07:00
Rob Norris
6f8ff94016 ZTS: make uses of mktemp consistent
In all cases, rely on mktemp itself to make the best decision about
where to place the file or directory. In all cases, that decision will
be $TMPDIR, which we have set globally.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-04-02 17:03:01 -07:00
Rob Norris
a1786ca93c ZTS: zfs-tests: set TMPDIR to FILEDIR
Many tests use mktemp to create temporary files and dirs, which will
usually put them in /tmp unless instructed otherwise. This had led to
many tests trying to give mktemp a useful temp path in ad-hoc ways, and
others just using it directly without knowing they're potentially
leaving stuff lying around.

So we set TMPDIR to FILEDIR, which makes the simplest uses of mktemp put
things in the wanted work dir.

Included here is a hack to get TMPDIR into the test. If a test has to be
run as a different user (most of them), it is run through sudo. ld.so
from glibc will not pass TMPDIR to a setuid program, so instead we
re-set TMPDIR after sudo before running the target command.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-04-02 17:03:01 -07:00
Rob Norris
c2a1ea3a33 ZTS: test-runner: always apply timestamp to outputdir before updating
The default outputdir had a timestamp appended in TestRun.__init__, and
then the timestamp was unconditionally applied again after the runfile
had been loaded, assuming that an outputdir would be set in the runfile
too. If the runfile didn't have an outputdir, then the outputdir would
get a second timestamp appended.

Further, if test groups or individual tests themselves specificed an
outputdir, those would be set on their config, but would not get a
timestamp appended. It's not entirely clear if that's wrong or not, but
it is certainly not consistent with the rest.

To clean all this up, change things to append a timestamp to a received
outputdir (from arg or runfile) before setting it in any TestRun,
TestGroup or Test object.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-04-02 17:03:01 -07:00
Rob Norris
c48d819d8d ZTS: runfiles: remove explicit outputdir
The config file value overrides any set by the operator, making it quite
difficult to put the test output elsewhere. The default is
/var/tmp/test_results (via BASEDIR in test-runner) so this shouldn't
change anything for the default case.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-04-02 17:03:01 -07:00
Rob Norris
fcd0a2fb89 ZTS: zfs-tests: use configured FILEDIR for all temp paths
The default file vdevs, constrained binpath and temporary runfiles were
all explicitly places in /var/tmp. Instead, put them under FILEDIR,
which is set from -d and defaults to /var/tmp. TEST_BASE_DIR is also
initialised from FILEDIR, which means all data for the run will now end
up under the operator-specified data dir.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-04-02 17:03:01 -07:00
Rob Norris
bb05632632 ZTS: replace all uses of /var/tmp with TEST_BASE_DIR
The operator can override TEST_BASE_DIR by setting its source var
FILEDIR through zfs-tests.sh -d. There were a handful of cases where
this was not honoured.

By default FILEDIR (and so TEST_BASE_DIR) is /var/tmp, so there should
be no functional change if the operator does nothing.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-04-02 17:03:01 -07:00
Tony Hutter
a901229fe6 Linux 6.14 compat: META (#17098) (#17172)
Update the META file to reflect compatibility with the 6.14
kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: @ImAwsumm
2025-04-02 16:56:33 -07:00
Rob Norris
5f7037067e
Revert "zinject: count matches and injections for each handler" (#17137)
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
Adding fields to zinject_record_t unexpectedly extended zfs_cmd_t,
preventing some things working properly with 2.3.1 userspace tools
against 2.3.0 kernel module.

This reverts commit fabdd502f4.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-03-24 13:49:10 -07:00
Tony Hutter
f3e4043a36 Tag zfs-2.3.1
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2025-03-05 09:36:39 -08:00
Tony Hutter
91bc62642e Add 'zfs-qemu-packages' workflow for RPM building
Add a new 'zfs-qemu-packages' GH workflow for manually building RPMs
and test installing ZFS RPMs from a yum repo. The workflow has a
dropdown menu in the Github runners tab with two options:

Build RPMs - Build release RPMs and tarballs and put them into an
             artifact ZIP file.  The directory structure used in
             the ZIP file mirrors the ZFS yum repo.

Test repo -  Test install the ZFS RPMs from the ZFS repo.  On
             Almalinux, this will do a DKMS and KMOD test install
             from both the regular and testing repos.  On Fedora,
             it will do a DKMS install from the regular repo.  All
             test install results will be displayed in the Github
             runner Summary page. Note that the workflow provides an
             optional text box where you can specify the full URL to
             an alternate repo.  If left blank, it will install from
             the default repo from the zfs-release RPM.

Most developers will never need to use this workflow.  It is intended
to be used by the ZFS admins for building and testing releases.

This commit also modularizes many of the runner scripts so they can
be used by both the zfs-qemu and zfs-qemu-packages workflows.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17005
2025-03-05 09:35:51 -08:00
Tony Hutter
acfd6511cf Linux 6.13 compat: META (#17098)
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
Update the META file to reflect compatibility with the 6.13 kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-02-28 00:42:29 +05:00
Tony Hutter
f741c841dd zpool: allow relative vdev paths
`zpool create` won't let you use relative paths to disks.  This is
annoying when you want to do:

	zpool create tank ./diskfile

But have to do..

	zpool create tank `pwd`/diskfile

This fixes it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17042
2025-02-28 00:42:29 +05:00
Ameer Hamza
637f918211 arc: avoid possible deadlock in arc_read
In l2arc_evict(), the config lock may be acquired in reverse order
(e.g., first the config lock (writer), then a hash lock) unlike in
arc_read() during scenarios like L2ARC device removal. To avoid
deadlocks, if the attempt to acquire the config lock (reader) fails
in arc_read(), release the hash lock, wait for the config lock, and
retry from the beginning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17071
2025-02-28 00:42:29 +05:00
Paul Dagnelie
7e72312eff Don't try to get mg of hole vdev in removal
Don't try to get mg of hole vdev in removal

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17080
2025-02-28 00:42:29 +05:00
aokblast
383256c329 spa: fix signature mismatch for spa_boot_init as eventhandler required
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: SHENGYI HONG <aokblast@FreeBSD.org>
Closes #17088
2025-02-28 00:42:29 +05:00
Alexander Motin
c2668b2d10 Better fill empty metaslabs
Before this change zfs_metaslab_switch_threshold tunable switched
metaslabs each time ones index reduced by two (which means biggest
contiguous chunk reduced to 1/4).  It is a good idea to balance
metaslabs fragmentation.  But for empty metaslabs (having power-
of-2 sizes) this means switching when they get just below the half
of their capacity.  Inspection with zdb after filling new pool to
half capacity shown most of its metaslabs filled to half capacity.
I consider this sub-optimal for pool fragmentation in a long run.

This change blocks the metaslabs switching if most of the metaslab
free space (15/16) is represented by a single contiguous range.
Such metaslab should not be considered fragmented until it actually
fail some big allocation.  More contiguous filling should improve
data locality and increase time before previously filled and
partially freed metaslab is touched again, giving it more time to
free more contiguous chunks for lower fragmentation.  It should
also slightly reduce spacemap traffic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17081
2025-02-28 00:42:29 +05:00
Rob Norris
b4ce059a76 suspend_resume_single: clear pool errors on fail
If the timing is unfortunate, the pool can suspend just as we're failing
because it didn't suspend. If we don't resume the pool, we hang trying
to destroy it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17054
2025-02-28 00:42:29 +05:00
Rob Norris
92d1686a2a include: move zio_priority_t into zfs.h
It's included so it's effectively already part of it, but it's not
always installed as a userspace header, making zfs.h effectively
useless. Might as well just combine it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Close #17066
2025-02-28 00:42:29 +05:00
Rob Norris
7ea899be04 vdev_file: make FLUSH and TRIM asynchronous
zfs_file_fsync() and zfs_file_deallocate() are both blocking ops, so the
zio_taskq thread is active and blocked both while waiting for the IO
call and then while calling zio_execute() for the next stage. This is a
particular issue for FLUSH, as the z_flush_iss queue typically only has
one thread; multiple flushes arriving at once can cause long delays if
the underlying fsync() response is particularly slow.

To fix this, we dispatch both FLUSH and TRIM to the z_vdev_file taskq,
just as we do for reads and writes. Further, we return all results
through zio_interrupt(), so neither the issue nor the file taskqs are
blocked.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17064
2025-02-28 00:42:29 +05:00
Chunwei Chen
e085d66f7a Fix wrong free function in arc_hdr_decrypt
Need to use arc_free_data_abd to free abd type buffer.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #17079
2025-02-28 00:42:29 +05:00
Rob Norris
af062c480c vdev_file: unify FreeBSD and Linux implementations (#17046)
Kernel & userspace specifics are in zfs_file_os.c, so there's no
particular reason these have to be separate.

The one platform-specific part is in the Linux kernel part, to offload
flushes to a taskq if we're already inside a filesystem transaction.
This would be normally be an unsatisfying wart, but I'm intending to
remove this shortly, so I'm content to leave it gated for the moment.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2025-02-28 00:42:29 +05:00
Alexander Motin
f2ab5b82da Fix metaslab group fragmentation math (#17037)
Since we are calculating a free space fragmentation, we should
weight metaslabs by the amount of their free space, not a full
size.  Fragmentation of full metaslabs may not matter in presence
empty ones.  The old algorithm did not differentiate metaslabs
having only one free 4KB block from metaslabs having 50% of space
free in 4KB blocks, reporting higher fragmentation.

While there, move metaslab_group_alloc_update() call after setting
mg_fragmentation, otherwise the effect may be delayed by one TXG.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-28 00:42:29 +05:00
Rob Norris
1bdce0410c range_tree: convert remaining range_* defs to zfs_range_*
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
2025-02-28 00:42:29 +05:00
Ivan Volosyuk
55b21552d3 Linux 6.12 compat: Rename range_tree_* to zfs_range_tree_*
Linux 6.12 has conflicting range_tree_{find,destroy,clear} symbols.

Signed-off-by: Ivan Volosyuk <Ivan.Volosyuk@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
2025-02-28 00:42:29 +05:00
vandanarungta
c4fa9c2962 Free memory in an error path in spl-kmem-cache.c
skc->skc_name also needs to be freed in an error path.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Vandana Rungta <vrungta@amazon.com>
Closes #17041
2025-02-28 00:42:29 +05:00
Umer Saleem
8487b6c9b4 Update the dataset name in handle after zfs_rename (#17040)
For zfs_rename, after the dataset name is successfully updated,
the dataset handle that was passed to zfs_rename, still contains
the old name, due to which, the dataset handle becomes invalid.
The following operations performed using this handle result in
error since the dataset with old name cannot be found anymore.

changelist_rename does update the names in dataset handles,
but those are temporary handles that were created during
changelist_gather. The original handle that was used to call
zfs_rename is not updated.

We should update the name in original ZFS handle after the IOCTL
for rename returns success for the operation.

Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-28 00:42:29 +05:00
Rob Norris
0be3b266ed zio: do no-op injections just before handing off to vdevs
The purpose of no-op is to simulate a failure between a device cache and
its permanent store. We still want it to go through the queue and
respond in the same way to everything else.

So, inject "success" as the very last thing, and then move on to
VDEV_IO_DONE to be dequeued and so any followup work can occur.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17029
2025-02-28 00:42:29 +05:00
Dr. Christian Kohlschütter
001ab5941d Fix "make install" with DESTDIR set (#16995)
"DESTDIR=/path/to/target/root/ make install" may fail when installing to
a root that contains an existing lib/modules structure. When run as root
we may even affect the wrong kernel (the build system's one, or, if
running a different version, some other directory in /lib/modules, but
not the desired one installed in DESTDIR).

Add a missing reference to the INSTALL_MOD_PATH root when calling
"depmod" during "make install"

Also add a switch "DONT_DELETE_MODULES_FILES=1" that skips the removal
of files named "modules.*" prior to running depmod.

Signed-off-by: Christian Kohlschütter <christian@kohlschutter.com>
Closes #16994

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-28 00:42:29 +05:00
George Amanakis
7784947923 optimize recv_fix_encryption_hierarchy()
recv_fix_encryption_hierarchy() in its present state goes through all
stream filesystems, and for each one traverses the snapshots in order to
find one that exists locally. This happens by calling guid_to_name() for
each snapshot, which iterates through all children of the filesystem.
This results in CPU utilization of 100% for several minutes (for ~1000
filesystems on a Ryzen 4350G) for 1 thread at the end of a raw receive
(-w, regardless whether encrypted or not, dryrun or not).

Fix this by following a different logic: using the top_fs name, call
gather_nvlist() to gather the nvlists for all local filesystems. For
each one filesystem, go through the snapshots to find the corresponding
stream's filesystem (since we know the snapshots guid and can search
with it in stream_avl for the stream's fs). Then go on to fix the
encryption roots and locations as in its present state.

Avoiding guid_to_name() iteratively makes
recv_fix_encryption_hierarchy() significantly faster (from several
minutes to seconds for ~1000 filesystems on a Ryzen 4350G).

Another problem is the following: in case we have promoted a clone of
the filesystem outside the top filesystem specified in zfs send, zfs
receive does not fail but returns an error:
recv_incremental_replication() fails to find its origin and errors out
with needagain=1. This results in recv_fix_hierarchy() not being called
which may render some children of the top fs not mountable since their
encryption root was not updated. To circumvent this make
recv_incremental_replication() silently ignore this error.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16929
2025-02-28 00:42:29 +05:00
Paul Dagnelie
a0f8d3c584 Add kstats tracking gang allocations
Gang blocks have a significant impact on the long and short term
performance of a zpool, but there is not a lot of observability into
whether they're being used.  This change adds gang-specific kstats to
ZFS, to better allow users to see whether ganging is happening.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17003
2025-02-28 00:42:29 +05:00
Paul Dagnelie
2adca179b6 Expand fragmentation table to reflect larger possibile allocation sizes
When you are using large recordsizes in conjunction with raidz, with
incompressible data, you can pretty reliably be making 21 MB
allocations. Unfortunately, the fragmentation metric in ZFS considers
any metaslabs with 16 MB free chunks completely unfragmented, so you can
have a metaslab report 0% fragmented and be unable to satisfy an
allocation. When using the segment-based metaslab weight, this is
inconvenient; when using the space-based one, it can seriously degrade
performance.

We expand the fragmentation table to extend up to 512MB, and redefine
the table size based on the actual table, rather than having a static
define. We also tweak the one variable that depends on fragmentation
directly.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #16986
2025-02-28 00:42:29 +05:00
Mateusz Piotrowski
a77b6398ed Fix typos in zpool_do_scrub() error messages (#17028)
Sponsored-by: Wasabi Technology, Inc.
Sponsored-by: Klara, Inc.

Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-28 00:42:29 +05:00
mnrx
d1b0c4ec5a Clarify documentation of zfs destroy on snapshots (#17021)
The current documentation of `zfs destroy` in application to snapshots
is particularly difficult to understand. The following changes are made:

- Remove circular reference to `zfs destroy` in the documentation of
  that command.
- Remove use of "for example", which implies there are more,
  undocumented reasons that ZFS may fail to destroy a snapshot
  immediately.
- Mention properties `defer_destroy` and `userrefs`.
- Add `zfsprops(8)` to "SEE ALSO" list.
- Clarify meaning of `-d` option.



Requires-builders: none

Signed-off-by: mnrx <83848843+mnrx@users.noreply.github.com>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-28 00:42:29 +05:00
Rob Norris
3266d4d655 Linux 6.14: BLK_MQ_F_SHOULD_MERGE was removed
According to the upstream change, all callers set it, and all block
devices either honoured it or ignored it, so removing it entirely allows
a bunch of handling for the "unset" case to be removed, and it becomes
effectively implied.

We follow suit, and keep setting it for older kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-28 00:42:29 +05:00
Rob Norris
51bec16060 Linux 6.14: dops->d_revalidate now takes four args
This is a convenience for filesystems that need the inode of their
parent or their own name, as its often complicated to get that
information. We don't need those things, so this is just detecting which
prototype is expected and adjusting our callback to match.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-28 00:42:29 +05:00
Rob Norris
3b5c3f52d2 zio: lock parent zios when updating wait counts on reexecute
As zios are reexecuted after resume from suspension, their ready and
wait states need to be propagated to wait counts on all their parents.

It's possible for those parents to have active children passing through
READY or DONE, which then end up in zio_notify_parent(), take their
parent's lock, and decrement the wait count. Without also taking a lock
here, it's possible for an increment race to occur, which leads to
either there being no references left (tripping the assert in
zio_notify_parent()), or a parent waiting forever for a nonexistent
child to complete.

To protect against this, we simply take the appropriate zio locks in
zio_reexecute() before updating the wait counts.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17016
2025-02-28 00:42:29 +05:00
Jaydeep Kshirsagar
67f0469f70 Avoid ARC buffer transfrom operations in prefetch
This change will prevent prefetch to perform unnecessary ARC buffer
fill when reading from disk.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jaydeep Kshirsagar <jkshirsagar@maxlinear.com>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes #17013
2025-02-28 00:42:29 +05:00
Jerzy Kołosowski
1aa4351c1f Add recursive dataset mounting and unmounting support to pam_zfs_key (#16857)
Introduced functionality to recursively mount datasets with a new
config option `mount_recursively`. Adjusted existing functions to
handle the recursive behavior and added tests to validate the feature.
This enhances support for managing hierarchical ZFS datasets within
a PAM context.

Signed-off-by: Jerzy Kołosowski <jerzy@kolosowscy.pl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-28 00:42:29 +05:00
Brian Atkinson
0e21e473a7 Update pin_user_pages() calls for Direct I/O
Originally #16856 updated Linux Direct I/O requests to use the new
pin_user_pages API. However, it was an oversight that this PR only
handled iov_iter's of type ITER_IOVEC and ITER_UBUF. Other iov_iter
types may try and use the pin_user_pages API if it is available. This
can lead to panics as the iov_iter is not being iterated over correctly
in zfs_uio_pin_user_pages().

Unfortunately, generic iov_iter API's that call pin_user_page_fast() are
protected as GPL only. Rather than update zfs_uio_pin_user_pages() to
account for all iov_iter types, we can simply just call
zfs_uio_get_dio_page_iov_iter() if the iov_iter type is not ITER_IOVEC
or ITER_UBUF. zfs_uio_get_dio_page_iov_iter() calls the
iov_iter_get_pages() calls that can handle any iov_iter type.

In the future it might be worth using the exposed iov_iter iterator
functions that are included in the header iov_iter.h since v6.7. These
functions allow for any iov_iter type to be iterated over and advanced
while applying a step function during iteration. This could possibly be
leveraged in zfs_uio_pin_user_pages().

A new ZFS test case was added to test that a ITER_BVEC is handled
correctly using this new code path. This test case was provided though
issue #16956.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16956 
Closes #17006
2025-02-25 22:33:25 +05:00
Alan Somers
6e9911212e Make the vfs.zfs.vdev.raidz_impl sysctl cross-platform
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #16980
2025-02-25 22:32:11 +05:00
rmacklem
42bad93414 FreeBSD: Add setting of the VFCF_FILEREV flag
The flag VFCF_FILEREV was recently defined in FreeBSD
so that a file system could indicate that it increments
va_filerev by one for each change.

Since ZFS does do this, set the flag if defined for the
kernel being built.  This allows the NFSv4.2 server to
reply with the correct change_attr_type attribute value.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closed #16976
2025-02-25 22:29:39 +05:00
Rob Norris
a28f5a94f4 zinject: add "probe" device injection type
Injecting a device probe failure is not possible by matching IO types,
because probe IO goes to the label regions, which is explicitly excluded
from injection. Even if it were possible, it would be awkward to do,
because a probe is sequence of reads and writes.

This commit adds a new IO "type" to match for injection, which looks for
the ZIO_FLAG_PROBE flag instead. Any probe IO will be match the
injection record and recieve the wanted error.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16947
2025-02-25 22:29:33 +05:00
Rob Norris
0dfcfe023e zinject: make iotype extendable
I'm about to add a new "type", and I need somewhere to put it!

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16947
2025-02-25 22:29:02 +05:00
Rob Norris
cf55fdea24 ZTS: remove get_arcstat
It's now a simple wrapper, so lets just call kstat direct.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-25 22:28:46 +05:00
Rob Norris
6edbbe0646 ZTS: update existing kstat users to new helper
Removes other custom helpers and direct accesses to /proc.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-25 22:28:27 +05:00
Rob Norris
198621f910 ZTS: reimplement kstat helper function
The old kstat helper function was barely used, I suspect in part because
it was very limited in the kinds of kstats it could gather.

This adds new functions to replace it, for each kind of thing that can
have stats: global, pool and dataset. There's options in there to get a
single stat value, or all values within a group.

Most importantly, the interface is the same for both platforms.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-25 22:28:14 +05:00
Tim Smith
e6c98d11ec Fix several typos in the man pages
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tim Smith <tsmith84@gmail.com>
Closes #16965
2025-02-25 22:27:47 +05:00
Alexander Ziaee
083d322fa0 zfs-destroy.8: Fix minor formatting typo
The warning at the end of the second example in the description section
was actually inside the options table. Move the El macro to match what
is done in the first section for improved readability.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Ziaee <ziaee@FreeBSD.org>
Closes #16962
2025-02-25 22:27:23 +05:00
Tony Hutter
c36faf668b Update RELEASES.md LTS release to 2.2
2.3.0 is out now, so make 2.2.x the LTS release.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16945
Closes #16948
2025-02-25 22:27:10 +05:00
Peng Liu
404254bacb style: remove unnecessary spaces in sa.h
Removed three unnecessary spaces in the definition of the
sa_attr_reg_t structure to improve code style consistency
and adhere to OpenZFS coding standards.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Peng Liu <littlenewton6@gmail.com>
Closes #16955
2025-02-25 22:26:45 +05:00
Rob Norris
8eba6a5ba1 Makefile.in: pass ARCH for modules_install as well
To do a cross-build using only kbuild rather than a full source tree,
ARCH= needs to be passed for the kbuild Makefile to find the
archspecific Makefile.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16944
2025-02-25 22:25:41 +05:00
Rob Norris
fabdd502f4 zinject: count matches and injections for each handler
When building tests with zinject, it can be quite difficult to work out
if you're producing the right kind of IO to match the rules you've set
up.

So, here we extend injection records to count the number of times a
handler matched the operation, and how often an error was actually
injected (ie after frequency and other exclusions are applied).

Then, display those counts in the `zinject` output.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16938
2025-02-25 22:25:24 +05:00
Alexander Motin
675b49d2a1 FreeBSD: Use ashift in vdev_check_boot_reserve()
We should not hardcode 512-byte read size when checking for loader
in the boot area before RAIDZ expansion.  Disk might be unable to
handle that I/O as is, and the code zio_vdev_io_start() handling
the padding asserts doing it only for top-level vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16942
2025-02-25 22:24:59 +05:00
Rob Norris
54eec0fa59 ZTS: remove empty zpool_add--allow-ashift-mismatch test
Added in b1e46f869, but empty, so no point keeping it around.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16931
2025-02-25 22:23:44 +05:00
Brian Behlendorf
bc06d8164b Linux: Enable Direct IO by default
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
Aligns the 2.3 release branch with the well tested default behavior
in the master branch.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-01-13 13:53:41 -08:00
Brian Behlendorf
76745cf5b8 Tag 2.3.0-1
Some checks are pending
checkstyle / checkstyle (push) Waiting to run
CodeQL / Analyze (cpp) (push) Waiting to run
CodeQL / Analyze (python) (push) Waiting to run
zloop / zloop (push) Waiting to run
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-01-13 10:03:37 -08:00
Brian Behlendorf
0c88ae6187 Tag 2.3.0-rc5
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-01-05 17:31:26 -08:00
n0-1
307fd0da1f Support for cross-compiling kernel modules
In order to correctly cross-compile, one has to pass ARCH and
CROSS_COMPILE make flags to kernel module build calls. Facilitate this
in the same way as for custom CC flag by recognizing KERNEL_-prefixed
configure environment variables of same name.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Closes #16924
2025-01-05 17:31:26 -08:00
Robert Evans
9f1c5e0b10 Remove duplicate dedup_legacy_create in common.run
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16926
2025-01-05 17:31:26 -08:00
Richard Kojedzinszky
5ba50c8135 fix: make zfs_strerror really thread-safe and portable
#15793 wanted to make zfs_strerror threadsafe, unfortunately, it
turned out that strerror_l() usage was wrong, and also, some libc 
implementations dont have strerror_l().

zfs_strerror() now simply calls original strerror() and copies the 
result to a thread-local buffer, then returns that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Kojedzinszky <richard@kojedz.in>
Closes #15793
Closes #16640
Closes #16923
2025-01-04 11:58:15 -08:00
Don Brady
25565403aa Too many vdev probe errors should suspend pool
Similar to what we saw in #16569, we need to consider that a
replacing vdev should not be considered as fully contributing
to the redundancy of a raidz vdev even though current IO has
enough redundancy.

When a failed vdev_probe() is faulting a disk, it now checks
if that disk is required, and if so it suspends the pool until
the admin can return the missing disks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16864
2025-01-04 11:58:15 -08:00
Robert Evans
47b7dc976b Add Makefile dependencies for scripts/zfs-tests.sh -c
This updates the Makefile to be more correct for parallel make.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16030
Closes #16922
2025-01-04 11:58:15 -08:00
Toomas Soome
125731436d ZTS: checkpoint_discard_busy should use save_tunable/restore_tunable
Instead of using hardwired value for SPA_DISCARD_MEMORY_LIMIT,
use save_tunable and restore_tunable to restore the pre-test state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16919
2025-01-03 15:23:49 -08:00
Rob Norris
4425a7bb85 vdev_open: clear async remove flag after reopen
It's possible for a vdev to be flagged for async remove after the pool
has suspended. If the removed device has been returned when the pool is
resumed, the ASYNC_REMOVE task will still run at the end of txg, and
remove the device from the pool again.

To fix, we clear the async remove flag at reopen, just as we did for the
async fault flag in 5de3ac223.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16921
2025-01-03 15:23:49 -08:00
Toomas Soome
e47b033eae ZTS: remove unused TESTDIRS from pam/cleanup.ksh
Remove TESTDIRS as it is not set for pam tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16920
2025-01-03 15:23:49 -08:00
pstef
cfec8f13a2 zfs_vnops_os.c: fallocate is valid but not supported on FreeBSD
This works around
/usr/lib/go-1.18/pkg/tool/linux_amd64/link:
mapping output file failed: invalid argument

It's happened to me under a Linux jail, but it's also happened to other
people, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270247#c4

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: pstef <pstef@users.noreply.github.com>
Closes #16918
2025-01-03 15:23:49 -08:00
Toomas Soome
997db7a7fc ZTS: checkpoint_discard_busy does not set 16M on cleanup
Originally hex value is used as decimal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16917
2025-01-02 17:04:10 -08:00
Toomas Soome
e411081aa0 ZTS: functional/mount scripts are not removing /var/tmp/testdir.X dirs
cleanup.ksh is assuming we have TESTDIRS set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16915
2025-01-02 17:04:10 -08:00
Toomas Soome
a55b6fe94a ZTS: zfs_mount_all_fail leaves /var/tmp/testrootPIDNUM directory around
Before we can remove test files, we need to unmount datasets
used by test first.

See also: zfs_mount_all_mountpoints.ksh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16914
2025-01-02 17:04:10 -08:00
James Reilly
939e9f0b6a ZTS: add centos stream10 (#16904)
Added centos as optional runners via workflow_dispatch

removed centos-stream9 from the FULL_OS runner list as CentOS is not
officially support by ZFS. This commit will add preliminary support for
EL10 and allow testing ZFS ahead of EL10 codebase solidifying in ~6
months

Signed-off-by: James Reilly <jreilly1821@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2025-01-02 17:04:10 -08:00
Andrew Walker
679b164cd3 Add missing zfs_exit() when snapdir is disabled (#16912)
zfs_vget doesn't zfs_exit when erroring out due to snapdir
being disabled.

Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Reviewed-by: @bmeagherix
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-01-02 17:04:10 -08:00
shodanshok
c2d9494f99 set zfs_arc_shrinker_limit to 0 by default
zfs_arc_shrinker_limit was introduced to avoid ARC collapse due to
aggressive kernel reclaim. While useful, the current default (10000) is
too prone to OOM especially when MGLRU-enabled kernels with default
min_ttl_ms are used. Even when no OOM happens, it often causes too much
swap usage.

This patch sets zfs_arc_shrinker_limit=0 to not ignore kernel reclaim
requests. ARC now plays better with both kernel shrinker and pagecache
but, should ARC collapse happen again, MGLRU behavior can be tuned or
even disabled.

Anyway, zfs should not cause OOM when ARC can be released.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16909
2024-12-29 11:53:45 -08:00
Ameer Hamza
b952e061df zvol: implement platform-independent part of block cloning
In Linux, block devices currently lack support for `copy_file_range`
API because the kernel does not provide the necessary functionality.
However, there is an ongoing upstream effort to address this
limitation: https://patchwork.kernel.org/project/dm-devel/cover/20240520102033.9361-1-nj.shetty@samsung.com/.
We have adopted this upstream kernel patch into the TrueNAS kernel and
made some additional modifications to enable block cloning specifically
for the zvol block device. This patch implements the platform-
independent portions of these changes for inclusion in OpenZFS.
This patch does not introduce any new functionality directly into
OpenZFS. The `TX_CLONE_RANGE` replay capability is only relevant when
zvols are migrated to non-TrueNAS systems that support Clone Range
replay in the ZIL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16901
2024-12-29 11:53:45 -08:00
Alexander Motin
0fea7fc109 ZTS: Reduce file size in redacted_panic to 1GB
This test takes 3 minutes on RELEASE FreeBSD bots, but on CURRENT,
probably due to debugging it has in kernel, it does not complete
within 10 minutes, ending up killed.  As I see all the redacting
here happens within the first ~128MB of the file, so I hope it
won't matter if there is 1GB of data instead of 2GB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin
0f6d955a35 ZTS: Remove procfs use from zpool_import_status
procfs might be not mounted on FreeBSD.  Plus checking for specific
PID might be not exactly reliable.  Check for empty list of jobs
instead.

Premature loop exit can result in failed test and failed cleanup,
failing also some following tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin
c3d2412b05 ZTS: Remove non-standard awk hex numbers usage
FreeBSD recently removed non-standard hex numbers support from awk.
Neither it supports -n argument, enabling it in gawk.  Instead of
depending on those rewrite list_file_blocks() fuction to handle the
hex math in shell instead of awk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Rob Norris
74064cb175 zpool_get_vdev_prop_value: show missing vdev userprops
If a vdev userprop is not found, present it as value '-', default
source, so it matches the output from pool userprops.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16887
2024-12-29 11:53:45 -08:00
Alexander Motin
30b97ce218 ZTS: Increase write sizes for RAIDZ/dRAID tests
Many RAIDZ/dRAID tests filled files doing millions of 100 or even
10 byte writes.  It makes very little sense since we are not
micro-benchmarking syscalls or VFS layer here, while before the
blocks reach the vdev layer absolute majority of the small writes
will be aggregated.  In some cases I see we spend almost as much
time creating the test files as actually running the tests.  And
sometimes the tests even time out after that.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16905
2024-12-29 11:53:45 -08:00
Rob Norris
9519e7ebcc microzap: set hard upper limit of 1M
The count of chunks in a microzap block is stored as an uint16_t
(mze_chunkid). Each chunk is 64 bytes, and the first is used to store a
header, so there are 32767 usable chunks, which is just under 2M. 1M is
the largest power-2-rounded block size under 2M, so we must set the
limit there.

If it goes higher, the loop in mzap_addent can overflow and fall into
the PANIC case.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16888
2024-12-29 11:53:45 -08:00
Alexander Motin
f9b02fe7e3 Fix readonly check for vdev user properties
VDEV_PROP_USERPROP is equal do VDEV_PROP_INVAL and so is not a real
property.  That's why vdev_prop_readonly() does not work right for
it.  In particular it may declare all vdev user properties readonly
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16890
2024-12-29 11:53:45 -08:00
Umer Saleem
cb8da70329 Skip iterating over snapshots for share properties
Setting sharenfs and sharesmb properties on a dataset can become costly
if there are large number of snapshots, since setting the share
properties iterates over all snapshots present for a dataset. If it is
the root dataset for which we are trying to set the share property,
snapshots for all child datasets and their children will also be
iterated.

There is no need to iterate over snapshots for share properties
because we do not allow share properties or any other property,
to be set on a snapshot itself execpt for user properties.

This commit skips iterating over snapshots for share properties,
instead iterate over all child dataset and their children for share
properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16877
2024-12-29 11:53:45 -08:00
Rob Norris
c944c46a98 zfs_main: fix alignment on props usage output
I guess we've got some long property names since this was first set up!

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16883
2024-12-29 11:53:45 -08:00
Tino Reichardt
166a7bc602 CI: Fix FreeBSD 13.4 STABLE build
In #16869 we added FreeBSD 13.4 STABLE, but forget the special
thing, that the virtio nic within FreeBSD 13.x is buggy.

This fix adds the needed rtl8139 nic to the VM.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16885
2024-12-29 11:53:45 -08:00
Rob Norris
e90124a7c8 zprop: fix value help for ZPOOL_PROP_CAPACITY
It's a percentage and documented as such, but we were showing it as
<size>.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16881
2024-12-29 11:53:45 -08:00
Brian Behlendorf
18b3bea861 CI: Add FreeBSD 14.2 RELEASE+STABLE builds
Update the CI to include FreeBSD 14.2 as a regularly tested platform.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16869
2024-12-29 11:53:45 -08:00
Brian Atkinson
d67eb17e27 Use pin_user_pages API for Direct I/O requests
As of kernel v5.8, pin_user_pages* interfaced were introduced. These
interfaces use the FOLL_PIN flag. This is preferred interface now for
Direct I/O requests in the kernel. The reasoning for using this new
interface for Direct I/O requests is explained in the kernel
documenetation:
Documentation/core-api/pin_user_pages.rst

If pin_user_pages_unlocked is available, the all Direct I/O requests
will use this new API to stay uptodate with the kernel API requirements.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16856
2024-12-16 10:26:52 -08:00
Brian Atkinson
1862c1c0a8 Removing old code outside of 4.18 kernsls
There were checks still in place to verify we could completely use
iov_iter's on the Linux side. All interfaces are available as of kernel
4.18, so there is no reason to check whether we should use that
interface at this point. This PR completely removes the UIO_USERSPACE
type. It also removes the check for the direct_IO interface checks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16856
2024-12-16 10:26:49 -08:00
Shengqi Chen
b57f53036d simd_stat: fix undefined CONFIG_KERNEL_MODE_NEON error on armel
CONFIG_KERNEL_MODE_NEON depends on CONFIG_NEON. Neither is defined
on armel. Add a guard to avoid compilation errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16871
2024-12-16 10:26:45 -08:00
Brian Behlendorf
4b8bf3c48a Fix stray "no" in configure output
This is purely a cosmetic fix which removes a stray "no" from
the configure output.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16867
2024-12-16 10:26:42 -08:00
Alexander Motin
696943533c Fix use-afer-free regression in RAIDZ expansion
We should not dereference rra after the last zio_nowait() is called.
It seems very unlikely, but ASAN in ztest managed to catch it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16868
2024-12-16 10:26:39 -08:00
kotauskas
2284a61129 Remount datasets on soft-reboot
The one-shot zfs-mount.service is incorrectly deemed active by 
Systemd after a systemctl soft-reboot. As such, soft-rebooting
prevents zfs mount -a from being ran automatically.

This commit makes it so that zfs-mount.service is marked as being 
undone by the time umount.target is reached, so that zfs.target then 
pulls it in again and gets it restarted after a soft reboot.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: kotauskas <v.toncharov@gmail.com>
Closes #16845
2024-12-16 10:26:35 -08:00
Rob Norris
e1833a72f9 flush: only detect lack of flush support in one place
It seems there's no good reason for vdev_disk & vdev_geom to explicitly
detect no support for flush and set vdev_nowritecache.  Instead, just
signal it by setting the error to ENOTSUP, and let zio_vdev_io_assess()
take care of it in one place.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16855
2024-12-16 10:26:30 -08:00
Rob Norris
5bb034f533 flush: don't report flush error when disabling flush support
The first time a device returns ENOTSUP in repsonse to a flush request,
we set vdev_nowritecache so we don't issue flushes in the future and
instead just pretend the succeeded. However, we still return an error
for the initial flush, even though we just decided such errors are
meaningless!

So, when setting vdev_nowritecache in response to a flush error, also
reset the error code to assume success.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16855
2024-12-16 10:26:27 -08:00
Poscat
1e08e49a28 build: use correct bashcompletiondir on arch
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: poscat <poscat@poscat.moe>
Closes #16861
2024-12-16 10:26:23 -08:00
Rob Norris
6604fe9a06 backtrace: fix off-by-one on string output
sizeof("foo") includes the trailing null byte, so all the output had
nulls through it. Most terminals quietly ignore it, but it makes some
tools misdetect file types and other annoyances.

Easy fix: subtract 1.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16862
2024-12-16 10:26:20 -08:00
Brian Behlendorf
7cbe7bbbd4 Tag 2.3.0-rc4
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zfs-qemu / Setup (push) Has been cancelled
zloop / zloop (push) Has been cancelled
zfs-qemu / qemu-x86 (push) Has been cancelled
zfs-qemu / Cleanup (push) Has been cancelled
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-12-12 16:20:30 -08:00
Chunwei Chen
2dcc8fe035 Fix DR_OVERRIDDEN use-after-free race in dbuf_sync_leaf
In dbuf_sync_leaf, we clone the arc_buf in dr if we share it with db
except for overridden case. However, this exception causes a race where
dbuf_new_size could free the arc_buf after the last dereference of
*datap and causes use-after-free. We fix this by cloning the buf
regardless if it's overridden.

The race:
--
P0                                     P1

                                       dbuf_hold_impl()
                                         // dbuf_hold_copy passed
                                         // because db_data_pending NULL

dbuf_sync_leaf()
  // doesn't clone *datap
  // *datap derefed to db_buf
  dbuf_write(*datap)

                                       dbuf_new_size()
                                         dmu_buf_will_dirty()
                                           dbuf_fix_old_data()
                                             // alloc new buf for P0 dr
                                             // but can't change *datap

                                         arc_alloc_buf()
                                         arc_buf_destroy()
                                           // alloc new buf for db_buf
                                           // and destroy old buf

  dbuf_write() // continue
    abd_get_from_buf(data->b_data,
    arc_buf_size(data))
      // use-after-free
--

Here's an example when it happens:

BUG: kernel NULL pointer dereference, address: 000000000000002e
RIP: 0010:arc_buf_size+0x1c/0x30 [zfs]
Call Trace:
 dbuf_write+0x3ff/0x580 [zfs]
 dbuf_sync_leaf+0x13c/0x530 [zfs]
 dbuf_sync_list+0xbf/0x120 [zfs]
 dnode_sync+0x3ea/0x7a0 [zfs]
 sync_dnodes_task+0x71/0xa0 [zfs]
 taskq_thread+0x2b8/0x4e0 [spl]
 kthread+0x112/0x130
 ret_from_fork+0x1f/0x30

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #16854
2024-12-12 16:20:30 -08:00
Alexander Motin
38875918d8 BRT: Check bv_mos_entries in brt_entry_lookup()
When vdev first sees some block cloning, there is a window when
brt_maybe_exists() might already return true since something was
cloned, but bv_mos_entries is still 0 since BRT ZAP was not yet
created.  In such case we should not try to look into the ZAP
and dereference NULL bv_mos_entries_dnode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16851
2024-12-12 16:20:30 -08:00
Rob Norris
0d51852ec7 Remove unnecessary CSTYLED escapes on top-level macro invocations
cstyle can handle these cases now, so we don't need to disable it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
73a73cba71 cstyle: ignore old non-POSIX types in macro invocations
In code generation macros, we often use names like `uint` when
constructing handler functions. These are not being used as types, so
exclude them from the admonishment to use POSIX type names.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
87947f2440 cstyle: understand macro params can be empty
It's not uncommon to have empty parameters in code generator macros,
usually when multiple parameters are concatenated or stringified into a
single token or literal. So, exclude the space-before-comma check, which
will allow construction like `MACRO_CALL(foo, , baz)`.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
0e87150b6c cstyle: understand basic top-level macro invocations
We quite often invoke macros outside of functions, usually to generate
functions or data. cstyle inteprets these as function headers, which at
least have opinions for indenting.

This introduces a separate state for top-level macro invocations, and
excludes it from matching functions. For the moment, most of the
existing rules will continue to apply, but this gives us a way to add or
removes rules targeting macros specifically.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Alexander Motin
7742e29387 Optimize RAIDZ expansion
- Instead of copying one ashift-sized block per ZIO, copy as much
as we have contiguous data up to 16MB per old vdev.  To avoid data
moves use gang ABDs, so that read ZIOs can directly fill buffers
for write ZIOs.  ABDs have much smaller overhead than ZIOs in both
memory usage and processing time, plus big I/Os do not depend on
I/O aggregation and scheduling to reach decent performance on HDDs.
 - Reduce raidz_expand_max_copy_bytes to 16MB on 32bit platforms.
 - Use 32bit range tree when possible (practically always now) to
slightly reduce memory usage.
 - Use ZIO_PRIORITY_REMOVAL for early stages of expansion, same as
for main ones.
 - Fix rate overflows in `zpool status` reporting.

With these changes expanding RAIDZ1 from 4 to 5 children I am able
to reach 6-12GB/s rate on SSDs and ~500MB/s on HDDs, both are
limited by devices instead of CPU.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15680
Closes #16819
2024-12-06 09:05:02 -08:00
Alexander Motin
f54052a122 Fix false assertion in dmu_tx_dirty_buf() on cloning
Same as writes block cloning can increase block size and number of
indirection levels.  That means it can dirty block 0 at level 0 or
at new top indirection level without explicitly holding them.

A block cloning test case for large offsets has been added.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16825
2024-12-05 11:49:06 -08:00
Rob Norris
d874f27776 zdb_il: use flex array member to access ZIL records
In 6f50f8e16 we added flex arrays to lr_XX_t structs to silence kernel
bounds check warnings. Userspace code was mostly not updated to use them
though.

It seems that in the right circumstances, compilers can get confused
about sizes in the same way, and throw warnings. This commits switch
those uses over to use the flex array fields also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16832
2024-12-05 09:33:21 -08:00
Alexander Motin
84d7d53e91 Improve speculative prefetcher for block cloning
- Issue prescient prefetches for demand indirect blocks after the
first one.  It should be quite rare for reads/writes, but much more
useful for cloning due to much bigger (up to 1022 blocks) accesses.
It covers the gap during the first couple accesses when we can not
speculate yet, but we know what is needed right now.  It reduces
dbuf_hold() sync read delays in dmu_buf_hold_array_by_dnode().
 - Increase maximum prefetch distance for indirect blocks from 64
to 128MB.  It should cover the maximum 1022 blocks of block cloning
access size in case of default 128KB recordsize used.  In case of
bigger recordsize the above prescient prefetch should also help.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16814
2024-12-05 09:33:21 -08:00
Alexander Motin
d90042dedb Allow dsl_deadlist_open() return errors
In some cases like dsl_dataset_hold_obj() it is possible to handle
those errors, so failure to hold dataset should be better than
kernel panic.  Some other places where these errors are still not
handled but asserted should be less dangerous just as unreachable.

We have a user report about pool corruption leading to assertions
on these errors.  Hopefully this will make behavior a bit nicer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16836
2024-12-05 09:33:21 -08:00
Mark Johnston
0e46085ee6 FreeBSD: Remove an incorrect assertion in zfs_getpages()
The pages in the array may become valid after this initial unbusying,
so the assertion only holds during the first iteration of the outer
loop.

Later in zfs_getpages(), the dmu_read_pages() loop handles already-valid
pages.  Just drop the assertion, it's not terribly useful.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: Klara, Inc.
Closes #16810
Closes #16834
2024-12-05 09:33:21 -08:00
Mariusz Zaborski
3b0c1131ef Add ability to scrub from last scrubbed txg
Some users might want to scrub only new data because they would like
to know if the new write wasn't corrupted.  This PR adds possibility
scrub only newly written data.

This introduces new `last_scrubbed_txg` property, indicating the
transaction group (TXG) up to which the most recent scrub operation
has checked and repaired the dataset, so users can run scrub only
from the last saved point. We use a scn_max_txg and scn_min_txg
which are already built into scrub, to accomplish that.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Sponsored-By: Klara Inc.
Closes #16301
2024-12-05 09:33:21 -08:00
shodanshok
5988de77b0 Fix race in libzfs_run_process_impl
When replacing a disk, a child process is forked to run a script called
zfs_prepare_disk (which can be useful for disk firmware update or health
check). The parent than calls waitpid and checks the child error/status
code.

However, the _reap_children thread (created from zed_exec_process to
manage zedlets) also waits for all children with the same PGID and can
stole the signal, causing the replace operation to be aborted.

As waitpid returns -1, the parent incorrectly assume that the child
process had an error or was killed. This, in turn, leaves the newly
added disk in REMOVED or UNAVAIL status rather than completing the
replace process.

This patch changes the PGID of the child process execuing the
prepare script, shielding it from the _reap_children thread.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16801
2024-12-05 09:33:21 -08:00
Alexander Motin
00debc1361 FreeBSD: Remove some illumos compat from vnode.h
Should make no difference, just some dead code cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-05 09:33:21 -08:00
Alexander Motin
af10714e42 FreeBSD: Return ifndef IN_BASE back to fix the build
FreeBSD's libprocstat seems to build kernel code in user space,
which does not work here due to undefined vnode_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-05 09:33:21 -08:00
Rob Norris
747781a345 zinject(8): rename "ioctl" to "flush"
Doc bug missed in d7605ae77.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16827
2024-12-02 18:14:26 -08:00
Alexander Motin
b17ea73f9d Fix regression in dmu_buf_will_fill()
Direct I/O implementation added condition to call dbuf_undirty()
only in case of block cloning.  But the condition is not right if
the block is no longer dirty in this TXG, but still in DB_NOFILL
state.  It resulted in block not reverting to DB_UNCACHED and
following NULL de-reference on attempt to access absent db_data.

While there, add assertions for db_data to make debugging easier.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16829
2024-12-02 18:14:26 -08:00
Alexander Motin
17cdb7a2b1 Add missing parenthesis in VERIFYF()
Without them the order of operations might get unexpected.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16826
2024-12-02 18:14:26 -08:00
Pavel Snajdr
b673bcba4d Linux: fix zfs_uio_dio_check_for_zero_page
The intent here is to replace the zero page pointer in the array of
pointers to pages in the struct.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #16812 
Closes #16689
Closes #16642
2024-12-02 18:14:26 -08:00
Ivan Volosyuk
d8886275df Linux: Fix detection of register_sysctl_sz
Adjust the m4 function to mimic sentinel we use in spl-proc.c
This fixes the detection on kernels compiled with CONFIG_RANDSTRUCT=y

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ivan Volosyuk <Ivan.Volosyuk@gmail.com>
Closes: #16620
Closes: #16805
2024-12-02 18:14:26 -08:00
Rob Norris
1aee375947 zdb: show dedup table and log attributes
There's interesting info in there that is going to help with
understanding dedup behavior at any given moment.

Since this is a format change, tests that rely on that output have been
modified to match.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16755
2024-12-02 18:14:26 -08:00
Pavel Snajdr
a1907b038a Assert if we're logging after final txg was set
This allowed to debug #16714, fixed in #16782.  Without assertions
added here it is difficult to figure out what logs cause the problem,
since the assertion happens in sync thread context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes #16795
2024-12-02 18:14:26 -08:00
Alexander Motin
d359f7f547 FreeBSD: Reduce copy_file_range() source lock to shared
Linux locks copy_file_range() source as shared.  FreeBSD was doing
it also, but then was changed to exclusive, partially because KPI
of that time was doing so, and partially seems out of caution.
Considering zfs_clone_range() uses range locks on both source and
destination, neither should require exclusive vnode locks. But one
step at a time, just sync it with Linux for now.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16789
Closes #16797
2024-12-02 18:14:26 -08:00
Alexander Motin
90603601b4 FreeBSD: Lock vnode in zfs_ioctl()
Previously vnode was not locked there, unlike Linux.  It required
locking it in vn_flush_cached_data(), which recursed on the lock
if called from zfs_clone_range(), having the vnode locked.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16789
Closes #16796
2024-12-02 18:14:26 -08:00
Pavel Snajdr
ecd0b1528e Linux: Fix zfs_prune panics
by protecting against sb->s_shrink eviction on umount with newer kernels

deactivate_locked_super calls shrinker_free and only then
sops->kill_sb cb, resulting in UAF on umount when trying
to reach for the shrinker functions in zpl_prune_sb of
in-umount dataset

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #16770
2024-12-02 18:14:26 -08:00
Brian Behlendorf
3ed1d608a8 Linux 6.12 compat: META
Update the META file to reflect compatibility with the 6.12 kernel.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16793
2024-11-21 08:24:37 -08:00
Alexander Motin
c165daa0b1 BRT: Clear bv_entcount_dirty on destroy
This fixes assertion in brt_sync_table() on debug builds when last
cloned block on the vdev is freed and bv_meta_dirty is cleared,
while bv_entcount_dirty is not.  Should not matter in production.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16791
2024-11-21 08:24:37 -08:00
Alexander Motin
1a5414ba2f BRT: More optimizations after per-vdev splitting
- With both pending and current AVL-trees being per-vdev and having
effectively identical comparison functions (pending tree compared
also birth time, but I don't believe it is possible for them to be
different for the same offset within one transaction group), it
makes no sense to move entries from one to another.  Instead inline
dramatically simplified brt_entry_addref() into brt_pending_apply().
It no longer requires bv_lock, since there is nothing concurrent
to it at the time.  And it does not need to search the tree for the
previous entries, since it is the same tree, we already have the
entry and we know it is unique.
 - Put brt_vdev_lookup() and brt_vdev_addref() into different tree
traversals to avoid false positives in the first due to the second
entcount modifications.  It saves dramatic amount of time when a
file cloned first time by not looking for non-existent ZAP entries.
 - Remove avl_is_empty(bv_tree) check from brt_maybe_exists().  I
don't think it is needed, since by the time all added entries are
already accounted in bv_entcount. The extra check must be producing
too many false positives for no reason.  Also we don't need bv_lock
there, since bv_entcount pointer must be table at this point, and
we don't care about false positive races here, while false negative
should be impossible, since all brt_vdev_addref() have already
completed by this point.  This dramatically reduces lock contention
on massive deletes of cloned blocks.  The only remaining one is
between multiple parallel free threads calling brt_entry_decref().
 - Do not update ZAP if net change for a block over the TXG was 0.
In combination with above it makes file move between datasets as
cheap operation as originally intended if it fits into one TXG.
 - Do not allocate vdevs on pool creation or import if it did not
have active block cloning. This allows to save a bit in few cases.
 - While here, add proper error handling in brt_load() on pool
import instead of assertions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16773
2024-11-21 08:24:37 -08:00
Alexander Motin
409aad3f33 BRT: Rework structures and locks to be per-vdev
While block cloning operation from the beginning was made per-vdev,
before this change most of its data were protected by two pool-
wide locks.  It created lots of lock contention in many workload.

This change makes most of block cloning data structures per-vdev,
which allows to lock them separately.  The only pool-wide lock now
it spa_brt_lock, protecting array of per-vdev pointers and in most
cases taken as reader.  Also this splits per-vdev locks into three
different ones: bv_pending_lock protects the AVL-tree of pending
operations in open context, bv_mos_entries_lock protects BRT ZAP
object from while being prefetched, and bv_lock protects the rest
of per-vdev context during TXG commit process.  There should be
no functional difference aside of some optimizations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin
1917c26944 ZAP: Add by_dnode variants to lookup/prefetch_uint64
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin
2b64d41be8 BRT: Don't call brt_pending_remove() on holes/embedded
We are doing exactly the same checks around all brt_pending_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin
9753feaa63 ZTS: Avoid embedded blocks in bclone/bclone_prop_sync
If we write less than 113 bytes with enabled compression we get
embeded block, which then fails check for number of cloned blocks
in bclone_test.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
tleydxdy
3c0b8da206 fix: block incompatible kernel from being installed
The current "Requires" lines only ensure the old kernel is
available on the system but it does not prevent fedora from
updating to an incompatible and breaking user's system.

Set Conflicts to block incompatible kernels from being installed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: tleydxdy <shironeko.github@tesaguri.club>
Closes #16139
2024-11-21 08:24:37 -08:00
Mark Johnston
d7abeef621 zio: Avoid sleeping in the I/O path
zio_delay_interrupt(), apparently used for fault injection, is executed
in the I/O pipeline.  It can cause the calling thread to go to sleep,
which is not allowed on FreeBSD.  This happens only for small delays,
though, and there's no apparent reason to avoid deferring to a taskqueue
in that case, as it already does otherwise.

Simply go to sleep unconditionally.  This fixes an occasional panic I
see when running the ZTS on FreeBSD.  Also remove an unhelpful comment
referencing the non-existent timeout_generic().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16785
2024-11-21 08:24:37 -08:00
Brian Behlendorf
7fb7eb9a63 ZTS: Fix zpool_status_008_pos false positive
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment.  Additionally, clear the fault injection as
soon as it is no longer required for the test case.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16769
2024-11-21 08:24:37 -08:00
Alexander Motin
3f9af023f6 L2ARC: Stop rebuild before setting spa_final_txg
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16714
Closes #16782
2024-11-21 08:24:37 -08:00
Alexander Motin
f7675ae30f Remove hash_elements_max accounting from DBUF and ARC
Those values require global atomics to get current hash_elements
values in few of the hottest code paths, while in all the years I
never cared about it.  If somebody wants, it should be easy to
get it by periodic sampling, since neither ARC header nor DBUF
counts change so fast that it would be difficult to catch.

For now I've left hash_elements_max kstat for ARC, since it was
used/reported by arc_summary and it would break older versions,
but now it just reports the current value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16759
2024-11-21 08:24:37 -08:00
Rob Norris
920603990a Move "no name changes" from compression to checksum table
Compression names actually aren't used in dedup table names, but
checksum names are.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16776
2024-11-21 08:24:37 -08:00
Steve Mokris
9a4b2f08d3 Expand zpool-remove.8 manpage with example results
Also fix comment cross-referencing to zpool.8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Steve Mokris <smokris@softpixel.com>
Closes #16777
2024-11-21 08:24:37 -08:00
Alexander Motin
8023d9d4b5 Fix few __VA_ARGS typos in assertions
It should be __VA_ARGS__, not __VA_ARGS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16780
2024-11-21 08:24:37 -08:00
Ameer Hamza
23f063d2e6 zed: prevent automatic replacement of offline vdevs
When an OFFLINE device is physically removed, a spare is automatically
activated. However, this behavior differs in FreeBSD, where we do not
transition from OFFLINE state to REMOVED.
Our support team has encountered cases where customers experienced
unexpected behavior during drive replacements, with multiple spares
activating for the same VDEV due to a single disk replacement. This
patch ensures that a drive in an OFFLINE state remains in that state,
preventing it from transitioning to REMOVED and being automatically
replaced by a spare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16751
2024-11-21 08:24:37 -08:00
Rob Norris
18474efeec AUTHORS: refresh with recent new contributors
Welcome to the party 🎉

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16762
2024-11-15 15:08:59 -08:00
José Luis Salvador Rufo
260065099e tests: fix uClibc for getversion.c
This patch fixes compilation with uClibc by applying the same fallback
as commit e12d76176d to the `getversion.c`
file, which was previously overlooked.
 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #16735
Closes #16741
2024-11-14 16:56:34 -08:00
Ameer Hamza
4c9f2cec46 zvol_os.c: Increase optimal IO size
Since zvol read and write can process up to (DMU_MAX_ACCESS / 2) bytes
in a single operation, the current optimal I/O size is too low. SCST
directly reports this value as the optimal transfer length for the
target SCSI device. Increasing it from the previous volblocksize results
in performance improvement for large block parallel I/O workloads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16750
2024-11-14 16:52:10 -08:00
Mark Johnston
ee3677d321 Fix some nits in zfs_getpages()
- If we don't want dmu_read_pages() to perform extra readahead/behind,
  pass a pointer to 0 instead of a null pointer, as dum_read_pages()
  expects rahead and rbehind to be non-null.
- Avoid unneeded iterations in a loop.

Sponsored-by: Klara, Inc.
Reported-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16758
2024-11-14 16:52:06 -08:00
Rob Norris
0274a9a57d dsl_dataset: put IO-inducing frees on the pool deadlist
dsl_free() calls zio_free() to free the block. For most blocks, this
simply calls metaslab_free() without doing any IO or putting anything on
the IO pipeline.

Some blocks however require additional IO to free. This at least
includes gang, dedup and cloned blocks. For those, zio_free() will issue
a ZIO_TYPE_FREE IO and return.

If a huge number of blocks are being freed all at once, it's possible
for dsl_dataset_block_kill() to be called millions of time on a single
transaction (eg a 2T object of 128K blocks is 16M blocks). If those are
all IO-inducing frees, that then becomes 16M FREE IOs placed on the
pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T
object that requires a 20G allocation of resident memory from the
zio_cache. If that can't be satisfied by the kernel, an out-of-memory
condition is raised.

This would be better handled by improving the cases that the
dmu_tx_assign() throttle will handle, or by reducing the overheads
required by the IO pipeline, or with a better central facility for
freeing blocks.

For now, we simply check for the cases that would cause zio_free() to
create a FREE IO, and instead put the block on the pool's freelist. This
is the same place that blocks from destroyed datasets go, and the async
destroy machinery will automatically see them and trickle them out as
normal.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #6783
Closes #16708
Closes #16722 
Closes #16697
2024-11-14 16:52:02 -08:00
Alexander Motin
025f8b2e74 L2ARC: Move different stats updates earlier
..., before we make the header or the log block visible to others.
It should fix assertion on allocated space going negative if the
header is freed once the lock is dropped, while the write is still
going.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16040
Closes #16743
2024-11-14 16:51:58 -08:00
Mark Johnston
37e8f3ae17 Grab the rangelock unconditionally in zfs_getpages()
As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
2024-11-14 16:51:20 -08:00
Mark Johnston
7313c6e382 Fix a potential page leak in mappedread_sf()
mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.

I think this is quite unlikely to happen in practice, it was noticed by
code inspection.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
2024-11-14 16:51:17 -08:00
Umer Saleem
1c6b0302ef Fix user properties output for zpool list
In zpool_get_user_prop, when called from zpool_expand_proplist and
collect_pool, we often have zpool_props present in zpool_handle_t equal
to NULL. This mostly happens when only one user property is requested
using zpool list -o <user_property>. Checking for this case and
correctly initializing the zpool_props field in zpool_handle_t fixes
this issue.

Interestingly, this issue does not occur if we query any other property
like name or guid along with a user property with -o flag because while
accessing properties like guid, zpool_prop_get_int is called which
checks for this case specifically and calls zpool_get_all_props.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16734
2024-11-14 16:51:12 -08:00
Umer Saleem
7e3af4658b JSON: fix user properties output for zpool list
This commit fixes JSON output for zpool list when user properties are
requested with -o flag. This case needed to be handled specifically
since zpool_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16734
2024-11-14 16:51:07 -08:00
Brian Behlendorf
1a54b13aaf Tag 2.3.0-rc3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-11-07 11:33:51 -08:00
Umer Saleem
9061a4da0b JSON: fix user properties output for zfs list
This commit fixes JSON output for zfs list when user properties are
requested with -o flag. This case needed to be handled specifically
since zfs_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16732
2024-11-07 11:33:51 -08:00
Sam James
e12d76176d Use <fcntl.h> instead of <sys/fcntl.h>
When building on musl, we get:

```
In file included from tests/zfs-tests/cmd/getversion.c:22:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>

In file included from module/os/linux/zfs/vdev_file.c:36:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
```

Bug: https://bugs.gentoo.org/925235
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sam James <sam@gentoo.org>
Closes #15925
2024-11-07 11:33:51 -08:00
Brian Atkinson
8131793d6f Update ABD stats for linear page Linux
a10e552 updated abd_free_linear_page() to no longer call
abd_update_scatter_stat(). This meant that linear pages that were not
attached to Direct I/O requests were not doing waste accounting for the
ARC. This led to performance issues due to incorrect ARC accounting that
resulted in 100% of CPU time being spent in arc_evict() during prolonged
I/O workloads with the ARC.

The call to abd_update_scatter_stats() is now conditionally called in
abd_free_linear_page() when the ABD is not from a Direct I/O request.

Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16729
2024-11-07 11:33:51 -08:00
Chunwei Chen
c82eb27b22 ZFS send should use spill block prefetched from send_reader_thread
Currently, even though send_reader_thread prefetches spill block,
do_dump() will not use it and issues its own blocking arc_read. This
causes significant performance degradation when sending datasets with
lots of spill blocks.

For unmodified spill blocks, we also create send_range struct for them
in send_reader_thread and issue prefetches for them. We piggyback them
on the dnode send_range instead of enqueueing them so we don't break
send_range_after check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: david.chen <david.chen@nutanix.com>
Closes #16701
2024-11-06 11:54:32 -08:00
tstabrawa
661bb434e6 Use simple folio migration function
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
tstabrawa
ae48c2f6a9 Revert "Avoid BUG in migrate_folio_extra"
This reverts commit b052035990.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
Uglymotha
b96845b632 Verify parent_dev before calling udev_device_get_sysattr_value
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Sietse <sietse@wizdom.nu>
Co-authored-by: Sietse <sietse@wizdom.nu>
Closes #16705 
Closes #16717
2024-11-04 16:46:39 -08:00
Alexander Motin
55cbd1f9bd Reduce dirty records memory usage
Small block workloads may use a very large number of dirty records.
During simple block cloning test due to BRT still using 4KB blocks
I can easily see up to 2.5M of those used.  Before this change
dbuf_dirty_record_t structures representing them were allocated via
kmem_zalloc(), that rounded their size up to 512 bytes.

Introduction of specialized kmem cache allows to reduce the size
from 512 to 408 bytes.  Additionally, since override and raw params
in dirty records are mutually exclusive, puting them into a union
allows to reduce structure size down to 368 bytes, increasing the
saving to 28%, that can be a 0.5GB or more of RAM.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16694
2024-11-04 16:46:39 -08:00
Rob Norris
880b73956b zfs(4): remove "experimental" from zfs_bclone_enabled
I think we've done enough experiments.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16189 
Closes #16712
2024-11-04 16:46:39 -08:00
Tony Hutter
d367ef2995 ZTS: Add Fedora 41, remove Fedora 39
Fedora 41 was released 10/29/24, and Fedora 39 will be EOL on 11/12/24.
Update Fedora runners in the test suite.  Some minor tweaks also needed
to support ksh 1.0.10.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16700
2024-11-01 10:03:05 -07:00
Rob Norris
7546fbd6e9 zdb: add extra -T flag to show histograms of BRT refcounts
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16692
2024-11-01 09:49:54 -07:00
Rich Ercolani
903d3f9187 Added output to zpool online and offline
I was surprised to discover today that `zpool online` and
`zpool offline` don't print any information about why they failed in
many cases, they just return 1 with no information about why.

Let's improve that where we can without changing the library function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #16244
2024-11-01 09:49:49 -07:00
Rob Norris
86b5853cfb vdev_disk: move abd return and free off the interrupt handler
Freeing an ABD can take sleeping locks to update various stats. We
aren't allowed to sleep on an interrupt handler. So, move the free off
to the io_done callback.

We should never have been freeing things in the interrupt handler, but
we got away with it because we were usually freeing a linear ABD, which
at most is returning two objects to a cache and never sleeping. Scatter
ABDs can be used now, and those have more complex locking.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687
2024-11-01 09:49:23 -07:00
Rob Norris
19a8dd48e1 vdev_disk: try harder to ensure IO alignment rules
It seems out our notion of "properly" aligned IO was incomplete. In
particular, dm-crypt does its own splitting, and assumes that a logical
block will never cross an order-0 page boundary (ie, the physical page
size, not compound size). This effectively means that it needs to be
possible to split a BIO at any page or block size boundary and have it
work correctly.

This updates the alignment check function to enforce these rules (to the
extent possible).

Our response to misaligned data is to make some new allocation that is
properly aligned, and copy the data into it. It turns out that
linearising (via abd_borrow_buf()) is not enough, because we allocate eg
4K blocks from a general purpose slab, and so may receive (or already
have) a 4K block that crosses pages.

So instead, we allocate a new ABD, which is guaranteed to be aligned
properly to block sizes, and then copy everything into it, and back out
on the way back.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687 #16631 #15646 #15533 #14533
2024-11-01 09:49:14 -07:00
Serapheim Dimitropoulos
8ac70aade7 Add warning for external consumers of dmu_tx_callback_register
While reading some code @grwilson came across the above function that
seemingly had no consumers besides a ztest callback that ensures that
the tx_callback infrastructure works correctly. It turns out that Lustre
is the main (and potentially the only) consumer of this. Refer to
`osd_trans_commit_cb` of `lustre/osd-zfs/osd_handler.c` in the Lustre
repo for more info. Let's add a comment highlighting this before someone
removes it by mistake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheimd@gmail.com>
Closes #16698
2024-11-01 09:49:05 -07:00
Alexander Motin
bbc0d34bfd On the first vdev open ignore impossible ashift hints
If on the first open device's logical ashift is bigger than set
by pool's ashift property, ignore the last as unusable instead of
creating vdev that will fail most of I/Os due to misalignment.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16690
2024-11-01 09:49:00 -07:00
Dimitry Andric
f3823a9ab2 Fix gcc uninitialized warning in FreeBSD zio_crypt.c
In FreeBSD's `zio_do_crypt_data()`, ensure that two `struct uio`
variables are cleared before copying data out of them. This avoids
accessing garbage data, and fixes gcc `-Wuninitialized` warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16688
2024-11-01 09:48:55 -07:00
Dimitry Andric
fd2cae969f Fix gcc unused value warning in FreeBSD simd.h
The macros `simd_stat_init()` and `simd_stat_fini()` in FreeBSD's
`simd.h` are defined as zero, but they are actually only used as
statements. Replace the definitions with `do {} while (0)` instead, to
avoid gcc `-Wunused-value` warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16693
2024-11-01 09:48:51 -07:00
Tony Hutter
f7b4bca66a ZTS: Add LUKS sanity test
Add a LUKS sanity test to trigger: #16631

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16681
2024-11-01 09:48:47 -07:00
Alexander Motin
7e3ce4efaa Pack dmu_buf_impl_t by 16 bytes
On 64bit FreeBSD this reduces one from 296 to 280 bytes.  On small
block workloads dbufs may consume gigabytes of ARC, and this saves
5% of it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16684
2024-11-01 09:48:42 -07:00
Tino Reichardt
77d81974b6 Fix dependency install on Debian 11 (#16683)
Adding cryptsetup breaks some dialog things on Debian 11.
Apply some workaround for it.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-11-01 09:48:37 -07:00
Brian Behlendorf
5237760b17 ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16670
2024-10-21 13:02:07 -07:00
Rob Norris
ede715d1e4 spl/thread: explicitly define thread_func_t as noreturn
All of our thread entry functions have this signature:

    void (*)(void*) __attribute__((noreturn))

The low-level `__thread_create()` function accepts a `thread_func_t` as
the entry point, which is defined more simply as:

    void (*)(void *)

And then the `thread_create()` and `thread_create_named()` macros cast
the passed-in function point down to `thread_func_t`, that is, casting
away the `noreturn` attribute.

Clang considers casting between these two types to be invalid because
both the caller and the callee may have elided parts of the stack frame
save and restore, knowing that they won't be needed.

Recent Linux appears to be setting `-Wcast-function-type-strict`, which
causes this invalid cast to emit a warning, which with `-Werror` is
converted to an error, breaking the build.

This commit fixes this in the simplest possible way: adding `noreturn`
to the `thread_func_t` attribute. Since all our thread entry functions
already have this attribute, it's arguably a just a consistency fix
anyway.

I considered removing the casts in the macros, which silences the
warnings, but it turns out that Clang has a bug that won't emit this
error for implicit conversions, only explicit casts. So leaving them
there seems like a reasonable belt-and-suspenders approach. Also,
frankly, this whole mechanism seems a little undercooked inside LLVM, so
I'm content go with my intuition about the smallest, least invaisve
change.

**NOTE**: `__thread_create` is exported by `spl.ko` and has a
`thread_func_t` arg, so this is an ABI break. Whether that matters in
practice, I have no idea.

Further reading:
- 1aad641c79
- https://github.com/llvm/llvm-project/issues/7325
- https://github.com/llvm/llvm-project/issues/41465

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16672 
Closes #16673
2024-10-21 13:02:07 -07:00
Rob Norris
e30c69365d config: fix dequeue_signal check for kernels <4.20
Before 4.20, kernel_siginfo_t was just called siginfo_t. This was
causing the kthread_dequeue_signal_3arg_task check, which uses
kernel_siginfo_t, to fail on older kernels.

In d6b8c17f1, we started checking for the "new" three-arg
dequeue_signal() by testing for the "old" version. Because that test is
explicitly using kernel_siginfo_t, it would fail, leading to the build
trying to use the new three-arg version, which would then not compile.

This commit fixes that by avoiding checking for the old 3-arg
dequeue_signal entirely. Instead, we check for the new one, as well as
the 4-arg form, and we use the old form as a fallback. This way, we
never have to test for it explicitly, and once we're building
HAVE_SIGINFO will make sure we get the right kernel_siginfo_t for it, so
everything works out nice.

Original-patch-by: Finix <yancw@info2soft.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16666
2024-10-21 13:02:07 -07:00
Rob Norris
78d39d91fa zdb: show bp in uberblock dump
Just another useful nugget of info in times of strife.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16667
2024-10-21 13:02:07 -07:00
Tomohiro Kusumi
2b359c7824 Fix compile-time warnings caused by duplicate struct typedefs
Some compiler/versions warn these typedefs according to #16660.

The platform specific header sys/abd_os.h shouldn't define or use abd_t,
as it's defined in its non-platform specific consumer sys/abd.h.
Do the same as what FreeBSD header does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #16660 
Closes #16665
2024-10-21 13:02:07 -07:00
Alexander Motin
ace2e17a9b zfs_debug: Restore log size limit for userspace
For some reason it was dropped when split from kernel, that makes
raidz_test to accumulate in RAM up to 100GB of logs we don't need.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by:  Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16492
Closes #16566
Closes #16664
2024-10-21 13:02:07 -07:00
Rob Norris
b4cd10ce5b libspl/backtrace: comment and harden libunwind backtracer
This is the sort of code that we get right once and never look at again.
Anyone reading this code is already likely in the middle of a debugging
nightmare, and then they have a wall of manual string construction and
an unfamiliar and idiosyncratic library to deal with. So, comment the
whole thing to try to make it clear what's going on.

In pursuit of the above, I've added return checks to some of the
libunwind calls, fixed the frame loop to not skip the "top" frame
(however unseful it may be), and fix a couple of calls to
spl_bt_u64_to_hex_str() which requested 18 digits instead of 16.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Rob Norris
f52d7aaaac libspl/backtrace: rename and document hex conversion function
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Rob Norris
d5db840260 libspl/backtrace: helper macros for output
My eyes are going blurry looking at all those write calls. This is much
nicer.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Close #16653
2024-10-21 13:02:07 -07:00
Rob Norris
bcd61d9579 libspl/backtrace: dump registers in libunwind backtraces
More useful stuff, especially when trying to follow a disassembly.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Umer Saleem
36a67b50a2 Fix inconsistent mount options for ZFS root
While mounting ZFS root during boot on Linux distributions from initrd,
mount from busybox is effectively used which executes mount system call
directly. This skips the ZFS helper mount.zfs, which checks and enables
the mount options as specified in dataset properties. As a result,
datasets mounted during boot from initrd do not have correct mount
options as specified in ZFS dataset properties.

There has been an attempt to use mount.zfs in zfs initrd script,
responsible for mounting the ZFS root filesystem (PR#13305). This was
later reverted (PR#14908) after discovering that using mount.zfs breaks
mounting of snapshots on root (/) and other child datasets of root have
the same issue (Issue#9461).

This happens because switching from busybox mount to mount.zfs correctly
parses the mount options but also adds 'mntpoint=/root' to the mount
options, which is then prepended to the snapshot mountpoint in
'.zfs/snapshot'. '/root' is the directory on Debian with initramfs-tools
where root filesystem is mounted before pivot_root. When Linux runtime
is reached, trying to access the snapshots on root results in
automounting the snapshot on '/root/.zfs/*', which fails.

This commit attempts to fix the automounting of snapshots on root, while
using mount.zfs in initrd script. Since the mountpoint of dataset is
stored in vfs_mntpoint field, we can check if current mountpoint of
dataset and vfs_mntpoint are same or not. If they are not same, reset
the vfs_mntpoint field with current mountpoint. This fixes the
mountpoints of root dataset and children in respective vfs_mntpoint
fields when we try to access the snapshots of root dataset or its
children. With correct mountpoint for root dataset and children stored
in vfs_mntpoint, all snapshots of root dataset are mounted correctly
and become accessible.

This fix will come into play only if current process, that is trying to
access the snapshots is not in chroot context. The Linux kernel API
that is used to convert struct path into char format (d_path), returns
the complete path for given struct path. It works in chroot environment
as well and returns the correct path from original filesystem root.

However d_path fails to return the complete path if any directory from
original root filesystem is mounted using --bind flag or --rbind flag
in chroot environment. In this case, if we try to access the snapshot
from outside the chroot environment, d_path returns the path correctly,
i.e. it returns the correct path to the directory that is mounted with
--bind flag. However inside the chroot environment, it only returns the
path inside chroot.

For now, there is not a better way in my understanding that gives the
complete path in char format and handles the case where directories from
root filesystem are mounted with --bind or --rbind on another path which
user will later chroot into. So this fix gets enabled if current
process trying to access the snapshot is not in chroot context.

With the snapshots issue fixed for root filesystem, using mount.zfs in
ZFS initrd script, mounts the datasets with correct mount options.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16646
2024-10-21 13:02:07 -07:00
Warner Losh
3d9129a7b6 freebsd: Use compiler.h from FreeBSD's base's linuxkpi
The FreeBSD linux/compiler.h in OpenZFS was copied from a very old
version of FreeBSD's linuxkpi's linux/compiler.h. There's no need for
this duplication. Use FreeBSD's linuxkpi version instead, and provide
zfs_fallthrough to augment it (it's all that's needed). Use #pragma once
to avoid naming issues for guard variables. Since this is a complete
rewrite, use my copyright here (the original code in FreeBSD still
credits everybody). This works back at least to FreeBSD 12.4, which
is not out of support, and all newer releases.

Remove extra copies of macros that were defined elsewhere, but are now
properly defined in LinuxKPI so are redundant.

Sponsored-by: Netflix
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes #16650
2024-10-21 13:02:07 -07:00
Brian Behlendorf
0409c47fe0 Tag 2.3.0-rc2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-10-13 19:25:58 -07:00
Tino Reichardt
b5a3825244 ZTS: Make use of optimal CPU pinning
With CPU pinning, we should get some speedup because of better
cpu cache re-use.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:25:58 -07:00
Tino Reichardt
77df762a1b ZTS: Optimize Kernel Same-page Merging (KSM)
Kernel same-page Merging (KSM) allows KVM guests to share identical
memory pages. These shared pages are usually common libraries or other
identical, high-use data.

The current configuration was a bit to lazy - so KSM didn't work very
well. With the new configuration I could run 3 Linux VMs in parralel.

FreeBSD can't benefit from it. But FreeBSD is not so memory hungry in
general, so there is no need for it ;)

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:25:58 -07:00
Brian Behlendorf
56871e465a Fallback to strerror() when strerror_l() isn't available
Some C libraries, such as uClibc, do not provide strerror_l() in
which case we fallback to strerror().

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16636 
Closes #16640
2024-10-13 19:25:58 -07:00
Brian Behlendorf
c645b07eaa ZTS: Increase zpool_import_parallel_pos import margin
Increase the pool import time allowed by assuming a minimum reduction
to 1/2 instead of 1/3 when comparing sequential to parallel import
times.  This is sufficient to verify parallel imports are working as
intended and should address the occasional false positive failure
when the time is slightly exceeded.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16638
2024-10-11 16:21:25 -07:00
Brian Behlendorf
5bc27acf51 ZTS: Slightly increase dedup_quota limit
As described in the comment above this check the space used by
logged entries is not accounted for and some margin needs to be
added in.  While uncommon we have slightly exceeded the 600,000
threshold on some CI run so we increase the limit a bit more.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16637
2024-10-11 16:21:25 -07:00
Brian Behlendorf
7f830d783b CI: Stick with ubuntu-22.04 for CodeQL analysis
The ubuntu-latest alias now refers to ubuntu-24.04 instead of
ubuntu-22.04 which causes CodeQL's autobuild to fail with:

    cpp/autobuilder: deptrace not supported in ubuntu 24.04

Until deptrace is supported by ubuntu-24.04 hosted runners request
ubuntu-22.04 which is supported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Closes #16639
2024-10-11 16:21:25 -07:00
Martin Matuška
58162960a1 zdb: fix printf format in dump_zap()
When compiling zdb.c on 32-bit platforms, a format conversion error 
is reported for a printf() in dump_zap().  Change %l to macro 
%" PRIu64 " to match the platform size of a 64-bit unsigned integer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16635
2024-10-11 16:21:25 -07:00
Rob Norris
666903610d zpool/zfs: allow --json wherever -j is allowed
Mostly so that with the JSON formatting options are also used, they all
look the same. To my eye, `-j --json-flat-vdevs` suggests that they are
different or unrelated, while `--json --json-flat-vdevs` invites no
further questions.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16632
2024-10-11 16:21:25 -07:00
Brian Atkinson
26ecd8b993 Always validate checksums for Direct I/O reads
This fixes an oversight in the Direct I/O PR. There is nothing that
stops a process from manipulating the contents of a buffer for a
Direct I/O read while the I/O is in flight. This can lead checksum
verify failures. However, the disk contents are still correct, and this
would lead to false reporting of checksum validation failures.

To remedy this, all Direct I/O reads that have a checksum verification
failure are treated as suspicious. In the event a checksum validation
failure occurs for a Direct I/O read, then the I/O request will be
reissued though the ARC. This allows for actual validation to happen and
removes any possibility of the buffer being manipulated after the I/O
has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O
read checksum validation failures are reported though zpool status -d in
the DIO column. Also the zevent has been updated to have both:
1. dio_verify_wr -> Checksum verification failure for writes
2. dio_verify_rd -> Checksum verification failure for reads.
This allows for determining what I/O operation was the culprit for the
checksum verification failure. All DIO errors are reported only on the
top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has
the same issue as Linux with Direct I/O reads.

This commit updates the following:
1. Propogates checksum failures for reads all the way up to the
   top-level VDEV.
2. Reports errors through zpool status -d as DIO.
3. Has two zevents for checksum verify errors with Direct I/O. One for
   read and one for write.
4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and
   handle ABD buffer contents validation the same as Linux.
5. Updated manipulate_user_buffer.c to also manipulate a buffer while a
   Direct I/O read is taking place.
6. Adds a new ZTS test case dio_read_verify that stress tests the new
   code.
7. Updated man pages.
8. Added an IMPLY statement to zio_checksum_verify() to make sure that
   Direct I/O reads are not issued as speculative.
9. Removed self healing through mirror, raidz, and dRAID VDEVs for
   Direct I/O reads.

This issue was first observed when installing a Windows 11 VM on a ZFS
dataset with the dataset property direct set to always. The zpool
devices would report checksum failures, but running a subsequent zpool
scrub would not repair any data and report no errors.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16598
2024-10-09 13:45:06 -07:00
Martin Matuška
774dcba86d FreeBSD: ignore some includes when not building kernel
The function abd_alloc_from_pages() is used only in kernel.
Excluding sys/vm.h, and vm/vm_page.h includes avoids dependency
problems.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16616
2024-10-09 13:45:02 -07:00
Brian Behlendorf
09f6b2ebe3 ztest: Fix scrub check in ztest_raidz_expand_check()
The scrub code may return EBUSY under several possible scenarios
causing ztest to incorrectly ASSERT when verifying the result of
a raidz expansion.  Update the test case to allow EBUSY since it
does not indicate pool damage.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16627
2024-10-09 13:44:58 -07:00
Matthew Heller
2609d93b65 vdev_id: multi-lun disks & slot num zero pad
Add ability to generate disk names that contain both a slot number
and a lun number in order to support multi-actuator SAS hard drives
with multiple luns. Also add the ability to zero pad slot numbers to
a desired digit length for easier sorting.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Heller <matthew.f.heller@accre.vanderbilt.edu>
Closes #16603
2024-10-09 13:44:55 -07:00
Brian Behlendorf
10f46d2aba ZTS: resilver_restart_001.ksh restore defaults
Update resilver_restart_001.ksh to restore the default
resilver_defer_percent when the test completes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16618
2024-10-09 13:44:50 -07:00
Umer Saleem
0df10dc911 Only serialize native-deb* targets
.NOTPARALLEL target is being forced on userspace as well. This commit
removes .NOTPARALEL target and only serializes the execution of
native-deb* targets.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16622
2024-10-09 13:44:46 -07:00
Rob Norris
0fbe9d352c zpool/zfs: restore -V & --version options
The -j option added a round of getopt, which didn't know the magic
version flags. So just bypass the whole thing and go straight to the
human output function for the special case.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16615 
Closes #16617
2024-10-09 13:44:42 -07:00
Martin Matuška
84f44ec07f Return boolean_t in inline functions of lib/libspl/include/sys/uio.h
The inline functions zfs_dio_offset_aligned(), zfs_dio_size_aligned()
and zfs_dio_aligned() are declared as boolean_t but return the bool
type.

This fixes the build of FreeBSD.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16613
2024-10-09 13:44:36 -07:00
Shengqi Chen
fc9608e2e6 Bump SONAME of libzfs and libzpool
The ABI of libzfs and libzpool have breaking changes since last
SONAME bump in commit fe6babc:

* libzfs: `zpool_print_unsup_feat` removed (used by zpool cmd).
* libzpool: multiple `ddt_*` symbols removed (used by zdb cmd).

Bump them to avoid ABI breakage.

See: https://github.com/openzfs/zfs/pull/11817
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-09 13:44:32 -07:00
Shengqi Chen
d32c05949a contrib/debian: add new manpages to installation list
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-09 13:44:26 -07:00
JKDingwall
1ebb6b866f Fix generation of kernel uevents for snapshot rename on linux
`zvol_rename_minors()` needs to be given the full path not just the
snapshot name.  Use code removed in a0bd735ad as a guide
to providing the necessary values.

Add ZTS check for /dev changes after snapshot rename.  After
renaming a snapshot with 'snapdev=visible' ensure that the /dev
entries are updated to reflect the rename.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Dingwall <james@dingwall.me.uk>
Closes #14223 
Closes #16600
2024-10-09 13:44:22 -07:00
Tino Reichardt
f019b445f3 ZTS: Fix summary page creation again - second try
In PR #16599 I used 'return' like in C - which is wrong :/
This fix generates the summary as needed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16611
2024-10-09 13:44:18 -07:00
Tino Reichardt
03822a61be ZTS: Remove FreeBSD 13.4-STABLE
Current CI is failing on FreeBSD 13.4-STABLE, because samba4 can't be
installed there. Lets remove it for now.

Update also the FreeBSD version definitions a bit.

The naming is like this now:

FreeBSD variants:
- freebsd13-3r, freebsd13-4r, freebsd14-0r, freebsd14-1r (RELEASE)
- freebsd13-4s, freebsd14-1s (STABLE)
- freebsd15-0c (CURRENT)

RHL based distros:
- almalinux8, almalinux9, centos-stream9, fedora39, fedora40

Debian based:
- debian11, debian12, ubuntu20, ubuntu22, ubuntu24

Misc Linux distros:
- archlinux, tumbleweed

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16610
2024-10-09 13:43:46 -07:00
549 changed files with 7698 additions and 22969 deletions

View File

@ -2,6 +2,11 @@
<!--- Provide a general summary of your changes in the Title above --> <!--- Provide a general summary of your changes in the Title above -->
<!---
Documentation on ZFS Buildbot options can be found at
https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.html
-->
### Motivation and Context ### Motivation and Context
<!--- Why is this change required? What problem does it solve? --> <!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. --> <!--- If it fixes an open issue, please link to the issue here. -->

View File

@ -65,7 +65,7 @@ if __name__ == '__main__':
# check last (HEAD) commit message # check last (HEAD) commit message
last_commit_message_raw = subprocess.run([ last_commit_message_raw = subprocess.run([
'git', 'show', '-s', '--format=%B', head 'git', 'show', '-s', '--format=%B', 'HEAD'
], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) ], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in last_commit_message_raw.stdout.decode().splitlines(): for line in last_commit_message_raw.stdout.decode().splitlines():

View File

@ -6,13 +6,6 @@
set -eu set -eu
# We've been seeing this script take over 15min to run. This may or
# may not be normal. Just to get a little more insight, print out
# a message to stdout with the top running process, and do this every
# 30 seconds. We can delete this watchdog later once we get a better
# handle on what the timeout value should be.
(while [ 1 ] ; do sleep 30 && echo "[watchdog: $(ps -eo cmd --sort=-pcpu | head -n 2 | tail -n 1)}')]"; done) &
# install needed packages # install needed packages
export DEBIAN_FRONTEND="noninteractive" export DEBIAN_FRONTEND="noninteractive"
sudo apt-get -y update sudo apt-get -y update
@ -72,6 +65,3 @@ sudo zpool create -f -o ashift=12 zpool $SSD1 $SSD2 -O relatime=off \
for i in /sys/block/s*/queue/scheduler; do for i in /sys/block/s*/queue/scheduler; do
echo "none" | sudo tee $i echo "none" | sudo tee $i
done done
# Kill off our watchdog
kill $(jobs -p)

View File

@ -121,7 +121,7 @@ case "$OS" in
KSRC="$FREEBSD_SNAP/../amd64/$FreeBSD/src.txz" KSRC="$FREEBSD_SNAP/../amd64/$FreeBSD/src.txz"
;; ;;
freebsd15-0c) freebsd15-0c)
FreeBSD="15.0-ALPHA3" FreeBSD="15.0-PRERELEASE"
OSNAME="FreeBSD $FreeBSD" OSNAME="FreeBSD $FreeBSD"
OSv="freebsd14.0" OSv="freebsd14.0"
URLxz="$FREEBSD_SNAP/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI-ufs.raw.xz" URLxz="$FREEBSD_SNAP/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI-ufs.raw.xz"

View File

@ -20,7 +20,7 @@ function archlinux() {
sudo pacman -Sy --noconfirm base-devel bc cpio cryptsetup dhclient dkms \ sudo pacman -Sy --noconfirm base-devel bc cpio cryptsetup dhclient dkms \
fakeroot fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils \ fakeroot fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils \
parted pax perf python-packaging python-setuptools qemu-guest-agent ksh \ parted pax perf python-packaging python-setuptools qemu-guest-agent ksh \
samba strace sysstat rng-tools rsync wget xxhash samba sysstat rng-tools rsync wget xxhash
echo "##[endgroup]" echo "##[endgroup]"
} }
@ -43,8 +43,7 @@ function debian() {
lsscsi nfs-kernel-server pamtester parted python3 python3-all-dev \ lsscsi nfs-kernel-server pamtester parted python3 python3-all-dev \
python3-cffi python3-dev python3-distlib python3-packaging libtirpc-dev \ python3-cffi python3-dev python3-distlib python3-packaging libtirpc-dev \
python3-setuptools python3-sphinx qemu-guest-agent rng-tools rpm2cpio \ python3-setuptools python3-sphinx qemu-guest-agent rng-tools rpm2cpio \
rsync samba strace sysstat uuid-dev watchdog wget xfslibs-dev xxhash \ rsync samba sysstat uuid-dev watchdog wget xfslibs-dev xxhash zlib1g-dev
zlib1g-dev
echo "##[endgroup]" echo "##[endgroup]"
} }
@ -88,8 +87,8 @@ function rhel() {
libuuid-devel lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester \ libuuid-devel lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester \
parted perf python3 python3-cffi python3-devel python3-packaging \ parted perf python3 python3-cffi python3-devel python3-packaging \
kernel-devel python3-setuptools qemu-guest-agent rng-tools rpcgen \ kernel-devel python3-setuptools qemu-guest-agent rng-tools rpcgen \
rpm-build rsync samba strace sysstat systemd watchdog wget xfsprogs-devel \ rpm-build rsync samba sysstat systemd watchdog wget xfsprogs-devel xxhash \
xxhash zlib-devel zlib-devel
echo "##[endgroup]" echo "##[endgroup]"
} }
@ -105,7 +104,7 @@ function install_fedora_experimental_kernel {
our_version="$1" our_version="$1"
sudo dnf -y copr enable @kernel-vanilla/stable sudo dnf -y copr enable @kernel-vanilla/stable
sudo dnf -y copr enable @kernel-vanilla/mainline sudo dnf -y copr enable @kernel-vanilla/mainline
all="$(sudo dnf list --showduplicates kernel-* python3-perf* perf* bpftool*)" all="$(sudo dnf list --showduplicates kernel-*)"
echo "Available versions:" echo "Available versions:"
echo "$all" echo "$all"

View File

@ -108,30 +108,19 @@ echo '*/5 * * * * /root/cronjob.sh' > crontab.txt
sudo crontab crontab.txt sudo crontab crontab.txt
rm crontab.txt rm crontab.txt
# Save the VM's serial output (ttyS0) to /var/tmp/console.txt
# - ttyS0 on the VM corresponds to a local /dev/pty/N entry
# - use 'virsh ttyconsole' to lookup the /dev/pty/N entry
for ((i=1; i<=VMs; i++)); do
mkdir -p $RESPATH/vm$i
read "pty" <<< $(sudo virsh ttyconsole vm$i)
# Create the file so we can tail it, even if there's no output.
touch $RESPATH/vm$i/console.txt
sudo nohup bash -c "cat $pty > $RESPATH/vm$i/console.txt" &
# Write all VM boot lines to the console to aid in debugging failed boots.
# The boot lines from all the VMs will be munged together, so prepend each
# line with the vm hostname (like 'vm1:').
(while IFS=$'\n' read -r line; do echo "vm$i: $line" ; done < <(sudo tail -f $RESPATH/vm$i/console.txt)) &
done
echo "Console logging for ${VMs}x $OS started."
# check if the machines are okay # check if the machines are okay
echo "Waiting for vm's to come up... (${VMs}x CPU=$CPU RAM=$RAM)" echo "Waiting for vm's to come up... (${VMs}x CPU=$CPU RAM=$RAM)"
for ((i=1; i<=VMs; i++)); do for ((i=1; i<=VMs; i++)); do
.github/workflows/scripts/qemu-wait-for-vm.sh vm$i .github/workflows/scripts/qemu-wait-for-vm.sh vm$i
done done
echo "All $VMs VMs are up now." echo "All $VMs VMs are up now."
# Save the VM's serial output (ttyS0) to /var/tmp/console.txt
# - ttyS0 on the VM corresponds to a local /dev/pty/N entry
# - use 'virsh ttyconsole' to lookup the /dev/pty/N entry
for ((i=1; i<=VMs; i++)); do
mkdir -p $RESPATH/vm$i
read "pty" <<< $(sudo virsh ttyconsole vm$i)
sudo nohup bash -c "cat $pty > $RESPATH/vm$i/console.txt" &
done
echo "Console logging for ${VMs}x $OS started."

View File

@ -111,7 +111,7 @@ fi
sudo dmesg -c > dmesg-prerun.txt sudo dmesg -c > dmesg-prerun.txt
mount > mount.txt mount > mount.txt
df -h > df-prerun.txt df -h > df-prerun.txt
$TDIR/zfs-tests.sh -vKO -s 3GB -T $TAGS $TDIR/zfs-tests.sh -vK -s 3GB -T $TAGS
RV=$? RV=$?
df -h > df-postrun.txt df -h > df-postrun.txt
echo $RV > tests-exitcode.txt echo $RV > tests-exitcode.txt

View File

@ -44,7 +44,7 @@ jobs:
os_selection="$FULL_OS" os_selection="$FULL_OS"
fi fi
if ${{ github.event.inputs.fedora_kernel_ver != '' }}; then if [ ${{ github.event.inputs.fedora_kernel_ver }} != "" ] ; then
# They specified a custom kernel version for Fedora. Use only # They specified a custom kernel version for Fedora. Use only
# Fedora runners. # Fedora runners.
os_json=$(echo ${os_selection} | jq -c '[.[] | select(startswith("fedora"))]') os_json=$(echo ${os_selection} | jq -c '[.[] | select(startswith("fedora"))]')
@ -53,8 +53,9 @@ jobs:
os_json=$(echo ${os_selection} | jq -c) os_json=$(echo ${os_selection} | jq -c)
fi fi
echo "os=$os_json" | tee -a $GITHUB_OUTPUT echo $os_json
echo "ci_type=$ci_type" | tee -a $GITHUB_OUTPUT echo "os=$os_json" >> $GITHUB_OUTPUT
echo "ci_type=$ci_type" >> $GITHUB_OUTPUT
qemu-vm: qemu-vm:
name: qemu-x86 name: qemu-x86
@ -77,12 +78,8 @@ jobs:
ref: ${{ github.event.pull_request.head.sha }} ref: ${{ github.event.pull_request.head.sha }}
- name: Setup QEMU - name: Setup QEMU
timeout-minutes: 20 timeout-minutes: 10
run: | run: .github/workflows/scripts/qemu-1-setup.sh
# Add a timestamp to each line to debug timeouts
while IFS=$'\n' read -r line; do
echo "$(date +'%H:%M:%S') $line"
done < <(.github/workflows/scripts/qemu-1-setup.sh)
- name: Start build machine - name: Start build machine
timeout-minutes: 10 timeout-minutes: 10

View File

@ -23,7 +23,6 @@
# These maps are making names consistent where they have varied but the email # These maps are making names consistent where they have varied but the email
# address has never changed. In most cases, the full name is in the # address has never changed. In most cases, the full name is in the
# Signed-off-by of a commit with a matching author. # Signed-off-by of a commit with a matching author.
Achill Gilgenast <achill@achill.org>
Ahelenia Ziemiańska <nabijaczleweli@gmail.com> Ahelenia Ziemiańska <nabijaczleweli@gmail.com>
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Alex John <alex@stty.io> Alex John <alex@stty.io>
@ -38,7 +37,6 @@ Crag Wang <crag0715@gmail.com>
Damian Szuberski <szuberskidamian@gmail.com> Damian Szuberski <szuberskidamian@gmail.com>
Daniel Kolesa <daniel@octaforge.org> Daniel Kolesa <daniel@octaforge.org>
Debabrata Banerjee <dbavatar@gmail.com> Debabrata Banerjee <dbavatar@gmail.com>
Diwakar Kristappagari <diwakar-k@hpe.com>
Finix Yan <yanchongwen@hotmail.com> Finix Yan <yanchongwen@hotmail.com>
Gaurav Kumar <gauravk.18@gmail.com> Gaurav Kumar <gauravk.18@gmail.com>
Gionatan Danti <g.danti@assyoma.it> Gionatan Danti <g.danti@assyoma.it>
@ -147,7 +145,6 @@ Gaurav Kumar <gauravk.18@gmail.com> <gaurkuma@users.noreply.github.com>
George Gaydarov <git@gg7.io> <gg7@users.noreply.github.com> George Gaydarov <git@gg7.io> <gg7@users.noreply.github.com>
Georgy Yakovlev <gyakovlev@gentoo.org> <168902+gyakovlev@users.noreply.github.com> Georgy Yakovlev <gyakovlev@gentoo.org> <168902+gyakovlev@users.noreply.github.com>
Gerardwx <gerardw@alum.mit.edu> <Gerardwx@users.noreply.github.com> Gerardwx <gerardw@alum.mit.edu> <Gerardwx@users.noreply.github.com>
Germano Massullo <germano.massullo@gmail.com> <Germano0@users.noreply.github.com>
Gian-Carlo DeFazio <defazio1@llnl.gov> <defaziogiancarlo@users.noreply.github.com> Gian-Carlo DeFazio <defazio1@llnl.gov> <defaziogiancarlo@users.noreply.github.com>
Giuseppe Di Natale <dinatale2@llnl.gov> <dinatale2@users.noreply.github.com> Giuseppe Di Natale <dinatale2@llnl.gov> <dinatale2@users.noreply.github.com>
Hajo Möller <dasjoe@gmail.com> <dasjoe@users.noreply.github.com> Hajo Möller <dasjoe@gmail.com> <dasjoe@users.noreply.github.com>
@ -167,7 +164,6 @@ John Ramsden <johnramsden@riseup.net> <johnramsden@users.noreply.github.com>
Jonathon Fernyhough <jonathon@m2x.dev> <559369+jonathonf@users.noreply.github.com> Jonathon Fernyhough <jonathon@m2x.dev> <559369+jonathonf@users.noreply.github.com>
Jose Luis Duran <jlduran@gmail.com> <jlduran@users.noreply.github.com> Jose Luis Duran <jlduran@gmail.com> <jlduran@users.noreply.github.com>
Justin Hibbits <chmeeedalf@gmail.com> <chmeeedalf@users.noreply.github.com> Justin Hibbits <chmeeedalf@gmail.com> <chmeeedalf@users.noreply.github.com>
Kaitlin Hoang <kthoang@amazon.com> <khoang98@users.noreply.github.com>
Kevin Greene <kevin.greene@delphix.com> <104801862+kxgreene@users.noreply.github.com> Kevin Greene <kevin.greene@delphix.com> <104801862+kxgreene@users.noreply.github.com>
Kevin Jin <lostking2008@hotmail.com> <33590050+jxdking@users.noreply.github.com> Kevin Jin <lostking2008@hotmail.com> <33590050+jxdking@users.noreply.github.com>
Kevin P. Fleming <kevin@km6g.us> <kpfleming@users.noreply.github.com> Kevin P. Fleming <kevin@km6g.us> <kpfleming@users.noreply.github.com>

13
AUTHORS
View File

@ -10,7 +10,6 @@ PAST MAINTAINERS:
CONTRIBUTORS: CONTRIBUTORS:
Aaron Fineman <abyxcos@gmail.com> Aaron Fineman <abyxcos@gmail.com>
Achill Gilgenast <achill@achill.org>
Adam D. Moss <c@yotes.com> Adam D. Moss <c@yotes.com>
Adam Leventhal <ahl@delphix.com> Adam Leventhal <ahl@delphix.com>
Adam Stevko <adam.stevko@gmail.com> Adam Stevko <adam.stevko@gmail.com>
@ -60,7 +59,6 @@ CONTRIBUTORS:
Andreas Buschmann <andreas.buschmann@tech.net.de> Andreas Buschmann <andreas.buschmann@tech.net.de>
Andreas Dilger <adilger@intel.com> Andreas Dilger <adilger@intel.com>
Andreas Vögele <andreas@andreasvoegele.com> Andreas Vögele <andreas@andreasvoegele.com>
Andres <a-d-j-i@users.noreply.github.com>
Andrew Barnes <barnes333@gmail.com> Andrew Barnes <barnes333@gmail.com>
Andrew Hamilton <ahamilto@tjhsst.edu> Andrew Hamilton <ahamilto@tjhsst.edu>
Andrew Innes <andrew.c12@gmail.com> Andrew Innes <andrew.c12@gmail.com>
@ -74,7 +72,6 @@ CONTRIBUTORS:
Andrey Prokopenko <job@terem.fr> Andrey Prokopenko <job@terem.fr>
Andrey Vesnovaty <andrey.vesnovaty@gmail.com> Andrey Vesnovaty <andrey.vesnovaty@gmail.com>
Andriy Gapon <avg@freebsd.org> Andriy Gapon <avg@freebsd.org>
Andriy Tkachuk <andriy.tkachuk@seagate.com>
Andy Bakun <github@thwartedefforts.org> Andy Bakun <github@thwartedefforts.org>
Andy Fiddaman <omnios@citrus-it.co.uk> Andy Fiddaman <omnios@citrus-it.co.uk>
Aniruddha Shankar <k@191a.net> Aniruddha Shankar <k@191a.net>
@ -123,7 +120,6 @@ CONTRIBUTORS:
Caleb James DeLisle <calebdelisle@lavabit.com> Caleb James DeLisle <calebdelisle@lavabit.com>
Cameron Harr <harr1@llnl.gov> Cameron Harr <harr1@llnl.gov>
Cao Xuewen <cao.xuewen@zte.com.cn> Cao Xuewen <cao.xuewen@zte.com.cn>
Carl George <carlwgeorge@gmail.com>
Carlo Landmeter <clandmeter@gmail.com> Carlo Landmeter <clandmeter@gmail.com>
Carlos Alberto Lopez Perez <clopez@igalia.com> Carlos Alberto Lopez Perez <clopez@igalia.com>
Cedric Maunoury <cedric.maunoury@gmail.com> Cedric Maunoury <cedric.maunoury@gmail.com>
@ -204,7 +200,6 @@ CONTRIBUTORS:
Dimitri John Ledkov <xnox@ubuntu.com> Dimitri John Ledkov <xnox@ubuntu.com>
Dimitry Andric <dimitry@andric.com> Dimitry Andric <dimitry@andric.com>
Dirkjan Bussink <d.bussink@gmail.com> Dirkjan Bussink <d.bussink@gmail.com>
Diwakar Kristappagari <diwakar-k@hpe.com>
Dmitry Khasanov <pik4ez@gmail.com> Dmitry Khasanov <pik4ez@gmail.com>
Dominic Pearson <dsp@technoanimal.net> Dominic Pearson <dsp@technoanimal.net>
Dominik Hassler <hadfl@omniosce.org> Dominik Hassler <hadfl@omniosce.org>
@ -255,7 +250,6 @@ CONTRIBUTORS:
George Wilson <gwilson@delphix.com> George Wilson <gwilson@delphix.com>
Georgy Yakovlev <ya@sysdump.net> Georgy Yakovlev <ya@sysdump.net>
Gerardwx <gerardw@alum.mit.edu> Gerardwx <gerardw@alum.mit.edu>
Germano Massullo <germano.massullo@gmail.com>
Gian-Carlo DeFazio <defazio1@llnl.gov> Gian-Carlo DeFazio <defazio1@llnl.gov>
Gionatan Danti <g.danti@assyoma.it> Gionatan Danti <g.danti@assyoma.it>
Giuseppe Di Natale <guss80@gmail.com> Giuseppe Di Natale <guss80@gmail.com>
@ -293,7 +287,6 @@ CONTRIBUTORS:
Igor K <igor@dilos.org> Igor K <igor@dilos.org>
Igor Kozhukhov <ikozhukhov@gmail.com> Igor Kozhukhov <ikozhukhov@gmail.com>
Igor Lvovsky <ilvovsky@gmail.com> Igor Lvovsky <ilvovsky@gmail.com>
Igor Ostapenko <pm@igoro.pro>
ilbsmart <wgqimut@gmail.com> ilbsmart <wgqimut@gmail.com>
Ilkka Sovanto <github@ilkka.kapsi.fi> Ilkka Sovanto <github@ilkka.kapsi.fi>
illiliti <illiliti@protonmail.com> illiliti <illiliti@protonmail.com>
@ -333,7 +326,6 @@ CONTRIBUTORS:
Jinshan Xiong <jinshan.xiong@intel.com> Jinshan Xiong <jinshan.xiong@intel.com>
Jitendra Patidar <jitendra.patidar@nutanix.com> Jitendra Patidar <jitendra.patidar@nutanix.com>
JK Dingwall <james@dingwall.me.uk> JK Dingwall <james@dingwall.me.uk>
Joel Low <joel@joelsplace.sg>
Joe Stein <joe.stein@delphix.com> Joe Stein <joe.stein@delphix.com>
John-Mark Gurney <jmg@funkthat.com> John-Mark Gurney <jmg@funkthat.com>
John Albietz <inthecloud247@gmail.com> John Albietz <inthecloud247@gmail.com>
@ -382,7 +374,6 @@ CONTRIBUTORS:
Kevin Jin <lostking2008@hotmail.com> Kevin Jin <lostking2008@hotmail.com>
Kevin P. Fleming <kevin@km6g.us> Kevin P. Fleming <kevin@km6g.us>
Kevin Tanguy <kevin.tanguy@ovh.net> Kevin Tanguy <kevin.tanguy@ovh.net>
khoang98 <khoang98@users.noreply.github.com>
KireinaHoro <i@jsteward.moe> KireinaHoro <i@jsteward.moe>
Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Kleber Tarcísio <klebertarcisio@yahoo.com.br> Kleber Tarcísio <klebertarcisio@yahoo.com.br>
@ -456,7 +447,6 @@ CONTRIBUTORS:
Max Zettlmeißl <max@zettlmeissl.de> Max Zettlmeißl <max@zettlmeissl.de>
Md Islam <mdnahian@outlook.com> Md Islam <mdnahian@outlook.com>
megari <megari@iki.fi> megari <megari@iki.fi>
Meriel Luna Mittelbach <lunarlambda@gmail.com>
Michael D Labriola <michael.d.labriola@gmail.com> Michael D Labriola <michael.d.labriola@gmail.com>
Michael Franzl <michael@franzl.name> Michael Franzl <michael@franzl.name>
Michael Gebetsroither <michael@mgeb.org> Michael Gebetsroither <michael@mgeb.org>
@ -504,7 +494,6 @@ CONTRIBUTORS:
Orivej Desh <orivej@gmx.fr> Orivej Desh <orivej@gmx.fr>
Pablo Correa Gómez <ablocorrea@hotmail.com> Pablo Correa Gómez <ablocorrea@hotmail.com>
Palash Gandhi <pbg4930@rit.edu> Palash Gandhi <pbg4930@rit.edu>
Patrick Fasano <patrick@patrickfasano.com>
Patrick Mooney <pmooney@pfmooney.com> Patrick Mooney <pmooney@pfmooney.com>
Patrik Greco <sikevux@sikevux.se> Patrik Greco <sikevux@sikevux.se>
Paul B. Henson <henson@acm.org> Paul B. Henson <henson@acm.org>
@ -546,7 +535,6 @@ CONTRIBUTORS:
Remy Blank <remy.blank@pobox.com> Remy Blank <remy.blank@pobox.com>
renelson <bnelson@nelsonbe.com> renelson <bnelson@nelsonbe.com>
Reno Reckling <e-github@wthack.de> Reno Reckling <e-github@wthack.de>
René Wirnata <rene.wirnata@pandascience.net>
Ricardo M. Correia <ricardo.correia@oracle.com> Ricardo M. Correia <ricardo.correia@oracle.com>
Riccardo Schirone <rschirone91@gmail.com> Riccardo Schirone <rschirone91@gmail.com>
Richard Allen <belperite@gmail.com> Richard Allen <belperite@gmail.com>
@ -652,7 +640,6 @@ CONTRIBUTORS:
tleydxdy <shironeko.github@tesaguri.club> tleydxdy <shironeko.github@tesaguri.club>
Tobin Harding <me@tobin.cc> Tobin Harding <me@tobin.cc>
Todd Seidelmann <seidelma@users.noreply.github.com> Todd Seidelmann <seidelma@users.noreply.github.com>
Todd Zullinger <tmz@pobox.com>
Tom Caputi <tcaputi@datto.com> Tom Caputi <tcaputi@datto.com>
Tom Matthews <tom@axiom-partners.com> Tom Matthews <tom@axiom-partners.com>
Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Tomohiro Kusumi <kusumi.tomohiro@gmail.com>

4
META
View File

@ -1,10 +1,10 @@
Meta: 1 Meta: 1
Name: zfs Name: zfs
Branch: 1.0 Branch: 1.0
Version: 2.4.99 Version: 2.3.4
Release: 1 Release: 1
Release-Tags: relext Release-Tags: relext
License: CDDL License: CDDL
Author: OpenZFS Author: OpenZFS
Linux-Maximum: 6.17 Linux-Maximum: 6.16
Linux-Minimum: 4.18 Linux-Minimum: 4.18

View File

@ -1,7 +1,6 @@
CLEANFILES = CLEANFILES =
dist_noinst_DATA = dist_noinst_DATA =
INSTALL_DATA_HOOKS = INSTALL_DATA_HOOKS =
INSTALL_EXEC_HOOKS =
ALL_LOCAL = ALL_LOCAL =
CLEAN_LOCAL = CLEAN_LOCAL =
CHECKS = shellcheck checkbashisms CHECKS = shellcheck checkbashisms
@ -72,9 +71,6 @@ all: gitrev
PHONY += install-data-hook $(INSTALL_DATA_HOOKS) PHONY += install-data-hook $(INSTALL_DATA_HOOKS)
install-data-hook: $(INSTALL_DATA_HOOKS) install-data-hook: $(INSTALL_DATA_HOOKS)
PHONY += install-exec-hook $(INSTALL_EXEC_HOOKS)
install-exec-hook: $(INSTALL_EXEC_HOOKS)
PHONY += maintainer-clean-local PHONY += maintainer-clean-local
maintainer-clean-local: maintainer-clean-local:
-$(RM) $(GITREV) -$(RM) $(GITREV)

View File

@ -98,16 +98,17 @@ endif
if USING_PYTHON if USING_PYTHON
bin_SCRIPTS += zarcsummary zarcstat dbufstat zilstat bin_SCRIPTS += arc_summary arcstat dbufstat zilstat
CLEANFILES += zarcsummary zarcstat dbufstat zilstat CLEANFILES += arc_summary arcstat dbufstat zilstat
dist_noinst_DATA += %D%/zarcsummary %D%/zarcstat.in %D%/dbufstat.in %D%/zilstat.in dist_noinst_DATA += %D%/arc_summary %D%/arcstat.in %D%/dbufstat.in %D%/zilstat.in
$(call SUBST,zarcstat,%D%/) $(call SUBST,arcstat,%D%/)
$(call SUBST,dbufstat,%D%/) $(call SUBST,dbufstat,%D%/)
$(call SUBST,zilstat,%D%/) $(call SUBST,zilstat,%D%/)
zarcsummary: %D%/zarcsummary arc_summary: %D%/arc_summary
$(AM_V_at)cp $< $@ $(AM_V_at)cp $< $@
endif endif
PHONY += cmd PHONY += cmd
cmd: $(bin_SCRIPTS) $(bin_PROGRAMS) $(sbin_SCRIPTS) $(sbin_PROGRAMS) $(dist_bin_SCRIPTS) $(zfsexec_PROGRAMS) $(mounthelper_PROGRAMS) cmd: $(bin_SCRIPTS) $(bin_PROGRAMS) $(sbin_SCRIPTS) $(sbin_PROGRAMS) $(dist_bin_SCRIPTS) $(zfsexec_PROGRAMS) $(mounthelper_PROGRAMS)

View File

@ -34,7 +34,7 @@ Provides basic information on the ARC, its efficiency, the L2ARC (if present),
the Data Management Unit (DMU), Virtual Devices (VDEVs), and tunables. See the Data Management Unit (DMU), Virtual Devices (VDEVs), and tunables. See
the in-source documentation and code at the in-source documentation and code at
https://github.com/openzfs/zfs/blob/master/module/zfs/arc.c for details. https://github.com/openzfs/zfs/blob/master/module/zfs/arc.c for details.
The original introduction to zarcsummary can be found at The original introduction to arc_summary can be found at
http://cuddletech.com/?p=454 http://cuddletech.com/?p=454
""" """
@ -161,7 +161,7 @@ elif sys.platform.startswith('linux'):
return get_params(TUNABLES_PATH) return get_params(TUNABLES_PATH)
def get_version_impl(request): def get_version_impl(request):
# The original zarcsummary called /sbin/modinfo/{spl,zfs} to get # The original arc_summary called /sbin/modinfo/{spl,zfs} to get
# the version information. We switch to /sys/module/{spl,zfs}/version # the version information. We switch to /sys/module/{spl,zfs}/version
# to make sure we get what is really loaded in the kernel # to make sure we get what is really loaded in the kernel
try: try:
@ -439,7 +439,7 @@ def print_header():
""" """
# datetime is now recommended over time but we keep the exact formatting # datetime is now recommended over time but we keep the exact formatting
# from the older version of zarcsummary in case there are scripts # from the older version of arc_summary in case there are scripts
# that expect it in this way # that expect it in this way
daydate = time.strftime(DATE_FORMAT) daydate = time.strftime(DATE_FORMAT)
spc_date = LINE_LENGTH-len(daydate) spc_date = LINE_LENGTH-len(daydate)

View File

@ -2,7 +2,7 @@
# SPDX-License-Identifier: CDDL-1.0 # SPDX-License-Identifier: CDDL-1.0
# #
# Print out ZFS ARC Statistics exported via kstat(1) # Print out ZFS ARC Statistics exported via kstat(1)
# For a definition of fields, or usage, use zarcstat -v # For a definition of fields, or usage, use arcstat -v
# #
# This script was originally a fork of the original arcstat.pl (0.1) # This script was originally a fork of the original arcstat.pl (0.1)
# by Neelakanth Nadgir, originally published on his Sun blog on # by Neelakanth Nadgir, originally published on his Sun blog on
@ -56,7 +56,6 @@ import time
import getopt import getopt
import re import re
import copy import copy
import os
from signal import signal, SIGINT, SIGWINCH, SIG_DFL from signal import signal, SIGINT, SIGWINCH, SIG_DFL
@ -172,7 +171,7 @@ cols = {
"zactive": [7, 1000, "zfetch prefetches active per second"], "zactive": [7, 1000, "zfetch prefetches active per second"],
} }
# ARC structural breakdown from zarcsummary # ARC structural breakdown from arc_summary
structfields = { structfields = {
"cmp": ["compressed", "Compressed"], "cmp": ["compressed", "Compressed"],
"ovh": ["overhead", "Overhead"], "ovh": ["overhead", "Overhead"],
@ -188,7 +187,7 @@ structstats = { # size stats
"sz": ["_size", "size"], "sz": ["_size", "size"],
} }
# ARC types breakdown from zarcsummary # ARC types breakdown from arc_summary
typefields = { typefields = {
"data": ["data", "ARC data"], "data": ["data", "ARC data"],
"meta": ["metadata", "ARC metadata"], "meta": ["metadata", "ARC metadata"],
@ -199,7 +198,7 @@ typestats = { # size stats
"sz": ["_size", "size"], "sz": ["_size", "size"],
} }
# ARC states breakdown from zarcsummary # ARC states breakdown from arc_summary
statefields = { statefields = {
"ano": ["anon", "Anonymous"], "ano": ["anon", "Anonymous"],
"mfu": ["mfu", "MFU"], "mfu": ["mfu", "MFU"],
@ -262,7 +261,7 @@ hdr_intr = 20 # Print header every 20 lines of output
opfile = None opfile = None
sep = " " # Default separator is 2 spaces sep = " " # Default separator is 2 spaces
l2exist = False l2exist = False
cmd = ("Usage: zarcstat [-havxp] [-f fields] [-o file] [-s string] [interval " cmd = ("Usage: arcstat [-havxp] [-f fields] [-o file] [-s string] [interval "
"[count]]\n") "[count]]\n")
cur = {} cur = {}
d = {} d = {}
@ -349,10 +348,10 @@ def usage():
"character or string\n") "character or string\n")
sys.stderr.write("\t -p : Disable auto-scaling of numerical fields\n") sys.stderr.write("\t -p : Disable auto-scaling of numerical fields\n")
sys.stderr.write("\nExamples:\n") sys.stderr.write("\nExamples:\n")
sys.stderr.write("\tzarcstat -o /tmp/a.log 2 10\n") sys.stderr.write("\tarcstat -o /tmp/a.log 2 10\n")
sys.stderr.write("\tzarcstat -s \",\" -o /tmp/a.log 2 10\n") sys.stderr.write("\tarcstat -s \",\" -o /tmp/a.log 2 10\n")
sys.stderr.write("\tzarcstat -v\n") sys.stderr.write("\tarcstat -v\n")
sys.stderr.write("\tzarcstat -f time,hit%,dh%,ph%,mh% 1\n") sys.stderr.write("\tarcstat -f time,hit%,dh%,ph%,mh% 1\n")
sys.stderr.write("\n") sys.stderr.write("\n")
sys.exit(1) sys.exit(1)
@ -367,7 +366,7 @@ def snap_stats():
cur = kstat cur = kstat
# fill in additional values from zarcsummary # fill in additional values from arc_summary
cur["caches_size"] = caches_size = cur["anon_data"]+cur["anon_metadata"]+\ cur["caches_size"] = caches_size = cur["anon_data"]+cur["anon_metadata"]+\
cur["mfu_data"]+cur["mfu_metadata"]+cur["mru_data"]+cur["mru_metadata"]+\ cur["mfu_data"]+cur["mfu_metadata"]+cur["mru_data"]+cur["mru_metadata"]+\
cur["uncached_data"]+cur["uncached_metadata"] cur["uncached_data"]+cur["uncached_metadata"]
@ -767,7 +766,6 @@ def calculate():
def main(): def main():
global sint global sint
global count global count
global hdr_intr global hdr_intr

View File

@ -107,9 +107,7 @@ extern uint_t zfs_reconstruct_indirect_combinations_max;
extern uint_t zfs_btree_verify_intensity; extern uint_t zfs_btree_verify_intensity;
static const char cmdname[] = "zdb"; static const char cmdname[] = "zdb";
uint8_t dump_opt[512]; uint8_t dump_opt[256];
#define ALLOCATED_OPT 256
typedef void object_viewer_t(objset_t *, uint64_t, void *data, size_t size); typedef void object_viewer_t(objset_t *, uint64_t, void *data, size_t size);
@ -129,7 +127,6 @@ static zfs_range_tree_t *mos_refd_objs;
static spa_t *spa; static spa_t *spa;
static objset_t *os; static objset_t *os;
static boolean_t kernel_init_done; static boolean_t kernel_init_done;
static boolean_t corruption_found = B_FALSE;
static void snprintf_blkptr_compact(char *, size_t, const blkptr_t *, static void snprintf_blkptr_compact(char *, size_t, const blkptr_t *,
boolean_t); boolean_t);
@ -179,7 +176,7 @@ static int
sublivelist_verify_blkptr(void *arg, const blkptr_t *bp, boolean_t free, sublivelist_verify_blkptr(void *arg, const blkptr_t *bp, boolean_t free,
dmu_tx_t *tx) dmu_tx_t *tx)
{ {
ASSERT0P(tx); ASSERT3P(tx, ==, NULL);
struct sublivelist_verify *sv = arg; struct sublivelist_verify *sv = arg;
sublivelist_verify_block_refcnt_t current = { sublivelist_verify_block_refcnt_t current = {
.svbr_blk = *bp, .svbr_blk = *bp,
@ -211,7 +208,7 @@ sublivelist_verify_blkptr(void *arg, const blkptr_t *bp, boolean_t free,
sublivelist_verify_block_t svb = { sublivelist_verify_block_t svb = {
.svb_dva = bp->blk_dva[i], .svb_dva = bp->blk_dva[i],
.svb_allocated_txg = .svb_allocated_txg =
BP_GET_BIRTH(bp) BP_GET_LOGICAL_BIRTH(bp)
}; };
if (zfs_btree_find(&sv->sv_leftover, &svb, if (zfs_btree_find(&sv->sv_leftover, &svb,
@ -253,7 +250,6 @@ sublivelist_verify_func(void *args, dsl_deadlist_entry_t *dle)
&e->svbr_blk, B_TRUE); &e->svbr_blk, B_TRUE);
(void) printf("\tERROR: %d unmatched FREE(s): %s\n", (void) printf("\tERROR: %d unmatched FREE(s): %s\n",
e->svbr_refcnt, blkbuf); e->svbr_refcnt, blkbuf);
corruption_found = B_TRUE;
} }
zfs_btree_destroy(&sv->sv_pair); zfs_btree_destroy(&sv->sv_pair);
@ -385,7 +381,7 @@ verify_livelist_allocs(metaslab_verify_t *mv, uint64_t txg,
sublivelist_verify_block_t svb = {{{0}}}; sublivelist_verify_block_t svb = {{{0}}};
DVA_SET_VDEV(&svb.svb_dva, mv->mv_vdid); DVA_SET_VDEV(&svb.svb_dva, mv->mv_vdid);
DVA_SET_OFFSET(&svb.svb_dva, offset); DVA_SET_OFFSET(&svb.svb_dva, offset);
DVA_SET_ASIZE(&svb.svb_dva, 0); DVA_SET_ASIZE(&svb.svb_dva, size);
zfs_btree_index_t where; zfs_btree_index_t where;
uint64_t end_offset = offset + size; uint64_t end_offset = offset + size;
@ -409,7 +405,6 @@ verify_livelist_allocs(metaslab_verify_t *mv, uint64_t txg,
(u_longlong_t)DVA_GET_ASIZE(&found->svb_dva), (u_longlong_t)DVA_GET_ASIZE(&found->svb_dva),
(u_longlong_t)found->svb_allocated_txg, (u_longlong_t)found->svb_allocated_txg,
(u_longlong_t)txg); (u_longlong_t)txg);
corruption_found = B_TRUE;
} }
} }
} }
@ -431,7 +426,6 @@ metaslab_spacemap_validation_cb(space_map_entry_t *sme, void *arg)
(u_longlong_t)txg, (u_longlong_t)offset, (u_longlong_t)txg, (u_longlong_t)offset,
(u_longlong_t)size, (u_longlong_t)mv->mv_vdid, (u_longlong_t)size, (u_longlong_t)mv->mv_vdid,
(u_longlong_t)mv->mv_msid); (u_longlong_t)mv->mv_msid);
corruption_found = B_TRUE;
} else { } else {
zfs_range_tree_add(mv->mv_allocated, zfs_range_tree_add(mv->mv_allocated,
offset, size); offset, size);
@ -445,7 +439,6 @@ metaslab_spacemap_validation_cb(space_map_entry_t *sme, void *arg)
(u_longlong_t)txg, (u_longlong_t)offset, (u_longlong_t)txg, (u_longlong_t)offset,
(u_longlong_t)size, (u_longlong_t)mv->mv_vdid, (u_longlong_t)size, (u_longlong_t)mv->mv_vdid,
(u_longlong_t)mv->mv_msid); (u_longlong_t)mv->mv_msid);
corruption_found = B_TRUE;
} else { } else {
zfs_range_tree_remove(mv->mv_allocated, zfs_range_tree_remove(mv->mv_allocated,
offset, size); offset, size);
@ -533,7 +526,6 @@ mv_populate_livelist_allocs(metaslab_verify_t *mv, sublivelist_verify_t *sv)
(u_longlong_t)DVA_GET_VDEV(&svb->svb_dva), (u_longlong_t)DVA_GET_VDEV(&svb->svb_dva),
(u_longlong_t)DVA_GET_OFFSET(&svb->svb_dva), (u_longlong_t)DVA_GET_OFFSET(&svb->svb_dva),
(u_longlong_t)DVA_GET_ASIZE(&svb->svb_dva)); (u_longlong_t)DVA_GET_ASIZE(&svb->svb_dva));
corruption_found = B_TRUE;
continue; continue;
} }
@ -550,7 +542,6 @@ mv_populate_livelist_allocs(metaslab_verify_t *mv, sublivelist_verify_t *sv)
(u_longlong_t)DVA_GET_VDEV(&svb->svb_dva), (u_longlong_t)DVA_GET_VDEV(&svb->svb_dva),
(u_longlong_t)DVA_GET_OFFSET(&svb->svb_dva), (u_longlong_t)DVA_GET_OFFSET(&svb->svb_dva),
(u_longlong_t)DVA_GET_ASIZE(&svb->svb_dva)); (u_longlong_t)DVA_GET_ASIZE(&svb->svb_dva));
corruption_found = B_TRUE;
continue; continue;
} }
@ -664,7 +655,6 @@ livelist_metaslab_validate(spa_t *spa)
} }
(void) printf("ERROR: Found livelist blocks marked as allocated " (void) printf("ERROR: Found livelist blocks marked as allocated "
"for indirect vdevs:\n"); "for indirect vdevs:\n");
corruption_found = B_TRUE;
zfs_btree_index_t *where = NULL; zfs_btree_index_t *where = NULL;
sublivelist_verify_block_t *svb; sublivelist_verify_block_t *svb;
@ -808,8 +798,8 @@ usage(void)
"[default is 200]\n"); "[default is 200]\n");
(void) fprintf(stderr, " -K --key=KEY " (void) fprintf(stderr, " -K --key=KEY "
"decryption key for encrypted dataset\n"); "decryption key for encrypted dataset\n");
(void) fprintf(stderr, " -o --option=\"NAME=VALUE\" " (void) fprintf(stderr, " -o --option=\"OPTION=INTEGER\" "
"set the named tunable to the given value\n"); "set global variable to an unsigned 32-bit integer\n");
(void) fprintf(stderr, " -p --path==PATH " (void) fprintf(stderr, " -p --path==PATH "
"use one or more with -e to specify path to vdev dir\n"); "use one or more with -e to specify path to vdev dir\n");
(void) fprintf(stderr, " -P --parseable " (void) fprintf(stderr, " -P --parseable "
@ -837,7 +827,7 @@ usage(void)
(void) fprintf(stderr, "Specify an option more than once (e.g. -bb) " (void) fprintf(stderr, "Specify an option more than once (e.g. -bb) "
"to make only that option verbose\n"); "to make only that option verbose\n");
(void) fprintf(stderr, "Default is to dump everything non-verbosely\n"); (void) fprintf(stderr, "Default is to dump everything non-verbosely\n");
zdb_exit(2); zdb_exit(1);
} }
static void static void
@ -902,9 +892,9 @@ dump_packed_nvlist(objset_t *os, uint64_t object, void *data, size_t size)
size_t nvsize = *(uint64_t *)data; size_t nvsize = *(uint64_t *)data;
char *packed = umem_alloc(nvsize, UMEM_NOFAIL); char *packed = umem_alloc(nvsize, UMEM_NOFAIL);
VERIFY0(dmu_read(os, object, 0, nvsize, packed, DMU_READ_PREFETCH)); VERIFY(0 == dmu_read(os, object, 0, nvsize, packed, DMU_READ_PREFETCH));
VERIFY0(nvlist_unpack(packed, nvsize, &nv, 0)); VERIFY(nvlist_unpack(packed, nvsize, &nv, 0) == 0);
umem_free(packed, nvsize); umem_free(packed, nvsize);
@ -1465,8 +1455,8 @@ get_obsolete_refcount(vdev_t *vd)
refcount++; refcount++;
} }
} else { } else {
ASSERT0P(vd->vdev_obsolete_sm); ASSERT3P(vd->vdev_obsolete_sm, ==, NULL);
ASSERT0(obsolete_sm_object); ASSERT3U(obsolete_sm_object, ==, 0);
} }
for (unsigned c = 0; c < vd->vdev_children; c++) { for (unsigned c = 0; c < vd->vdev_children; c++) {
refcount += get_obsolete_refcount(vd->vdev_child[c]); refcount += get_obsolete_refcount(vd->vdev_child[c]);
@ -1588,8 +1578,9 @@ dump_spacemap(objset_t *os, space_map_t *sm)
continue; continue;
} }
uint8_t words;
char entry_type; char entry_type;
uint64_t entry_off, entry_run, entry_vdev; uint64_t entry_off, entry_run, entry_vdev = SM_NO_VDEVID;
if (sm_entry_is_single_word(word)) { if (sm_entry_is_single_word(word)) {
entry_type = (SM_TYPE_DECODE(word) == SM_ALLOC) ? entry_type = (SM_TYPE_DECODE(word) == SM_ALLOC) ?
@ -1597,43 +1588,35 @@ dump_spacemap(objset_t *os, space_map_t *sm)
entry_off = (SM_OFFSET_DECODE(word) << mapshift) + entry_off = (SM_OFFSET_DECODE(word) << mapshift) +
sm->sm_start; sm->sm_start;
entry_run = SM_RUN_DECODE(word) << mapshift; entry_run = SM_RUN_DECODE(word) << mapshift;
words = 1;
(void) printf("\t [%6llu] %c "
"range: %012llx-%012llx size: %08llx\n",
(u_longlong_t)entry_id, entry_type,
(u_longlong_t)entry_off,
(u_longlong_t)(entry_off + entry_run - 1),
(u_longlong_t)entry_run);
} else { } else {
/* it is a two-word entry so we read another word */ /* it is a two-word entry so we read another word */
ASSERT(sm_entry_is_double_word(word)); ASSERT(sm_entry_is_double_word(word));
uint64_t extra_word; uint64_t extra_word;
offset += sizeof (extra_word); offset += sizeof (extra_word);
ASSERT3U(offset, <, space_map_length(sm));
VERIFY0(dmu_read(os, space_map_object(sm), offset, VERIFY0(dmu_read(os, space_map_object(sm), offset,
sizeof (extra_word), &extra_word, sizeof (extra_word), &extra_word,
DMU_READ_PREFETCH)); DMU_READ_PREFETCH));
ASSERT3U(offset, <=, space_map_length(sm));
entry_run = SM2_RUN_DECODE(word) << mapshift; entry_run = SM2_RUN_DECODE(word) << mapshift;
entry_vdev = SM2_VDEV_DECODE(word); entry_vdev = SM2_VDEV_DECODE(word);
entry_type = (SM2_TYPE_DECODE(extra_word) == SM_ALLOC) ? entry_type = (SM2_TYPE_DECODE(extra_word) == SM_ALLOC) ?
'A' : 'F'; 'A' : 'F';
entry_off = (SM2_OFFSET_DECODE(extra_word) << entry_off = (SM2_OFFSET_DECODE(extra_word) <<
mapshift) + sm->sm_start; mapshift) + sm->sm_start;
words = 2;
}
if (zopt_metaslab_args == 0 || (void) printf("\t [%6llu] %c range:"
zopt_metaslab[0] == entry_vdev) { " %010llx-%010llx size: %06llx vdev: %06llu words: %u\n",
(void) printf("\t [%6llu] %c " (u_longlong_t)entry_id,
"range: %012llx-%012llx size: %08llx " entry_type, (u_longlong_t)entry_off,
"vdev: %llu\n", (u_longlong_t)(entry_off + entry_run),
(u_longlong_t)entry_id, entry_type,
(u_longlong_t)entry_off,
(u_longlong_t)(entry_off + entry_run - 1),
(u_longlong_t)entry_run, (u_longlong_t)entry_run,
(u_longlong_t)entry_vdev); (u_longlong_t)entry_vdev, words);
}
}
if (entry_type == 'A') if (entry_type == 'A')
alloc += entry_run; alloc += entry_run;
@ -1668,16 +1651,6 @@ dump_metaslab_stats(metaslab_t *msp)
dump_histogram(rt->rt_histogram, ZFS_RANGE_TREE_HISTOGRAM_SIZE, 0); dump_histogram(rt->rt_histogram, ZFS_RANGE_TREE_HISTOGRAM_SIZE, 0);
} }
static void
dump_allocated(void *arg, uint64_t start, uint64_t size)
{
uint64_t *off = arg;
if (*off != start)
(void) printf("ALLOC: %"PRIu64" %"PRIu64"\n", *off,
start - *off);
*off = start + size;
}
static void static void
dump_metaslab(metaslab_t *msp) dump_metaslab(metaslab_t *msp)
{ {
@ -1694,24 +1667,13 @@ dump_metaslab(metaslab_t *msp)
(u_longlong_t)msp->ms_id, (u_longlong_t)msp->ms_start, (u_longlong_t)msp->ms_id, (u_longlong_t)msp->ms_start,
(u_longlong_t)space_map_object(sm), freebuf); (u_longlong_t)space_map_object(sm), freebuf);
if (dump_opt[ALLOCATED_OPT] || if (dump_opt['m'] > 2 && !dump_opt['L']) {
(dump_opt['m'] > 2 && !dump_opt['L'])) {
mutex_enter(&msp->ms_lock); mutex_enter(&msp->ms_lock);
VERIFY0(metaslab_load(msp)); VERIFY0(metaslab_load(msp));
}
if (dump_opt['m'] > 2 && !dump_opt['L']) {
zfs_range_tree_stat_verify(msp->ms_allocatable); zfs_range_tree_stat_verify(msp->ms_allocatable);
dump_metaslab_stats(msp); dump_metaslab_stats(msp);
} metaslab_unload(msp);
mutex_exit(&msp->ms_lock);
if (dump_opt[ALLOCATED_OPT]) {
uint64_t off = msp->ms_start;
zfs_range_tree_walk(msp->ms_allocatable, dump_allocated,
&off);
if (off != msp->ms_start + msp->ms_size)
(void) printf("ALLOC: %"PRIu64" %"PRIu64"\n", off,
msp->ms_size - off);
} }
if (dump_opt['m'] > 1 && sm != NULL && if (dump_opt['m'] > 1 && sm != NULL &&
@ -1726,12 +1688,6 @@ dump_metaslab(metaslab_t *msp)
SPACE_MAP_HISTOGRAM_SIZE, sm->sm_shift); SPACE_MAP_HISTOGRAM_SIZE, sm->sm_shift);
} }
if (dump_opt[ALLOCATED_OPT] ||
(dump_opt['m'] > 2 && !dump_opt['L'])) {
metaslab_unload(msp);
mutex_exit(&msp->ms_lock);
}
if (vd->vdev_ops == &vdev_draid_ops) if (vd->vdev_ops == &vdev_draid_ops)
ASSERT3U(msp->ms_size, <=, 1ULL << vd->vdev_ms_shift); ASSERT3U(msp->ms_size, <=, 1ULL << vd->vdev_ms_shift);
else else
@ -1768,9 +1724,8 @@ print_vdev_metaslab_header(vdev_t *vd)
} }
} }
(void) printf("\tvdev %10llu\t%s metaslab shift %4llu", (void) printf("\tvdev %10llu %s",
(u_longlong_t)vd->vdev_id, bias_str, (u_longlong_t)vd->vdev_id, bias_str);
(u_longlong_t)vd->vdev_ms_shift);
if (ms_flush_data_obj != 0) { if (ms_flush_data_obj != 0) {
(void) printf(" ms_unflushed_phys object %llu", (void) printf(" ms_unflushed_phys object %llu",
@ -1837,7 +1792,7 @@ print_vdev_indirect(vdev_t *vd)
vdev_indirect_births_t *vib = vd->vdev_indirect_births; vdev_indirect_births_t *vib = vd->vdev_indirect_births;
if (vim == NULL) { if (vim == NULL) {
ASSERT0P(vib); ASSERT3P(vib, ==, NULL);
return; return;
} }
@ -1910,7 +1865,7 @@ dump_metaslabs(spa_t *spa)
(void) printf("\nMetaslabs:\n"); (void) printf("\nMetaslabs:\n");
if (zopt_metaslab_args > 0) { if (!dump_opt['d'] && zopt_metaslab_args > 0) {
c = zopt_metaslab[0]; c = zopt_metaslab[0];
if (c >= children) if (c >= children)
@ -2037,7 +1992,7 @@ dump_ddt_log(ddt_t *ddt)
c += strlcpy(&flagstr[c], " UNKNOWN", c += strlcpy(&flagstr[c], " UNKNOWN",
sizeof (flagstr) - c); sizeof (flagstr) - c);
flagstr[1] = '['; flagstr[1] = '[';
flagstr[c] = ']'; flagstr[c++] = ']';
} }
uint64_t count = avl_numnodes(&ddl->ddl_tree); uint64_t count = avl_numnodes(&ddl->ddl_tree);
@ -2088,10 +2043,10 @@ dump_ddt_object(ddt_t *ddt, ddt_type_t type, ddt_class_t class)
if (error == ENOENT) if (error == ENOENT)
return; return;
ASSERT0(error); ASSERT(error == 0);
error = ddt_object_count(ddt, type, class, &count); error = ddt_object_count(ddt, type, class, &count);
ASSERT0(error); ASSERT(error == 0);
if (count == 0) if (count == 0)
return; return;
@ -2614,7 +2569,7 @@ snprintf_blkptr_compact(char *blkbuf, size_t buflen, const blkptr_t *bp,
(u_longlong_t)BP_GET_PSIZE(bp), (u_longlong_t)BP_GET_PSIZE(bp),
(u_longlong_t)BP_GET_FILL(bp), (u_longlong_t)BP_GET_FILL(bp),
(u_longlong_t)BP_GET_LOGICAL_BIRTH(bp), (u_longlong_t)BP_GET_LOGICAL_BIRTH(bp),
(u_longlong_t)BP_GET_PHYSICAL_BIRTH(bp)); (u_longlong_t)BP_GET_BIRTH(bp));
if (bp_freed) if (bp_freed)
(void) snprintf(blkbuf + strlen(blkbuf), (void) snprintf(blkbuf + strlen(blkbuf),
buflen - strlen(blkbuf), " %s", "FREE"); buflen - strlen(blkbuf), " %s", "FREE");
@ -2628,17 +2583,19 @@ snprintf_blkptr_compact(char *blkbuf, size_t buflen, const blkptr_t *bp,
} }
} }
static u_longlong_t static void
print_indirect(spa_t *spa, blkptr_t *bp, const zbookmark_phys_t *zb, print_indirect(spa_t *spa, blkptr_t *bp, const zbookmark_phys_t *zb,
const dnode_phys_t *dnp) const dnode_phys_t *dnp)
{ {
char blkbuf[BP_SPRINTF_LEN]; char blkbuf[BP_SPRINTF_LEN];
u_longlong_t offset;
int l; int l;
offset = (u_longlong_t)blkid2offset(dnp, bp, zb); if (!BP_IS_EMBEDDED(bp)) {
ASSERT3U(BP_GET_TYPE(bp), ==, dnp->dn_type);
ASSERT3U(BP_GET_LEVEL(bp), ==, zb->zb_level);
}
(void) printf("%16llx ", offset); (void) printf("%16llx ", (u_longlong_t)blkid2offset(dnp, bp, zb));
ASSERT(zb->zb_level >= 0); ASSERT(zb->zb_level >= 0);
@ -2653,38 +2610,19 @@ print_indirect(spa_t *spa, blkptr_t *bp, const zbookmark_phys_t *zb,
snprintf_blkptr_compact(blkbuf, sizeof (blkbuf), bp, B_FALSE); snprintf_blkptr_compact(blkbuf, sizeof (blkbuf), bp, B_FALSE);
if (dump_opt['Z'] && BP_GET_COMPRESS(bp) == ZIO_COMPRESS_ZSTD) if (dump_opt['Z'] && BP_GET_COMPRESS(bp) == ZIO_COMPRESS_ZSTD)
snprintf_zstd_header(spa, blkbuf, sizeof (blkbuf), bp); snprintf_zstd_header(spa, blkbuf, sizeof (blkbuf), bp);
(void) printf("%s", blkbuf); (void) printf("%s\n", blkbuf);
if (!BP_IS_EMBEDDED(bp)) {
if (BP_GET_TYPE(bp) != dnp->dn_type) {
(void) printf(" (ERROR: Block pointer type "
"(%llu) does not match dnode type (%hhu))",
BP_GET_TYPE(bp), dnp->dn_type);
corruption_found = B_TRUE;
}
if (BP_GET_LEVEL(bp) != zb->zb_level) {
(void) printf(" (ERROR: Block pointer level "
"(%llu) does not match bookmark level (%lld))",
BP_GET_LEVEL(bp), (longlong_t)zb->zb_level);
corruption_found = B_TRUE;
}
}
(void) printf("\n");
return (offset);
} }
static int static int
visit_indirect(spa_t *spa, const dnode_phys_t *dnp, visit_indirect(spa_t *spa, const dnode_phys_t *dnp,
blkptr_t *bp, const zbookmark_phys_t *zb) blkptr_t *bp, const zbookmark_phys_t *zb)
{ {
u_longlong_t offset;
int err = 0; int err = 0;
if (BP_GET_BIRTH(bp) == 0) if (BP_GET_LOGICAL_BIRTH(bp) == 0)
return (0); return (0);
offset = print_indirect(spa, bp, zb, dnp); print_indirect(spa, bp, zb, dnp);
if (BP_GET_LEVEL(bp) > 0 && !BP_IS_HOLE(bp)) { if (BP_GET_LEVEL(bp) > 0 && !BP_IS_HOLE(bp)) {
arc_flags_t flags = ARC_FLAG_WAIT; arc_flags_t flags = ARC_FLAG_WAIT;
@ -2714,15 +2652,8 @@ visit_indirect(spa_t *spa, const dnode_phys_t *dnp,
break; break;
fill += BP_GET_FILL(cbp); fill += BP_GET_FILL(cbp);
} }
if (!err) { if (!err)
if (fill != BP_GET_FILL(bp)) { ASSERT3U(fill, ==, BP_GET_FILL(bp));
(void) printf("%16llx: Block pointer "
"fill (%llu) does not match calculated "
"value (%llu)\n", offset, BP_GET_FILL(bp),
(u_longlong_t)fill);
corruption_found = B_TRUE;
}
}
arc_buf_destroy(buf, &buf); arc_buf_destroy(buf, &buf);
} }
@ -2876,7 +2807,7 @@ dump_bptree_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx)
(void) arg, (void) tx; (void) arg, (void) tx;
char blkbuf[BP_SPRINTF_LEN]; char blkbuf[BP_SPRINTF_LEN];
if (BP_GET_BIRTH(bp) != 0) { if (BP_GET_LOGICAL_BIRTH(bp) != 0) {
snprintf_blkptr(blkbuf, sizeof (blkbuf), bp); snprintf_blkptr(blkbuf, sizeof (blkbuf), bp);
(void) printf("\t%s\n", blkbuf); (void) printf("\t%s\n", blkbuf);
} }
@ -2917,7 +2848,7 @@ dump_bpobj_cb(void *arg, const blkptr_t *bp, boolean_t bp_freed, dmu_tx_t *tx)
(void) arg, (void) tx; (void) arg, (void) tx;
char blkbuf[BP_SPRINTF_LEN]; char blkbuf[BP_SPRINTF_LEN];
ASSERT(BP_GET_BIRTH(bp) != 0); ASSERT(BP_GET_LOGICAL_BIRTH(bp) != 0);
snprintf_blkptr_compact(blkbuf, sizeof (blkbuf), bp, bp_freed); snprintf_blkptr_compact(blkbuf, sizeof (blkbuf), bp, bp_freed);
(void) printf("\t%s\n", blkbuf); (void) printf("\t%s\n", blkbuf);
return (0); return (0);
@ -2978,7 +2909,6 @@ dump_full_bpobj(bpobj_t *bpo, const char *name, int indent)
(void) printf("ERROR %u while trying to open " (void) printf("ERROR %u while trying to open "
"subobj id %llu\n", "subobj id %llu\n",
error, (u_longlong_t)subobj); error, (u_longlong_t)subobj);
corruption_found = B_TRUE;
continue; continue;
} }
dump_full_bpobj(&subbpo, "subobj", indent + 1); dump_full_bpobj(&subbpo, "subobj", indent + 1);
@ -3158,7 +3088,6 @@ bpobj_count_refd(bpobj_t *bpo)
(void) printf("ERROR %u while trying to open " (void) printf("ERROR %u while trying to open "
"subobj id %llu\n", "subobj id %llu\n",
error, (u_longlong_t)subobj); error, (u_longlong_t)subobj);
corruption_found = B_TRUE;
continue; continue;
} }
bpobj_count_refd(&subbpo); bpobj_count_refd(&subbpo);
@ -3180,7 +3109,7 @@ dsl_deadlist_entry_count_refd(void *arg, dsl_deadlist_entry_t *dle)
static int static int
dsl_deadlist_entry_dump(void *arg, dsl_deadlist_entry_t *dle) dsl_deadlist_entry_dump(void *arg, dsl_deadlist_entry_t *dle)
{ {
ASSERT0P(arg); ASSERT(arg == NULL);
if (dump_opt['d'] >= 5) { if (dump_opt['d'] >= 5) {
char buf[128]; char buf[128];
(void) snprintf(buf, sizeof (buf), (void) snprintf(buf, sizeof (buf),
@ -3301,7 +3230,6 @@ zdb_derive_key(dsl_dir_t *dd, uint8_t *key_out)
uint64_t keyformat, salt, iters; uint64_t keyformat, salt, iters;
int i; int i;
unsigned char c; unsigned char c;
FILE *f;
VERIFY0(zap_lookup(dd->dd_pool->dp_meta_objset, dd->dd_crypto_obj, VERIFY0(zap_lookup(dd->dd_pool->dp_meta_objset, dd->dd_crypto_obj,
zfs_prop_to_name(ZFS_PROP_KEYFORMAT), sizeof (uint64_t), zfs_prop_to_name(ZFS_PROP_KEYFORMAT), sizeof (uint64_t),
@ -3334,25 +3262,6 @@ zdb_derive_key(dsl_dir_t *dd, uint8_t *key_out)
break; break;
case ZFS_KEYFORMAT_RAW:
if ((f = fopen(key_material, "r")) == NULL)
return (B_FALSE);
if (fread(key_out, 1, WRAPPING_KEY_LEN, f) !=
WRAPPING_KEY_LEN) {
(void) fclose(f);
return (B_FALSE);
}
/* Check the key length */
if (fgetc(f) != EOF) {
(void) fclose(f);
return (B_FALSE);
}
(void) fclose(f);
break;
default: default:
fatal("no support for key format %u\n", fatal("no support for key format %u\n",
(unsigned int) keyformat); (unsigned int) keyformat);
@ -3438,7 +3347,7 @@ open_objset(const char *path, const void *tag, objset_t **osp)
uint64_t sa_attrs = 0; uint64_t sa_attrs = 0;
uint64_t version = 0; uint64_t version = 0;
VERIFY0P(sa_os); VERIFY3P(sa_os, ==, NULL);
/* /*
* We can't own an objset if it's redacted. Therefore, we do this * We can't own an objset if it's redacted. Therefore, we do this
@ -3611,8 +3520,8 @@ dump_uidgid(objset_t *os, uint64_t uid, uint64_t gid)
uint64_t fuid_obj; uint64_t fuid_obj;
/* first find the fuid object. It lives in the master node */ /* first find the fuid object. It lives in the master node */
VERIFY0(zap_lookup(os, MASTER_NODE_OBJ, ZFS_FUID_TABLES, VERIFY(zap_lookup(os, MASTER_NODE_OBJ, ZFS_FUID_TABLES,
8, 1, &fuid_obj)); 8, 1, &fuid_obj) == 0);
zfs_fuid_avl_tree_create(&idx_tree, &domain_tree); zfs_fuid_avl_tree_create(&idx_tree, &domain_tree);
(void) zfs_fuid_table_load(os, fuid_obj, (void) zfs_fuid_table_load(os, fuid_obj,
&idx_tree, &domain_tree); &idx_tree, &domain_tree);
@ -6013,11 +5922,11 @@ zdb_count_block(zdb_cb_t *zcb, zilog_t *zilog, const blkptr_t *bp,
* entry back to the block pointer before we claim it. * entry back to the block pointer before we claim it.
*/ */
if (v == DDT_PHYS_FLAT) { if (v == DDT_PHYS_FLAT) {
ASSERT3U(BP_GET_PHYSICAL_BIRTH(bp), ==, ASSERT3U(BP_GET_BIRTH(bp), ==,
ddt_phys_birth(dde->dde_phys, v)); ddt_phys_birth(dde->dde_phys, v));
tempbp = *bp; tempbp = *bp;
ddt_bp_fill(dde->dde_phys, v, &tempbp, ddt_bp_fill(dde->dde_phys, v, &tempbp,
BP_GET_PHYSICAL_BIRTH(bp)); BP_GET_BIRTH(bp));
bp = &tempbp; bp = &tempbp;
} }
@ -6243,7 +6152,7 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp,
if (zb->zb_level == ZB_DNODE_LEVEL) if (zb->zb_level == ZB_DNODE_LEVEL)
return (0); return (0);
if (dump_opt['b'] >= 5 && BP_GET_BIRTH(bp) > 0) { if (dump_opt['b'] >= 5 && BP_GET_LOGICAL_BIRTH(bp) > 0) {
char blkbuf[BP_SPRINTF_LEN]; char blkbuf[BP_SPRINTF_LEN];
snprintf_blkptr(blkbuf, sizeof (blkbuf), bp); snprintf_blkptr(blkbuf, sizeof (blkbuf), bp);
(void) printf("objset %llu object %llu " (void) printf("objset %llu object %llu "
@ -6843,7 +6752,6 @@ zdb_leak_init(spa_t *spa, zdb_cb_t *zcb)
spa->spa_normal_class->mc_ops = &zdb_metaslab_ops; spa->spa_normal_class->mc_ops = &zdb_metaslab_ops;
spa->spa_log_class->mc_ops = &zdb_metaslab_ops; spa->spa_log_class->mc_ops = &zdb_metaslab_ops;
spa->spa_embedded_log_class->mc_ops = &zdb_metaslab_ops; spa->spa_embedded_log_class->mc_ops = &zdb_metaslab_ops;
spa->spa_special_embedded_log_class->mc_ops = &zdb_metaslab_ops;
zcb->zcb_vd_obsolete_counts = zcb->zcb_vd_obsolete_counts =
umem_zalloc(rvd->vdev_children * sizeof (uint32_t *), umem_zalloc(rvd->vdev_children * sizeof (uint32_t *),
@ -6981,9 +6889,7 @@ zdb_leak_fini(spa_t *spa, zdb_cb_t *zcb)
for (uint64_t m = 0; m < vd->vdev_ms_count; m++) { for (uint64_t m = 0; m < vd->vdev_ms_count; m++) {
metaslab_t *msp = vd->vdev_ms[m]; metaslab_t *msp = vd->vdev_ms[m];
ASSERT3P(msp->ms_group, ==, (msp->ms_group->mg_class == ASSERT3P(msp->ms_group, ==, (msp->ms_group->mg_class ==
spa_embedded_log_class(spa) || spa_embedded_log_class(spa)) ?
msp->ms_group->mg_class ==
spa_special_embedded_log_class(spa)) ?
vd->vdev_log_mg : vd->vdev_mg); vd->vdev_log_mg : vd->vdev_mg);
/* /*
@ -7107,7 +7013,7 @@ deleted_livelists_count_blocks(spa_t *spa, zdb_cb_t *zbc)
static void static void
dump_livelist_cb(dsl_deadlist_t *ll, void *arg) dump_livelist_cb(dsl_deadlist_t *ll, void *arg)
{ {
ASSERT0P(arg); ASSERT3P(arg, ==, NULL);
global_feature_count[SPA_FEATURE_LIVELIST]++; global_feature_count[SPA_FEATURE_LIVELIST]++;
dump_blkptr_list(ll, "Deleted Livelist"); dump_blkptr_list(ll, "Deleted Livelist");
dsl_deadlist_iterate(ll, sublivelist_verify_lightweight, NULL); dsl_deadlist_iterate(ll, sublivelist_verify_lightweight, NULL);
@ -7217,8 +7123,6 @@ dump_block_stats(spa_t *spa)
zcb->zcb_totalasize += metaslab_class_get_alloc(spa_dedup_class(spa)); zcb->zcb_totalasize += metaslab_class_get_alloc(spa_dedup_class(spa));
zcb->zcb_totalasize += zcb->zcb_totalasize +=
metaslab_class_get_alloc(spa_embedded_log_class(spa)); metaslab_class_get_alloc(spa_embedded_log_class(spa));
zcb->zcb_totalasize +=
metaslab_class_get_alloc(spa_special_embedded_log_class(spa));
zcb->zcb_start = zcb->zcb_lastprint = gethrtime(); zcb->zcb_start = zcb->zcb_lastprint = gethrtime();
err = traverse_pool(spa, 0, flags, zdb_blkptr_cb, zcb); err = traverse_pool(spa, 0, flags, zdb_blkptr_cb, zcb);
@ -7267,7 +7171,6 @@ dump_block_stats(spa_t *spa)
total_alloc = norm_alloc + total_alloc = norm_alloc +
metaslab_class_get_alloc(spa_log_class(spa)) + metaslab_class_get_alloc(spa_log_class(spa)) +
metaslab_class_get_alloc(spa_embedded_log_class(spa)) + metaslab_class_get_alloc(spa_embedded_log_class(spa)) +
metaslab_class_get_alloc(spa_special_embedded_log_class(spa)) +
metaslab_class_get_alloc(spa_special_class(spa)) + metaslab_class_get_alloc(spa_special_class(spa)) +
metaslab_class_get_alloc(spa_dedup_class(spa)) + metaslab_class_get_alloc(spa_dedup_class(spa)) +
get_unflushed_alloc_space(spa); get_unflushed_alloc_space(spa);
@ -7351,18 +7254,6 @@ dump_block_stats(spa_t *spa)
100.0 * alloc / space); 100.0 * alloc / space);
} }
if (spa_special_embedded_log_class(spa)->mc_allocator[0].mca_rotor
!= NULL) {
uint64_t alloc = metaslab_class_get_alloc(
spa_special_embedded_log_class(spa));
uint64_t space = metaslab_class_get_space(
spa_special_embedded_log_class(spa));
(void) printf("\t%-16s %14llu used: %5.2f%%\n",
"Special embedded log", (u_longlong_t)alloc,
100.0 * alloc / space);
}
for (i = 0; i < NUM_BP_EMBEDDED_TYPES; i++) { for (i = 0; i < NUM_BP_EMBEDDED_TYPES; i++) {
if (zcb->zcb_embedded_blocks[i] == 0) if (zcb->zcb_embedded_blocks[i] == 0)
continue; continue;
@ -8004,7 +7895,7 @@ verify_checkpoint_vdev_spacemaps(spa_t *checkpoint, spa_t *current)
for (uint64_t c = ckpoint_rvd->vdev_children; for (uint64_t c = ckpoint_rvd->vdev_children;
c < current_rvd->vdev_children; c++) { c < current_rvd->vdev_children; c++) {
vdev_t *current_vd = current_rvd->vdev_child[c]; vdev_t *current_vd = current_rvd->vdev_child[c];
VERIFY0P(current_vd->vdev_checkpoint_sm); VERIFY3P(current_vd->vdev_checkpoint_sm, ==, NULL);
} }
} }
@ -8702,9 +8593,9 @@ zdb_dump_indirect(blkptr_t *bp, int nbps, int flags)
} }
static void static void
zdb_dump_gbh(void *buf, uint64_t size, int flags) zdb_dump_gbh(void *buf, int flags)
{ {
zdb_dump_indirect((blkptr_t *)buf, gbh_nblkptrs(size), flags); zdb_dump_indirect((blkptr_t *)buf, SPA_GBH_NBLKPTRS, flags);
} }
static void static void
@ -8894,6 +8785,7 @@ zdb_decompress_block(abd_t *pabd, void *buf, void *lbuf, uint64_t lsize,
(void) buf; (void) buf;
uint64_t orig_lsize = lsize; uint64_t orig_lsize = lsize;
boolean_t tryzle = ((getenv("ZDB_NO_ZLE") == NULL)); boolean_t tryzle = ((getenv("ZDB_NO_ZLE") == NULL));
boolean_t found = B_FALSE;
/* /*
* We don't know how the data was compressed, so just try * We don't know how the data was compressed, so just try
* every decompress function at every inflated blocksize. * every decompress function at every inflated blocksize.
@ -8936,19 +8828,20 @@ zdb_decompress_block(abd_t *pabd, void *buf, void *lbuf, uint64_t lsize,
for (cfuncp = cfuncs; *cfuncp; cfuncp++) { for (cfuncp = cfuncs; *cfuncp; cfuncp++) {
if (try_decompress_block(pabd, lsize, psize, flags, if (try_decompress_block(pabd, lsize, psize, flags,
*cfuncp, lbuf, lbuf2)) { *cfuncp, lbuf, lbuf2)) {
tryzle = B_FALSE; found = B_TRUE;
break; break;
} }
} }
if (*cfuncp != 0) if (*cfuncp != 0)
break; break;
} }
if (tryzle) { if (!found && tryzle) {
for (lsize = orig_lsize; lsize <= maxlsize; for (lsize = orig_lsize; lsize <= maxlsize;
lsize += SPA_MINBLOCKSIZE) { lsize += SPA_MINBLOCKSIZE) {
if (try_decompress_block(pabd, lsize, psize, flags, if (try_decompress_block(pabd, lsize, psize, flags,
ZIO_COMPRESS_ZLE, lbuf, lbuf2)) { ZIO_COMPRESS_ZLE, lbuf, lbuf2)) {
*cfuncp = ZIO_COMPRESS_ZLE; *cfuncp = ZIO_COMPRESS_ZLE;
found = B_TRUE;
break; break;
} }
} }
@ -9185,7 +9078,7 @@ zdb_read_block(char *thing, spa_t *spa)
zdb_dump_indirect((blkptr_t *)buf, zdb_dump_indirect((blkptr_t *)buf,
orig_lsize / sizeof (blkptr_t), flags); orig_lsize / sizeof (blkptr_t), flags);
else if (flags & ZDB_FLAG_GBH) else if (flags & ZDB_FLAG_GBH)
zdb_dump_gbh(buf, lsize, flags); zdb_dump_gbh(buf, flags);
else else
zdb_dump_block(thing, buf, lsize, flags); zdb_dump_block(thing, buf, lsize, flags);
@ -9425,8 +9318,6 @@ main(int argc, char **argv)
{"all-reconstruction", no_argument, NULL, 'Y'}, {"all-reconstruction", no_argument, NULL, 'Y'},
{"livelist", no_argument, NULL, 'y'}, {"livelist", no_argument, NULL, 'y'},
{"zstd-headers", no_argument, NULL, 'Z'}, {"zstd-headers", no_argument, NULL, 'Z'},
{"allocated-map", no_argument, NULL,
ALLOCATED_OPT},
{0, 0, 0, 0} {0, 0, 0, 0}
}; };
@ -9457,7 +9348,6 @@ main(int argc, char **argv)
case 'u': case 'u':
case 'y': case 'y':
case 'Z': case 'Z':
case ALLOCATED_OPT:
dump_opt[c]++; dump_opt[c]++;
dump_all = 0; dump_all = 0;
break; break;
@ -9492,11 +9382,9 @@ main(int argc, char **argv)
while (*optarg != '\0') { *optarg++ = '*'; } while (*optarg != '\0') { *optarg++ = '*'; }
break; break;
case 'o': case 'o':
dump_opt[c]++; error = set_global_var(optarg);
dump_all = 0;
error = handle_tunable_option(optarg, B_FALSE);
if (error != 0) if (error != 0)
zdb_exit(1); usage();
break; break;
case 'p': case 'p':
if (searchdirs == NULL) { if (searchdirs == NULL) {
@ -9662,12 +9550,6 @@ main(int argc, char **argv)
error = 0; error = 0;
goto fini; goto fini;
} }
if (dump_opt['o'])
/*
* Avoid blasting tunable options off the top of the
* screen.
*/
zdb_exit(1);
usage(); usage();
} }
@ -9728,7 +9610,7 @@ main(int argc, char **argv)
} else if (objset_str && !zdb_numeric(objset_str + 1) && } else if (objset_str && !zdb_numeric(objset_str + 1) &&
dump_opt['N']) { dump_opt['N']) {
printf("Supply a numeric objset ID with -N\n"); printf("Supply a numeric objset ID with -N\n");
error = 2; error = 1;
goto fini; goto fini;
} }
} else { } else {
@ -9837,7 +9719,7 @@ main(int argc, char **argv)
if (error == 0) { if (error == 0) {
if (dump_opt['k'] && (target_is_spa || dump_opt['R'])) { if (dump_opt['k'] && (target_is_spa || dump_opt['R'])) {
ASSERT(checkpoint_pool != NULL); ASSERT(checkpoint_pool != NULL);
ASSERT0P(checkpoint_target); ASSERT(checkpoint_target == NULL);
error = spa_open(checkpoint_pool, &spa, FTAG); error = spa_open(checkpoint_pool, &spa, FTAG);
if (error != 0) { if (error != 0) {
@ -10030,8 +9912,5 @@ fini:
if (kernel_init_done) if (kernel_init_done)
kernel_fini(); kernel_fini();
if (corruption_found && error == 0)
error = 3;
return (error); return (error);
} }

View File

@ -29,6 +29,6 @@
#define _ZDB_H #define _ZDB_H
void dump_intent_log(zilog_t *); void dump_intent_log(zilog_t *);
extern uint8_t dump_opt[512]; extern uint8_t dump_opt[256];
#endif /* _ZDB_H */ #endif /* _ZDB_H */

View File

@ -48,6 +48,8 @@
#include "zdb.h" #include "zdb.h"
extern uint8_t dump_opt[256];
static char tab_prefix[4] = "\t\t\t"; static char tab_prefix[4] = "\t\t\t";
static void static void
@ -174,7 +176,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) { if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
(void) printf("%shas blkptr, %s\n", tab_prefix, (void) printf("%shas blkptr, %s\n", tab_prefix,
!BP_IS_HOLE(bp) && BP_GET_BIRTH(bp) >= !BP_IS_HOLE(bp) && BP_GET_LOGICAL_BIRTH(bp) >=
spa_min_claim_txg(zilog->zl_spa) ? spa_min_claim_txg(zilog->zl_spa) ?
"will claim" : "won't claim"); "will claim" : "won't claim");
print_log_bp(bp, tab_prefix); print_log_bp(bp, tab_prefix);
@ -187,7 +189,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
(void) printf("%s<hole>\n", tab_prefix); (void) printf("%s<hole>\n", tab_prefix);
return; return;
} }
if (BP_GET_BIRTH(bp) < zilog->zl_header->zh_claim_txg) { if (BP_GET_LOGICAL_BIRTH(bp) < zilog->zl_header->zh_claim_txg) {
(void) printf("%s<block already committed>\n", (void) printf("%s<block already committed>\n",
tab_prefix); tab_prefix);
return; return;
@ -238,7 +240,7 @@ zil_prt_rec_write_enc(zilog_t *zilog, int txtype, const void *arg)
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) { if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
(void) printf("%shas blkptr, %s\n", tab_prefix, (void) printf("%shas blkptr, %s\n", tab_prefix,
!BP_IS_HOLE(bp) && BP_GET_BIRTH(bp) >= !BP_IS_HOLE(bp) && BP_GET_LOGICAL_BIRTH(bp) >=
spa_min_claim_txg(zilog->zl_spa) ? spa_min_claim_txg(zilog->zl_spa) ?
"will claim" : "won't claim"); "will claim" : "won't claim");
print_log_bp(bp, tab_prefix); print_log_bp(bp, tab_prefix);
@ -474,7 +476,7 @@ print_log_block(zilog_t *zilog, const blkptr_t *bp, void *arg,
if (claim_txg != 0) if (claim_txg != 0)
claim = "already claimed"; claim = "already claimed";
else if (BP_GET_BIRTH(bp) >= spa_min_claim_txg(zilog->zl_spa)) else if (BP_GET_LOGICAL_BIRTH(bp) >= spa_min_claim_txg(zilog->zl_spa))
claim = "will claim"; claim = "will claim";
else else
claim = "won't claim"; claim = "won't claim";

View File

@ -9,18 +9,18 @@ dist_zedexec_SCRIPTS = \
%D%/all-debug.sh \ %D%/all-debug.sh \
%D%/all-syslog.sh \ %D%/all-syslog.sh \
%D%/data-notify.sh \ %D%/data-notify.sh \
%D%/deadman-sync-slot_off.sh \ %D%/deadman-slot_off.sh \
%D%/generic-notify.sh \ %D%/generic-notify.sh \
%D%/pool_import-sync-led.sh \ %D%/pool_import-led.sh \
%D%/resilver_finish-notify.sh \ %D%/resilver_finish-notify.sh \
%D%/resilver_finish-start-scrub.sh \ %D%/resilver_finish-start-scrub.sh \
%D%/scrub_finish-notify.sh \ %D%/scrub_finish-notify.sh \
%D%/statechange-sync-led.sh \ %D%/statechange-led.sh \
%D%/statechange-notify.sh \ %D%/statechange-notify.sh \
%D%/statechange-sync-slot_off.sh \ %D%/statechange-slot_off.sh \
%D%/trim_finish-notify.sh \ %D%/trim_finish-notify.sh \
%D%/vdev_attach-sync-led.sh \ %D%/vdev_attach-led.sh \
%D%/vdev_clear-sync-led.sh %D%/vdev_clear-led.sh
nodist_zedexec_SCRIPTS = \ nodist_zedexec_SCRIPTS = \
%D%/history_event-zfs-list-cacher.sh %D%/history_event-zfs-list-cacher.sh
@ -30,17 +30,17 @@ SUBSTFILES += $(nodist_zedexec_SCRIPTS)
zedconfdefaults = \ zedconfdefaults = \
all-syslog.sh \ all-syslog.sh \
data-notify.sh \ data-notify.sh \
deadman-sync-slot_off.sh \ deadman-slot_off.sh \
history_event-zfs-list-cacher.sh \ history_event-zfs-list-cacher.sh \
pool_import-sync-led.sh \ pool_import-led.sh \
resilver_finish-notify.sh \ resilver_finish-notify.sh \
resilver_finish-start-scrub.sh \ resilver_finish-start-scrub.sh \
scrub_finish-notify.sh \ scrub_finish-notify.sh \
statechange-sync-led.sh \ statechange-led.sh \
statechange-notify.sh \ statechange-notify.sh \
statechange-sync-slot_off.sh \ statechange-slot_off.sh \
vdev_attach-sync-led.sh \ vdev_attach-led.sh \
vdev_clear-sync-led.sh vdev_clear-led.sh
dist_noinst_DATA += %D%/README dist_noinst_DATA += %D%/README

View File

@ -0,0 +1 @@
statechange-led.sh

View File

@ -1 +0,0 @@
statechange-sync-led.sh

View File

@ -0,0 +1 @@
statechange-led.sh

View File

@ -1 +0,0 @@
statechange-sync-led.sh

View File

@ -0,0 +1 @@
statechange-led.sh

View File

@ -1 +0,0 @@
statechange-sync-led.sh

View File

@ -110,7 +110,7 @@ zed_event_fini(struct zed_conf *zcp)
static void static void
_bump_event_queue_length(void) _bump_event_queue_length(void)
{ {
int zzlm, wr; int zzlm = -1, wr;
char qlen_buf[12] = {0}; /* parameter is int => max "-2147483647\n" */ char qlen_buf[12] = {0}; /* parameter is int => max "-2147483647\n" */
long int qlen, orig_qlen; long int qlen, orig_qlen;

View File

@ -196,29 +196,37 @@ _nop(int sig)
(void) sig; (void) sig;
} }
static void static void *
wait_for_children(boolean_t do_pause, boolean_t wait) _reap_children(void *arg)
{ {
pid_t pid; (void) arg;
struct rusage usage;
int status;
struct launched_process_node node, *pnode; struct launched_process_node node, *pnode;
pid_t pid;
int status;
struct rusage usage;
struct sigaction sa = {};
(void) sigfillset(&sa.sa_mask);
(void) sigdelset(&sa.sa_mask, SIGCHLD);
(void) pthread_sigmask(SIG_SETMASK, &sa.sa_mask, NULL);
(void) sigemptyset(&sa.sa_mask);
sa.sa_handler = _nop;
sa.sa_flags = SA_NOCLDSTOP;
(void) sigaction(SIGCHLD, &sa, NULL);
for (_reap_children_stop = B_FALSE; !_reap_children_stop; ) { for (_reap_children_stop = B_FALSE; !_reap_children_stop; ) {
(void) pthread_mutex_lock(&_launched_processes_lock); (void) pthread_mutex_lock(&_launched_processes_lock);
pid = wait4(0, &status, wait ? 0 : WNOHANG, &usage); pid = wait4(0, &status, WNOHANG, &usage);
if (pid == 0 || pid == (pid_t)-1) { if (pid == 0 || pid == (pid_t)-1) {
(void) pthread_mutex_unlock(&_launched_processes_lock); (void) pthread_mutex_unlock(&_launched_processes_lock);
if ((pid == 0) || (errno == ECHILD)) { if (pid == 0 || errno == ECHILD)
if (do_pause)
pause(); pause();
} else if (errno != EINTR) else if (errno != EINTR)
zed_log_msg(LOG_WARNING, zed_log_msg(LOG_WARNING,
"Failed to wait for children: %s", "Failed to wait for children: %s",
strerror(errno)); strerror(errno));
if (!do_pause)
return;
} else { } else {
memset(&node, 0, sizeof (node)); memset(&node, 0, sizeof (node));
node.pid = pid; node.pid = pid;
@ -270,25 +278,6 @@ wait_for_children(boolean_t do_pause, boolean_t wait)
} }
} }
}
static void *
_reap_children(void *arg)
{
(void) arg;
struct sigaction sa = {};
(void) sigfillset(&sa.sa_mask);
(void) sigdelset(&sa.sa_mask, SIGCHLD);
(void) pthread_sigmask(SIG_SETMASK, &sa.sa_mask, NULL);
(void) sigemptyset(&sa.sa_mask);
sa.sa_handler = _nop;
sa.sa_flags = SA_NOCLDSTOP;
(void) sigaction(SIGCHLD, &sa, NULL);
wait_for_children(B_TRUE, B_FALSE);
return (NULL); return (NULL);
} }
@ -317,45 +306,6 @@ zed_exec_fini(void)
_reap_children_tid = (pthread_t)-1; _reap_children_tid = (pthread_t)-1;
} }
/*
* Check if the zedlet name indicates if it is a synchronous zedlet
*
* Synchronous zedlets have a "-sync-" immediately following the event name in
* their zedlet filename, like:
*
* EVENT_NAME-sync-ZEDLETNAME.sh
*
* For example, if you wanted a synchronous statechange script:
*
* statechange-sync-myzedlet.sh
*
* Synchronous zedlets are guaranteed to be the only zedlet running. No other
* zedlets may run in parallel with a synchronous zedlet. A synchronous
* zedlet will wait for all previously spawned zedlets to finish before running.
* Users should be careful to only use synchronous zedlets when needed, since
* they decrease parallelism.
*/
static boolean_t
zedlet_is_sync(const char *zedlet, const char *event)
{
const char *sync_str = "-sync-";
size_t sync_str_len;
size_t zedlet_len;
size_t event_len;
sync_str_len = strlen(sync_str);
zedlet_len = strlen(zedlet);
event_len = strlen(event);
if (event_len + sync_str_len >= zedlet_len)
return (B_FALSE);
if (strncmp(&zedlet[event_len], sync_str, sync_str_len) == 0)
return (B_TRUE);
return (B_FALSE);
}
/* /*
* Process the event [eid] by synchronously invoking all zedlets with a * Process the event [eid] by synchronously invoking all zedlets with a
* matching class prefix. * matching class prefix.
@ -418,28 +368,9 @@ zed_exec_process(uint64_t eid, const char *class, const char *subclass,
z = zed_strings_next(zcp->zedlets)) { z = zed_strings_next(zcp->zedlets)) {
for (csp = class_strings; *csp; csp++) { for (csp = class_strings; *csp; csp++) {
n = strlen(*csp); n = strlen(*csp);
if ((strncmp(z, *csp, n) == 0) && !isalpha(z[n])) { if ((strncmp(z, *csp, n) == 0) && !isalpha(z[n]))
boolean_t is_sync = zedlet_is_sync(z, *csp);
if (is_sync) {
/*
* Wait for previous zedlets to
* finish
*/
wait_for_children(B_FALSE, B_TRUE);
}
_zed_exec_fork_child(eid, zcp->zedlet_dir, _zed_exec_fork_child(eid, zcp->zedlet_dir,
z, e, zcp->zevent_fd, zcp->do_foreground); z, e, zcp->zevent_fd, zcp->do_foreground);
if (is_sync) {
/*
* Wait for sync zedlet we just launched
* to finish.
*/
wait_for_children(B_FALSE, B_TRUE);
}
}
} }
} }
free(e); free(e);

View File

@ -440,7 +440,7 @@ get_usage(zfs_help_t idx)
return (gettext("\tredact <snapshot> <bookmark> " return (gettext("\tredact <snapshot> <bookmark> "
"<redaction_snapshot> ...\n")); "<redaction_snapshot> ...\n"));
case HELP_REWRITE: case HELP_REWRITE:
return (gettext("\trewrite [-Prvx] [-o <offset>] [-l <length>] " return (gettext("\trewrite [-rvx] [-o <offset>] [-l <length>] "
"<directory|file ...>\n")); "<directory|file ...>\n"));
case HELP_JAIL: case HELP_JAIL:
return (gettext("\tjail <jailid|jailname> <filesystem>\n")); return (gettext("\tjail <jailid|jailname> <filesystem>\n"));
@ -923,22 +923,26 @@ zfs_do_clone(int argc, char **argv)
return (!!ret); return (!!ret);
usage: usage:
ASSERT0P(zhp); ASSERT3P(zhp, ==, NULL);
nvlist_free(props); nvlist_free(props);
usage(B_FALSE); usage(B_FALSE);
return (-1); return (-1);
} }
/* /*
* Calculate the minimum allocation size based on the top-level vdevs. * Return a default volblocksize for the pool which always uses more than
* half of the data sectors. This primarily applies to dRAID which always
* writes full stripe widths.
*/ */
static uint64_t static uint64_t
calculate_volblocksize(nvlist_t *config) default_volblocksize(zpool_handle_t *zhp, nvlist_t *props)
{ {
uint64_t asize = SPA_MINBLOCKSIZE; uint64_t volblocksize, asize = SPA_MINBLOCKSIZE;
nvlist_t *tree, **vdevs; nvlist_t *tree, **vdevs;
uint_t nvdevs; uint_t nvdevs;
nvlist_t *config = zpool_get_config(zhp, NULL);
if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, &tree) != 0 || if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, &tree) != 0 ||
nvlist_lookup_nvlist_array(tree, ZPOOL_CONFIG_CHILDREN, nvlist_lookup_nvlist_array(tree, ZPOOL_CONFIG_CHILDREN,
&vdevs, &nvdevs) != 0) { &vdevs, &nvdevs) != 0) {
@ -969,24 +973,6 @@ calculate_volblocksize(nvlist_t *config)
} }
} }
return (asize);
}
/*
* Return a default volblocksize for the pool which always uses more than
* half of the data sectors. This primarily applies to dRAID which always
* writes full stripe widths.
*/
static uint64_t
default_volblocksize(zpool_handle_t *zhp, nvlist_t *props)
{
uint64_t volblocksize, asize = SPA_MINBLOCKSIZE;
nvlist_t *config = zpool_get_config(zhp, NULL);
if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_MAX_ALLOC, &asize) != 0)
asize = calculate_volblocksize(config);
/* /*
* Calculate the target volblocksize such that more than half * Calculate the target volblocksize such that more than half
* of the asize is used. The following table is for 4k sectors. * of the asize is used. The following table is for 4k sectors.
@ -1988,8 +1974,9 @@ fill_dataset_info(nvlist_t *list, zfs_handle_t *zhp, boolean_t as_int)
} }
if (type == ZFS_TYPE_SNAPSHOT) { if (type == ZFS_TYPE_SNAPSHOT) {
char *snap = strdup(zfs_get_name(zhp)); char *ds, *snap;
char *ds = strsep(&snap, "@"); ds = snap = strdup(zfs_get_name(zhp));
ds = strsep(&snap, "@");
fnvlist_add_string(list, "dataset", ds); fnvlist_add_string(list, "dataset", ds);
fnvlist_add_string(list, "snapshot_name", snap); fnvlist_add_string(list, "snapshot_name", snap);
free(ds); free(ds);
@ -2032,7 +2019,8 @@ get_callback(zfs_handle_t *zhp, void *data)
nvlist_t *user_props = zfs_get_user_props(zhp); nvlist_t *user_props = zfs_get_user_props(zhp);
zprop_list_t *pl = cbp->cb_proplist; zprop_list_t *pl = cbp->cb_proplist;
nvlist_t *propval; nvlist_t *propval;
nvlist_t *item, *d = NULL, *props = NULL; nvlist_t *item, *d, *props;
item = d = props = NULL;
const char *strval; const char *strval;
const char *sourceval; const char *sourceval;
boolean_t received = is_recvd_column(cbp); boolean_t received = is_recvd_column(cbp);
@ -3008,8 +2996,7 @@ us_type2str(unsigned field_type)
} }
static int static int
userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space, userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space)
uint64_t default_quota)
{ {
us_cbdata_t *cb = (us_cbdata_t *)arg; us_cbdata_t *cb = (us_cbdata_t *)arg;
zfs_userquota_prop_t prop = cb->cb_prop; zfs_userquota_prop_t prop = cb->cb_prop;
@ -3165,7 +3152,7 @@ userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space,
prop == ZFS_PROP_PROJECTUSED) { prop == ZFS_PROP_PROJECTUSED) {
propname = "used"; propname = "used";
if (!nvlist_exists(props, "quota")) if (!nvlist_exists(props, "quota"))
(void) nvlist_add_uint64(props, "quota", default_quota); (void) nvlist_add_uint64(props, "quota", 0);
} else if (prop == ZFS_PROP_USERQUOTA || prop == ZFS_PROP_GROUPQUOTA || } else if (prop == ZFS_PROP_USERQUOTA || prop == ZFS_PROP_GROUPQUOTA ||
prop == ZFS_PROP_PROJECTQUOTA) { prop == ZFS_PROP_PROJECTQUOTA) {
propname = "quota"; propname = "quota";
@ -3174,10 +3161,8 @@ userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space,
} else if (prop == ZFS_PROP_USEROBJUSED || } else if (prop == ZFS_PROP_USEROBJUSED ||
prop == ZFS_PROP_GROUPOBJUSED || prop == ZFS_PROP_PROJECTOBJUSED) { prop == ZFS_PROP_GROUPOBJUSED || prop == ZFS_PROP_PROJECTOBJUSED) {
propname = "objused"; propname = "objused";
if (!nvlist_exists(props, "objquota")) { if (!nvlist_exists(props, "objquota"))
(void) nvlist_add_uint64(props, "objquota", (void) nvlist_add_uint64(props, "objquota", 0);
default_quota);
}
} else if (prop == ZFS_PROP_USEROBJQUOTA || } else if (prop == ZFS_PROP_USEROBJQUOTA ||
prop == ZFS_PROP_GROUPOBJQUOTA || prop == ZFS_PROP_GROUPOBJQUOTA ||
prop == ZFS_PROP_PROJECTOBJQUOTA) { prop == ZFS_PROP_PROJECTOBJQUOTA) {
@ -5317,7 +5302,6 @@ zfs_do_receive(int argc, char **argv)
#define ZFS_DELEG_PERM_MOUNT "mount" #define ZFS_DELEG_PERM_MOUNT "mount"
#define ZFS_DELEG_PERM_SHARE "share" #define ZFS_DELEG_PERM_SHARE "share"
#define ZFS_DELEG_PERM_SEND "send" #define ZFS_DELEG_PERM_SEND "send"
#define ZFS_DELEG_PERM_SEND_RAW "send:raw"
#define ZFS_DELEG_PERM_RECEIVE "receive" #define ZFS_DELEG_PERM_RECEIVE "receive"
#define ZFS_DELEG_PERM_RECEIVE_APPEND "receive:append" #define ZFS_DELEG_PERM_RECEIVE_APPEND "receive:append"
#define ZFS_DELEG_PERM_ALLOW "allow" #define ZFS_DELEG_PERM_ALLOW "allow"
@ -5360,7 +5344,6 @@ static zfs_deleg_perm_tab_t zfs_deleg_perm_tbl[] = {
{ ZFS_DELEG_PERM_RENAME, ZFS_DELEG_NOTE_RENAME }, { ZFS_DELEG_PERM_RENAME, ZFS_DELEG_NOTE_RENAME },
{ ZFS_DELEG_PERM_ROLLBACK, ZFS_DELEG_NOTE_ROLLBACK }, { ZFS_DELEG_PERM_ROLLBACK, ZFS_DELEG_NOTE_ROLLBACK },
{ ZFS_DELEG_PERM_SEND, ZFS_DELEG_NOTE_SEND }, { ZFS_DELEG_PERM_SEND, ZFS_DELEG_NOTE_SEND },
{ ZFS_DELEG_PERM_SEND_RAW, ZFS_DELEG_NOTE_SEND_RAW },
{ ZFS_DELEG_PERM_SHARE, ZFS_DELEG_NOTE_SHARE }, { ZFS_DELEG_PERM_SHARE, ZFS_DELEG_NOTE_SHARE },
{ ZFS_DELEG_PERM_SNAPSHOT, ZFS_DELEG_NOTE_SNAPSHOT }, { ZFS_DELEG_PERM_SNAPSHOT, ZFS_DELEG_NOTE_SNAPSHOT },
{ ZFS_DELEG_PERM_BOOKMARK, ZFS_DELEG_NOTE_BOOKMARK }, { ZFS_DELEG_PERM_BOOKMARK, ZFS_DELEG_NOTE_BOOKMARK },
@ -5893,7 +5876,7 @@ parse_fs_perm_set(fs_perm_set_t *fspset, nvlist_t *nvl)
static inline const char * static inline const char *
deleg_perm_comment(zfs_deleg_note_t note) deleg_perm_comment(zfs_deleg_note_t note)
{ {
const char *str; const char *str = "";
/* subcommands */ /* subcommands */
switch (note) { switch (note) {
@ -5945,10 +5928,6 @@ deleg_perm_comment(zfs_deleg_note_t note)
case ZFS_DELEG_NOTE_SEND: case ZFS_DELEG_NOTE_SEND:
str = gettext(""); str = gettext("");
break; break;
case ZFS_DELEG_NOTE_SEND_RAW:
str = gettext("Allow sending ONLY encrypted (raw) replication"
"\n\t\t\t\tstreams");
break;
case ZFS_DELEG_NOTE_SHARE: case ZFS_DELEG_NOTE_SHARE:
str = gettext("Allows sharing file systems over NFS or SMB" str = gettext("Allows sharing file systems over NFS or SMB"
"\n\t\t\t\tprotocols"); "\n\t\t\t\tprotocols");
@ -6878,17 +6857,17 @@ print_holds(boolean_t scripted, int nwidth, int tagwidth, nvlist_t *nvl,
if (scripted) { if (scripted) {
if (parsable) { if (parsable) {
(void) printf("%s\t%s\t%lld\n", zname, (void) printf("%s\t%s\t%ld\n", zname,
tagname, (long long)time); tagname, time);
} else { } else {
(void) printf("%s\t%s\t%s\n", zname, (void) printf("%s\t%s\t%s\n", zname,
tagname, tsbuf); tagname, tsbuf);
} }
} else { } else {
if (parsable) { if (parsable) {
(void) printf("%-*s %-*s %lld\n", (void) printf("%-*s %-*s %ld\n",
nwidth, zname, tagwidth, nwidth, zname, tagwidth,
tagname, (long long)time); tagname, time);
} else { } else {
(void) printf("%-*s %-*s %s\n", (void) printf("%-*s %-*s %s\n",
nwidth, zname, tagwidth, nwidth, zname, tagwidth,
@ -9197,11 +9176,8 @@ zfs_do_rewrite(int argc, char **argv)
zfs_rewrite_args_t args; zfs_rewrite_args_t args;
memset(&args, 0, sizeof (args)); memset(&args, 0, sizeof (args));
while ((c = getopt(argc, argv, "Pl:o:rvx")) != -1) { while ((c = getopt(argc, argv, "l:o:rvx")) != -1) {
switch (c) { switch (c) {
case 'P':
args.flags |= ZFS_REWRITE_PHYSICAL;
break;
case 'l': case 'l':
args.len = strtoll(optarg, NULL, 0); args.len = strtoll(optarg, NULL, 0);
break; break;

View File

@ -54,7 +54,6 @@
#include <sys/dmu_tx.h> #include <sys/dmu_tx.h>
#include <zfeature_common.h> #include <zfeature_common.h>
#include <libzutil.h> #include <libzutil.h>
#include <sys/metaslab_impl.h>
static importargs_t g_importargs; static importargs_t g_importargs;
static char *g_pool; static char *g_pool;
@ -70,8 +69,7 @@ static __attribute__((noreturn)) void
usage(void) usage(void)
{ {
(void) fprintf(stderr, (void) fprintf(stderr,
"Usage: zhack [-o tunable] [-c cachefile] [-d dir] <subcommand> " "Usage: zhack [-c cachefile] [-d dir] <subcommand> <args> ...\n"
"<args> ...\n"
"where <subcommand> <args> is one of the following:\n" "where <subcommand> <args> is one of the following:\n"
"\n"); "\n");
@ -95,10 +93,7 @@ usage(void)
" -c repair corrupted label checksums\n" " -c repair corrupted label checksums\n"
" -u restore the label on a detached device\n" " -u restore the label on a detached device\n"
"\n" "\n"
" <device> : path to vdev\n" " <device> : path to vdev\n");
"\n"
" metaslab leak <pool>\n"
" apply allocation map from zdb to specified pool\n");
exit(1); exit(1);
} }
@ -167,9 +162,9 @@ zhack_import(char *target, boolean_t readonly)
props = NULL; props = NULL;
if (readonly) { if (readonly) {
VERIFY0(nvlist_alloc(&props, NV_UNIQUE_NAME, 0)); VERIFY(nvlist_alloc(&props, NV_UNIQUE_NAME, 0) == 0);
VERIFY0(nvlist_add_uint64(props, VERIFY(nvlist_add_uint64(props,
zpool_prop_to_name(ZPOOL_PROP_READONLY), 1)); zpool_prop_to_name(ZPOOL_PROP_READONLY), 1) == 0);
} }
zfeature_checks_disable = B_TRUE; zfeature_checks_disable = B_TRUE;
@ -223,8 +218,8 @@ dump_obj(objset_t *os, uint64_t obj, const char *name)
} else { } else {
ASSERT(za->za_integer_length == 1); ASSERT(za->za_integer_length == 1);
char val[1024]; char val[1024];
VERIFY0(zap_lookup(os, obj, za->za_name, VERIFY(zap_lookup(os, obj, za->za_name,
1, sizeof (val), val)); 1, sizeof (val), val) == 0);
(void) printf("\t%s = %s\n", za->za_name, val); (void) printf("\t%s = %s\n", za->za_name, val);
} }
} }
@ -368,12 +363,10 @@ feature_incr_sync(void *arg, dmu_tx_t *tx)
zfeature_info_t *feature = arg; zfeature_info_t *feature = arg;
uint64_t refcount; uint64_t refcount;
mutex_enter(&spa->spa_feat_stats_lock);
VERIFY0(feature_get_refcount_from_disk(spa, feature, &refcount)); VERIFY0(feature_get_refcount_from_disk(spa, feature, &refcount));
feature_sync(spa, feature, refcount + 1, tx); feature_sync(spa, feature, refcount + 1, tx);
spa_history_log_internal(spa, "zhack feature incr", tx, spa_history_log_internal(spa, "zhack feature incr", tx,
"name=%s", feature->fi_guid); "name=%s", feature->fi_guid);
mutex_exit(&spa->spa_feat_stats_lock);
} }
static void static void
@ -383,12 +376,10 @@ feature_decr_sync(void *arg, dmu_tx_t *tx)
zfeature_info_t *feature = arg; zfeature_info_t *feature = arg;
uint64_t refcount; uint64_t refcount;
mutex_enter(&spa->spa_feat_stats_lock);
VERIFY0(feature_get_refcount_from_disk(spa, feature, &refcount)); VERIFY0(feature_get_refcount_from_disk(spa, feature, &refcount));
feature_sync(spa, feature, refcount - 1, tx); feature_sync(spa, feature, refcount - 1, tx);
spa_history_log_internal(spa, "zhack feature decr", tx, spa_history_log_internal(spa, "zhack feature decr", tx,
"name=%s", feature->fi_guid); "name=%s", feature->fi_guid);
mutex_exit(&spa->spa_feat_stats_lock);
} }
static void static void
@ -505,186 +496,6 @@ zhack_do_feature(int argc, char **argv)
return (0); return (0);
} }
static boolean_t
strstarts(const char *a, const char *b)
{
return (strncmp(a, b, strlen(b)) == 0);
}
static void
metaslab_force_alloc(metaslab_t *msp, uint64_t start, uint64_t size,
dmu_tx_t *tx)
{
ASSERT(msp->ms_disabled);
ASSERT(MUTEX_HELD(&msp->ms_lock));
uint64_t txg = dmu_tx_get_txg(tx);
uint64_t off = start;
while (off < start + size) {
uint64_t ostart, osize;
boolean_t found = zfs_range_tree_find_in(msp->ms_allocatable,
off, start + size - off, &ostart, &osize);
if (!found)
break;
zfs_range_tree_remove(msp->ms_allocatable, ostart, osize);
if (zfs_range_tree_is_empty(msp->ms_allocating[txg & TXG_MASK]))
vdev_dirty(msp->ms_group->mg_vd, VDD_METASLAB, msp,
txg);
zfs_range_tree_add(msp->ms_allocating[txg & TXG_MASK], ostart,
osize);
msp->ms_allocating_total += osize;
off = ostart + osize;
}
}
static void
zhack_do_metaslab_leak(int argc, char **argv)
{
int c;
char *target;
spa_t *spa;
optind = 1;
boolean_t force = B_FALSE;
while ((c = getopt(argc, argv, "f")) != -1) {
switch (c) {
case 'f':
force = B_TRUE;
break;
default:
usage();
break;
}
}
argc -= optind;
argv += optind;
if (argc < 1) {
(void) fprintf(stderr, "error: missing pool name\n");
usage();
}
target = argv[0];
zhack_spa_open(target, B_FALSE, FTAG, &spa);
spa_config_enter(spa, SCL_VDEV | SCL_ALLOC, FTAG, RW_READER);
char *line = NULL;
size_t cap = 0;
vdev_t *vd = NULL;
metaslab_t *prev = NULL;
dmu_tx_t *tx = NULL;
while (getline(&line, &cap, stdin) > 0) {
if (strstarts(line, "\tvdev ")) {
uint64_t vdev_id, ms_shift;
if (sscanf(line,
"\tvdev %10"PRIu64"\t%*s metaslab shift %4"PRIu64,
&vdev_id, &ms_shift) == 1) {
VERIFY3U(sscanf(line, "\tvdev %"PRIu64
"\t metaslab shift %4"PRIu64,
&vdev_id, &ms_shift), ==, 2);
}
vd = vdev_lookup_top(spa, vdev_id);
if (vd == NULL) {
fprintf(stderr, "error: no such vdev with "
"id %"PRIu64"\n", vdev_id);
break;
}
if (tx) {
dmu_tx_commit(tx);
mutex_exit(&prev->ms_lock);
metaslab_enable(prev, B_FALSE, B_FALSE);
tx = NULL;
prev = NULL;
}
if (vd->vdev_ms_shift != ms_shift) {
fprintf(stderr, "error: ms_shift mismatch: %"
PRIu64" != %"PRIu64"\n", vd->vdev_ms_shift,
ms_shift);
break;
}
} else if (strstarts(line, "\tmetaslabs ")) {
uint64_t ms_count;
VERIFY3U(sscanf(line, "\tmetaslabs %"PRIu64, &ms_count),
==, 1);
ASSERT(vd);
if (!force && vd->vdev_ms_count != ms_count) {
fprintf(stderr, "error: ms_count mismatch: %"
PRIu64" != %"PRIu64"\n", vd->vdev_ms_count,
ms_count);
break;
}
} else if (strstarts(line, "ALLOC:")) {
uint64_t start, size;
VERIFY3U(sscanf(line, "ALLOC: %"PRIu64" %"PRIu64"\n",
&start, &size), ==, 2);
ASSERT(vd);
metaslab_t *cur =
vd->vdev_ms[start >> vd->vdev_ms_shift];
if (prev != cur) {
if (prev) {
dmu_tx_commit(tx);
mutex_exit(&prev->ms_lock);
metaslab_enable(prev, B_FALSE, B_FALSE);
}
ASSERT(cur);
metaslab_disable(cur);
mutex_enter(&cur->ms_lock);
metaslab_load(cur);
prev = cur;
tx = dmu_tx_create_dd(
spa_get_dsl(vd->vdev_spa)->dp_root_dir);
dmu_tx_assign(tx, DMU_TX_WAIT);
}
metaslab_force_alloc(cur, start, size, tx);
} else {
continue;
}
}
if (tx) {
dmu_tx_commit(tx);
mutex_exit(&prev->ms_lock);
metaslab_enable(prev, B_FALSE, B_FALSE);
tx = NULL;
prev = NULL;
}
if (line)
free(line);
spa_config_exit(spa, SCL_VDEV | SCL_ALLOC, FTAG);
spa_close(spa, FTAG);
}
static int
zhack_do_metaslab(int argc, char **argv)
{
char *subcommand;
argc--;
argv++;
if (argc == 0) {
(void) fprintf(stderr,
"error: no metaslab operation specified\n");
usage();
}
subcommand = argv[0];
if (strcmp(subcommand, "leak") == 0) {
zhack_do_metaslab_leak(argc, argv);
} else {
(void) fprintf(stderr, "error: unknown subcommand: %s\n",
subcommand);
usage();
}
return (0);
}
#define ASHIFT_UBERBLOCK_SHIFT(ashift) \ #define ASHIFT_UBERBLOCK_SHIFT(ashift) \
MIN(MAX(ashift, UBERBLOCK_SHIFT), \ MIN(MAX(ashift, UBERBLOCK_SHIFT), \
MAX_UBERBLOCK_SHIFT) MAX_UBERBLOCK_SHIFT)
@ -714,23 +525,6 @@ zhack_repair_read_label(const int fd, vdev_label_t *vl,
return (0); return (0);
} }
static int
zhack_repair_get_byteswap(const zio_eck_t *vdev_eck, const int l, int *byteswap)
{
if (vdev_eck->zec_magic == ZEC_MAGIC) {
*byteswap = B_FALSE;
} else if (vdev_eck->zec_magic == BSWAP_64((uint64_t)ZEC_MAGIC)) {
*byteswap = B_TRUE;
} else {
(void) fprintf(stderr, "error: label %d: "
"Expected the nvlist checksum magic number but instead got "
"0x%" PRIx64 "\n",
l, vdev_eck->zec_magic);
return (1);
}
return (0);
}
static void static void
zhack_repair_calc_cksum(const int byteswap, void *data, const uint64_t offset, zhack_repair_calc_cksum(const int byteswap, void *data, const uint64_t offset,
const uint64_t abdsize, zio_eck_t *eck, zio_cksum_t *cksum) const uint64_t abdsize, zio_eck_t *eck, zio_cksum_t *cksum)
@ -757,10 +551,33 @@ zhack_repair_calc_cksum(const int byteswap, void *data, const uint64_t offset,
} }
static int static int
zhack_repair_get_ashift(nvlist_t *cfg, const int l, uint64_t *ashift) zhack_repair_check_label(uberblock_t *ub, const int l, const char **cfg_keys,
const size_t cfg_keys_len, nvlist_t *cfg, nvlist_t *vdev_tree_cfg,
uint64_t *ashift)
{ {
int err; int err;
nvlist_t *vdev_tree_cfg;
if (ub->ub_txg != 0) {
(void) fprintf(stderr,
"error: label %d: UB TXG of 0 expected, but got %"
PRIu64 "\n",
l, ub->ub_txg);
(void) fprintf(stderr, "It would appear the device was not "
"properly removed.\n");
return (1);
}
for (int i = 0; i < cfg_keys_len; i++) {
uint64_t val;
err = nvlist_lookup_uint64(cfg, cfg_keys[i], &val);
if (err) {
(void) fprintf(stderr,
"error: label %d, %d: "
"cannot find nvlist key %s\n",
l, i, cfg_keys[i]);
return (err);
}
}
err = nvlist_lookup_nvlist(cfg, err = nvlist_lookup_nvlist(cfg,
ZPOOL_CONFIG_VDEV_TREE, &vdev_tree_cfg); ZPOOL_CONFIG_VDEV_TREE, &vdev_tree_cfg);
@ -784,7 +601,7 @@ zhack_repair_get_ashift(nvlist_t *cfg, const int l, uint64_t *ashift)
(void) fprintf(stderr, (void) fprintf(stderr,
"error: label %d: nvlist key %s is zero\n", "error: label %d: nvlist key %s is zero\n",
l, ZPOOL_CONFIG_ASHIFT); l, ZPOOL_CONFIG_ASHIFT);
return (1); return (err);
} }
return (0); return (0);
@ -799,35 +616,30 @@ zhack_repair_undetach(uberblock_t *ub, nvlist_t *cfg, const int l)
*/ */
if (BP_GET_LOGICAL_BIRTH(&ub->ub_rootbp) != 0) { if (BP_GET_LOGICAL_BIRTH(&ub->ub_rootbp) != 0) {
const uint64_t txg = BP_GET_LOGICAL_BIRTH(&ub->ub_rootbp); const uint64_t txg = BP_GET_LOGICAL_BIRTH(&ub->ub_rootbp);
int err;
ub->ub_txg = txg; ub->ub_txg = txg;
err = nvlist_remove_all(cfg, ZPOOL_CONFIG_CREATE_TXG); if (nvlist_remove_all(cfg, ZPOOL_CONFIG_CREATE_TXG) != 0) {
if (err) {
(void) fprintf(stderr, (void) fprintf(stderr,
"error: label %d: " "error: label %d: "
"Failed to remove pool creation TXG\n", "Failed to remove pool creation TXG\n",
l); l);
return (err); return (1);
} }
err = nvlist_remove_all(cfg, ZPOOL_CONFIG_POOL_TXG); if (nvlist_remove_all(cfg, ZPOOL_CONFIG_POOL_TXG) != 0) {
if (err) {
(void) fprintf(stderr, (void) fprintf(stderr,
"error: label %d: Failed to remove pool TXG to " "error: label %d: Failed to remove pool TXG to "
"be replaced.\n", "be replaced.\n",
l); l);
return (err); return (1);
} }
err = nvlist_add_uint64(cfg, ZPOOL_CONFIG_POOL_TXG, txg); if (nvlist_add_uint64(cfg, ZPOOL_CONFIG_POOL_TXG, txg) != 0) {
if (err) {
(void) fprintf(stderr, (void) fprintf(stderr,
"error: label %d: " "error: label %d: "
"Failed to add pool TXG of %" PRIu64 "\n", "Failed to add pool TXG of %" PRIu64 "\n",
l, txg); l, txg);
return (err); return (1);
} }
} }
@ -921,7 +733,6 @@ zhack_repair_test_cksum(const int byteswap, void *vdev_data,
BSWAP_64(ZEC_MAGIC) : ZEC_MAGIC; BSWAP_64(ZEC_MAGIC) : ZEC_MAGIC;
const uint64_t actual_magic = vdev_eck->zec_magic; const uint64_t actual_magic = vdev_eck->zec_magic;
int err = 0; int err = 0;
if (actual_magic != expected_magic) { if (actual_magic != expected_magic) {
(void) fprintf(stderr, "error: label %d: " (void) fprintf(stderr, "error: label %d: "
"Expected " "Expected "
@ -943,36 +754,6 @@ zhack_repair_test_cksum(const int byteswap, void *vdev_data,
return (err); return (err);
} }
static int
zhack_repair_unpack_cfg(vdev_label_t *vl, const int l, nvlist_t **cfg)
{
const char *cfg_keys[] = { ZPOOL_CONFIG_VERSION,
ZPOOL_CONFIG_POOL_STATE, ZPOOL_CONFIG_GUID };
int err;
err = nvlist_unpack(vl->vl_vdev_phys.vp_nvlist,
VDEV_PHYS_SIZE - sizeof (zio_eck_t), cfg, 0);
if (err) {
(void) fprintf(stderr,
"error: cannot unpack nvlist label %d\n", l);
return (err);
}
for (int i = 0; i < ARRAY_SIZE(cfg_keys); i++) {
uint64_t val;
err = nvlist_lookup_uint64(*cfg, cfg_keys[i], &val);
if (err) {
(void) fprintf(stderr,
"error: label %d, %d: "
"cannot find nvlist key %s\n",
l, i, cfg_keys[i]);
return (err);
}
}
return (0);
}
static void static void
zhack_repair_one_label(const zhack_repair_op_t op, const int fd, zhack_repair_one_label(const zhack_repair_op_t op, const int fd,
vdev_label_t *vl, const uint64_t label_offset, const int l, vdev_label_t *vl, const uint64_t label_offset, const int l,
@ -986,7 +767,10 @@ zhack_repair_one_label(const zhack_repair_op_t op, const int fd,
(zio_eck_t *)((char *)(vdev_data) + VDEV_PHYS_SIZE) - 1; (zio_eck_t *)((char *)(vdev_data) + VDEV_PHYS_SIZE) - 1;
const uint64_t vdev_phys_offset = const uint64_t vdev_phys_offset =
label_offset + offsetof(vdev_label_t, vl_vdev_phys); label_offset + offsetof(vdev_label_t, vl_vdev_phys);
const char *cfg_keys[] = { ZPOOL_CONFIG_VERSION,
ZPOOL_CONFIG_POOL_STATE, ZPOOL_CONFIG_GUID };
nvlist_t *cfg; nvlist_t *cfg;
nvlist_t *vdev_tree_cfg = NULL;
uint64_t ashift; uint64_t ashift;
int byteswap; int byteswap;
@ -994,9 +778,18 @@ zhack_repair_one_label(const zhack_repair_op_t op, const int fd,
if (err) if (err)
return; return;
err = zhack_repair_get_byteswap(vdev_eck, l, &byteswap); if (vdev_eck->zec_magic == 0) {
if (err) (void) fprintf(stderr, "error: label %d: "
"Expected the nvlist checksum magic number to not be zero"
"\n",
l);
(void) fprintf(stderr, "There should already be a checksum "
"for the label.\n");
return; return;
}
byteswap =
(vdev_eck->zec_magic == BSWAP_64((uint64_t)ZEC_MAGIC));
if (byteswap) { if (byteswap) {
byteswap_uint64_array(&vdev_eck->zec_cksum, byteswap_uint64_array(&vdev_eck->zec_cksum,
@ -1012,7 +805,16 @@ zhack_repair_one_label(const zhack_repair_op_t op, const int fd,
return; return;
} }
err = zhack_repair_unpack_cfg(vl, l, &cfg); err = nvlist_unpack(vl->vl_vdev_phys.vp_nvlist,
VDEV_PHYS_SIZE - sizeof (zio_eck_t), &cfg, 0);
if (err) {
(void) fprintf(stderr,
"error: cannot unpack nvlist label %d\n", l);
return;
}
err = zhack_repair_check_label(ub,
l, cfg_keys, ARRAY_SIZE(cfg_keys), cfg, vdev_tree_cfg, &ashift);
if (err) if (err)
return; return;
@ -1020,19 +822,6 @@ zhack_repair_one_label(const zhack_repair_op_t op, const int fd,
char *buf; char *buf;
size_t buflen; size_t buflen;
if (ub->ub_txg != 0) {
(void) fprintf(stderr,
"error: label %d: UB TXG of 0 expected, but got %"
PRIu64 "\n", l, ub->ub_txg);
(void) fprintf(stderr, "It would appear the device was "
"not properly detached.\n");
return;
}
err = zhack_repair_get_ashift(cfg, l, &ashift);
if (err)
return;
err = zhack_repair_undetach(ub, cfg, l); err = zhack_repair_undetach(ub, cfg, l);
if (err) if (err)
return; return;
@ -1192,7 +981,7 @@ main(int argc, char **argv)
dprintf_setup(&argc, argv); dprintf_setup(&argc, argv);
zfs_prop_init(); zfs_prop_init();
while ((c = getopt(argc, argv, "+c:d:o:")) != -1) { while ((c = getopt(argc, argv, "+c:d:")) != -1) {
switch (c) { switch (c) {
case 'c': case 'c':
g_importargs.cachefile = optarg; g_importargs.cachefile = optarg;
@ -1201,10 +990,6 @@ main(int argc, char **argv)
assert(g_importargs.paths < MAX_NUM_PATHS); assert(g_importargs.paths < MAX_NUM_PATHS);
g_importargs.path[g_importargs.paths++] = optarg; g_importargs.path[g_importargs.paths++] = optarg;
break; break;
case 'o':
if (handle_tunable_option(optarg, B_FALSE) != 0)
exit(1);
break;
default: default:
usage(); usage();
break; break;
@ -1226,8 +1011,6 @@ main(int argc, char **argv)
rv = zhack_do_feature(argc, argv); rv = zhack_do_feature(argc, argv);
} else if (strcmp(subcommand, "label") == 0) { } else if (strcmp(subcommand, "label") == 0) {
return (zhack_do_label(argc, argv)); return (zhack_do_label(argc, argv));
} else if (strcmp(subcommand, "metaslab") == 0) {
rv = zhack_do_metaslab(argc, argv);
} else { } else {
(void) fprintf(stderr, "error: unknown subcommand: %s\n", (void) fprintf(stderr, "error: unknown subcommand: %s\n",
subcommand); subcommand);

View File

@ -47,7 +47,6 @@ cols = {
"cec": [5, 1000, "zil_commit_error_count"], "cec": [5, 1000, "zil_commit_error_count"],
"csc": [5, 1000, "zil_commit_stall_count"], "csc": [5, 1000, "zil_commit_stall_count"],
"cSc": [5, 1000, "zil_commit_suspend_count"], "cSc": [5, 1000, "zil_commit_suspend_count"],
"cCc": [5, 1000, "zil_commit_crash_count"],
"ic": [5, 1000, "zil_itx_count"], "ic": [5, 1000, "zil_itx_count"],
"iic": [5, 1000, "zil_itx_indirect_count"], "iic": [5, 1000, "zil_itx_indirect_count"],
"iib": [5, 1024, "zil_itx_indirect_bytes"], "iib": [5, 1024, "zil_itx_indirect_bytes"],

View File

@ -435,30 +435,26 @@ print_data_handler(int id, const char *pool, zinject_record_t *record,
if (*count == 0) { if (*count == 0) {
(void) printf("%3s %-15s %-6s %-6s %-8s %3s %-4s " (void) printf("%3s %-15s %-6s %-6s %-8s %3s %-4s "
"%-15s %-6s %-15s\n", "ID", "POOL", "OBJSET", "OBJECT", "%-15s\n", "ID", "POOL", "OBJSET", "OBJECT", "TYPE",
"TYPE", "LVL", "DVAs", "RANGE", "MATCH", "INJECT"); "LVL", "DVAs", "RANGE");
(void) printf("--- --------------- ------ " (void) printf("--- --------------- ------ "
"------ -------- --- ---- --------------- " "------ -------- --- ---- ---------------\n");
"------ ------\n");
} }
*count += 1; *count += 1;
char rangebuf[32]; (void) printf("%3d %-15s %-6llu %-6llu %-8s %-3d 0x%02x ",
if (record->zi_start == 0 && record->zi_end == -1ULL) id, pool, (u_longlong_t)record->zi_objset,
snprintf(rangebuf, sizeof (rangebuf), "all");
else
snprintf(rangebuf, sizeof (rangebuf), "[%llu, %llu]",
(u_longlong_t)record->zi_start,
(u_longlong_t)record->zi_end);
(void) printf("%3d %-15s %-6llu %-6llu %-8s %-3d 0x%02x %-15s "
"%6" PRIu64 " %6" PRIu64 "\n", id, pool,
(u_longlong_t)record->zi_objset,
(u_longlong_t)record->zi_object, type_to_name(record->zi_type), (u_longlong_t)record->zi_object, type_to_name(record->zi_type),
record->zi_level, record->zi_dvas, rangebuf, record->zi_level, record->zi_dvas);
record->zi_match_count, record->zi_inject_count);
if (record->zi_start == 0 &&
record->zi_end == -1ULL)
(void) printf("all\n");
else
(void) printf("[%llu, %llu]\n", (u_longlong_t)record->zi_start,
(u_longlong_t)record->zi_end);
return (0); return (0);
} }
@ -476,14 +472,11 @@ print_device_handler(int id, const char *pool, zinject_record_t *record,
return (0); return (0);
if (*count == 0) { if (*count == 0) {
(void) printf("%3s %-15s %-16s %-5s %-10s %-9s " (void) printf("%3s %-15s %-16s %-5s %-10s %-9s\n",
"%-6s %-6s\n", "ID", "POOL", "GUID", "TYPE", "ERROR", "FREQ");
"ID", "POOL", "GUID", "TYPE", "ERROR", "FREQ",
"MATCH", "INJECT");
(void) printf( (void) printf(
"--- --------------- ---------------- " "--- --------------- ---------------- "
"----- ---------- --------- " "----- ---------- ---------\n");
"------ ------\n");
} }
*count += 1; *count += 1;
@ -491,11 +484,9 @@ print_device_handler(int id, const char *pool, zinject_record_t *record,
double freq = record->zi_freq == 0 ? 100.0f : double freq = record->zi_freq == 0 ? 100.0f :
(((double)record->zi_freq) / ZI_PERCENTAGE_MAX) * 100.0f; (((double)record->zi_freq) / ZI_PERCENTAGE_MAX) * 100.0f;
(void) printf("%3d %-15s %llx %-5s %-10s %8.4f%% " (void) printf("%3d %-15s %llx %-5s %-10s %8.4f%%\n", id, pool,
"%6" PRIu64 " %6" PRIu64 "\n", id, pool, (u_longlong_t)record->zi_guid, iotype_to_str(record->zi_iotype),
(u_longlong_t)record->zi_guid, err_to_str(record->zi_error), freq);
iotype_to_str(record->zi_iotype), err_to_str(record->zi_error),
freq, record->zi_match_count, record->zi_inject_count);
return (0); return (0);
} }
@ -513,26 +504,18 @@ print_delay_handler(int id, const char *pool, zinject_record_t *record,
return (0); return (0);
if (*count == 0) { if (*count == 0) {
(void) printf("%3s %-15s %-16s %-10s %-5s %-9s " (void) printf("%3s %-15s %-15s %-15s %s\n",
"%-6s %-6s\n", "ID", "POOL", "DELAY (ms)", "LANES", "GUID");
"ID", "POOL", "GUID", "DELAY (ms)", "LANES", "FREQ", (void) printf("--- --------------- --------------- "
"MATCH", "INJECT"); "--------------- ----------------\n");
(void) printf("--- --------------- ---------------- "
"---------- ----- --------- "
"------ ------\n");
} }
*count += 1; *count += 1;
double freq = record->zi_freq == 0 ? 100.0f : (void) printf("%3d %-15s %-15llu %-15llu %llx\n", id, pool,
(((double)record->zi_freq) / ZI_PERCENTAGE_MAX) * 100.0f;
(void) printf("%3d %-15s %llx %10llu %5llu %8.4f%% "
"%6" PRIu64 " %6" PRIu64 "\n", id, pool,
(u_longlong_t)record->zi_guid,
(u_longlong_t)NSEC2MSEC(record->zi_timer), (u_longlong_t)NSEC2MSEC(record->zi_timer),
(u_longlong_t)record->zi_nlanes, (u_longlong_t)record->zi_nlanes,
freq, record->zi_match_count, record->zi_inject_count); (u_longlong_t)record->zi_guid);
return (0); return (0);
} }

View File

@ -148,7 +148,6 @@ dist_zpoolcompat_DATA = \
%D%/compatibility.d/openzfs-2.1-linux \ %D%/compatibility.d/openzfs-2.1-linux \
%D%/compatibility.d/openzfs-2.2 \ %D%/compatibility.d/openzfs-2.2 \
%D%/compatibility.d/openzfs-2.3 \ %D%/compatibility.d/openzfs-2.3 \
%D%/compatibility.d/openzfs-2.4 \
%D%/compatibility.d/openzfsonosx-1.7.0 \ %D%/compatibility.d/openzfsonosx-1.7.0 \
%D%/compatibility.d/openzfsonosx-1.8.1 \ %D%/compatibility.d/openzfsonosx-1.8.1 \
%D%/compatibility.d/openzfsonosx-1.9.3 \ %D%/compatibility.d/openzfsonosx-1.9.3 \
@ -188,9 +187,7 @@ zpoolcompatlinks = \
"openzfs-2.2 openzfs-2.2-linux" \ "openzfs-2.2 openzfs-2.2-linux" \
"openzfs-2.2 openzfs-2.2-freebsd" \ "openzfs-2.2 openzfs-2.2-freebsd" \
"openzfs-2.3 openzfs-2.3-linux" \ "openzfs-2.3 openzfs-2.3-linux" \
"openzfs-2.3 openzfs-2.3-freebsd" \ "openzfs-2.3 openzfs-2.3-freebsd"
"openzfs-2.4 openzfs-2.4-linux" \
"openzfs-2.4 openzfs-2.4-freebsd"
zpoolconfdir = $(sysconfdir)/zfs/zpool.d zpoolconfdir = $(sysconfdir)/zfs/zpool.d
INSTALL_DATA_HOOKS += zpool-install-data-hook INSTALL_DATA_HOOKS += zpool-install-data-hook

View File

@ -1,48 +0,0 @@
# Features supported by OpenZFS 2.4 on Linux and FreeBSD
allocation_classes
async_destroy
blake3
block_cloning
block_cloning_endian
bookmark_v2
bookmark_written
bookmarks
device_rebuild
device_removal
draid
dynamic_gang_header
edonr
embedded_data
empty_bpobj
enabled_txg
encryption
extensible_dataset
fast_dedup
filesystem_limits
head_errlog
hole_birth
large_blocks
large_dnode
large_microzap
livelist
log_spacemap
longname
lz4_compress
multi_vdev_crash_dump
obsolete_counts
physical_rewrite
project_quota
raidz_expansion
redacted_datasets
redaction_bookmarks
redaction_list_spill
resilver_defer
sha512
skein
spacemap_histogram
spacemap_v2
userobj_accounting
vdev_zaps_v2
zilsaxattr
zpool_checkpoint
zstd_compress

View File

@ -26,7 +26,6 @@
/* /*
* Copyright 2016 Igor Kozhukhov <ikozhukhov@gmail.com>. * Copyright 2016 Igor Kozhukhov <ikozhukhov@gmail.com>.
* Copyright (c) 2025, Klara, Inc.
*/ */
#include <libintl.h> #include <libintl.h>
@ -53,7 +52,7 @@
typedef struct zpool_node { typedef struct zpool_node {
zpool_handle_t *zn_handle; zpool_handle_t *zn_handle;
uu_avl_node_t zn_avlnode; uu_avl_node_t zn_avlnode;
hrtime_t zn_last_refresh; int zn_mark;
} zpool_node_t; } zpool_node_t;
struct zpool_list { struct zpool_list {
@ -63,7 +62,6 @@ struct zpool_list {
uu_avl_pool_t *zl_pool; uu_avl_pool_t *zl_pool;
zprop_list_t **zl_proplist; zprop_list_t **zl_proplist;
zfs_type_t zl_type; zfs_type_t zl_type;
hrtime_t zl_last_refresh;
}; };
static int static int
@ -83,47 +81,32 @@ zpool_compare(const void *larg, const void *rarg, void *unused)
* of known pools. * of known pools.
*/ */
static int static int
add_pool(zpool_handle_t *zhp, zpool_list_t *zlp) add_pool(zpool_handle_t *zhp, void *data)
{ {
zpool_node_t *node, *new = safe_malloc(sizeof (zpool_node_t)); zpool_list_t *zlp = data;
zpool_node_t *node = safe_malloc(sizeof (zpool_node_t));
uu_avl_index_t idx; uu_avl_index_t idx;
new->zn_handle = zhp; node->zn_handle = zhp;
uu_avl_node_init(new, &new->zn_avlnode, zlp->zl_pool); uu_avl_node_init(node, &node->zn_avlnode, zlp->zl_pool);
if (uu_avl_find(zlp->zl_avl, node, NULL, &idx) == NULL) {
node = uu_avl_find(zlp->zl_avl, new, NULL, &idx);
if (node == NULL) {
if (zlp->zl_proplist && if (zlp->zl_proplist &&
zpool_expand_proplist(zhp, zlp->zl_proplist, zpool_expand_proplist(zhp, zlp->zl_proplist,
zlp->zl_type, zlp->zl_literal) != 0) { zlp->zl_type, zlp->zl_literal) != 0) {
zpool_close(zhp); zpool_close(zhp);
free(new); free(node);
return (-1); return (-1);
} }
new->zn_last_refresh = zlp->zl_last_refresh; uu_avl_insert(zlp->zl_avl, node, idx);
uu_avl_insert(zlp->zl_avl, new, idx);
} else { } else {
node->zn_last_refresh = zlp->zl_last_refresh;
zpool_close(zhp); zpool_close(zhp);
free(new); free(node);
return (-1); return (-1);
} }
return (0); return (0);
} }
/*
* add_pool(), but always returns 0. This allows zpool_iter() to continue
* even if a pool exists in the tree, or we fail to get the properties for
* a new one.
*/
static int
add_pool_cb(zpool_handle_t *zhp, void *data)
{
(void) add_pool(zhp, data);
return (0);
}
/* /*
* Create a list of pools based on the given arguments. If we're given no * Create a list of pools based on the given arguments. If we're given no
* arguments, then iterate over all pools in the system and add them to the AVL * arguments, then iterate over all pools in the system and add them to the AVL
@ -152,10 +135,9 @@ pool_list_get(int argc, char **argv, zprop_list_t **proplist, zfs_type_t type,
zlp->zl_type = type; zlp->zl_type = type;
zlp->zl_literal = literal; zlp->zl_literal = literal;
zlp->zl_last_refresh = gethrtime();
if (argc == 0) { if (argc == 0) {
(void) zpool_iter(g_zfs, add_pool_cb, zlp); (void) zpool_iter(g_zfs, add_pool, zlp);
zlp->zl_findall = B_TRUE; zlp->zl_findall = B_TRUE;
} else { } else {
int i; int i;
@ -177,69 +159,15 @@ pool_list_get(int argc, char **argv, zprop_list_t **proplist, zfs_type_t type,
} }
/* /*
* Refresh the state of all pools on the list. Additionally, if no options were * Search for any new pools, adding them to the list. We only add pools when no
* given on the command line, add any new pools and remove any that are no * options were given on the command line. Otherwise, we keep the list fixed as
* longer available. * those that were explicitly specified.
*/ */
int void
pool_list_refresh(zpool_list_t *zlp) pool_list_update(zpool_list_t *zlp)
{ {
zlp->zl_last_refresh = gethrtime(); if (zlp->zl_findall)
(void) zpool_iter(g_zfs, add_pool, zlp);
if (!zlp->zl_findall) {
/*
* This list is a fixed list of pools, so we must not add
* or remove any. Just walk over them and refresh their
* state.
*/
int navail = 0;
for (zpool_node_t *node = uu_avl_first(zlp->zl_avl);
node != NULL; node = uu_avl_next(zlp->zl_avl, node)) {
boolean_t missing;
zpool_refresh_stats(node->zn_handle, &missing);
navail += !missing;
node->zn_last_refresh = zlp->zl_last_refresh;
}
return (navail);
}
/*
* Search for any new pools and add them to the list. zpool_iter()
* will call zpool_refresh_stats() as part of its work, so this has
* the side effect of updating all active handles.
*/
(void) zpool_iter(g_zfs, add_pool_cb, zlp);
/*
* Walk the list for any that weren't refreshed, and update and remove
* them. It's not enough to just skip available ones, as zpool_iter()
* won't update them, so they'll still appear active in our list.
*/
zpool_node_t *node, *next;
for (node = uu_avl_first(zlp->zl_avl); node != NULL; node = next) {
next = uu_avl_next(zlp->zl_avl, node);
/*
* Skip any that were refreshed and are online; they're already
* handled.
*/
if (node->zn_last_refresh == zlp->zl_last_refresh &&
zpool_get_state(node->zn_handle) != POOL_STATE_UNAVAIL)
continue;
/* Do the refresh ourselves, just in case. */
boolean_t missing;
zpool_refresh_stats(node->zn_handle, &missing);
if (missing) {
uu_avl_remove(zlp->zl_avl, node);
zpool_close(node->zn_handle);
free(node);
} else {
node->zn_last_refresh = zlp->zl_last_refresh;
}
}
return (uu_avl_numnodes(zlp->zl_avl));
} }
/* /*
@ -262,6 +190,23 @@ pool_list_iter(zpool_list_t *zlp, int unavail, zpool_iter_f func,
return (ret); return (ret);
} }
/*
* Remove the given pool from the list. When running iostat, we want to remove
* those pools that no longer exist.
*/
void
pool_list_remove(zpool_list_t *zlp, zpool_handle_t *zhp)
{
zpool_node_t search, *node;
search.zn_handle = zhp;
if ((node = uu_avl_find(zlp->zl_avl, &search, NULL, NULL)) != NULL) {
uu_avl_remove(zlp->zl_avl, node);
zpool_close(node->zn_handle);
free(node);
}
}
/* /*
* Free all the handles associated with this list. * Free all the handles associated with this list.
*/ */
@ -434,8 +379,8 @@ process_unique_cmd_columns(vdev_cmd_data_list_t *vcdl)
static int static int
vdev_process_cmd_output(vdev_cmd_data_t *data, char *line) vdev_process_cmd_output(vdev_cmd_data_t *data, char *line)
{ {
char *col; char *col = NULL;
char *val; char *val = line;
char *equals; char *equals;
char **tmp; char **tmp;
@ -452,7 +397,6 @@ vdev_process_cmd_output(vdev_cmd_data_t *data, char *line)
col = line; col = line;
val = equals + 1; val = equals + 1;
} else { } else {
col = NULL;
val = line; val = line;
} }

View File

@ -33,8 +33,8 @@
* Copyright (c) 2017, Intel Corporation. * Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2019, loli10K <ezomori.nozomu@gmail.com> * Copyright (c) 2019, loli10K <ezomori.nozomu@gmail.com>
* Copyright (c) 2021, Colm Buckley <colm@tuatha.org> * Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
* Copyright (c) 2021, 2023, 2025, Klara, Inc. * Copyright (c) 2021, 2023, Klara Inc.
* Copyright (c) 2021, 2025 Hewlett Packard Enterprise Development LP. * Copyright [2021] Hewlett Packard Enterprise Development LP
*/ */
#include <assert.h> #include <assert.h>
@ -456,7 +456,7 @@ get_usage(zpool_help_t idx)
"<pool> <vdev> ...\n")); "<pool> <vdev> ...\n"));
case HELP_ATTACH: case HELP_ATTACH:
return (gettext("\tattach [-fsw] [-o property=value] " return (gettext("\tattach [-fsw] [-o property=value] "
"<pool> <vdev> <new-device>\n")); "<pool> <device> <new-device>\n"));
case HELP_CLEAR: case HELP_CLEAR:
return (gettext("\tclear [[--power]|[-nF]] <pool> [device]\n")); return (gettext("\tclear [[--power]|[-nF]] <pool> [device]\n"));
case HELP_CREATE: case HELP_CREATE:
@ -510,16 +510,16 @@ get_usage(zpool_help_t idx)
case HELP_REOPEN: case HELP_REOPEN:
return (gettext("\treopen [-n] <pool>\n")); return (gettext("\treopen [-n] <pool>\n"));
case HELP_INITIALIZE: case HELP_INITIALIZE:
return (gettext("\tinitialize [-c | -s | -u] [-w] <-a | <pool> " return (gettext("\tinitialize [-c | -s | -u] [-w] <pool> "
"[<device> ...]>\n")); "[<device> ...]\n"));
case HELP_SCRUB: case HELP_SCRUB:
return (gettext("\tscrub [-e | -s | -p | -C | -E | -S] [-w] " return (gettext("\tscrub [-e | -s | -p | -C] [-w] "
"<-a | <pool> [<pool> ...]>\n")); "<pool> ...\n"));
case HELP_RESILVER: case HELP_RESILVER:
return (gettext("\tresilver <pool> ...\n")); return (gettext("\tresilver <pool> ...\n"));
case HELP_TRIM: case HELP_TRIM:
return (gettext("\ttrim [-dw] [-r <rate>] [-c | -s] " return (gettext("\ttrim [-dw] [-r <rate>] [-c | -s] <pool> "
"<-a | <pool> [<device> ...]>\n")); "[<device> ...]\n"));
case HELP_STATUS: case HELP_STATUS:
return (gettext("\tstatus [-DdegiLPpstvx] " return (gettext("\tstatus [-DdegiLPpstvx] "
"[-c script1[,script2,...]] ...\n" "[-c script1[,script2,...]] ...\n"
@ -560,6 +560,33 @@ get_usage(zpool_help_t idx)
} }
} }
static void
zpool_collect_leaves(zpool_handle_t *zhp, nvlist_t *nvroot, nvlist_t *res)
{
uint_t children = 0;
nvlist_t **child;
uint_t i;
(void) nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_CHILDREN,
&child, &children);
if (children == 0) {
char *path = zpool_vdev_name(g_zfs, zhp, nvroot,
VDEV_NAME_PATH);
if (strcmp(path, VDEV_TYPE_INDIRECT) != 0 &&
strcmp(path, VDEV_TYPE_HOLE) != 0)
fnvlist_add_boolean(res, path);
free(path);
return;
}
for (i = 0; i < children; i++) {
zpool_collect_leaves(zhp, child[i], res);
}
}
/* /*
* Callback routine that will print out a pool property value. * Callback routine that will print out a pool property value.
*/ */
@ -752,11 +779,10 @@ usage(boolean_t requested)
} }
/* /*
* zpool initialize [-c | -s | -u] [-w] <-a | pool> [<vdev> ...] * zpool initialize [-c | -s | -u] [-w] <pool> [<vdev> ...]
* Initialize all unused blocks in the specified vdevs, or all vdevs in the pool * Initialize all unused blocks in the specified vdevs, or all vdevs in the pool
* if none specified. * if none specified.
* *
* -a Use all pools.
* -c Cancel. Ends active initializing. * -c Cancel. Ends active initializing.
* -s Suspend. Initializing can then be restarted with no flags. * -s Suspend. Initializing can then be restarted with no flags.
* -u Uninitialize. Clears initialization state. * -u Uninitialize. Clears initialization state.
@ -768,26 +794,22 @@ zpool_do_initialize(int argc, char **argv)
int c; int c;
char *poolname; char *poolname;
zpool_handle_t *zhp; zpool_handle_t *zhp;
nvlist_t *vdevs;
int err = 0; int err = 0;
boolean_t wait = B_FALSE; boolean_t wait = B_FALSE;
boolean_t initialize_all = B_FALSE;
struct option long_options[] = { struct option long_options[] = {
{"cancel", no_argument, NULL, 'c'}, {"cancel", no_argument, NULL, 'c'},
{"suspend", no_argument, NULL, 's'}, {"suspend", no_argument, NULL, 's'},
{"uninit", no_argument, NULL, 'u'}, {"uninit", no_argument, NULL, 'u'},
{"wait", no_argument, NULL, 'w'}, {"wait", no_argument, NULL, 'w'},
{"all", no_argument, NULL, 'a'},
{0, 0, 0, 0} {0, 0, 0, 0}
}; };
pool_initialize_func_t cmd_type = POOL_INITIALIZE_START; pool_initialize_func_t cmd_type = POOL_INITIALIZE_START;
while ((c = getopt_long(argc, argv, "acsuw", long_options, while ((c = getopt_long(argc, argv, "csuw", long_options,
NULL)) != -1) { NULL)) != -1) {
switch (c) { switch (c) {
case 'a':
initialize_all = B_TRUE;
break;
case 'c': case 'c':
if (cmd_type != POOL_INITIALIZE_START && if (cmd_type != POOL_INITIALIZE_START &&
cmd_type != POOL_INITIALIZE_CANCEL) { cmd_type != POOL_INITIALIZE_CANCEL) {
@ -834,18 +856,7 @@ zpool_do_initialize(int argc, char **argv)
argc -= optind; argc -= optind;
argv += optind; argv += optind;
initialize_cbdata_t cbdata = { if (argc < 1) {
.wait = wait,
.cmd_type = cmd_type
};
if (initialize_all && argc > 0) {
(void) fprintf(stderr, gettext("-a cannot be combined with "
"individual pools or vdevs\n"));
usage(B_FALSE);
}
if (argc < 1 && !initialize_all) {
(void) fprintf(stderr, gettext("missing pool name argument\n")); (void) fprintf(stderr, gettext("missing pool name argument\n"));
usage(B_FALSE); usage(B_FALSE);
return (-1); return (-1);
@ -857,35 +868,30 @@ zpool_do_initialize(int argc, char **argv)
usage(B_FALSE); usage(B_FALSE);
} }
if (argc == 0 && initialize_all) {
/* Initilize each pool recursively */
err = for_each_pool(argc, argv, B_TRUE, NULL, ZFS_TYPE_POOL,
B_FALSE, zpool_initialize_one, &cbdata);
return (err);
} else if (argc == 1) {
/* no individual leaf vdevs specified, initialize the pool */
poolname = argv[0]; poolname = argv[0];
zhp = zpool_open(g_zfs, poolname); zhp = zpool_open(g_zfs, poolname);
if (zhp == NULL) if (zhp == NULL)
return (-1); return (-1);
err = zpool_initialize_one(zhp, &cbdata);
vdevs = fnvlist_alloc();
if (argc == 1) {
/* no individual leaf vdevs specified, so add them all */
nvlist_t *config = zpool_get_config(zhp, NULL);
nvlist_t *nvroot = fnvlist_lookup_nvlist(config,
ZPOOL_CONFIG_VDEV_TREE);
zpool_collect_leaves(zhp, nvroot, vdevs);
} else { } else {
/* individual leaf vdevs specified, initialize them */
poolname = argv[0];
zhp = zpool_open(g_zfs, poolname);
if (zhp == NULL)
return (-1);
nvlist_t *vdevs = fnvlist_alloc();
for (int i = 1; i < argc; i++) { for (int i = 1; i < argc; i++) {
fnvlist_add_boolean(vdevs, argv[i]); fnvlist_add_boolean(vdevs, argv[i]);
} }
}
if (wait) if (wait)
err = zpool_initialize_wait(zhp, cmd_type, vdevs); err = zpool_initialize_wait(zhp, cmd_type, vdevs);
else else
err = zpool_initialize(zhp, cmd_type, vdevs); err = zpool_initialize(zhp, cmd_type, vdevs);
fnvlist_free(vdevs);
}
fnvlist_free(vdevs);
zpool_close(zhp); zpool_close(zhp);
return (err); return (err);
@ -1782,7 +1788,7 @@ zpool_do_labelclear(int argc, char **argv)
{ {
char vdev[MAXPATHLEN]; char vdev[MAXPATHLEN];
char *name = NULL; char *name = NULL;
int c, fd, ret = 0; int c, fd = -1, ret = 0;
nvlist_t *config; nvlist_t *config;
pool_state_t state; pool_state_t state;
boolean_t inuse = B_FALSE; boolean_t inuse = B_FALSE;
@ -5761,6 +5767,24 @@ children:
return (ret); return (ret);
} }
static int
refresh_iostat(zpool_handle_t *zhp, void *data)
{
iostat_cbdata_t *cb = data;
boolean_t missing;
/*
* If the pool has disappeared, remove it from the list and continue.
*/
if (zpool_refresh_stats(zhp, &missing) != 0)
return (-1);
if (missing)
pool_list_remove(cb->cb_list, zhp);
return (0);
}
/* /*
* Callback to print out the iostats for the given pool. * Callback to print out the iostats for the given pool.
*/ */
@ -6133,6 +6157,7 @@ static void
get_interval_count_filter_guids(int *argc, char **argv, float *interval, get_interval_count_filter_guids(int *argc, char **argv, float *interval,
unsigned long *count, iostat_cbdata_t *cb) unsigned long *count, iostat_cbdata_t *cb)
{ {
char **tmpargv = argv;
int argc_for_interval = 0; int argc_for_interval = 0;
/* Is the last arg an interval value? Or a guid? */ /* Is the last arg an interval value? Or a guid? */
@ -6156,7 +6181,7 @@ get_interval_count_filter_guids(int *argc, char **argv, float *interval,
} }
/* Point to our list of possible intervals */ /* Point to our list of possible intervals */
char **tmpargv = &argv[*argc - argc_for_interval]; tmpargv = &argv[*argc - argc_for_interval];
*argc = *argc - argc_for_interval; *argc = *argc - argc_for_interval;
get_interval_count(&argc_for_interval, tmpargv, get_interval_count(&argc_for_interval, tmpargv,
@ -6341,16 +6366,18 @@ get_namewidth_iostat(zpool_handle_t *zhp, void *data)
* This command can be tricky because we want to be able to deal with pool * This command can be tricky because we want to be able to deal with pool
* creation/destruction as well as vdev configuration changes. The bulk of this * creation/destruction as well as vdev configuration changes. The bulk of this
* processing is handled by the pool_list_* routines in zpool_iter.c. We rely * processing is handled by the pool_list_* routines in zpool_iter.c. We rely
* on pool_list_refresh() to detect the addition and removal of pools. * on pool_list_update() to detect the addition of new pools. Configuration
* Configuration changes are all handled within libzfs. * changes are all handled within libzfs.
*/ */
int int
zpool_do_iostat(int argc, char **argv) zpool_do_iostat(int argc, char **argv)
{ {
int c; int c;
int ret; int ret;
int npools;
float interval = 0; float interval = 0;
unsigned long count = 0; unsigned long count = 0;
int winheight = 24;
zpool_list_t *list; zpool_list_t *list;
boolean_t verbose = B_FALSE; boolean_t verbose = B_FALSE;
boolean_t latency = B_FALSE, l_histo = B_FALSE, rq_histo = B_FALSE; boolean_t latency = B_FALSE, l_histo = B_FALSE, rq_histo = B_FALSE;
@ -6599,24 +6626,10 @@ zpool_do_iostat(int argc, char **argv)
return (1); return (1);
} }
int last_npools = 0;
for (;;) { for (;;) {
/* if ((npools = pool_list_count(list)) == 0)
* Refresh all pools in list, adding or removing pools as
* necessary.
*/
int npools = pool_list_refresh(list);
if (npools == 0) {
(void) fprintf(stderr, gettext("no pools available\n")); (void) fprintf(stderr, gettext("no pools available\n"));
} else { else {
/*
* If the list of pools has changed since last time
* around, reset the iteration count to force the
* header to be redisplayed.
*/
if (last_npools != npools)
cb.cb_iteration = 0;
/* /*
* If this is the first iteration and -y was supplied * If this is the first iteration and -y was supplied
* we skip any printing. * we skip any printing.
@ -6624,6 +6637,15 @@ zpool_do_iostat(int argc, char **argv)
boolean_t skip = (omit_since_boot && boolean_t skip = (omit_since_boot &&
cb.cb_iteration == 0); cb.cb_iteration == 0);
/*
* Refresh all statistics. This is done as an
* explicit step before calculating the maximum name
* width, so that any * configuration changes are
* properly accounted for.
*/
(void) pool_list_iter(list, B_FALSE, refresh_iostat,
&cb);
/* /*
* Iterate over all pools to determine the maximum width * Iterate over all pools to determine the maximum width
* for the pool / device name column across all pools. * for the pool / device name column across all pools.
@ -6651,7 +6673,7 @@ zpool_do_iostat(int argc, char **argv)
* even when terminal window has its height * even when terminal window has its height
* changed. * changed.
*/ */
int winheight = terminal_height(); winheight = terminal_height();
/* /*
* Are we connected to TTY? If not, headers_once * Are we connected to TTY? If not, headers_once
* should be true, to avoid breaking scripts. * should be true, to avoid breaking scripts.
@ -6714,8 +6736,6 @@ zpool_do_iostat(int argc, char **argv)
(void) fflush(stdout); (void) fflush(stdout);
(void) fsleep(interval); (void) fsleep(interval);
last_npools = npools;
} }
pool_list_free(list); pool_list_free(list);
@ -7632,7 +7652,7 @@ zpool_do_replace(int argc, char **argv)
} }
/* /*
* zpool attach [-fsw] [-o property=value] <pool> <vdev> <new_device> * zpool attach [-fsw] [-o property=value] <pool> <device>|<vdev> <new_device>
* *
* -f Force attach, even if <new_device> appears to be in use. * -f Force attach, even if <new_device> appears to be in use.
* -s Use sequential instead of healing reconstruction for resilver. * -s Use sequential instead of healing reconstruction for resilver.
@ -7640,9 +7660,9 @@ zpool_do_replace(int argc, char **argv)
* -w Wait for resilvering (mirror) or expansion (raidz) to complete * -w Wait for resilvering (mirror) or expansion (raidz) to complete
* before returning. * before returning.
* *
* Attach <new_device> to a <vdev>, where the vdev can be of type * Attach <new_device> to a <device> or <vdev>, where the vdev can be of type
* device, mirror or raidz. If <vdev> is not part of a mirror, then <vdev> will * mirror or raidz. If <device> is not part of a mirror, then <device> will
* be transformed into a mirror of <vdev> and <new_device>. When a mirror * be transformed into a mirror of <device> and <new_device>. When a mirror
* is involved, <new_device> will begin life with a DTL of [0, now], and will * is involved, <new_device> will begin life with a DTL of [0, now], and will
* immediately begin to resilver itself. For the raidz case, a expansion will * immediately begin to resilver itself. For the raidz case, a expansion will
* commence and reflow the raidz data across all the disks including the * commence and reflow the raidz data across all the disks including the
@ -8348,8 +8368,6 @@ zpool_do_reopen(int argc, char **argv)
typedef struct scrub_cbdata { typedef struct scrub_cbdata {
int cb_type; int cb_type;
pool_scrub_cmd_t cb_scrub_cmd; pool_scrub_cmd_t cb_scrub_cmd;
time_t cb_date_start;
time_t cb_date_end;
} scrub_cbdata_t; } scrub_cbdata_t;
static boolean_t static boolean_t
@ -8393,8 +8411,8 @@ scrub_callback(zpool_handle_t *zhp, void *data)
return (1); return (1);
} }
err = zpool_scan_range(zhp, cb->cb_type, cb->cb_scrub_cmd, err = zpool_scan(zhp, cb->cb_type, cb->cb_scrub_cmd);
cb->cb_date_start, cb->cb_date_end);
if (err == 0 && zpool_has_checkpoint(zhp) && if (err == 0 && zpool_has_checkpoint(zhp) &&
cb->cb_type == POOL_SCAN_SCRUB) { cb->cb_type == POOL_SCAN_SCRUB) {
(void) printf(gettext("warning: will not scrub state that " (void) printf(gettext("warning: will not scrub state that "
@ -8412,35 +8430,10 @@ wait_callback(zpool_handle_t *zhp, void *data)
return (zpool_wait(zhp, *act)); return (zpool_wait(zhp, *act));
} }
static time_t
date_string_to_sec(const char *timestr, boolean_t rounding)
{
struct tm tm = {0};
int adjustment = rounding ? 1 : 0;
/* Allow mktime to determine timezone. */
tm.tm_isdst = -1;
if (strptime(timestr, "%Y-%m-%d %H:%M", &tm) == NULL) {
if (strptime(timestr, "%Y-%m-%d", &tm) == NULL) {
fprintf(stderr, gettext("Failed to parse the date.\n"));
usage(B_FALSE);
}
adjustment *= 24 * 60 * 60;
} else {
adjustment *= 60;
}
return (mktime(&tm) + adjustment);
}
/* /*
* zpool scrub [-e | -s | -p | -C | -E | -S] [-w] [-a | <pool> ...] * zpool scrub [-e | -s | -p | -C] [-w] <pool> ...
* *
* -a Scrub all pools.
* -e Only scrub blocks in the error log. * -e Only scrub blocks in the error log.
* -E End date of scrub.
* -S Start date of scrub.
* -s Stop. Stops any in-progress scrub. * -s Stop. Stops any in-progress scrub.
* -p Pause. Pause in-progress scrub. * -p Pause. Pause in-progress scrub.
* -w Wait. Blocks until scrub has completed. * -w Wait. Blocks until scrub has completed.
@ -8456,36 +8449,21 @@ zpool_do_scrub(int argc, char **argv)
cb.cb_type = POOL_SCAN_SCRUB; cb.cb_type = POOL_SCAN_SCRUB;
cb.cb_scrub_cmd = POOL_SCRUB_NORMAL; cb.cb_scrub_cmd = POOL_SCRUB_NORMAL;
cb.cb_date_start = cb.cb_date_end = 0;
boolean_t is_error_scrub = B_FALSE; boolean_t is_error_scrub = B_FALSE;
boolean_t is_pause = B_FALSE; boolean_t is_pause = B_FALSE;
boolean_t is_stop = B_FALSE; boolean_t is_stop = B_FALSE;
boolean_t is_txg_continue = B_FALSE; boolean_t is_txg_continue = B_FALSE;
boolean_t scrub_all = B_FALSE;
/* check options */ /* check options */
while ((c = getopt(argc, argv, "aspweCE:S:")) != -1) { while ((c = getopt(argc, argv, "spweC")) != -1) {
switch (c) { switch (c) {
case 'a':
scrub_all = B_TRUE;
break;
case 'e': case 'e':
is_error_scrub = B_TRUE; is_error_scrub = B_TRUE;
break; break;
case 'E':
/*
* Round the date. It's better to scrub more data than
* less. This also makes the date inclusive.
*/
cb.cb_date_end = date_string_to_sec(optarg, B_TRUE);
break;
case 's': case 's':
is_stop = B_TRUE; is_stop = B_TRUE;
break; break;
case 'S':
cb.cb_date_start = date_string_to_sec(optarg, B_FALSE);
break;
case 'p': case 'p':
is_pause = B_TRUE; is_pause = B_TRUE;
break; break;
@ -8533,19 +8511,6 @@ zpool_do_scrub(int argc, char **argv)
} }
} }
if ((cb.cb_date_start != 0 || cb.cb_date_end != 0) &&
cb.cb_scrub_cmd != POOL_SCRUB_NORMAL) {
(void) fprintf(stderr, gettext("invalid option combination: "
"start/end date is available only with normal scrub\n"));
usage(B_FALSE);
}
if (cb.cb_date_start != 0 && cb.cb_date_end != 0 &&
cb.cb_date_start > cb.cb_date_end) {
(void) fprintf(stderr, gettext("invalid arguments: "
"end date has to be later than start date\n"));
usage(B_FALSE);
}
if (wait && (cb.cb_type == POOL_SCAN_NONE || if (wait && (cb.cb_type == POOL_SCAN_NONE ||
cb.cb_scrub_cmd == POOL_SCRUB_PAUSE)) { cb.cb_scrub_cmd == POOL_SCRUB_PAUSE)) {
(void) fprintf(stderr, gettext("invalid option combination: " (void) fprintf(stderr, gettext("invalid option combination: "
@ -8556,7 +8521,7 @@ zpool_do_scrub(int argc, char **argv)
argc -= optind; argc -= optind;
argv += optind; argv += optind;
if (argc < 1 && !scrub_all) { if (argc < 1) {
(void) fprintf(stderr, gettext("missing pool name argument\n")); (void) fprintf(stderr, gettext("missing pool name argument\n"));
usage(B_FALSE); usage(B_FALSE);
} }
@ -8586,7 +8551,6 @@ zpool_do_resilver(int argc, char **argv)
cb.cb_type = POOL_SCAN_RESILVER; cb.cb_type = POOL_SCAN_RESILVER;
cb.cb_scrub_cmd = POOL_SCRUB_NORMAL; cb.cb_scrub_cmd = POOL_SCRUB_NORMAL;
cb.cb_date_start = cb.cb_date_end = 0;
/* check options */ /* check options */
while ((c = getopt(argc, argv, "")) != -1) { while ((c = getopt(argc, argv, "")) != -1) {
@ -8611,9 +8575,8 @@ zpool_do_resilver(int argc, char **argv)
} }
/* /*
* zpool trim [-d] [-r <rate>] [-c | -s] <-a | pool> [<device> ...] * zpool trim [-d] [-r <rate>] [-c | -s] <pool> [<device> ...]
* *
* -a Trim all pools.
* -c Cancel. Ends any in-progress trim. * -c Cancel. Ends any in-progress trim.
* -d Secure trim. Requires kernel and device support. * -d Secure trim. Requires kernel and device support.
* -r <rate> Sets the TRIM rate in bytes (per second). Supports * -r <rate> Sets the TRIM rate in bytes (per second). Supports
@ -8630,7 +8593,6 @@ zpool_do_trim(int argc, char **argv)
{"rate", required_argument, NULL, 'r'}, {"rate", required_argument, NULL, 'r'},
{"suspend", no_argument, NULL, 's'}, {"suspend", no_argument, NULL, 's'},
{"wait", no_argument, NULL, 'w'}, {"wait", no_argument, NULL, 'w'},
{"all", no_argument, NULL, 'a'},
{0, 0, 0, 0} {0, 0, 0, 0}
}; };
@ -8638,16 +8600,11 @@ zpool_do_trim(int argc, char **argv)
uint64_t rate = 0; uint64_t rate = 0;
boolean_t secure = B_FALSE; boolean_t secure = B_FALSE;
boolean_t wait = B_FALSE; boolean_t wait = B_FALSE;
boolean_t trimall = B_FALSE;
int error;
int c; int c;
while ((c = getopt_long(argc, argv, "acdr:sw", long_options, NULL)) while ((c = getopt_long(argc, argv, "cdr:sw", long_options, NULL))
!= -1) { != -1) {
switch (c) { switch (c) {
case 'a':
trimall = B_TRUE;
break;
case 'c': case 'c':
if (cmd_type != POOL_TRIM_START && if (cmd_type != POOL_TRIM_START &&
cmd_type != POOL_TRIM_CANCEL) { cmd_type != POOL_TRIM_CANCEL) {
@ -8706,18 +8663,7 @@ zpool_do_trim(int argc, char **argv)
argc -= optind; argc -= optind;
argv += optind; argv += optind;
trimflags_t trim_flags = { if (argc < 1) {
.secure = secure,
.rate = rate,
.wait = wait,
};
trim_cbdata_t cbdata = {
.trim_flags = trim_flags,
.cmd_type = cmd_type
};
if (argc < 1 && !trimall) {
(void) fprintf(stderr, gettext("missing pool name argument\n")); (void) fprintf(stderr, gettext("missing pool name argument\n"));
usage(B_FALSE); usage(B_FALSE);
return (-1); return (-1);
@ -8725,45 +8671,40 @@ zpool_do_trim(int argc, char **argv)
if (wait && (cmd_type != POOL_TRIM_START)) { if (wait && (cmd_type != POOL_TRIM_START)) {
(void) fprintf(stderr, gettext("-w cannot be used with -c or " (void) fprintf(stderr, gettext("-w cannot be used with -c or "
"-s options\n")); "-s\n"));
usage(B_FALSE); usage(B_FALSE);
} }
if (trimall && argc > 0) {
(void) fprintf(stderr, gettext("-a cannot be combined with "
"individual zpools or vdevs\n"));
usage(B_FALSE);
}
if (argc == 0 && trimall) {
cbdata.trim_flags.fullpool = B_TRUE;
/* Trim each pool recursively */
error = for_each_pool(argc, argv, B_TRUE, NULL, ZFS_TYPE_POOL,
B_FALSE, zpool_trim_one, &cbdata);
} else if (argc == 1) {
char *poolname = argv[0]; char *poolname = argv[0];
zpool_handle_t *zhp = zpool_open(g_zfs, poolname); zpool_handle_t *zhp = zpool_open(g_zfs, poolname);
if (zhp == NULL) if (zhp == NULL)
return (-1); return (-1);
/* no individual leaf vdevs specified, so add them all */
error = zpool_trim_one(zhp, &cbdata); trimflags_t trim_flags = {
zpool_close(zhp); .secure = secure,
} else { .rate = rate,
char *poolname = argv[0]; .wait = wait,
zpool_handle_t *zhp = zpool_open(g_zfs, poolname); };
if (zhp == NULL)
return (-1);
/* leaf vdevs specified, trim only those */
cbdata.trim_flags.fullpool = B_FALSE;
nvlist_t *vdevs = fnvlist_alloc(); nvlist_t *vdevs = fnvlist_alloc();
if (argc == 1) {
/* no individual leaf vdevs specified, so add them all */
nvlist_t *config = zpool_get_config(zhp, NULL);
nvlist_t *nvroot = fnvlist_lookup_nvlist(config,
ZPOOL_CONFIG_VDEV_TREE);
zpool_collect_leaves(zhp, nvroot, vdevs);
trim_flags.fullpool = B_TRUE;
} else {
trim_flags.fullpool = B_FALSE;
for (int i = 1; i < argc; i++) { for (int i = 1; i < argc; i++) {
fnvlist_add_boolean(vdevs, argv[i]); fnvlist_add_boolean(vdevs, argv[i]);
} }
error = zpool_trim(zhp, cbdata.cmd_type, vdevs, }
&cbdata.trim_flags);
int error = zpool_trim(zhp, cmd_type, vdevs, &trim_flags);
fnvlist_free(vdevs); fnvlist_free(vdevs);
zpool_close(zhp); zpool_close(zhp);
}
return (error); return (error);
} }
@ -10765,6 +10706,7 @@ status_callback_json(zpool_handle_t *zhp, void *data)
uint_t c; uint_t c;
vdev_stat_t *vs; vdev_stat_t *vs;
nvlist_t *item, *d, *load_info, *vds; nvlist_t *item, *d, *load_info, *vds;
item = d = NULL;
/* If dedup stats were requested, also fetch dedupcached. */ /* If dedup stats were requested, also fetch dedupcached. */
if (cbp->cb_dedup_stats > 1) if (cbp->cb_dedup_stats > 1)
@ -11388,8 +11330,7 @@ upgrade_enable_all(zpool_handle_t *zhp, int *countp)
const char *fname = spa_feature_table[i].fi_uname; const char *fname = spa_feature_table[i].fi_uname;
const char *fguid = spa_feature_table[i].fi_guid; const char *fguid = spa_feature_table[i].fi_guid;
if (!spa_feature_table[i].fi_zfs_mod_supported || if (!spa_feature_table[i].fi_zfs_mod_supported)
(spa_feature_table[i].fi_flags & ZFEATURE_FLAG_NO_UPGRADE))
continue; continue;
if (!nvlist_exists(enabled, fguid) && requested_features[i]) { if (!nvlist_exists(enabled, fguid) && requested_features[i]) {
@ -11544,11 +11485,7 @@ upgrade_list_disabled_cb(zpool_handle_t *zhp, void *arg)
"Note that the pool " "Note that the pool "
"'compatibility' feature can be " "'compatibility' feature can be "
"used to inhibit\nfeature " "used to inhibit\nfeature "
"upgrades.\n\n" "upgrades.\n\n"));
"Features marked with (*) are not "
"applied automatically on upgrade, "
"and\nmust be applied explicitly "
"with zpool-set(7).\n\n"));
(void) printf(gettext("POOL " (void) printf(gettext("POOL "
"FEATURE\n")); "FEATURE\n"));
(void) printf(gettext("------" (void) printf(gettext("------"
@ -11562,9 +11499,7 @@ upgrade_list_disabled_cb(zpool_handle_t *zhp, void *arg)
poolfirst = B_FALSE; poolfirst = B_FALSE;
} }
(void) printf(gettext(" %s%s\n"), fname, (void) printf(gettext(" %s\n"), fname);
spa_feature_table[i].fi_flags &
ZFEATURE_FLAG_NO_UPGRADE ? "(*)" : "");
} }
/* /*
* If they did "zpool upgrade -a", then we could * If they did "zpool upgrade -a", then we could
@ -12134,11 +12069,6 @@ zpool_do_events_nvprint(nvlist_t *nvl, int depth)
zfs_valstr_zio_stage(i32, flagstr, zfs_valstr_zio_stage(i32, flagstr,
sizeof (flagstr)); sizeof (flagstr));
printf(gettext("0x%x [%s]"), i32, flagstr); printf(gettext("0x%x [%s]"), i32, flagstr);
} else if (strcmp(name,
FM_EREPORT_PAYLOAD_ZFS_ZIO_TYPE) == 0) {
zfs_valstr_zio_type(i32, flagstr,
sizeof (flagstr));
printf(gettext("0x%x [%s]"), i32, flagstr);
} else if (strcmp(name, } else if (strcmp(name,
FM_EREPORT_PAYLOAD_ZFS_ZIO_PRIORITY) == 0) { FM_EREPORT_PAYLOAD_ZFS_ZIO_PRIORITY) == 0) {
zfs_valstr_zio_priority(i32, flagstr, zfs_valstr_zio_priority(i32, flagstr,
@ -12365,7 +12295,7 @@ zpool_do_events_next(ev_opts_t *opts)
nvlist_free(nvl); nvlist_free(nvl);
} }
VERIFY0(close(zevent_fd)); VERIFY(0 == close(zevent_fd));
return (ret); return (ret);
} }

View File

@ -76,10 +76,11 @@ typedef struct zpool_list zpool_list_t;
zpool_list_t *pool_list_get(int, char **, zprop_list_t **, zfs_type_t, zpool_list_t *pool_list_get(int, char **, zprop_list_t **, zfs_type_t,
boolean_t, int *); boolean_t, int *);
int pool_list_refresh(zpool_list_t *); void pool_list_update(zpool_list_t *);
int pool_list_iter(zpool_list_t *, int unavail, zpool_iter_f, void *); int pool_list_iter(zpool_list_t *, int unavail, zpool_iter_f, void *);
void pool_list_free(zpool_list_t *); void pool_list_free(zpool_list_t *);
int pool_list_count(zpool_list_t *); int pool_list_count(zpool_list_t *);
void pool_list_remove(zpool_list_t *, zpool_handle_t *);
extern libzfs_handle_t *g_zfs; extern libzfs_handle_t *g_zfs;

View File

@ -574,6 +574,7 @@ get_replication(nvlist_t *nvroot, boolean_t fatal)
nvlist_t *cnv = child[c]; nvlist_t *cnv = child[c];
const char *path; const char *path;
struct stat64 statbuf; struct stat64 statbuf;
int64_t size = -1LL;
const char *childtype; const char *childtype;
int fd, err; int fd, err;
@ -608,29 +609,23 @@ get_replication(nvlist_t *nvroot, boolean_t fatal)
verify(nvlist_lookup_string(cnv, verify(nvlist_lookup_string(cnv,
ZPOOL_CONFIG_PATH, &path) == 0); ZPOOL_CONFIG_PATH, &path) == 0);
/*
* Skip active spares they should never cause
* the pool to be evaluated as inconsistent.
*/
if (is_spare(NULL, path))
continue;
/* /*
* If we have a raidz/mirror that combines disks * If we have a raidz/mirror that combines disks
* with files, only report it as an error when * with files, report it as an error.
* fatal is set to ensure all the replication
* checks aren't skipped in check_replication().
*/ */
if (fatal && !dontreport && type != NULL && if (!dontreport && type != NULL &&
strcmp(type, childtype) != 0) { strcmp(type, childtype) != 0) {
if (ret != NULL) if (ret != NULL)
free(ret); free(ret);
ret = NULL; ret = NULL;
if (fatal)
vdev_error(gettext( vdev_error(gettext(
"mismatched replication " "mismatched replication "
"level: %s contains both " "level: %s contains both "
"files and devices\n"), "files and devices\n"),
rep.zprl_type); rep.zprl_type);
else
return (NULL);
dontreport = B_TRUE; dontreport = B_TRUE;
} }
@ -661,7 +656,7 @@ get_replication(nvlist_t *nvroot, boolean_t fatal)
statbuf.st_size == MAXOFFSET_T) statbuf.st_size == MAXOFFSET_T)
continue; continue;
int64_t size = statbuf.st_size; size = statbuf.st_size;
/* /*
* Also make sure that devices and * Also make sure that devices and
@ -881,18 +876,6 @@ check_replication(nvlist_t *config, nvlist_t *newroot)
(u_longlong_t)mirror->zprl_children); (u_longlong_t)mirror->zprl_children);
ret = -1; ret = -1;
} }
} else if (is_raidz_draid(current, new)) {
if (current->zprl_parity != new->zprl_parity) {
vdev_error(gettext(
"mismatched replication level: pool and "
"new vdev with different redundancy, %s "
"and %s vdevs, %llu vs. %llu\n"),
current->zprl_type,
new->zprl_type,
(u_longlong_t)current->zprl_parity,
(u_longlong_t)new->zprl_parity);
ret = -1;
}
} else if (strcmp(current->zprl_type, new->zprl_type) != 0) { } else if (strcmp(current->zprl_type, new->zprl_type) != 0) {
vdev_error(gettext( vdev_error(gettext(
"mismatched replication level: pool uses %s " "mismatched replication level: pool uses %s "
@ -1370,7 +1353,7 @@ is_grouping(const char *type, int *mindev, int *maxdev)
static int static int
draid_config_by_type(nvlist_t *nv, const char *type, uint64_t children) draid_config_by_type(nvlist_t *nv, const char *type, uint64_t children)
{ {
uint64_t nparity; uint64_t nparity = 1;
uint64_t nspares = 0; uint64_t nspares = 0;
uint64_t ndata = UINT64_MAX; uint64_t ndata = UINT64_MAX;
uint64_t ngroups = 1; uint64_t ngroups = 1;
@ -1598,12 +1581,13 @@ construct_spec(nvlist_t *props, int argc, char **argv)
is_dedup = is_spare = B_FALSE; is_dedup = is_spare = B_FALSE;
} }
if (is_log) { if (is_log || is_special || is_dedup) {
if (strcmp(type, VDEV_TYPE_MIRROR) != 0) { if (strcmp(type, VDEV_TYPE_MIRROR) != 0) {
(void) fprintf(stderr, (void) fprintf(stderr,
gettext("invalid vdev " gettext("invalid vdev "
"specification: unsupported 'log' " "specification: unsupported '%s' "
"device: %s\n"), type); "device: %s\n"), is_log ? "log" :
"special", type);
goto spec_out; goto spec_out;
} }
nlogs++; nlogs++;

View File

@ -18,7 +18,6 @@ zstream_LDADD = \
libzpool.la \ libzpool.la \
libnvpair.la libnvpair.la
cmd-zstream-install-exec-hook: PHONY += install-exec-hook
install-exec-hook:
cd $(DESTDIR)$(sbindir) && $(LN_S) -f zstream zstreamdump cd $(DESTDIR)$(sbindir) && $(LN_S) -f zstream zstreamdump
INSTALL_EXEC_HOOKS += cmd-zstream-install-exec-hook

View File

@ -273,6 +273,7 @@ extern int zfs_compressed_arc_enabled;
extern int zfs_abd_scatter_enabled; extern int zfs_abd_scatter_enabled;
extern uint_t dmu_object_alloc_chunk_shift; extern uint_t dmu_object_alloc_chunk_shift;
extern boolean_t zfs_force_some_double_word_sm_entries; extern boolean_t zfs_force_some_double_word_sm_entries;
extern unsigned long zio_decompress_fail_fraction;
extern unsigned long zfs_reconstruct_indirect_damage_fraction; extern unsigned long zfs_reconstruct_indirect_damage_fraction;
extern uint64_t raidz_expand_max_reflow_bytes; extern uint64_t raidz_expand_max_reflow_bytes;
extern uint_t raidz_expand_pause_point; extern uint_t raidz_expand_pause_point;
@ -808,8 +809,8 @@ static ztest_option_t option_table[] = {
{ 'X', "raidz-expansion", NULL, { 'X', "raidz-expansion", NULL,
"Perform a dedicated raidz expansion test", "Perform a dedicated raidz expansion test",
NO_DEFAULT, NULL}, NO_DEFAULT, NULL},
{ 'o', "option", "\"NAME=VALUE\"", { 'o', "option", "\"OPTION=INTEGER\"",
"Set the named tunable to the given value", "Set global variable to an unsigned 32-bit integer value",
NO_DEFAULT, NULL}, NO_DEFAULT, NULL},
{ 'G', "dump-debug-msg", NULL, { 'G', "dump-debug-msg", NULL,
"Dump zfs_dbgmsg buffer before exiting due to an error", "Dump zfs_dbgmsg buffer before exiting due to an error",
@ -828,8 +829,8 @@ static char *short_opts = NULL;
static void static void
init_options(void) init_options(void)
{ {
ASSERT0P(long_opts); ASSERT3P(long_opts, ==, NULL);
ASSERT0P(short_opts); ASSERT3P(short_opts, ==, NULL);
int count = sizeof (option_table) / sizeof (option_table[0]); int count = sizeof (option_table) / sizeof (option_table[0]);
long_opts = umem_alloc(sizeof (struct option) * count, UMEM_NOFAIL); long_opts = umem_alloc(sizeof (struct option) * count, UMEM_NOFAIL);
@ -918,7 +919,7 @@ ztest_parse_name_value(const char *input, ztest_shared_opts_t *zo)
{ {
char name[32]; char name[32];
char *value; char *value;
int state; int state = ZTEST_VDEV_CLASS_RND;
(void) strlcpy(name, input, sizeof (name)); (void) strlcpy(name, input, sizeof (name));
@ -1685,7 +1686,7 @@ ztest_rll_init(rll_t *rll)
static void static void
ztest_rll_destroy(rll_t *rll) ztest_rll_destroy(rll_t *rll)
{ {
ASSERT0P(rll->rll_writer); ASSERT3P(rll->rll_writer, ==, NULL);
ASSERT0(rll->rll_readers); ASSERT0(rll->rll_readers);
mutex_destroy(&rll->rll_lock); mutex_destroy(&rll->rll_lock);
cv_destroy(&rll->rll_cv); cv_destroy(&rll->rll_cv);
@ -1719,7 +1720,7 @@ ztest_rll_unlock(rll_t *rll)
rll->rll_writer = NULL; rll->rll_writer = NULL;
} else { } else {
ASSERT3S(rll->rll_readers, >, 0); ASSERT3S(rll->rll_readers, >, 0);
ASSERT0P(rll->rll_writer); ASSERT3P(rll->rll_writer, ==, NULL);
rll->rll_readers--; rll->rll_readers--;
} }
@ -1815,7 +1816,7 @@ ztest_zd_fini(ztest_ds_t *zd)
(ztest_random(10) == 0 ? DMU_TX_NOWAIT : DMU_TX_WAIT) (ztest_random(10) == 0 ? DMU_TX_NOWAIT : DMU_TX_WAIT)
static uint64_t static uint64_t
ztest_tx_assign(dmu_tx_t *tx, dmu_tx_flag_t txg_how, const char *tag) ztest_tx_assign(dmu_tx_t *tx, uint64_t txg_how, const char *tag)
{ {
uint64_t txg; uint64_t txg;
int error; int error;
@ -1828,10 +1829,9 @@ ztest_tx_assign(dmu_tx_t *tx, dmu_tx_flag_t txg_how, const char *tag)
if (error == ERESTART) { if (error == ERESTART) {
ASSERT3U(txg_how, ==, DMU_TX_NOWAIT); ASSERT3U(txg_how, ==, DMU_TX_NOWAIT);
dmu_tx_wait(tx); dmu_tx_wait(tx);
} else if (error == ENOSPC) {
ztest_record_enospc(tag);
} else { } else {
ASSERT(error == EDQUOT || error == EIO); ASSERT3U(error, ==, ENOSPC);
ztest_record_enospc(tag);
} }
dmu_tx_abort(tx); dmu_tx_abort(tx);
return (0); return (0);
@ -1993,9 +1993,8 @@ ztest_log_write(ztest_ds_t *zd, dmu_tx_t *tx, lr_write_t *lr)
if (write_state == WR_COPIED && if (write_state == WR_COPIED &&
dmu_read(zd->zd_os, lr->lr_foid, lr->lr_offset, lr->lr_length, dmu_read(zd->zd_os, lr->lr_foid, lr->lr_offset, lr->lr_length,
((lr_write_t *)&itx->itx_lr) + 1, DMU_READ_NO_PREFETCH | ((lr_write_t *)&itx->itx_lr) + 1, DMU_READ_NO_PREFETCH) != 0) {
DMU_KEEP_CACHING) != 0) { zil_itx_destroy(itx);
zil_itx_destroy(itx, 0);
itx = zil_itx_create(TX_WRITE, sizeof (*lr)); itx = zil_itx_create(TX_WRITE, sizeof (*lr));
write_state = WR_NEED_COPY; write_state = WR_NEED_COPY;
} }
@ -2266,19 +2265,19 @@ ztest_replay_write(void *arg1, void *arg2, boolean_t byteswap)
ASSERT(doi.doi_data_block_size); ASSERT(doi.doi_data_block_size);
ASSERT0(offset % doi.doi_data_block_size); ASSERT0(offset % doi.doi_data_block_size);
if (ztest_random(4) != 0) { if (ztest_random(4) != 0) {
dmu_flags_t flags = ztest_random(2) ? int prefetch = ztest_random(2) ?
DMU_READ_PREFETCH : DMU_READ_NO_PREFETCH; DMU_READ_PREFETCH : DMU_READ_NO_PREFETCH;
/* /*
* We will randomly set when to do O_DIRECT on a read. * We will randomly set when to do O_DIRECT on a read.
*/ */
if (ztest_random(4) == 0) if (ztest_random(4) == 0)
flags |= DMU_DIRECTIO; prefetch |= DMU_DIRECTIO;
ztest_block_tag_t rbt; ztest_block_tag_t rbt;
VERIFY0(dmu_read(os, lr->lr_foid, offset, VERIFY(dmu_read(os, lr->lr_foid, offset,
sizeof (rbt), &rbt, flags)); sizeof (rbt), &rbt, prefetch) == 0);
if (rbt.bt_magic == BT_MAGIC) { if (rbt.bt_magic == BT_MAGIC) {
ztest_bt_verify(&rbt, os, lr->lr_foid, 0, ztest_bt_verify(&rbt, os, lr->lr_foid, 0,
offset, gen, txg, crtxg); offset, gen, txg, crtxg);
@ -2309,7 +2308,7 @@ ztest_replay_write(void *arg1, void *arg2, boolean_t byteswap)
dmu_write(os, lr->lr_foid, offset, length, data, tx); dmu_write(os, lr->lr_foid, offset, length, data, tx);
} else { } else {
memcpy(abuf->b_data, data, length); memcpy(abuf->b_data, data, length);
VERIFY0(dmu_assign_arcbuf_by_dbuf(db, offset, abuf, tx, 0)); VERIFY0(dmu_assign_arcbuf_by_dbuf(db, offset, abuf, tx));
} }
(void) ztest_log_write(zd, tx, lr); (void) ztest_log_write(zd, tx, lr);
@ -2534,7 +2533,7 @@ ztest_get_data(void *arg, uint64_t arg2, lr_write_t *lr, char *buf,
object, offset, size, ZTRL_READER); object, offset, size, ZTRL_READER);
error = dmu_read(os, object, offset, size, buf, error = dmu_read(os, object, offset, size, buf,
DMU_READ_NO_PREFETCH | DMU_KEEP_CACHING); DMU_READ_NO_PREFETCH);
ASSERT0(error); ASSERT0(error);
} else { } else {
ASSERT3P(zio, !=, NULL); ASSERT3P(zio, !=, NULL);
@ -2550,6 +2549,7 @@ ztest_get_data(void *arg, uint64_t arg2, lr_write_t *lr, char *buf,
object, offset, size, ZTRL_READER); object, offset, size, ZTRL_READER);
error = dmu_buf_hold_noread(os, object, offset, zgd, &db); error = dmu_buf_hold_noread(os, object, offset, zgd, &db);
if (error == 0) { if (error == 0) {
blkptr_t *bp = &lr->lr_blkptr; blkptr_t *bp = &lr->lr_blkptr;
@ -2826,7 +2826,7 @@ ztest_io(ztest_ds_t *zd, uint64_t object, uint64_t offset)
enum ztest_io_type io_type; enum ztest_io_type io_type;
uint64_t blocksize; uint64_t blocksize;
void *data; void *data;
dmu_flags_t dmu_read_flags = DMU_READ_NO_PREFETCH; uint32_t dmu_read_flags = DMU_READ_NO_PREFETCH;
/* /*
* We will randomly set when to do O_DIRECT on a read. * We will randomly set when to do O_DIRECT on a read.
@ -2965,7 +2965,7 @@ ztest_zil_commit(ztest_ds_t *zd, uint64_t id)
(void) pthread_rwlock_rdlock(&zd->zd_zilog_lock); (void) pthread_rwlock_rdlock(&zd->zd_zilog_lock);
VERIFY0(zil_commit(zilog, ztest_random(ZTEST_OBJECTS))); zil_commit(zilog, ztest_random(ZTEST_OBJECTS));
/* /*
* Remember the committed values in zd, which is in parent/child * Remember the committed values in zd, which is in parent/child
@ -4006,7 +4006,7 @@ raidz_scratch_verify(void)
* requested by user, but scratch object was not created. * requested by user, but scratch object was not created.
*/ */
case RRSS_SCRATCH_NOT_IN_USE: case RRSS_SCRATCH_NOT_IN_USE:
ASSERT0(offset); ASSERT3U(offset, ==, 0);
break; break;
/* /*
@ -4916,7 +4916,7 @@ ztest_dsl_dataset_promote_busy(ztest_ds_t *zd, uint64_t id)
fatal(B_FALSE, "dmu_take_snapshot(%s) = %d", snap1name, error); fatal(B_FALSE, "dmu_take_snapshot(%s) = %d", snap1name, error);
} }
error = dsl_dataset_clone(clone1name, snap1name); error = dmu_objset_clone(clone1name, snap1name);
if (error) { if (error) {
if (error == ENOSPC) { if (error == ENOSPC) {
ztest_record_enospc(FTAG); ztest_record_enospc(FTAG);
@ -4943,7 +4943,7 @@ ztest_dsl_dataset_promote_busy(ztest_ds_t *zd, uint64_t id)
fatal(B_FALSE, "dmu_open_snapshot(%s) = %d", snap3name, error); fatal(B_FALSE, "dmu_open_snapshot(%s) = %d", snap3name, error);
} }
error = dsl_dataset_clone(clone2name, snap3name); error = dmu_objset_clone(clone2name, snap3name);
if (error) { if (error) {
if (error == ENOSPC) { if (error == ENOSPC) {
ztest_record_enospc(FTAG); ztest_record_enospc(FTAG);
@ -5065,7 +5065,7 @@ ztest_dmu_read_write(ztest_ds_t *zd, uint64_t id)
uint64_t stride = 123456789ULL; uint64_t stride = 123456789ULL;
uint64_t width = 40; uint64_t width = 40;
int free_percent = 5; int free_percent = 5;
dmu_flags_t dmu_read_flags = DMU_READ_PREFETCH; uint32_t dmu_read_flags = DMU_READ_PREFETCH;
/* /*
* We will randomly set when to do O_DIRECT on a read. * We will randomly set when to do O_DIRECT on a read.
@ -5536,18 +5536,18 @@ ztest_dmu_read_write_zcopy(ztest_ds_t *zd, uint64_t id)
} }
if (i == 1) { if (i == 1) {
VERIFY0(dmu_buf_hold(os, bigobj, off, VERIFY(dmu_buf_hold(os, bigobj, off,
FTAG, &dbt, DMU_READ_NO_PREFETCH)); FTAG, &dbt, DMU_READ_NO_PREFETCH) == 0);
} }
if (i != 5 || chunksize < (SPA_MINBLOCKSIZE * 2)) { if (i != 5 || chunksize < (SPA_MINBLOCKSIZE * 2)) {
VERIFY0(dmu_assign_arcbuf_by_dbuf(bonus_db, VERIFY0(dmu_assign_arcbuf_by_dbuf(bonus_db,
off, bigbuf_arcbufs[j], tx, 0)); off, bigbuf_arcbufs[j], tx));
} else { } else {
VERIFY0(dmu_assign_arcbuf_by_dbuf(bonus_db, VERIFY0(dmu_assign_arcbuf_by_dbuf(bonus_db,
off, bigbuf_arcbufs[2 * j], tx, 0)); off, bigbuf_arcbufs[2 * j], tx));
VERIFY0(dmu_assign_arcbuf_by_dbuf(bonus_db, VERIFY0(dmu_assign_arcbuf_by_dbuf(bonus_db,
off + chunksize / 2, off + chunksize / 2,
bigbuf_arcbufs[2 * j + 1], tx, 0)); bigbuf_arcbufs[2 * j + 1], tx));
} }
if (i == 1) { if (i == 1) {
dmu_buf_rele(dbt, FTAG); dmu_buf_rele(dbt, FTAG);
@ -6334,13 +6334,13 @@ ztest_dmu_snapshot_hold(ztest_ds_t *zd, uint64_t id)
fatal(B_FALSE, "dmu_objset_snapshot(%s) = %d", fullname, error); fatal(B_FALSE, "dmu_objset_snapshot(%s) = %d", fullname, error);
} }
error = dsl_dataset_clone(clonename, fullname); error = dmu_objset_clone(clonename, fullname);
if (error) { if (error) {
if (error == ENOSPC) { if (error == ENOSPC) {
ztest_record_enospc("dsl_dataset_clone"); ztest_record_enospc("dmu_objset_clone");
goto out; goto out;
} }
fatal(B_FALSE, "dsl_dataset_clone(%s) = %d", clonename, error); fatal(B_FALSE, "dmu_objset_clone(%s) = %d", clonename, error);
} }
error = dsl_destroy_snapshot(fullname, B_TRUE); error = dsl_destroy_snapshot(fullname, B_TRUE);
@ -7068,7 +7068,7 @@ ztest_set_global_vars(void)
char *kv = ztest_opts.zo_gvars[i]; char *kv = ztest_opts.zo_gvars[i];
VERIFY3U(strlen(kv), <=, ZO_GVARS_MAX_ARGLEN); VERIFY3U(strlen(kv), <=, ZO_GVARS_MAX_ARGLEN);
VERIFY3U(strlen(kv), >, 0); VERIFY3U(strlen(kv), >, 0);
int err = handle_tunable_option(kv, B_TRUE); int err = set_global_var(kv);
if (ztest_opts.zo_verbose > 0) { if (ztest_opts.zo_verbose > 0) {
(void) printf("setting global var %s ... %s\n", kv, (void) printf("setting global var %s ... %s\n", kv,
err ? "failed" : "ok"); err ? "failed" : "ok");
@ -7936,7 +7936,7 @@ ztest_freeze(void)
*/ */
while (BP_IS_HOLE(&zd->zd_zilog->zl_header->zh_log)) { while (BP_IS_HOLE(&zd->zd_zilog->zl_header->zh_log)) {
ztest_dmu_object_alloc_free(zd, 0); ztest_dmu_object_alloc_free(zd, 0);
VERIFY0(zil_commit(zd->zd_zilog, 0)); zil_commit(zd->zd_zilog, 0);
} }
txg_wait_synced(spa_get_dsl(spa), 0); txg_wait_synced(spa_get_dsl(spa), 0);
@ -7978,7 +7978,7 @@ ztest_freeze(void)
/* /*
* Commit all of the changes we just generated. * Commit all of the changes we just generated.
*/ */
VERIFY0(zil_commit(zd->zd_zilog, 0)); zil_commit(zd->zd_zilog, 0);
txg_wait_synced(spa_get_dsl(spa), 0); txg_wait_synced(spa_get_dsl(spa), 0);
/* /*
@ -8978,7 +8978,7 @@ main(int argc, char **argv)
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
} else { } else {
/* children should not be spawned if setting gvars fails */ /* children should not be spawned if setting gvars fails */
VERIFY0(err); VERIFY3S(err, ==, 0);
} }
/* Override location of zpool.cache */ /* Override location of zpool.cache */

View File

@ -16,14 +16,10 @@ SHELLCHECK_OPTS = $(call JUST_SHELLCHECK_OPTS,$(1)) $(call JUST_CHECKBAS
PHONY += shellcheck PHONY += shellcheck
shellcheck_verbose = $(shellcheck_verbose_@AM_V@)
shellcheck_verbose_ = $(shellcheck_verbose_@AM_DEFAULT_V@)
shellcheck_verbose_0 = @echo SHELLCHECK $(_STGT);
_STGT = $(subst ^,/,$(subst shellcheck-here-,,$@)) _STGT = $(subst ^,/,$(subst shellcheck-here-,,$@))
shellcheck-here-%: shellcheck-here-%:
if HAVE_SHELLCHECK if HAVE_SHELLCHECK
$(shellcheck_verbose)shellcheck --format=gcc --enable=all --exclude=SC1090,SC1091,SC2039,SC2250,SC2312,SC2317,SC3043 $$([ -n "$(SHELLCHECK_SHELL)" ] && echo "--shell=$(SHELLCHECK_SHELL)") "$$([ -e "$(_STGT)" ] || echo "$(srcdir)/")$(_STGT)" shellcheck --format=gcc --enable=all --exclude=SC1090,SC1091,SC2039,SC2250,SC2312,SC2317,SC3043 $$([ -n "$(SHELLCHECK_SHELL)" ] && echo "--shell=$(SHELLCHECK_SHELL)") "$$([ -e "$(_STGT)" ] || echo "$(srcdir)/")$(_STGT)"
else else
@echo "skipping shellcheck of" $(_STGT) "because shellcheck is not installed" @echo "skipping shellcheck of" $(_STGT) "because shellcheck is not installed"
endif endif
@ -33,15 +29,11 @@ shellcheck: $(SHELLCHECKSCRIPTS) $(call JUST_SHELLCHECK_OPTS,$(SHELLCHECKSCRIPTS
PHONY += checkbashisms PHONY += checkbashisms
checkbashisms_verbose = $(checkbashisms_verbose_@AM_V@)
checkbashisms_verbose_ = $(checkbashisms_verbose_@AM_DEFAULT_V@)
checkbashisms_verbose_0 = @echo CHECKBASHISMS $(_BTGT);
# command -v *is* specified by POSIX and every shell in existence supports it # command -v *is* specified by POSIX and every shell in existence supports it
_BTGT = $(subst ^,/,$(subst checkbashisms-here-,,$@)) _BTGT = $(subst ^,/,$(subst checkbashisms-here-,,$@))
checkbashisms-here-%: checkbashisms-here-%:
if HAVE_CHECKBASHISMS if HAVE_CHECKBASHISMS
$(checkbashisms_verbose)! { [ -n "$(SHELLCHECK_SHELL)" ] && echo '#!/bin/$(SHELLCHECK_SHELL)'; cat "$$([ -e "$(_BTGT)" ] || echo "$(srcdir)/")$(_BTGT)"; } | \ ! { [ -n "$(SHELLCHECK_SHELL)" ] && echo '#!/bin/$(SHELLCHECK_SHELL)'; cat "$$([ -e "$(_BTGT)" ] || echo "$(srcdir)/")$(_BTGT)"; } | \
checkbashisms -npx 2>&1 | grep -vFe "'command' with option other than -p" -e 'command -v' -e 'any possible bashisms' $(CHECKBASHISMS_IGNORE) >&2 checkbashisms -npx 2>&1 | grep -vFe "'command' with option other than -p" -e 'command -v' -e 'any possible bashisms' $(CHECKBASHISMS_IGNORE) >&2
else else
@echo "skipping checkbashisms of" $(_BTGT) "because checkbashisms is not installed" @echo "skipping checkbashisms of" $(_BTGT) "because checkbashisms is not installed"

View File

@ -34,26 +34,8 @@ AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_ARCH], [
esac esac
AM_CONDITIONAL([TARGET_CPU_AARCH64], test $TARGET_CPU = aarch64) AM_CONDITIONAL([TARGET_CPU_AARCH64], test $TARGET_CPU = aarch64)
AM_CONDITIONAL([TARGET_CPU_I386], test $TARGET_CPU = i386)
AM_CONDITIONAL([TARGET_CPU_X86_64], test $TARGET_CPU = x86_64) AM_CONDITIONAL([TARGET_CPU_X86_64], test $TARGET_CPU = x86_64)
AM_CONDITIONAL([TARGET_CPU_POWERPC], test $TARGET_CPU = powerpc) AM_CONDITIONAL([TARGET_CPU_POWERPC], test $TARGET_CPU = powerpc)
AM_CONDITIONAL([TARGET_CPU_SPARC64], test $TARGET_CPU = sparc64) AM_CONDITIONAL([TARGET_CPU_SPARC64], test $TARGET_CPU = sparc64)
AM_CONDITIONAL([TARGET_CPU_ARM], test $TARGET_CPU = arm) AM_CONDITIONAL([TARGET_CPU_ARM], test $TARGET_CPU = arm)
]) ])
dnl #
dnl # Check for conflicting environment variables
dnl #
dnl # If ARCH env variable is set up, then kernel Makefile in the /usr/src/kernel
dnl # can misbehave during the zfs ./configure test of the module compilation.
AC_DEFUN([ZFS_AC_CONFIG_CHECK_ARCH_VAR], [
AC_MSG_CHECKING([for conflicting environment variables])
if test -n "$ARCH"; then
AC_MSG_RESULT([warning])
AC_MSG_WARN(m4_normalize([ARCH environment variable is set to "$ARCH".
This can cause build kernel modules support check failure.
Please unset it.]))
else
AC_MSG_RESULT([done])
fi
])

View File

@ -155,34 +155,6 @@ AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_CC_NO_FORMAT_ZERO_LENGTH], [
AC_SUBST([NO_FORMAT_ZERO_LENGTH]) AC_SUBST([NO_FORMAT_ZERO_LENGTH])
]) ])
dnl #
dnl # Check if kernel cc supports -Wno-format-zero-length option.
dnl #
AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_KERNEL_CC_NO_FORMAT_ZERO_LENGTH], [
saved_cc="$CC"
AS_IF(
[ test -n "$KERNEL_CC" ], [ CC="$KERNEL_CC" ],
[ test -n "$KERNEL_LLVM" ], [ CC="clang" ],
[ CC="gcc" ]
)
AC_MSG_CHECKING([whether $CC supports -Wno-format-zero-length])
saved_flags="$CFLAGS"
CFLAGS="$CFLAGS -Werror -Wno-format-zero-length"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [])], [
KERNEL_NO_FORMAT_ZERO_LENGTH=-Wno-format-zero-length
AC_MSG_RESULT([yes])
], [
KERNEL_NO_FORMAT_ZERO_LENGTH=
AC_MSG_RESULT([no])
])
CC="$saved_cc"
CFLAGS="$saved_flags"
AC_SUBST([KERNEL_NO_FORMAT_ZERO_LENGTH])
])
dnl # dnl #
dnl # Check if cc supports -Wno-clobbered option. dnl # Check if cc supports -Wno-clobbered option.
dnl # dnl #
@ -209,27 +181,6 @@ AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_CC_NO_CLOBBERED], [
AC_SUBST([NO_CLOBBERED]) AC_SUBST([NO_CLOBBERED])
]) ])
dnl #
dnl # Check if cc supports -Wno-atomic-alignment option.
dnl #
AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_CC_NO_ATOMIC_ALIGNMENT], [
AC_MSG_CHECKING([whether $CC supports -Wno-atomic-alignment])
saved_flags="$CFLAGS"
CFLAGS="$CFLAGS -Werror -Wno-atomic-alignment"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [])], [
NO_ATOMIC_ALIGNMENT=-Wno-atomic-alignment
AC_MSG_RESULT([yes])
], [
NO_ATOMIC_ALIGNMENT=
AC_MSG_RESULT([no])
])
CFLAGS="$saved_flags"
AC_SUBST([NO_ATOMIC_ALIGNMENT])
])
dnl # dnl #
dnl # Check if cc supports -Wimplicit-fallthrough option. dnl # Check if cc supports -Wimplicit-fallthrough option.
dnl # dnl #
@ -280,17 +231,20 @@ dnl #
dnl # Check if kernel cc supports -Winfinite-recursion option. dnl # Check if kernel cc supports -Winfinite-recursion option.
dnl # dnl #
AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_KERNEL_CC_INFINITE_RECURSION], [ AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_KERNEL_CC_INFINITE_RECURSION], [
saved_cc="$CC" AC_MSG_CHECKING([whether $KERNEL_CC supports -Winfinite-recursion])
AS_IF(
[ test -n "$KERNEL_CC" ], [ CC="$KERNEL_CC" ],
[ test -n "$KERNEL_LLVM" ], [ CC="clang" ],
[ CC="gcc" ]
)
AC_MSG_CHECKING([whether $CC supports -Winfinite-recursion])
saved_cc="$CC"
saved_flags="$CFLAGS" saved_flags="$CFLAGS"
CC="gcc"
CFLAGS="$CFLAGS -Werror -Winfinite-recursion" CFLAGS="$CFLAGS -Werror -Winfinite-recursion"
AS_IF([ test -n "$KERNEL_CC" ], [
CC="$KERNEL_CC"
])
AS_IF([ test -n "$KERNEL_LLVM" ], [
CC="clang"
])
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [])], [ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [])], [
KERNEL_INFINITE_RECURSION=-Winfinite-recursion KERNEL_INFINITE_RECURSION=-Winfinite-recursion
AC_DEFINE([HAVE_KERNEL_INFINITE_RECURSION], 1, AC_DEFINE([HAVE_KERNEL_INFINITE_RECURSION], 1,
@ -375,17 +329,20 @@ dnl #
dnl # Check if kernel cc supports -fno-ipa-sra option. dnl # Check if kernel cc supports -fno-ipa-sra option.
dnl # dnl #
AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_KERNEL_CC_NO_IPA_SRA], [ AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_KERNEL_CC_NO_IPA_SRA], [
saved_cc="$CC" AC_MSG_CHECKING([whether $KERNEL_CC supports -fno-ipa-sra])
AS_IF(
[ test -n "$KERNEL_CC" ], [ CC="$KERNEL_CC" ],
[ test -n "$KERNEL_LLVM" ], [ CC="clang" ],
[ CC="gcc" ]
)
AC_MSG_CHECKING([whether $CC supports -fno-ipa-sra])
saved_cc="$CC"
saved_flags="$CFLAGS" saved_flags="$CFLAGS"
CC="gcc"
CFLAGS="$CFLAGS -Werror -fno-ipa-sra" CFLAGS="$CFLAGS -Werror -fno-ipa-sra"
AS_IF([ test -n "$KERNEL_CC" ], [
CC="$KERNEL_CC"
])
AS_IF([ test -n "$KERNEL_LLVM" ], [
CC="clang"
])
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [])], [ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [])], [
KERNEL_NO_IPA_SRA=-fno-ipa-sra KERNEL_NO_IPA_SRA=-fno-ipa-sra
AC_MSG_RESULT([yes]) AC_MSG_RESULT([yes])

View File

@ -29,8 +29,9 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH_4ARG], [
const char *path = "path"; const char *path = "path";
fmode_t mode = 0; fmode_t mode = 0;
void *holder = NULL; void *holder = NULL;
struct blk_holder_ops h;
bdev = blkdev_get_by_path(path, mode, holder, NULL); bdev = blkdev_get_by_path(path, mode, holder, &h);
]) ])
]) ])
@ -47,8 +48,9 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_OPEN_BY_PATH], [
const char *path = "path"; const char *path = "path";
fmode_t mode = 0; fmode_t mode = 0;
void *holder = NULL; void *holder = NULL;
struct blk_holder_ops h;
bdh = bdev_open_by_path(path, mode, holder, NULL); bdh = bdev_open_by_path(path, mode, holder, &h);
]) ])
]) ])
@ -66,8 +68,9 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BDEV_FILE_OPEN_BY_PATH], [
const char *path = "path"; const char *path = "path";
fmode_t mode = 0; fmode_t mode = 0;
void *holder = NULL; void *holder = NULL;
struct blk_holder_ops h;
file = bdev_file_open_by_path(path, mode, holder, NULL); file = bdev_file_open_by_path(path, mode, holder, &h);
]) ])
]) ])

View File

@ -24,9 +24,6 @@ dnl #
dnl # 2.6.38 API change dnl # 2.6.38 API change
dnl # Added d_set_d_op() helper function. dnl # Added d_set_d_op() helper function.
dnl # dnl #
dnl # 6.17 API change
dnl # d_set_d_op() removed. No direct replacement.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_D_SET_D_OP], [ AC_DEFUN([ZFS_AC_KERNEL_SRC_D_SET_D_OP], [
ZFS_LINUX_TEST_SRC([d_set_d_op], [ ZFS_LINUX_TEST_SRC([d_set_d_op], [
#include <linux/dcache.h> #include <linux/dcache.h>
@ -37,46 +34,22 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_D_SET_D_OP], [
AC_DEFUN([ZFS_AC_KERNEL_D_SET_D_OP], [ AC_DEFUN([ZFS_AC_KERNEL_D_SET_D_OP], [
AC_MSG_CHECKING([whether d_set_d_op() is available]) AC_MSG_CHECKING([whether d_set_d_op() is available])
ZFS_LINUX_TEST_RESULT([d_set_d_op], [ ZFS_LINUX_TEST_RESULT_SYMBOL([d_set_d_op],
[d_set_d_op], [fs/dcache.c], [
AC_MSG_RESULT(yes) AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_D_SET_D_OP, 1,
[Define if d_set_d_op() is available])
], [ ], [
AC_MSG_RESULT(no) ZFS_LINUX_TEST_ERROR([d_set_d_op])
])
])
dnl #
dnl # 6.17 API change
dnl # sb->s_d_op removed; set_default_d_op(sb, dop) added
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_SET_DEFAULT_D_OP], [
ZFS_LINUX_TEST_SRC([set_default_d_op], [
#include <linux/dcache.h>
], [
set_default_d_op(NULL, NULL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_SET_DEFAULT_D_OP], [
AC_MSG_CHECKING([whether set_default_d_op() is available])
ZFS_LINUX_TEST_RESULT([set_default_d_op], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SET_DEFAULT_D_OP, 1,
[Define if set_default_d_op() is available])
], [
AC_MSG_RESULT(no)
]) ])
]) ])
AC_DEFUN([ZFS_AC_KERNEL_SRC_DENTRY], [ AC_DEFUN([ZFS_AC_KERNEL_SRC_DENTRY], [
ZFS_AC_KERNEL_SRC_D_OBTAIN_ALIAS ZFS_AC_KERNEL_SRC_D_OBTAIN_ALIAS
ZFS_AC_KERNEL_SRC_D_SET_D_OP ZFS_AC_KERNEL_SRC_D_SET_D_OP
ZFS_AC_KERNEL_SRC_SET_DEFAULT_D_OP ZFS_AC_KERNEL_SRC_S_D_OP
]) ])
AC_DEFUN([ZFS_AC_KERNEL_DENTRY], [ AC_DEFUN([ZFS_AC_KERNEL_DENTRY], [
ZFS_AC_KERNEL_D_OBTAIN_ALIAS ZFS_AC_KERNEL_D_OBTAIN_ALIAS
ZFS_AC_KERNEL_D_SET_D_OP ZFS_AC_KERNEL_D_SET_D_OP
ZFS_AC_KERNEL_SET_DEFAULT_D_OP ZFS_AC_KERNEL_S_D_OP
]) ])

View File

@ -84,8 +84,6 @@ AC_DEFUN([ZFS_AC_KERNEL_MKDIR], [
AC_DEFINE(HAVE_IOPS_MKDIR_DENTRY, 1, AC_DEFINE(HAVE_IOPS_MKDIR_DENTRY, 1,
[iops->mkdir() returns struct dentry*]) [iops->mkdir() returns struct dentry*])
],[ ],[
AC_MSG_RESULT(no)
dnl # dnl #
dnl # 6.3 API change dnl # 6.3 API change
dnl # mkdir() takes struct mnt_idmap * as the first arg dnl # mkdir() takes struct mnt_idmap * as the first arg

View File

@ -70,7 +70,6 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_COMMIT_METADATA ZFS_AC_KERNEL_SRC_COMMIT_METADATA
ZFS_AC_KERNEL_SRC_SETATTR_PREPARE ZFS_AC_KERNEL_SRC_SETATTR_PREPARE
ZFS_AC_KERNEL_SRC_INSERT_INODE_LOCKED ZFS_AC_KERNEL_SRC_INSERT_INODE_LOCKED
ZFS_AC_KERNEL_SRC_DENTRY
ZFS_AC_KERNEL_SRC_TRUNCATE_SETSIZE ZFS_AC_KERNEL_SRC_TRUNCATE_SETSIZE
ZFS_AC_KERNEL_SRC_SECURITY_INODE ZFS_AC_KERNEL_SRC_SECURITY_INODE
ZFS_AC_KERNEL_SRC_FST_MOUNT ZFS_AC_KERNEL_SRC_FST_MOUNT
@ -189,7 +188,6 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_COMMIT_METADATA ZFS_AC_KERNEL_COMMIT_METADATA
ZFS_AC_KERNEL_SETATTR_PREPARE ZFS_AC_KERNEL_SETATTR_PREPARE
ZFS_AC_KERNEL_INSERT_INODE_LOCKED ZFS_AC_KERNEL_INSERT_INODE_LOCKED
ZFS_AC_KERNEL_DENTRY
ZFS_AC_KERNEL_TRUNCATE_SETSIZE ZFS_AC_KERNEL_TRUNCATE_SETSIZE
ZFS_AC_KERNEL_SECURITY_INODE ZFS_AC_KERNEL_SECURITY_INODE
ZFS_AC_KERNEL_FST_MOUNT ZFS_AC_KERNEL_FST_MOUNT

View File

@ -24,8 +24,6 @@ AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_TOOLCHAIN_SIMD], [
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AES ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AES
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_PCLMULQDQ ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_PCLMULQDQ
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_MOVBE ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_MOVBE
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_VAES
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_VPCLMULQDQ
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVE ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVE
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVEOPT ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVEOPT
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVES ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVES
@ -448,48 +446,6 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_MOVBE], [
]) ])
]) ])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_VAES
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_VAES], [
AC_MSG_CHECKING([whether host toolchain supports VAES])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
int main()
{
__asm__ __volatile__("vaesenc %ymm0, %ymm1, %ymm0");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_VAES], 1, [Define if host toolchain supports VAES])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_VPCLMULQDQ
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_VPCLMULQDQ], [
AC_MSG_CHECKING([whether host toolchain supports VPCLMULQDQ])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
int main()
{
__asm__ __volatile__("vpclmulqdq %0, %%ymm4, %%ymm3, %%ymm5" :: "i"(0));
return (0);
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_VPCLMULQDQ], 1, [Define if host toolchain supports VPCLMULQDQ])
], [
AC_MSG_RESULT([no])
])
])
dnl # dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVE dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVE
dnl # dnl #

View File

@ -2,7 +2,7 @@ dnl #
dnl # Check for statx() function and STATX_MNT_ID availability dnl # Check for statx() function and STATX_MNT_ID availability
dnl # dnl #
AC_DEFUN([ZFS_AC_CONFIG_USER_STATX], [ AC_DEFUN([ZFS_AC_CONFIG_USER_STATX], [
AC_CHECK_HEADERS([sys/stat.h], AC_CHECK_HEADERS([linux/stat.h],
[have_stat_headers=yes], [have_stat_headers=yes],
[have_stat_headers=no]) [have_stat_headers=no])
@ -14,7 +14,7 @@ AC_DEFUN([ZFS_AC_CONFIG_USER_STATX], [
AC_MSG_CHECKING([for STATX_MNT_ID]) AC_MSG_CHECKING([for STATX_MNT_ID])
AC_COMPILE_IFELSE([ AC_COMPILE_IFELSE([
AC_LANG_PROGRAM([[ AC_LANG_PROGRAM([[
#include <sys/stat.h> #include <linux/stat.h>
]], [[ ]], [[
struct statx stx; struct statx stx;
int mask = STATX_MNT_ID; int mask = STATX_MNT_ID;
@ -29,6 +29,6 @@ AC_DEFUN([ZFS_AC_CONFIG_USER_STATX], [
]) ])
]) ])
], [ ], [
AC_MSG_WARN([sys/stat.h not found; skipping statx support]) AC_MSG_WARN([linux/stat.h not found; skipping statx support])
]) ])
]) dnl end AC_DEFUN ]) dnl end AC_DEFUN

View File

@ -252,12 +252,10 @@ AC_DEFUN([ZFS_AC_CONFIG_ALWAYS], [
ZFS_AC_CONFIG_ALWAYS_CC_NO_CLOBBERED ZFS_AC_CONFIG_ALWAYS_CC_NO_CLOBBERED
ZFS_AC_CONFIG_ALWAYS_CC_INFINITE_RECURSION ZFS_AC_CONFIG_ALWAYS_CC_INFINITE_RECURSION
ZFS_AC_CONFIG_ALWAYS_KERNEL_CC_INFINITE_RECURSION ZFS_AC_CONFIG_ALWAYS_KERNEL_CC_INFINITE_RECURSION
ZFS_AC_CONFIG_ALWAYS_CC_NO_ATOMIC_ALIGNMENT
ZFS_AC_CONFIG_ALWAYS_CC_IMPLICIT_FALLTHROUGH ZFS_AC_CONFIG_ALWAYS_CC_IMPLICIT_FALLTHROUGH
ZFS_AC_CONFIG_ALWAYS_CC_FRAME_LARGER_THAN ZFS_AC_CONFIG_ALWAYS_CC_FRAME_LARGER_THAN
ZFS_AC_CONFIG_ALWAYS_CC_NO_FORMAT_TRUNCATION ZFS_AC_CONFIG_ALWAYS_CC_NO_FORMAT_TRUNCATION
ZFS_AC_CONFIG_ALWAYS_CC_NO_FORMAT_ZERO_LENGTH ZFS_AC_CONFIG_ALWAYS_CC_NO_FORMAT_ZERO_LENGTH
ZFS_AC_CONFIG_ALWAYS_KERNEL_CC_NO_FORMAT_ZERO_LENGTH
ZFS_AC_CONFIG_ALWAYS_CC_FORMAT_OVERFLOW ZFS_AC_CONFIG_ALWAYS_CC_FORMAT_OVERFLOW
ZFS_AC_CONFIG_ALWAYS_CC_NO_OMIT_FRAME_POINTER ZFS_AC_CONFIG_ALWAYS_CC_NO_OMIT_FRAME_POINTER
ZFS_AC_CONFIG_ALWAYS_CC_NO_IPA_SRA ZFS_AC_CONFIG_ALWAYS_CC_NO_IPA_SRA
@ -267,7 +265,6 @@ AC_DEFUN([ZFS_AC_CONFIG_ALWAYS], [
ZFS_AC_CONFIG_ALWAYS_TOOLCHAIN_SIMD ZFS_AC_CONFIG_ALWAYS_TOOLCHAIN_SIMD
ZFS_AC_CONFIG_ALWAYS_SYSTEM ZFS_AC_CONFIG_ALWAYS_SYSTEM
ZFS_AC_CONFIG_ALWAYS_ARCH ZFS_AC_CONFIG_ALWAYS_ARCH
ZFS_AC_CONFIG_CHECK_ARCH_VAR
ZFS_AC_CONFIG_ALWAYS_PYTHON ZFS_AC_CONFIG_ALWAYS_PYTHON
ZFS_AC_CONFIG_ALWAYS_PYZFS ZFS_AC_CONFIG_ALWAYS_PYZFS
ZFS_AC_CONFIG_ALWAYS_SED ZFS_AC_CONFIG_ALWAYS_SED

View File

@ -4,7 +4,7 @@ The detailed contributor information can be found in [2][3].
Files: contrib/debian/* Files: contrib/debian/*
Copyright: Copyright:
2013-2025, Aron Xu <aron@debian.org> 2013-2016, Aron Xu <aron@debian.org>
2016, Petter Reinholdtsen <pere@hungry.com> 2016, Petter Reinholdtsen <pere@hungry.com>
2013, Carlos Alberto Lopez Perez <clopez@igalia.com> 2013, Carlos Alberto Lopez Perez <clopez@igalia.com>
2013, Turbo Fredriksson <turbo@bayour.com> 2013, Turbo Fredriksson <turbo@bayour.com>
@ -12,8 +12,6 @@ Copyright:
2011-2013, Darik Horn <dajhorn@vanadac.com> 2011-2013, Darik Horn <dajhorn@vanadac.com>
2018-2019, Mo Zhou <cdluminate@gmail.com> 2018-2019, Mo Zhou <cdluminate@gmail.com>
2018-2020, Mo Zhou <lumin@debian.org> 2018-2020, Mo Zhou <lumin@debian.org>
2023-2024, Shengqi Chen <harry-chen@outlook.com>
2024-2025, Shengqi Chen <harry@debian.org>
License: GPL-2+ License: GPL-2+
[1] https://tracker.debian.org/pkg/zfs-linux [1] https://tracker.debian.org/pkg/zfs-linux

View File

@ -1,4 +1,4 @@
usr/bin/zarcsummary.py usr/bin/arc_summary.py
usr/share/zfs/zfs-helpers.sh usr/share/zfs/zfs-helpers.sh
etc/default/zfs etc/default/zfs
etc/init.d etc/init.d
@ -9,4 +9,4 @@ etc/zfs/vdev_id.conf.sas_direct.example
etc/zfs/vdev_id.conf.sas_switch.example etc/zfs/vdev_id.conf.sas_switch.example
etc/zfs/vdev_id.conf.scsi.example etc/zfs/vdev_id.conf.scsi.example
etc/zfs/zfs-functions etc/zfs/zfs-functions
usr/lib/systemd/system/zfs-import.service lib/systemd/system/zfs-import.service

View File

@ -1 +1 @@
usr/lib/@DEB_HOST_MULTIARCH@/libnvpair.so.* lib/@DEB_HOST_MULTIARCH@/libnvpair.so.*

View File

@ -1,2 +1,2 @@
usr/lib/*/security/pam_zfs_key.so lib/*/security/pam_zfs_key.so
usr/share/pam-configs/zfs_key usr/share/pam-configs/zfs_key

View File

@ -1,7 +1,7 @@
#!/bin/sh #!/bin/sh
set -e set -e
if ! $(ldd "/usr/lib/$(dpkg-architecture -qDEB_HOST_MULTIARCH)/security/pam_zfs_key.so" | grep -q "libasan") ; then if ! $(ldd "/lib/$(dpkg-architecture -qDEB_HOST_MULTIARCH)/security/pam_zfs_key.so" | grep -q "libasan") ; then
pam-auth-update --package pam-auth-update --package
fi fi

View File

@ -1 +1 @@
usr/lib/@DEB_HOST_MULTIARCH@/libuutil.so.* lib/@DEB_HOST_MULTIARCH@/libuutil.so.*

View File

@ -1,5 +1,3 @@
usr/lib/@DEB_HOST_MULTIARCH@/*.a lib/@DEB_HOST_MULTIARCH@/*.a usr/lib/@DEB_HOST_MULTIARCH@
usr/lib/@DEB_HOST_MULTIARCH@/*.so
usr/lib/@DEB_HOST_MULTIARCH@/pkgconfig
usr/include usr/include
usr/lib/@DEB_HOST_MULTIARCH@

View File

@ -1,2 +1,2 @@
usr/lib/@DEB_HOST_MULTIARCH@/libzfs.so.* lib/@DEB_HOST_MULTIARCH@/libzfs.so.*
usr/lib/@DEB_HOST_MULTIARCH@/libzfs_core.so.* lib/@DEB_HOST_MULTIARCH@/libzfs_core.so.*

View File

@ -1 +1 @@
usr/lib/@DEB_HOST_MULTIARCH@/libzfsbootenv.so.* lib/@DEB_HOST_MULTIARCH@/libzfsbootenv.so.*

View File

@ -1 +1 @@
usr/lib/@DEB_HOST_MULTIARCH@/libzpool.so.* lib/@DEB_HOST_MULTIARCH@/libzpool.so.*

View File

@ -1,4 +1,4 @@
usr/sbin/ztest sbin/ztest
usr/bin/raidz_test usr/bin/raidz_test
usr/share/man/man1/raidz_test.1 usr/share/man/man1/raidz_test.1
usr/share/man/man1/test-runner.1 usr/share/man/man1/test-runner.1

View File

@ -1,5 +1,5 @@
etc/zfs/zed.d/* etc/zfs/zed.d/*
usr/lib/systemd/system/zfs-zed.service lib/systemd/system/zfs-zed.service
usr/lib/zfs-linux/zed.d/* usr/lib/zfs-linux/zed.d/*
usr/sbin/zed usr/sbin/zed
usr/share/man/man8/zed.8 usr/share/man/man8/zed.8

View File

@ -1,48 +1,48 @@
etc/default/zfs etc/default/zfs
etc/zfs/zfs-functions etc/zfs/zfs-functions
etc/zfs/zpool.d/ etc/zfs/zpool.d/
usr/lib/systemd/system-generators/ lib/systemd/system-generators/
usr/lib/systemd/system-preset/ lib/systemd/system-preset/
usr/lib/systemd/system/zfs-import-cache.service lib/systemd/system/zfs-import-cache.service
usr/lib/systemd/system/zfs-import-scan.service lib/systemd/system/zfs-import-scan.service
usr/lib/systemd/system/zfs-import.target lib/systemd/system/zfs-import.target
usr/lib/systemd/system/zfs-load-key.service lib/systemd/system/zfs-load-key.service
usr/lib/systemd/system/zfs-mount.service lib/systemd/system/zfs-mount.service
usr/lib/systemd/system/zfs-mount@.service lib/systemd/system/zfs-mount@.service
usr/lib/systemd/system/zfs-scrub-monthly@.timer lib/systemd/system/zfs-scrub-monthly@.timer
usr/lib/systemd/system/zfs-scrub-weekly@.timer lib/systemd/system/zfs-scrub-weekly@.timer
usr/lib/systemd/system/zfs-scrub@.service lib/systemd/system/zfs-scrub@.service
usr/lib/systemd/system/zfs-trim-monthly@.timer lib/systemd/system/zfs-trim-monthly@.timer
usr/lib/systemd/system/zfs-trim-weekly@.timer lib/systemd/system/zfs-trim-weekly@.timer
usr/lib/systemd/system/zfs-trim@.service lib/systemd/system/zfs-trim@.service
usr/lib/systemd/system/zfs-share.service lib/systemd/system/zfs-share.service
usr/lib/systemd/system/zfs-volume-wait.service lib/systemd/system/zfs-volume-wait.service
usr/lib/systemd/system/zfs-volumes.target lib/systemd/system/zfs-volumes.target
usr/lib/systemd/system/zfs.target lib/systemd/system/zfs.target
usr/lib/udev/ lib/udev/
usr/sbin/fsck.zfs sbin/fsck.zfs
usr/sbin/mount.zfs sbin/mount.zfs
usr/sbin/zdb sbin/zdb
usr/sbin/zfs sbin/zfs
usr/sbin/zfs_ids_to_path sbin/zfs_ids_to_path
usr/sbin/zgenhostid sbin/zgenhostid
usr/sbin/zhack sbin/zhack
usr/sbin/zinject sbin/zinject
usr/sbin/zpool sbin/zpool
usr/sbin/zstream sbin/zstream
usr/sbin/zstreamdump sbin/zstreamdump
usr/bin/zvol_wait usr/bin/zvol_wait
usr/lib/modules-load.d/ usr/lib/modules-load.d/ lib/
usr/lib/zfs-linux/zpool.d/ usr/lib/zfs-linux/zpool.d/
usr/lib/zfs-linux/zpool_influxdb usr/lib/zfs-linux/zpool_influxdb
usr/lib/zfs-linux/zfs_prepare_disk usr/lib/zfs-linux/zfs_prepare_disk
usr/bin/zarcsummary usr/sbin/arc_summary
usr/bin/zarcstat usr/sbin/arcstat
usr/bin/dbufstat usr/sbin usr/sbin/dbufstat
usr/bin/zilstat usr/sbin/zilstat
usr/share/zfs/compatibility.d/ usr/share/zfs/compatibility.d/
usr/share/bash-completion/completions usr/share/bash-completion/completions
usr/share/man/man1/zarcstat.1 usr/share/man/man1/arcstat.1
usr/share/man/man1/zhack.1 usr/share/man/man1/zhack.1
usr/share/man/man1/zvol_wait.1 usr/share/man/man1/zvol_wait.1
usr/share/man/man5/ usr/share/man/man5/

View File

@ -1,3 +0,0 @@
usr/sbin/zfs usr/bin/zfs
usr/sbin/zpool usr/bin/zpool
usr/lib/zfs-linux/zpool_influxdb usr/bin/zpool_influxdb

View File

@ -37,19 +37,18 @@ override_dh_auto_configure:
@# Build the userland, but don't build the kernel modules. @# Build the userland, but don't build the kernel modules.
dh_auto_configure -- @CFGOPTS@ \ dh_auto_configure -- @CFGOPTS@ \
--bindir=/usr/bin \ --bindir=/usr/bin \
--sbindir=/usr/sbin \ --sbindir=/sbin \
--with-mounthelperdir=/usr/sbin \ --libdir=/lib/"$(DEB_HOST_MULTIARCH)" \
--libdir=/usr/lib/"$(DEB_HOST_MULTIARCH)" \ --with-udevdir=/lib/udev \
--with-udevdir=/usr/lib/udev \
--with-zfsexecdir=/usr/lib/zfs-linux \ --with-zfsexecdir=/usr/lib/zfs-linux \
--enable-systemd \ --enable-systemd \
--enable-pyzfs \ --enable-pyzfs \
--with-python=python3 \ --with-python=python3 \
--with-pammoduledir='/usr/lib/$(DEB_HOST_MULTIARCH)/security' \ --with-pammoduledir='/lib/$(DEB_HOST_MULTIARCH)/security' \
--with-pkgconfigdir='/usr/lib/$(DEB_HOST_MULTIARCH)/pkgconfig' \ --with-pkgconfigdir='/usr/lib/$(DEB_HOST_MULTIARCH)/pkgconfig' \
--with-systemdunitdir=/usr/lib/systemd/system \ --with-systemdunitdir=/lib/systemd/system \
--with-systemdpresetdir=/usr/lib/systemd/system-preset \ --with-systemdpresetdir=/lib/systemd/system-preset \
--with-systemdgeneratordir=/usr/lib/systemd/system-generators \ --with-systemdgeneratordir=/lib/systemd/system-generators \
--with-config=user --with-config=user
for i in $(wildcard $(CURDIR)/debian/*.install.in) ; do \ for i in $(wildcard $(CURDIR)/debian/*.install.in) ; do \
@ -78,6 +77,19 @@ override_dh_auto_install:
@# Install the utilities. @# Install the utilities.
$(MAKE) install DESTDIR='$(CURDIR)/debian/tmp' $(MAKE) install DESTDIR='$(CURDIR)/debian/tmp'
# Move from bin_dir to /usr/sbin
# Remove suffix (.py) as per policy 10.4 - Scripts
# https://www.debian.org/doc/debian-policy/ch-files.html#s-scripts
mkdir -p '$(CURDIR)/debian/tmp/usr/sbin/'
mv '$(CURDIR)/debian/tmp/usr/bin/arc_summary' '$(CURDIR)/debian/tmp/usr/sbin/arc_summary'
mv '$(CURDIR)/debian/tmp/usr/bin/arcstat' '$(CURDIR)/debian/tmp/usr/sbin/arcstat'
mv '$(CURDIR)/debian/tmp/usr/bin/dbufstat' '$(CURDIR)/debian/tmp/usr/sbin/dbufstat'
mv '$(CURDIR)/debian/tmp/usr/bin/zilstat' '$(CURDIR)/debian/tmp/usr/sbin/zilstat'
@# Zed has dependencies outside of the system root.
mv '$(CURDIR)/debian/tmp/sbin/zed' '$(CURDIR)/debian/tmp/usr/sbin/zed'
sed -i 's|ExecStart=/sbin/|ExecStart=/usr/sbin/|g' '$(CURDIR)/debian/tmp/lib/systemd/system/zfs-zed.service'
@# Install the DKMS source. @# Install the DKMS source.
@# We only want the files needed to build the modules @# We only want the files needed to build the modules
install -D -t '$(CURDIR)/debian/tmp/usr/src/$(NAME)-$(DEB_VERSION_UPSTREAM)/scripts' \ install -D -t '$(CURDIR)/debian/tmp/usr/src/$(NAME)-$(DEB_VERSION_UPSTREAM)/scripts' \
@ -119,6 +131,11 @@ override_dh_auto_install:
cd '$(CURDIR)/debian/tmp/usr/src/$(NAME)-$(DEB_VERSION_UPSTREAM)'; ./autogen.sh cd '$(CURDIR)/debian/tmp/usr/src/$(NAME)-$(DEB_VERSION_UPSTREAM)'; ./autogen.sh
rm -fr '$(CURDIR)/debian/tmp/usr/src/$(NAME)-$(DEB_VERSION_UPSTREAM)/autom4te.cache' rm -fr '$(CURDIR)/debian/tmp/usr/src/$(NAME)-$(DEB_VERSION_UPSTREAM)/autom4te.cache'
for i in `ls $(CURDIR)/debian/tmp/lib/$(DEB_HOST_MULTIARCH)/*.so`; do \
ln -s '/lib/$(DEB_HOST_MULTIARCH)/'`readlink $${i}` '$(CURDIR)/debian/tmp/usr/lib/$(DEB_HOST_MULTIARCH)/'`basename $${i}`; \
rm $${i}; \
done
chmod a-x '$(CURDIR)/debian/tmp/etc/zfs/zfs-functions' chmod a-x '$(CURDIR)/debian/tmp/etc/zfs/zfs-functions'
chmod a-x '$(CURDIR)/debian/tmp/etc/default/zfs' chmod a-x '$(CURDIR)/debian/tmp/etc/default/zfs'
@ -142,7 +159,7 @@ override_dh_auto_clean:
@if test -e META.orig; then mv META.orig META; fi @if test -e META.orig; then mv META.orig META; fi
override_dh_install: override_dh_install:
find debian/tmp/usr/lib -name '*.la' -delete find debian/tmp/lib -name '*.la' -delete
dh_install dh_install
override_dh_missing: override_dh_missing:
@ -156,8 +173,8 @@ override_dh_installinit:
dh_installinit -R --name zfs-zed dh_installinit -R --name zfs-zed
override_dh_installsystemd: override_dh_installsystemd:
mkdir -p debian/openzfs-zfsutils/usr/lib/systemd/system mkdir -p debian/openzfs-zfsutils/lib/systemd/system
ln -sr /dev/null debian/openzfs-zfsutils/usr/lib/systemd/system/zfs-import.service ln -sr /dev/null debian/openzfs-zfsutils/lib/systemd/system/zfs-import.service
dh_installsystemd --no-stop-on-upgrade -X zfs-zed.service dh_installsystemd --no-stop-on-upgrade -X zfs-zed.service
dh_installsystemd --name zfs-zed dh_installsystemd --name zfs-zed

View File

@ -5,7 +5,7 @@
PREREQ="udev" PREREQ="udev"
PREREQ_UDEV_RULES="60-zvol.rules 69-vdev.rules" PREREQ_UDEV_RULES="60-zvol.rules 69-vdev.rules"
COPY_EXEC_LIST="/usr/lib/udev/zvol_id /usr/lib/udev/vdev_id" COPY_EXEC_LIST="/lib/udev/zvol_id /lib/udev/vdev_id"
# Generic result code. # Generic result code.
RC=0 RC=0

View File

@ -16,8 +16,7 @@ depends() {
} }
installkernel() { installkernel() {
hostonly='' instmods -c zfs instmods -c zfs
instmods mpt3sas virtio_blk
} }
install() { install() {

View File

@ -1,253 +0,0 @@
BoringSSL is a fork of OpenSSL. As such, large parts of it fall under OpenSSL
licensing. Files that are completely new have a Google copyright and an ISC
license. This license is reproduced at the bottom of this file.
Contributors to BoringSSL are required to follow the CLA rules for Chromium:
https://cla.developers.google.com/clas
Files in third_party/ have their own licenses, as described therein. The MIT
license, for third_party/fiat, which, unlike other third_party directories, is
compiled into non-test libraries, is included below.
The OpenSSL toolkit stays under a dual license, i.e. both the conditions of the
OpenSSL License and the original SSLeay license apply to the toolkit. See below
for the actual license texts. Actually both licenses are BSD-style Open Source
licenses. In case of any license issues related to OpenSSL please contact
openssl-core@openssl.org.
The following are Google-internal bug numbers where explicit permission from
some authors is recorded for use of their work. (This is purely for our own
record keeping.)
27287199
27287880
27287883
263291445
OpenSSL License
---------------
/* ====================================================================
* Copyright (c) 1998-2011 The OpenSSL Project. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. All advertising materials mentioning features or use of this
* software must display the following acknowledgment:
* "This product includes software developed by the OpenSSL Project
* for use in the OpenSSL Toolkit. (http://www.openssl.org/)"
*
* 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to
* endorse or promote products derived from this software without
* prior written permission. For written permission, please contact
* openssl-core@openssl.org.
*
* 5. Products derived from this software may not be called "OpenSSL"
* nor may "OpenSSL" appear in their names without prior written
* permission of the OpenSSL Project.
*
* 6. Redistributions of any form whatsoever must retain the following
* acknowledgment:
* "This product includes software developed by the OpenSSL Project
* for use in the OpenSSL Toolkit (http://www.openssl.org/)"
*
* THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
* EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
* ====================================================================
*
* This product includes cryptographic software written by Eric Young
* (eay@cryptsoft.com). This product includes software written by Tim
* Hudson (tjh@cryptsoft.com).
*
*/
Original SSLeay License
-----------------------
/* Copyright (C) 1995-1998 Eric Young (eay@cryptsoft.com)
* All rights reserved.
*
* This package is an SSL implementation written
* by Eric Young (eay@cryptsoft.com).
* The implementation was written so as to conform with Netscapes SSL.
*
* This library is free for commercial and non-commercial use as long as
* the following conditions are aheared to. The following conditions
* apply to all code found in this distribution, be it the RC4, RSA,
* lhash, DES, etc., code; not just the SSL code. The SSL documentation
* included with this distribution is covered by the same copyright terms
* except that the holder is Tim Hudson (tjh@cryptsoft.com).
*
* Copyright remains Eric Young's, and as such any Copyright notices in
* the code are not to be removed.
* If this package is used in a product, Eric Young should be given attribution
* as the author of the parts of the library used.
* This can be in the form of a textual message at program startup or
* in documentation (online or textual) provided with the package.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* "This product includes cryptographic software written by
* Eric Young (eay@cryptsoft.com)"
* The word 'cryptographic' can be left out if the rouines from the library
* being used are not cryptographic related :-).
* 4. If you include any Windows specific code (or a derivative thereof) from
* the apps directory (application code) you must include an acknowledgement:
* "This product includes software written by Tim Hudson (tjh@cryptsoft.com)"
*
* THIS SOFTWARE IS PROVIDED BY ERIC YOUNG ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* The licence and distribution terms for any publically available version or
* derivative of this code cannot be changed. i.e. this code cannot simply be
* copied and put under another distribution licence
* [including the GNU Public Licence.]
*/
ISC license used for completely new code in BoringSSL:
/* Copyright 2015 The BoringSSL Authors
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
* SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION
* OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
* CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */
The code in third_party/fiat carries the MIT license:
Copyright (c) 2015-2016 the fiat-crypto authors (see
https://github.com/mit-plv/fiat-crypto/blob/master/AUTHORS).
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Licenses for support code
-------------------------
Parts of the TLS test suite are under the Go license. This code is not included
in BoringSSL (i.e. libcrypto and libssl) when compiled, however, so
distributing code linked against BoringSSL does not trigger this license:
Copyright (c) 2009 The Go Authors. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
BoringSSL uses the Chromium test infrastructure to run a continuous build,
trybots etc. The scripts which manage this, and the script for generating build
metadata, are under the Chromium license. Distributing code linked against
BoringSSL does not trigger this license.
Copyright 2015 The Chromium Authors. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@ -1,11 +0,0 @@
This directory contains the original BoringSSL [1] GCM x86-64 assembly
files [2].
The assembler files where then further modified to fit the ICP conventions.
The main purpose to include these files (and the original ones) here, is to
serve as a reference if upstream changes need to be applied to the files
included and modified in the ICP.
[1] https://github.com/google/boringssl
[2] https://github.com/google/boringssl/blob/d5440dd2c2c500ac2d3bba4afec47a054b4d99ae/gen/bcm/aes-gcm-avx2-x86_64-linux.S

File diff suppressed because it is too large Load Diff

View File

@ -8,12 +8,3 @@ fi
. /usr/share/initramfs-tools/hook-functions . /usr/share/initramfs-tools/hook-functions
copy_exec /usr/share/initramfs-tools/zfsunlock /usr/bin/zfsunlock copy_exec /usr/share/initramfs-tools/zfsunlock /usr/bin/zfsunlock
if [ -f /etc/initramfs-tools/etc/motd ]; then
copy_file text /etc/initramfs-tools/etc/motd /etc/motd
else
tmpf=$(mktemp)
echo "If you use zfs encrypted root filesystems, you can use \`zfsunlock\` to manually unlock it" > "$tmpf"
copy_file text "$tmpf" /etc/motd
rm -f "$tmpf"
fi

View File

@ -979,8 +979,7 @@ mountroot()
touch /run/zfs_unlock_complete touch /run/zfs_unlock_complete
if [ -e /run/zfs_unlock_complete_notify ]; then if [ -e /run/zfs_unlock_complete_notify ]; then
# shellcheck disable=SC2034 read -r < /run/zfs_unlock_complete_notify
read -r zfs_unlock_complete_notify < /run/zfs_unlock_complete_notify
fi fi
# ------------ # ------------

View File

@ -391,11 +391,7 @@ static int
zfs_key_config_load(pam_handle_t *pamh, zfs_key_config_t *config, zfs_key_config_load(pam_handle_t *pamh, zfs_key_config_t *config,
int argc, const char **argv) int argc, const char **argv)
{ {
#if defined(__FreeBSD__)
config->homes_prefix = strdup("zroot/home");
#else
config->homes_prefix = strdup("rpool/home"); config->homes_prefix = strdup("rpool/home");
#endif
if (config->homes_prefix == NULL) { if (config->homes_prefix == NULL) {
pam_syslog(pamh, LOG_ERR, "strdup failure"); pam_syslog(pamh, LOG_ERR, "strdup failure");
return (PAM_SERVICE_ERR); return (PAM_SERVICE_ERR);

View File

@ -604,4 +604,5 @@ class RaidzExpansionRunning(ZFSError):
errno = ZFS_ERR_RAIDZ_EXPAND_IN_PROGRESS errno = ZFS_ERR_RAIDZ_EXPAND_IN_PROGRESS
message = "A raidz device is currently expanding" message = "A raidz device is currently expanding"
# vim: softtabstop=4 tabstop=4 expandtab shiftwidth=4 # vim: softtabstop=4 tabstop=4 expandtab shiftwidth=4

View File

@ -4223,7 +4223,7 @@ class _TempPool(object):
self.getRoot().reset() self.getRoot().reset()
return return
# On the CI builders this may fail with "pool is busy" # On the Buildbot builders this may fail with "pool is busy"
# Retry 5 times before raising an error # Retry 5 times before raising an error
retry = 0 retry = 0
while True: while True:

View File

@ -10,7 +10,6 @@ COMMON_H = \
cityhash.h \ cityhash.h \
zfeature_common.h \ zfeature_common.h \
zfs_comutil.h \ zfs_comutil.h \
zfs_crrd.h \
zfs_deleg.h \ zfs_deleg.h \
zfs_fletcher.h \ zfs_fletcher.h \
zfs_namecheck.h \ zfs_namecheck.h \
@ -70,6 +69,7 @@ COMMON_H = \
sys/metaslab_impl.h \ sys/metaslab_impl.h \
sys/mmp.h \ sys/mmp.h \
sys/mntent.h \ sys/mntent.h \
sys/mod.h \
sys/multilist.h \ sys/multilist.h \
sys/nvpair.h \ sys/nvpair.h \
sys/nvpair_impl.h \ sys/nvpair_impl.h \

View File

@ -30,7 +30,6 @@
* Copyright (c) 2017 Open-E, Inc. All Rights Reserved. * Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
* Copyright (c) 2019 Datto Inc. * Copyright (c) 2019 Datto Inc.
* Copyright (c) 2021, Colm Buckley <colm@tuatha.org> * Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
* Copyright (c) 2025 Hewlett Packard Enterprise Development LP.
*/ */
#ifndef _LIBZFS_H #ifndef _LIBZFS_H
@ -289,22 +288,10 @@ typedef struct trimflags {
uint64_t rate; uint64_t rate;
} trimflags_t; } trimflags_t;
typedef struct trim_cbdata {
trimflags_t trim_flags;
pool_trim_func_t cmd_type;
} trim_cbdata_t;
typedef struct initialize_cbdata {
boolean_t wait;
pool_initialize_func_t cmd_type;
} initialize_cbdata_t;
/* /*
* Functions to manipulate pool and vdev state * Functions to manipulate pool and vdev state
*/ */
_LIBZFS_H int zpool_scan(zpool_handle_t *, pool_scan_func_t, pool_scrub_cmd_t); _LIBZFS_H int zpool_scan(zpool_handle_t *, pool_scan_func_t, pool_scrub_cmd_t);
_LIBZFS_H int zpool_scan_range(zpool_handle_t *, pool_scan_func_t,
pool_scrub_cmd_t, time_t, time_t);
_LIBZFS_H int zpool_initialize_one(zpool_handle_t *, void *);
_LIBZFS_H int zpool_initialize(zpool_handle_t *, pool_initialize_func_t, _LIBZFS_H int zpool_initialize(zpool_handle_t *, pool_initialize_func_t,
nvlist_t *); nvlist_t *);
_LIBZFS_H int zpool_initialize_wait(zpool_handle_t *, pool_initialize_func_t, _LIBZFS_H int zpool_initialize_wait(zpool_handle_t *, pool_initialize_func_t,
@ -317,9 +304,7 @@ _LIBZFS_H int zpool_reguid(zpool_handle_t *);
_LIBZFS_H int zpool_set_guid(zpool_handle_t *, const uint64_t *); _LIBZFS_H int zpool_set_guid(zpool_handle_t *, const uint64_t *);
_LIBZFS_H int zpool_reopen_one(zpool_handle_t *, void *); _LIBZFS_H int zpool_reopen_one(zpool_handle_t *, void *);
_LIBZFS_H void zpool_collect_leaves(zpool_handle_t *, nvlist_t *, nvlist_t *);
_LIBZFS_H int zpool_sync_one(zpool_handle_t *, void *); _LIBZFS_H int zpool_sync_one(zpool_handle_t *, void *);
_LIBZFS_H int zpool_trim_one(zpool_handle_t *, void *);
_LIBZFS_H int zpool_ddt_prune(zpool_handle_t *, zpool_ddt_prune_unit_t, _LIBZFS_H int zpool_ddt_prune(zpool_handle_t *, zpool_ddt_prune_unit_t,
uint64_t); uint64_t);
@ -869,7 +854,7 @@ _LIBZFS_H uint64_t zvol_volsize_to_reservation(zpool_handle_t *, uint64_t,
nvlist_t *); nvlist_t *);
typedef int (*zfs_userspace_cb_t)(void *arg, const char *domain, typedef int (*zfs_userspace_cb_t)(void *arg, const char *domain,
uid_t rid, uint64_t space, uint64_t default_quota); uid_t rid, uint64_t space);
_LIBZFS_H int zfs_userspace(zfs_handle_t *, zfs_userquota_prop_t, _LIBZFS_H int zfs_userspace(zfs_handle_t *, zfs_userquota_prop_t,
zfs_userspace_cb_t, void *); zfs_userspace_cb_t, void *);

View File

@ -33,7 +33,7 @@ noinst_HEADERS = \
%D%/spl/sys/list_impl.h \ %D%/spl/sys/list_impl.h \
%D%/spl/sys/lock.h \ %D%/spl/sys/lock.h \
%D%/spl/sys/misc.h \ %D%/spl/sys/misc.h \
%D%/spl/sys/mod.h \ %D%/spl/sys/mod_os.h \
%D%/spl/sys/mode.h \ %D%/spl/sys/mode.h \
%D%/spl/sys/mount.h \ %D%/spl/sys/mount.h \
%D%/spl/sys/mutex.h \ %D%/spl/sys/mutex.h \

View File

@ -69,10 +69,6 @@
#define __maybe_unused __attribute__((unused)) #define __maybe_unused __attribute__((unused))
#endif #endif
#ifndef __must_check
#define __must_check __attribute__((__warn_unused_result__))
#endif
/* /*
* Without this, we see warnings from objtool during normal Linux builds when * Without this, we see warnings from objtool during normal Linux builds when
* the kernel is built with CONFIG_STACK_VALIDATION=y: * the kernel is built with CONFIG_STACK_VALIDATION=y:

View File

@ -77,8 +77,8 @@ do_thread_create(caddr_t stk, size_t stksize, void (*proc)(void *), void *arg,
/* /*
* Be sure there are no surprises. * Be sure there are no surprises.
*/ */
ASSERT0P(stk); ASSERT(stk == NULL);
ASSERT0(len); ASSERT(len == 0);
ASSERT(state == TS_RUN); ASSERT(state == TS_RUN);
if (pp == &p0) if (pp == &p0)

View File

@ -62,17 +62,6 @@ typedef longlong_t hrtime_t;
#define SEC_TO_TICK(sec) ((sec) * hz) #define SEC_TO_TICK(sec) ((sec) * hz)
#define NSEC_TO_TICK(nsec) ((nsec) / (NANOSEC / hz)) #define NSEC_TO_TICK(nsec) ((nsec) / (NANOSEC / hz))
static __inline hrtime_t
getlrtime(void)
{
struct timespec ts;
hrtime_t nsec;
getnanouptime(&ts);
nsec = ((hrtime_t)ts.tv_sec * NANOSEC) + ts.tv_nsec;
return (nsec);
}
static __inline hrtime_t static __inline hrtime_t
gethrtime(void) gethrtime(void)
{ {

View File

@ -227,7 +227,6 @@ struct taskq;
#define LOOKUP_XATTR 0x02 /* lookup up extended attr dir */ #define LOOKUP_XATTR 0x02 /* lookup up extended attr dir */
#define CREATE_XATTR_DIR 0x04 /* Create extended attr dir */ #define CREATE_XATTR_DIR 0x04 /* Create extended attr dir */
#define LOOKUP_HAVE_SYSATTR_DIR 0x08 /* Already created virtual GFS dir */ #define LOOKUP_HAVE_SYSATTR_DIR 0x08 /* Already created virtual GFS dir */
#define LOOKUP_NAMED_ATTR 0x10 /* Lookup a named attribute */
/* /*
* Public vnode manipulation functions. * Public vnode manipulation functions.

View File

@ -96,12 +96,6 @@ struct zfsvfs {
uint64_t z_groupobjquota_obj; uint64_t z_groupobjquota_obj;
uint64_t z_projectquota_obj; uint64_t z_projectquota_obj;
uint64_t z_projectobjquota_obj; uint64_t z_projectobjquota_obj;
uint64_t z_defaultuserquota;
uint64_t z_defaultgroupquota;
uint64_t z_defaultprojectquota;
uint64_t z_defaultuserobjquota;
uint64_t z_defaultgroupobjquota;
uint64_t z_defaultprojectobjquota;
uint64_t z_replay_eof; /* New end of file - replay only */ uint64_t z_replay_eof; /* New end of file - replay only */
sa_attr_type_t *z_attr_table; /* SA attr mapping->id */ sa_attr_type_t *z_attr_table; /* SA attr mapping->id */
#define ZFS_OBJ_MTX_SZ 64 #define ZFS_OBJ_MTX_SZ 64
@ -232,8 +226,6 @@ extern boolean_t zfs_is_readonly(zfsvfs_t *zfsvfs);
extern int zfs_get_temporary_prop(struct dsl_dataset *ds, zfs_prop_t zfs_prop, extern int zfs_get_temporary_prop(struct dsl_dataset *ds, zfs_prop_t zfs_prop,
uint64_t *val, char *setpoint); uint64_t *val, char *setpoint);
extern int zfs_busy(void); extern int zfs_busy(void);
extern int zfs_set_default_quota(zfsvfs_t *zfsvfs, zfs_prop_t zfs_prop,
uint64_t quota);
#ifdef __cplusplus #ifdef __cplusplus
} }

View File

@ -75,7 +75,7 @@ kernel_spl_sys_HEADERS = \
%D%/spl/sys/kstat.h \ %D%/spl/sys/kstat.h \
%D%/spl/sys/list.h \ %D%/spl/sys/list.h \
%D%/spl/sys/misc.h \ %D%/spl/sys/misc.h \
%D%/spl/sys/mod.h \ %D%/spl/sys/mod_os.h \
%D%/spl/sys/mutex.h \ %D%/spl/sys/mutex.h \
%D%/spl/sys/param.h \ %D%/spl/sys/param.h \
%D%/spl/sys/proc.h \ %D%/spl/sys/proc.h \

View File

@ -542,6 +542,24 @@ blk_generic_alloc_queue(make_request_fn make_request, int node_id)
} }
#endif /* !HAVE_SUBMIT_BIO_IN_BLOCK_DEVICE_OPERATIONS */ #endif /* !HAVE_SUBMIT_BIO_IN_BLOCK_DEVICE_OPERATIONS */
/*
* All the io_*() helper functions below can operate on a bio, or a rq, but
* not both. The older submit_bio() codepath will pass a bio, and the
* newer blk-mq codepath will pass a rq.
*/
static inline int
io_data_dir(struct bio *bio, struct request *rq)
{
if (rq != NULL) {
if (op_is_write(req_op(rq))) {
return (WRITE);
} else {
return (READ);
}
}
return (bio_data_dir(bio));
}
static inline int static inline int
io_is_flush(struct bio *bio, struct request *rq) io_is_flush(struct bio *bio, struct request *rq)
{ {

View File

@ -60,6 +60,32 @@
} while (0) } while (0)
#endif #endif
/*
* 2.6.30 API change,
* The const keyword was added to the 'struct dentry_operations' in
* the dentry structure. To handle this we define an appropriate
* dentry_operations_t typedef which can be used.
*/
typedef const struct dentry_operations dentry_operations_t;
/*
* 2.6.38 API addition,
* Added d_clear_d_op() helper function which clears some flags and the
* registered dentry->d_op table. This is required because d_set_d_op()
* issues a warning when the dentry operations table is already set.
* For the .zfs control directory to work properly we must be able to
* override the default operations table and register custom .d_automount
* and .d_revalidate callbacks.
*/
static inline void
d_clear_d_op(struct dentry *dentry)
{
dentry->d_op = NULL;
dentry->d_flags &= ~(
DCACHE_OP_HASH | DCACHE_OP_COMPARE |
DCACHE_OP_REVALIDATE | DCACHE_OP_DELETE);
}
/* /*
* Walk and invalidate all dentry aliases of an inode * Walk and invalidate all dentry aliases of an inode
* unless it's a mountpoint * unless it's a mountpoint

View File

@ -63,7 +63,6 @@ enum scope_prefix_types {
zfs_vdev_disk, zfs_vdev_disk,
zfs_vdev_file, zfs_vdev_file,
zfs_vdev_mirror, zfs_vdev_mirror,
zfs_vol,
zfs_vnops, zfs_vnops,
zfs_zevent, zfs_zevent,
zfs_zio, zfs_zio,

View File

@ -597,32 +597,6 @@ zfs_movbe_available(void)
#endif #endif
} }
/*
* Check if VAES instruction set is available
*/
static inline boolean_t
zfs_vaes_available(void)
{
#if defined(X86_FEATURE_VAES)
return (!!boot_cpu_has(X86_FEATURE_VAES));
#else
return (B_FALSE);
#endif
}
/*
* Check if VPCLMULQDQ instruction set is available
*/
static inline boolean_t
zfs_vpclmulqdq_available(void)
{
#if defined(X86_FEATURE_VPCLMULQDQ)
return (!!boot_cpu_has(X86_FEATURE_VPCLMULQDQ));
#else
return (B_FALSE);
#endif
}
/* /*
* Check if SHA_NI instruction set is available * Check if SHA_NI instruction set is available
*/ */

View File

@ -71,22 +71,6 @@ atomic_cas_ptr(volatile void *target, void *cmp, void *newval)
return ((void *)atomic_cas_64((volatile uint64_t *)target, return ((void *)atomic_cas_64((volatile uint64_t *)target,
(uint64_t)cmp, (uint64_t)newval)); (uint64_t)cmp, (uint64_t)newval));
} }
static __inline__ void *
atomic_swap_ptr(volatile void *target, void *newval)
{
return ((void *)atomic_swap_64((volatile uint64_t *)target,
(uint64_t)newval));
}
static __inline__ void *
atomic_load_ptr(volatile void *target)
{
return ((void *)atomic_load_64((volatile uint64_t *)target));
}
static __inline__ void
atomic_store_ptr(volatile void *target, void *newval)
{
atomic_store_64((volatile uint64_t *)target, (uint64_t)newval);
}
#else /* _LP64 */ #else /* _LP64 */
static __inline__ void * static __inline__ void *
atomic_cas_ptr(volatile void *target, void *cmp, void *newval) atomic_cas_ptr(volatile void *target, void *cmp, void *newval)
@ -94,22 +78,6 @@ atomic_cas_ptr(volatile void *target, void *cmp, void *newval)
return ((void *)atomic_cas_32((volatile uint32_t *)target, return ((void *)atomic_cas_32((volatile uint32_t *)target,
(uint32_t)cmp, (uint32_t)newval)); (uint32_t)cmp, (uint32_t)newval));
} }
static __inline__ void *
atomic_swap_ptr(volatile void *target, void *newval)
{
return ((void *)atomic_swap_32((volatile uint32_t *)target,
(uint32_t)newval));
}
static __inline__ void *
atomic_load_ptr(volatile void *target)
{
return ((void *)atomic_load_32((volatile uint32_t *)target));
}
static __inline__ void
atomic_store_ptr(volatile void *target, void *newval)
{
atomic_store_32((volatile uint32_t *)target, (uint32_t)newval);
}
#endif /* _LP64 */ #endif /* _LP64 */
#endif /* _SPL_ATOMIC_H */ #endif /* _SPL_ATOMIC_H */

View File

@ -69,10 +69,6 @@
#define __maybe_unused __attribute__((unused)) #define __maybe_unused __attribute__((unused))
#endif #endif
#ifndef __must_check
#define __must_check __attribute__((__warn_unused_result__))
#endif
/* /*
* Without this, we see warnings from objtool during normal Linux builds when * Without this, we see warnings from objtool during normal Linux builds when
* the kernel is built with CONFIG_STACK_VALIDATION=y: * the kernel is built with CONFIG_STACK_VALIDATION=y:

View File

@ -61,7 +61,7 @@ void *spl_kvmalloc(size_t size, gfp_t flags);
/* /*
* Convert a KM_* flags mask to its Linux GFP_* counterpart. The conversion * Convert a KM_* flags mask to its Linux GFP_* counterpart. The conversion
* function is context aware which means that KM_SLEEP allocations can be * function is context aware which means that KM_SLEEP allocations can be
* safely used in syncing contexts which have set SPL_FSTRANS. * safely used in syncing contexts which have set PF_FSTRANS.
*/ */
static inline gfp_t static inline gfp_t
kmem_flags_convert(int flags) kmem_flags_convert(int flags)
@ -91,11 +91,25 @@ typedef struct {
} fstrans_cookie_t; } fstrans_cookie_t;
/* /*
* SPL_FSTRANS is the set of flags that indicate that the task is in a * Introduced in Linux 3.9, however this cannot be solely relied on before
* filesystem or IO codepath, and so any allocation must not call back into * Linux 3.18 as it doesn't turn off __GFP_FS as it should.
* those codepaths (eg to swap).
*/ */
#define SPL_FSTRANS (PF_MEMALLOC_NOIO) #ifdef PF_MEMALLOC_NOIO
#define __SPL_PF_MEMALLOC_NOIO (PF_MEMALLOC_NOIO)
#else
#define __SPL_PF_MEMALLOC_NOIO (0)
#endif
/*
* PF_FSTRANS is removed from Linux 4.12
*/
#ifdef PF_FSTRANS
#define __SPL_PF_FSTRANS (PF_FSTRANS)
#else
#define __SPL_PF_FSTRANS (0)
#endif
#define SPL_FSTRANS (__SPL_PF_FSTRANS|__SPL_PF_MEMALLOC_NOIO)
static inline fstrans_cookie_t static inline fstrans_cookie_t
spl_fstrans_mark(void) spl_fstrans_mark(void)
@ -127,8 +141,43 @@ spl_fstrans_check(void)
return (current->flags & SPL_FSTRANS); return (current->flags & SPL_FSTRANS);
} }
/*
* specifically used to check PF_FSTRANS flag, cannot be relied on for
* checking spl_fstrans_mark().
*/
static inline int
__spl_pf_fstrans_check(void)
{
return (current->flags & __SPL_PF_FSTRANS);
}
/*
* Kernel compatibility for GFP flags
*/
/* < 4.13 */
#ifndef __GFP_RETRY_MAYFAIL
#define __GFP_RETRY_MAYFAIL __GFP_REPEAT
#endif
/* < 4.4 */
#ifndef __GFP_RECLAIM
#define __GFP_RECLAIM __GFP_WAIT
#endif
#ifdef HAVE_ATOMIC64_T
#define kmem_alloc_used_add(size) atomic64_add(size, &kmem_alloc_used)
#define kmem_alloc_used_sub(size) atomic64_sub(size, &kmem_alloc_used)
#define kmem_alloc_used_read() atomic64_read(&kmem_alloc_used)
#define kmem_alloc_used_set(size) atomic64_set(&kmem_alloc_used, size)
extern atomic64_t kmem_alloc_used; extern atomic64_t kmem_alloc_used;
extern uint64_t kmem_alloc_max; extern unsigned long long kmem_alloc_max;
#else /* HAVE_ATOMIC64_T */
#define kmem_alloc_used_add(size) atomic_add(size, &kmem_alloc_used)
#define kmem_alloc_used_sub(size) atomic_sub(size, &kmem_alloc_used)
#define kmem_alloc_used_read() atomic_read(&kmem_alloc_used)
#define kmem_alloc_used_set(size) atomic_set(&kmem_alloc_used, size)
extern atomic_t kmem_alloc_used;
extern unsigned long long kmem_alloc_max;
#endif /* HAVE_ATOMIC64_T */
extern unsigned int spl_kmem_alloc_warn; extern unsigned int spl_kmem_alloc_warn;
extern unsigned int spl_kmem_alloc_max; extern unsigned int spl_kmem_alloc_max;

View File

@ -21,10 +21,6 @@
* You should have received a copy of the GNU General Public License along * You should have received a copy of the GNU General Public License along
* with the SPL. If not, see <http://www.gnu.org/licenses/>. * with the SPL. If not, see <http://www.gnu.org/licenses/>.
*/ */
/*
* Copyright (c) 2024-2025, Klara, Inc.
* Copyright (c) 2024-2025, Syneto
*/
#ifndef _SPL_KSTAT_H #ifndef _SPL_KSTAT_H
#define _SPL_KSTAT_H #define _SPL_KSTAT_H
@ -94,8 +90,6 @@ typedef struct kstat_module {
struct list_head ksm_module_list; /* module linkage */ struct list_head ksm_module_list; /* module linkage */
struct list_head ksm_kstat_list; /* list of kstat entries */ struct list_head ksm_kstat_list; /* list of kstat entries */
struct proc_dir_entry *ksm_proc; /* proc entry */ struct proc_dir_entry *ksm_proc; /* proc entry */
struct kstat_module *ksm_parent; /* parent module in hierarchy */
uint_t ksm_nchildren; /* number of child modules */
} kstat_module_t; } kstat_module_t;
typedef struct kstat_raw_ops { typedef struct kstat_raw_ops {

View File

@ -111,7 +111,7 @@ spl_mutex_lockdep_on_maybe(kmutex_t *mp) \
#undef mutex_destroy #undef mutex_destroy
#define mutex_destroy(mp) \ #define mutex_destroy(mp) \
{ \ { \
VERIFY0P(mutex_owner(mp)); \ VERIFY3P(mutex_owner(mp), ==, NULL); \
} }
#define mutex_tryenter(mp) \ #define mutex_tryenter(mp) \

View File

@ -130,7 +130,7 @@ RW_READ_HELD(krwlock_t *rwp)
/* /*
* The Linux rwsem implementation does not require a matching destroy. * The Linux rwsem implementation does not require a matching destroy.
*/ */
#define rw_destroy(rwp) ASSERT(!(RW_LOCK_HELD(rwp))) #define rw_destroy(rwp) ((void) 0)
/* /*
* Upgrading a rwsem from a reader to a writer is not supported by the * Upgrading a rwsem from a reader to a writer is not supported by the

View File

@ -25,6 +25,6 @@
#ifndef _SPL_STAT_H #ifndef _SPL_STAT_H
#define _SPL_STAT_H #define _SPL_STAT_H
#include <sys/stat.h> #include <linux/stat.h>
#endif /* SPL_STAT_H */ #endif /* SPL_STAT_H */

View File

@ -79,14 +79,6 @@ gethrestime_sec(void)
return (ts.tv_sec); return (ts.tv_sec);
} }
static inline hrtime_t
getlrtime(void)
{
inode_timespec_t ts;
ktime_get_coarse_ts64(&ts);
return (((hrtime_t)ts.tv_sec * NSEC_PER_SEC) + ts.tv_nsec);
}
static inline hrtime_t static inline hrtime_t
gethrtime(void) gethrtime(void)
{ {

Some files were not shown because too many files have changed in this diff Show More