Commit Graph

10226 Commits

Author SHA1 Message Date
Rob Norris
b2c792778c zvol: generalise zvol_remove_minors_impl() for single zvol case
zvol_remove_minor_impl() and zvol_remove_minors_impl() should be
identical except for how they select zvols to remove, so lets just use
the same function with a flag to indicate if we should include children
and snapshots or not.

Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17625
2025-08-19 10:06:11 -07:00
Rob Norris
6bb8fe5528 ZTS: stress test concurrent zvol create/destroy
Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17625
2025-08-19 10:05:34 -07:00
r-ricci
30a915efed
zfs-send.8: mention combination of -c/-e flags and zstd_compress feature
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Roberto Ricci <io@r-ricci.it>
Closes #17647
2025-08-19 10:56:58 -04:00
Ameer Hamza
7b54567c1f
trace_zil.h: rename zcw_zio_error to zcw_error
Rename `zcw_zio_error` to `zcw_error` in `trace_zil.h` that was missed
in commit f562e0f69. This fixes compilation errors exposed when building
with `--with-linux=`.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17654
2025-08-19 10:54:50 -04:00
Brian Behlendorf
f65321e30c
Add missing AC_MSG_RESULT
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
Output the result of the "iops->mkdir() returns struct dentry*"
check to cleanup the configure output.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17641
2025-08-15 13:18:37 -07:00
Tony Hutter
9b0a9b410e
CI: Add optional patch level, fix hostname on F42
In the past there have been times when we need to generate new RPMs
for an existing ZFS release.  Typically this happens when a new RHEL
version comes out and the kernel symbols no longer match.  To get
users to auto-update we just bump the patch number.  For example, we
had to create zfs-2.1.13-1 for EL8.8 and zfs-2.1.13-2 for EL8.9.

This commit adds an optional patch level text box to the github
package builder runner.

In addition, this commit also uses `hostnamectl` instead of `hostname`
for F42+ compatibility, if available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17638
2025-08-15 09:21:23 -07:00
Brian Behlendorf
5061f959d1
Retire zfs_autoimport_disable kmod option
Some checks are pending
checkstyle / checkstyle (push) Waiting to run
CodeQL / Analyze (cpp) (push) Waiting to run
CodeQL / Analyze (python) (push) Waiting to run
zloop / zloop (push) Waiting to run
Back in 2014 the zfs_autoimport_disable module option was added to
control whether the kmods should load the pool configs from the cache
file on module load.  The default value since that time has been for
the kernel to not process the cache file.

Detecting and importing pools during boot is now controlled outside
of the kmod on both Linux and FreeBSD.  By all accounts this has been
working well and we can remove this dormant code on the kernel side.

The spa_config_load() function is has been moved to userspace, it is
now only used by libzpool.  Additionally, the spa_boot_init() hook
which was used by FreeBSD now looks to be used and was removed.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17618
2025-08-14 14:58:58 -07:00
Alexander Motin
d151432073
ZIL: Make allocations more flexible
Some checks are pending
checkstyle / checkstyle (push) Waiting to run
CodeQL / Analyze (cpp) (push) Waiting to run
CodeQL / Analyze (python) (push) Waiting to run
zloop / zloop (push) Waiting to run
When ZIL allocates space for new LWBs without knowing how much it
will require, it can use new metaslab_alloc_range() function to
allocate slightly more or less than it predicted.  It allows to
improve space efficiency by allocating bigger LWBs on RAIDZ/dRAID
instead of padding and possibly packing more ZIL records there.
It may also allow to reduce ganging in some cases by allowing to
allocate smaller LWBs when we are not sure we'll need bigger.

On the opposite side, when we allocate space for already closed
LWBs, when we precisely know how much space we need, we may just
allocate what we need instead of relying on writing less than
allocated, that does not work for RAIDZ.

Space for LWBs in open state (still being filled) is allocated
same as before.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17613
2025-08-14 08:50:17 -07:00
Rob Norris
8d35a022e4
AUTHORS/mailmap: update with new contributors
Some checks are pending
checkstyle / checkstyle (push) Waiting to run
CodeQL / Analyze (cpp) (push) Waiting to run
CodeQL / Analyze (python) (push) Waiting to run
zloop / zloop (push) Waiting to run
Welcome to the house of fun! 🥳

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17634
2025-08-13 15:56:49 -07:00
Rob Norris
28433c4547 simd_stat: expose availability of VAES and VPCLMULQDQ
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Joel Low <joel@joelsplace.sg>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17058
2025-08-13 14:53:24 -07:00
Rob Norris
930f9cc66c crypto_test: include AVX2 GCM implementation in tests
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Joel Low <joel@joelsplace.sg>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17058
2025-08-13 14:52:46 -07:00
Joel Low
bb9225ea86 Backport AVX2 AES-GCM implementation from BoringSSL
This uses the AVX2 versions of the AESENC and PCLMULQDQ instructions; on
Zen 3 this provides an up to 80% performance improvement.

Original source:
d5440dd2c2/gen/bcm/aes-gcm-avx2-x86_64-linux.S

See the original BoringSSL commit at
3b6e1be439.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Joel Low <joel@joelsplace.sg>
Closes #17058
2025-08-13 14:51:20 -07:00
Alexander Motin
885d929cf8
Fix missed assertion update in physical rewrite patch
Physical rewrite patch changed the meaning of BP_GET_BIRTH(), but
I missed update one of its occurences, ending up asserting equal
logical birth times instead of equal physical birth times.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Fixes #17565
Closes #17631
2025-08-13 15:56:25 -04:00
Patrick Fasano
3cfd670e74
Add conflict/replacement with older SONAME libzfs and libzpool packages
Some checks are pending
checkstyle / checkstyle (push) Waiting to run
CodeQL / Analyze (cpp) (push) Waiting to run
CodeQL / Analyze (python) (push) Waiting to run
zloop / zloop (push) Waiting to run
In e8f0aa143e, the SONAMEs and package
names for libzfs and libzpool were bumped. The `contrib/debian/control`
file did not declare a conflict/replacement with the old package name.
This can cause dpkg to leave a system in an inconsistent state if the
old package is not manually uninstalled first.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: Patrick Fasano <patrick@patrickfasano.com>
Closes #17586
2025-08-13 09:53:24 -07:00
Jitendra Patidar
077269bfed
Fix Assert in dbuf_undirty, which triggers during usage zap shrink
Some checks are pending
checkstyle / checkstyle (push) Waiting to run
CodeQL / Analyze (cpp) (push) Waiting to run
CodeQL / Analyze (python) (push) Waiting to run
zloop / zloop (push) Waiting to run
Usage zap's (DMU_*USED_OBJECT) are updated in syncing context via
do_userquota_cacheflush(). zap shrink triggers,
ASSERT(db->db_objset == dmu_objset_pool(db->db_objset)->dp_meta_objset
    || txg != spa_syncing_txg(dmu_objset_spa(db->db_objset)));

DMU_*USED_OBJECT are special object (DMU_OBJECT_IS_SPECIAL), gets
updated in syncing context only. So, relax assert for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #17602
2025-08-12 14:19:05 -07:00
Alan Somers
d3c1d27afd
zdb: better handling for corrupt block pointers
When dumping indirect blocks, attempt to print corrupt block pointers
rather than abort the program.  When corruption is detected zdb will
exit with an error code of 3.

Sponsored by:	ConnectWise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Alek Pinchuk <alek.pinchuk@connectwise.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #17166
2025-08-12 14:16:37 -07:00
Rob Norris
f8bc01c79f
contrib/initramfs/scripts/zfs: shellcheck fixup
I got a newer shellcheck, and it pointed out that read without a target
variable is not POSIXly. The var was removed in c3ef9f7528, so I put it
back, and now shellcheck complains about an unused var. That's actually
correct, but necessary, so I've added a suppression for that, probably
better.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17626
2025-08-12 14:09:59 -07:00
Colin Percival
22671c4da4
FreeBSD 15.0 is now "PRERELEASE"
Chase URL change from the FreeBSD project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colin Percival <cperciva@tarsnap.com>
Closes #17617
2025-08-12 13:38:55 -07:00
Brian Behlendorf
152e34822b
Silence zstd large allocation warning
Allow zstd_mempool_init() to allocate using vmem_alloc() instead
of kmem_alloc() to silence the large allocation warning on Linux
during module load when the system has a large number of CPUs.

It's not at all clear to me that scaling the allocation size with
the number of CPUs is beneficial and that should be evaluated.
But for the moment this should resolve the warning without
introducing any unexpected side effects.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17620
Closes #11557
2025-08-12 13:38:08 -07:00
Brian Behlendorf
1ccae433e9
Allow vmem_alloc backed multilists
Systems with a large number of CPU cores (192+) may trigger the large
allocation warning in multilist_create() on Linux.  Silence the warning
by converting the allocation to vmem_alloc().

On Linux this results in a call to kvalloc() which will alloc vmem
for large allocations and kmem for small allocations.

On FreeBSD both vmem_alloc and kmem_alloc internally use the same
allocator so there is no functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17616
2025-08-12 13:36:03 -07:00
Alexander Motin
e0e60d319c
Better pack struct zio_prop
By using precisely sized fields it is possible to reduce the size
of this structure and respectively struct zio it is included into
by 40 bytes (from 92 to 52).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17619
2025-08-12 13:28:46 -07:00
Rob Norris
531568f438 zil_suspend: fix cookie leak if ZIL crashes during wait
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17622
2025-08-12 13:24:32 -07:00
Rob Norris
7c9adc6858 zil_process_commit_list: fail better if the pool suspends in stall
Make sure we properly inform the nolwb waiters of the error, and don't
keep trying.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17622
2025-08-12 13:24:27 -07:00
Rob Norris
f562e0f691 ZIL: single zil_commit_waiter_done() function to complete a waiter
Just making it easier to not get the locking and broadcast wrong.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17622
2025-08-12 13:24:22 -07:00
Rob Norris
92da3e18c8 ZIL: flag crashed LWBs so we know not to process them
If the ZIL crashed, any outstanding LWBs are no longer interesting, so
if they return, we need to just clean them up and return, not try to do
any work on them. This is true even if they return success, as that may
be long after the pool suspended and resumed, depending on when/if the
kernel decides to return the IO to us. In particular, we must not try to
get the "next" LWB from zl_lwb_list, since they're no longer on that
list.

So, we put a flag on in-flight LWBs in zil_crash() when we move them
from zl_lwb_list to zl_lwb_crash_list, so we know what's going on when
they return.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17622
2025-08-12 13:24:16 -07:00
Rob Norris
508c546975 ZIL: use a bitfield for LWB "slog" and "slim" state flags
I'm soon about to need another LWB flag, and boolean_t is just so big
for only storing a single bit. Changing to a bitfield is far less
wasteful.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17622
2025-08-12 13:23:59 -07:00
achill
6b3333de2d
Linux 6.16 compat: META
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
Update the META file to reflect compatibility with the 6.16
kernel.

Tested with 6.16.0-0-stable of Alpine Linux edge, see
<https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/87929>.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Achill Gilgenast <achill@achill.org>
Closes #17578
2025-08-11 16:30:09 -07:00
René Wirnata
1d0b94c4e7
zed: prettify slack notification message
This converts the body of a ZED slack notification from
plain text to code block style to help with readability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: René Wirnata <rene.wirnata@pandascience.net>
Closes #17610
2025-08-11 09:44:51 -07:00
Rob Norris
2fd145b578
zvol: cleanup error handling and passthrough
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zloop / zloop (push) Has been cancelled
This is trying to get all the uses and non-uses of SET_ERROR correct
(being: only call it if we're the originator of an error _within ZFS_),
and correctly negating errors going to/from the kernel. And/or both.

Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17605
2025-08-08 17:04:01 -07:00
Rob Norris
90a1e13df2 Linux: zfs_sync: remove explicit suspend check
Since zil_commit_flags(NOW) will always return error if the pool is
suspended, there's no need for a separate suspend check here.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:50 -07:00
Rob Norris
ef4058fcdc FreeBSD: zfs_putpage: handle page writeback errors
Page writeback is considered completed when the associated itx callback
completes. A syncing writeback will receive the error in its callback
directly, but an in-flight async writeback that was promoted to sync by
the ZIL may also receive an error.

Writeback errors, even syncing writeback errors, are not especially
serious on their own, because the error will ultimately be returned to
the zil_commit() caller, either zfs_fsync() for an explicit sync op (eg
msync()) or to zfs_putpage() itself for a syncing (VM_PAGER_PUT_SYNC)
writeback.

The only thing we need to do when a page writeback fails is to skip
marking the page clean ("undirty"), since we don't know if it made it to
disk yet. This will ensure that it gets written out again in the future,
either some scheduled async writeback or another explicit syncing call.

On the other side, we need to make sure that if a syncing op arrives,
any changes on dirty pages are written back to the DMU and/or the ZIL
first. We do this by starting an async writeback on the vnode cache
first, so any dirty data has been recorded in the ZIL, ready for the
followup zfs_sync()->zil_commit() to find.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:44 -07:00
Rob Norris
3d6ee9a68c Linux: zfs_putpage: handle page writeback errors
Page writeback is considered completed when the associated itx callback
completes. A syncing writeback will receive the error in its callback
directly, but an in-flight async writeback that was promoted to sync by
the ZIL may also receive an error.

Writeback errors, even syncing writeback errors, are not especially
serious on their own, because the error will ultimately be returned to
the zil_commit() caller, either zfs_fsync() for an explicit sync op (eg
msync()) or to zfs_putpage() itself for a syncing (WB_SYNC_ALL) writeback
(kernel housekeeping or sync_file_range(SYNC_FILE_RANGE_WAIT_AFTER).

The only thing we need to do when a page writeback fails is to re-mark
the page dirty, since we don't know if it made it to disk yet. This will
ensure that it gets written out again in the future, either some
scheduled async writeback or another explicit syncing call.

On the other side, we need to make sure that if a syncing op arrives,
any changes on dirty pages are written back to the DMU and/or the ZIL
first. We do this by starting an _async_ (WB_SYNC_NONE) writeback on the
file mapping at the start of the sync op (fsync(), msync(), etc). An
async op will get an async itx created and logged, ready for the
followup zfs_fsync()->zil_commit() to find, while avoiding a zil_commit()
call for every page in the range.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:38 -07:00
Rob Norris
391e85f519 ZIL: add zil_commit_flags() to make honouring failmode= optional
The vast majority of calls to zil_commit() follow VFS ops, and should
honour the failmode= setting - either wait for sync, or return error.
Some calls however are part of a larger syncing op, and shouldn't ever
block if something goes wrong.

To allow this, we introduce zil_commit_flags(), with a flag
ZIL_COMMIT_FAILMODE to indicate whether or not the pool failmode should
be honoured. zil_commit() is now a wrapper that always sets this flag,
but any caller wanting a different behaviour can request ZIL_COMMIT_NOW
instead to have the call return failure if the pool suspends, regardless
of the failmode= setting.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:33 -07:00
Rob Norris
72602f6ad9 ZIL: "crash" the ZIL if the pool suspends during fallback
If the ZIL runs into trouble, it calls txg_wait_synced(), which blocks
on suspend. We want it to not block on suspend, instead returning an
error. On the surface, this is simple: change all calls to
txg_wait_synced_flags(TXG_WAIT_SUSPEND), and then thread the error
return back to the zil_commit() caller.

Handling suspension means returning an error to all commit waiters. This
is relatively straightforward, as zil_commit_waiter_t already has
zcw_zio_error to hold the write IO error, which signals a fallback to
txg_wait_synced_flags(TXG_WAIT_SUSPEND), which will fail, and so the
waiter can now return an error from zil_commit().

However, commit waiters are normally signalled when their associated
write (LWB) completes. If the pool has suspended, those IOs may not
return for some time, or maybe not at all. We still want to signal those
waiters so they can return from zil_commit(). We have a list of those
in-flight LWBs on zl_lwb_list, so we can run through those, detach them
and signal them. The LWB itself is still in-flight, but no longer has
attached waiters, so when it returns there will be nothing to do.

(As an aside, ITXs can also supply completion callbacks, which are
called when they are destroyed. These are directly connected to LWBs
though, so are passed the error code and destroyed there too).

At this point, all ZIL waiters have been ejected, so we only have to
consider the internal state. We potentially still have ITXs that have
not been committed, LWBs still open, and LWBs in-flight. The on-disk ZIL
is in an unknown state; some writes may have been written but not
returned to us. We really can't rely on any of it; the best thing to do
is abandon it entirely and start over when the pool returns to service.
But, since we may have IO out that won't return until the pool resumes,
we need something for it to return to.

The simplest solution I could find, implemented here, is to "crash" the
ZIL: accept no new ITXs, make no further updates, and let it empty out
on its normal schedule, that is, as txgs complete and zil_sync() and
zil_clean() are called. We set a "restart txg" to three txgs in the
future (syncing + TXG_CONCURRENT_STATES), at which point all the
internal state will have been cleared out, and the ZIL can resume
operation (handled at the top of zil_clean()).

This commit adds zil_crash(), which handles all of the above:
 - sets the restart txg
 - capture and signal all waiters
 - zero the header

zil_crash() is called when txg_wait_synced_flags(TXG_WAIT_SUSPEND)
returns because the pool suspended (ESHUTDOWN).

The rest of the commit is just threading the errors through, and related
housekeeping.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:26 -07:00
Rob Norris
99a5f5d1ba ZIL: pass commit errors back to ITX callbacks
ITX callbacks are used to signal that something can be cleaned up after
a itx is committed. Presently that's only used when syncing out mapped
pages (msync()) to mark dirty pages clean.

This extends the callback interface so it can be passed an error, and
take a different cleanup action if necessary.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:20 -07:00
Rob Norris
967b15b888 ZIL: allow zil_commit() to fail with error
This changes zil_commit() to have an int return, and updates all callers
to check it. There are no corresponding internal changes yet; it will
always return 0.

Since zil_commit() is an indication that the caller _really_ wants the
associated data to be durability stored, I've annotated it with the
__warn_unused_result__ compiler attribute (via __must_check), to emit a
warning if it's ever ussd without doing something with the return code.
I hope this will mean we never misuse it in the future.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:09 -07:00
Rob Norris
1f8c39ddb2 ZTS: test response of various sync methods under different failmodes
These are all the same shape: set up the pool to suspend on first write,
then perform some write+sync operation. The pool should suspend, and the
sync operation should respond according to the failmode= property.

We test fsync(), msync() and two forms of write() (open with O_SYNC, and
async with sync=always), which all take slightly different paths to
zil_commit() and back.

A helper function is included to do the write+sync sequence with mmap()
and msync(), since I didn't find a convenient tool to do that.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:42:35 -07:00
Rob Norris
b270663e8a
linux/zvol_os: fix crash with blk-mq on Linux 4.19
Some checks are pending
checkstyle / checkstyle (push) Waiting to run
CodeQL / Analyze (cpp) (push) Waiting to run
CodeQL / Analyze (python) (push) Waiting to run
zloop / zloop (push) Waiting to run
03987f71e3 (#16069) added a workaround to get the blk-mq hardware
context for older kernels that don't cache it in the struct request.
However, this workaround appears to be incomplete.

In 4.19, the rq data context is optional. If its not initialised, then
the cached rq->cpu will be -1, and so using it to index into mq_map
causes a crash.

Given that the upstream 4.19 is now in extended LTS and rarely seen,
RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached
hardware context has been available since 5.1, I'm not going to huge
lengths to get queue selection correct for the very few people that are
likely to feel it. To that end, we simply call raw_smp_processor_id() to
get a valid CPU id and use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17597
2025-08-08 09:39:14 -07:00
Rob Norris
82d6f7b047 Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL)
Some checks are pending
checkstyle / checkstyle (push) Waiting to run
CodeQL / Analyze (cpp) (push) Waiting to run
CodeQL / Analyze (python) (push) Waiting to run
zloop / zloop (push) Waiting to run
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17591
2025-08-07 11:41:42 -07:00
Rob Norris
f7bdd84328 Prefer VERIFY0P(n) over VERIFY(n == NULL)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17591
2025-08-07 11:41:37 -07:00
Rob Norris
611b95da18 Prefer VERIFY0(n) over VERIFY3S(n, ==, 0)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17591
2025-08-07 11:41:32 -07:00
Rob Norris
5c7df3bcac Prefer VERIFY0(n) over VERIFY3U(n, ==, 0)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17591
2025-08-07 11:41:25 -07:00
Rob Norris
c39e076f23 Prefer VERIFY0(n) over VERIFY(n == 0)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17591
2025-08-07 11:40:59 -07:00
Todd Zullinger
2564308cb2
rpm: don't list /sbin/zgenhostid twice in %files
The location of zgenhostid was changed in 0ae733c7a (Install zgenhostid
to sbindir, 2021-01-21).  We include all files within sbindir two lines
earlier, which causes rpmbuild to report:

    File listed twice: /sbin/zgenhostid

Drop the redundant entry from the %files section.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Closes #17601
2025-08-07 11:39:56 -07:00
Rob Norris
e44e51f28d zvol_task_report_status: gate behind ZFS_DEBUG
dprintf() is a no-op in production builds, giving a compile warning. So,
refactor it a little to keep all the strings inside the function, and
then make the function a no-op when ZFS_DEBUG is not set.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Closes #17596
2025-08-07 11:36:15 -07:00
Rob Norris
e6eb03a991 zvol_check_volblocksize: fix spa ref leak
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Closes #17596
2025-08-07 11:36:09 -07:00
Rob Norris
3e671f2353 zvol: remove void return casts on void-returning functions
Casting unused returns to (void) is already of dubious value, but it's
entirely meaningless on functions that are defined as void return.
Remove the clutter.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Closes #17596
2025-08-07 11:34:20 -07:00
Mariusz Zaborski
0c376d0f59
Document the new '-a' zpool option
Some checks are pending
checkstyle / checkstyle (push) Waiting to run
CodeQL / Analyze (cpp) (push) Waiting to run
CodeQL / Analyze (python) (push) Waiting to run
zloop / zloop (push) Waiting to run
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Closes #17585
2025-08-06 17:11:47 -07:00
Alek P
3e004369f7
Removed unused zio_decompress_fail_fraction variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alek Pinchuk <alek.pinchuk@connectwise.com>
Closes #17599
2025-08-06 17:10:03 -07:00
Attila Fülöp
25930cb8a1 config: Avoid void main() in toolchain-simd.m4
Some checks are pending
checkstyle / checkstyle (push) Waiting to run
CodeQL / Analyze (cpp) (push) Waiting to run
CodeQL / Analyze (python) (push) Waiting to run
zloop / zloop (push) Waiting to run
Be standard-compliant by using `int main()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13303
Closes #17590
2025-08-06 14:35:37 -07:00