Commit Graph

1427 Commits

Author SHA1 Message Date
George Amanakis
7784947923 optimize recv_fix_encryption_hierarchy()
recv_fix_encryption_hierarchy() in its present state goes through all
stream filesystems, and for each one traverses the snapshots in order to
find one that exists locally. This happens by calling guid_to_name() for
each snapshot, which iterates through all children of the filesystem.
This results in CPU utilization of 100% for several minutes (for ~1000
filesystems on a Ryzen 4350G) for 1 thread at the end of a raw receive
(-w, regardless whether encrypted or not, dryrun or not).

Fix this by following a different logic: using the top_fs name, call
gather_nvlist() to gather the nvlists for all local filesystems. For
each one filesystem, go through the snapshots to find the corresponding
stream's filesystem (since we know the snapshots guid and can search
with it in stream_avl for the stream's fs). Then go on to fix the
encryption roots and locations as in its present state.

Avoiding guid_to_name() iteratively makes
recv_fix_encryption_hierarchy() significantly faster (from several
minutes to seconds for ~1000 filesystems on a Ryzen 4350G).

Another problem is the following: in case we have promoted a clone of
the filesystem outside the top filesystem specified in zfs send, zfs
receive does not fail but returns an error:
recv_incremental_replication() fails to find its origin and errors out
with needagain=1. This results in recv_fix_hierarchy() not being called
which may render some children of the top fs not mountable since their
encryption root was not updated. To circumvent this make
recv_incremental_replication() silently ignore this error.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16929
2025-02-28 00:42:29 +05:00
Jerzy Kołosowski
1aa4351c1f Add recursive dataset mounting and unmounting support to pam_zfs_key (#16857)
Introduced functionality to recursively mount datasets with a new
config option `mount_recursively`. Adjusted existing functions to
handle the recursive behavior and added tests to validate the feature.
This enhances support for managing hierarchical ZFS datasets within
a PAM context.

Signed-off-by: Jerzy Kołosowski <jerzy@kolosowscy.pl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-28 00:42:29 +05:00
Brian Atkinson
0e21e473a7 Update pin_user_pages() calls for Direct I/O
Originally #16856 updated Linux Direct I/O requests to use the new
pin_user_pages API. However, it was an oversight that this PR only
handled iov_iter's of type ITER_IOVEC and ITER_UBUF. Other iov_iter
types may try and use the pin_user_pages API if it is available. This
can lead to panics as the iov_iter is not being iterated over correctly
in zfs_uio_pin_user_pages().

Unfortunately, generic iov_iter API's that call pin_user_page_fast() are
protected as GPL only. Rather than update zfs_uio_pin_user_pages() to
account for all iov_iter types, we can simply just call
zfs_uio_get_dio_page_iov_iter() if the iov_iter type is not ITER_IOVEC
or ITER_UBUF. zfs_uio_get_dio_page_iov_iter() calls the
iov_iter_get_pages() calls that can handle any iov_iter type.

In the future it might be worth using the exposed iov_iter iterator
functions that are included in the header iov_iter.h since v6.7. These
functions allow for any iov_iter type to be iterated over and advanced
while applying a step function during iteration. This could possibly be
leveraged in zfs_uio_pin_user_pages().

A new ZFS test case was added to test that a ITER_BVEC is handled
correctly using this new code path. This test case was provided though
issue #16956.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16956 
Closes #17006
2025-02-25 22:33:25 +05:00
Rob Norris
a28f5a94f4 zinject: add "probe" device injection type
Injecting a device probe failure is not possible by matching IO types,
because probe IO goes to the label regions, which is explicitly excluded
from injection. Even if it were possible, it would be awkward to do,
because a probe is sequence of reads and writes.

This commit adds a new IO "type" to match for injection, which looks for
the ZIO_FLAG_PROBE flag instead. Any probe IO will be match the
injection record and recieve the wanted error.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16947
2025-02-25 22:29:33 +05:00
Rob Norris
cf55fdea24 ZTS: remove get_arcstat
It's now a simple wrapper, so lets just call kstat direct.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-25 22:28:46 +05:00
Rob Norris
6edbbe0646 ZTS: update existing kstat users to new helper
Removes other custom helpers and direct accesses to /proc.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-25 22:28:27 +05:00
Rob Norris
198621f910 ZTS: reimplement kstat helper function
The old kstat helper function was barely used, I suspect in part because
it was very limited in the kinds of kstats it could gather.

This adds new functions to replace it, for each kind of thing that can
have stats: global, pool and dataset. There's options in there to get a
single stat value, or all values within a group.

Most importantly, the interface is the same for both platforms.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-25 22:28:14 +05:00
Rob Norris
fabdd502f4 zinject: count matches and injections for each handler
When building tests with zinject, it can be quite difficult to work out
if you're producing the right kind of IO to match the rules you've set
up.

So, here we extend injection records to count the number of times a
handler matched the operation, and how often an error was actually
injected (ie after frequency and other exclusions are applied).

Then, display those counts in the `zinject` output.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16938
2025-02-25 22:25:24 +05:00
Rob Norris
54eec0fa59 ZTS: remove empty zpool_add--allow-ashift-mismatch test
Added in b1e46f869, but empty, so no point keeping it around.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16931
2025-02-25 22:23:44 +05:00
Robert Evans
9f1c5e0b10 Remove duplicate dedup_legacy_create in common.run
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16926
2025-01-05 17:31:26 -08:00
Don Brady
25565403aa Too many vdev probe errors should suspend pool
Similar to what we saw in #16569, we need to consider that a
replacing vdev should not be considered as fully contributing
to the redundancy of a raidz vdev even though current IO has
enough redundancy.

When a failed vdev_probe() is faulting a disk, it now checks
if that disk is required, and if so it suspends the pool until
the admin can return the missing disks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16864
2025-01-04 11:58:15 -08:00
Toomas Soome
125731436d ZTS: checkpoint_discard_busy should use save_tunable/restore_tunable
Instead of using hardwired value for SPA_DISCARD_MEMORY_LIMIT,
use save_tunable and restore_tunable to restore the pre-test state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16919
2025-01-03 15:23:49 -08:00
Toomas Soome
e47b033eae ZTS: remove unused TESTDIRS from pam/cleanup.ksh
Remove TESTDIRS as it is not set for pam tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16920
2025-01-03 15:23:49 -08:00
Toomas Soome
997db7a7fc ZTS: checkpoint_discard_busy does not set 16M on cleanup
Originally hex value is used as decimal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16917
2025-01-02 17:04:10 -08:00
Toomas Soome
e411081aa0 ZTS: functional/mount scripts are not removing /var/tmp/testdir.X dirs
cleanup.ksh is assuming we have TESTDIRS set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16915
2025-01-02 17:04:10 -08:00
Toomas Soome
a55b6fe94a ZTS: zfs_mount_all_fail leaves /var/tmp/testrootPIDNUM directory around
Before we can remove test files, we need to unmount datasets
used by test first.

See also: zfs_mount_all_mountpoints.ksh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16914
2025-01-02 17:04:10 -08:00
Alexander Motin
0fea7fc109 ZTS: Reduce file size in redacted_panic to 1GB
This test takes 3 minutes on RELEASE FreeBSD bots, but on CURRENT,
probably due to debugging it has in kernel, it does not complete
within 10 minutes, ending up killed.  As I see all the redacting
here happens within the first ~128MB of the file, so I hope it
won't matter if there is 1GB of data instead of 2GB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin
0f6d955a35 ZTS: Remove procfs use from zpool_import_status
procfs might be not mounted on FreeBSD.  Plus checking for specific
PID might be not exactly reliable.  Check for empty list of jobs
instead.

Premature loop exit can result in failed test and failed cleanup,
failing also some following tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin
c3d2412b05 ZTS: Remove non-standard awk hex numbers usage
FreeBSD recently removed non-standard hex numbers support from awk.
Neither it supports -n argument, enabling it in gawk.  Instead of
depending on those rewrite list_file_blocks() fuction to handle the
hex math in shell instead of awk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin
30b97ce218 ZTS: Increase write sizes for RAIDZ/dRAID tests
Many RAIDZ/dRAID tests filled files doing millions of 100 or even
10 byte writes.  It makes very little sense since we are not
micro-benchmarking syscalls or VFS layer here, while before the
blocks reach the vdev layer absolute majority of the small writes
will be aggregated.  In some cases I see we spend almost as much
time creating the test files as actually running the tests.  And
sometimes the tests even time out after that.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16905
2024-12-29 11:53:45 -08:00
Alexander Motin
f54052a122 Fix false assertion in dmu_tx_dirty_buf() on cloning
Same as writes block cloning can increase block size and number of
indirection levels.  That means it can dirty block 0 at level 0 or
at new top indirection level without explicitly holding them.

A block cloning test case for large offsets has been added.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16825
2024-12-05 11:49:06 -08:00
Mariusz Zaborski
3b0c1131ef Add ability to scrub from last scrubbed txg
Some users might want to scrub only new data because they would like
to know if the new write wasn't corrupted.  This PR adds possibility
scrub only newly written data.

This introduces new `last_scrubbed_txg` property, indicating the
transaction group (TXG) up to which the most recent scrub operation
has checked and repaired the dataset, so users can run scrub only
from the last saved point. We use a scn_max_txg and scn_min_txg
which are already built into scrub, to accomplish that.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Sponsored-By: Klara Inc.
Closes #16301
2024-12-05 09:33:21 -08:00
Rob Norris
1aee375947 zdb: show dedup table and log attributes
There's interesting info in there that is going to help with
understanding dedup behavior at any given moment.

Since this is a format change, tests that rely on that output have been
modified to match.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16755
2024-12-02 18:14:26 -08:00
Alexander Motin
9753feaa63 ZTS: Avoid embedded blocks in bclone/bclone_prop_sync
If we write less than 113 bytes with enabled compression we get
embeded block, which then fails check for number of cloned blocks
in bclone_test.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Brian Behlendorf
7fb7eb9a63 ZTS: Fix zpool_status_008_pos false positive
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment.  Additionally, clear the fault injection as
soon as it is no longer required for the test case.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16769
2024-11-21 08:24:37 -08:00
José Luis Salvador Rufo
260065099e tests: fix uClibc for getversion.c
This patch fixes compilation with uClibc by applying the same fallback
as commit e12d76176d to the `getversion.c`
file, which was previously overlooked.
 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #16735
Closes #16741
2024-11-14 16:56:34 -08:00
Sam James
e12d76176d Use <fcntl.h> instead of <sys/fcntl.h>
When building on musl, we get:

```
In file included from tests/zfs-tests/cmd/getversion.c:22:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>

In file included from module/os/linux/zfs/vdev_file.c:36:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
```

Bug: https://bugs.gentoo.org/925235
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sam James <sam@gentoo.org>
Closes #15925
2024-11-07 11:33:51 -08:00
Tony Hutter
d367ef2995 ZTS: Add Fedora 41, remove Fedora 39
Fedora 41 was released 10/29/24, and Fedora 39 will be EOL on 11/12/24.
Update Fedora runners in the test suite.  Some minor tweaks also needed
to support ksh 1.0.10.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16700
2024-11-01 10:03:05 -07:00
Rob Norris
19a8dd48e1 vdev_disk: try harder to ensure IO alignment rules
It seems out our notion of "properly" aligned IO was incomplete. In
particular, dm-crypt does its own splitting, and assumes that a logical
block will never cross an order-0 page boundary (ie, the physical page
size, not compound size). This effectively means that it needs to be
possible to split a BIO at any page or block size boundary and have it
work correctly.

This updates the alignment check function to enforce these rules (to the
extent possible).

Our response to misaligned data is to make some new allocation that is
properly aligned, and copy the data into it. It turns out that
linearising (via abd_borrow_buf()) is not enough, because we allocate eg
4K blocks from a general purpose slab, and so may receive (or already
have) a 4K block that crosses pages.

So instead, we allocate a new ABD, which is guaranteed to be aligned
properly to block sizes, and then copy everything into it, and back out
on the way back.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687 #16631 #15646 #15533 #14533
2024-11-01 09:49:14 -07:00
Tony Hutter
f7b4bca66a ZTS: Add LUKS sanity test
Add a LUKS sanity test to trigger: #16631

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16681
2024-11-01 09:48:47 -07:00
Brian Behlendorf
5237760b17 ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16670
2024-10-21 13:02:07 -07:00
Brian Behlendorf
c645b07eaa ZTS: Increase zpool_import_parallel_pos import margin
Increase the pool import time allowed by assuming a minimum reduction
to 1/2 instead of 1/3 when comparing sequential to parallel import
times.  This is sufficient to verify parallel imports are working as
intended and should address the occasional false positive failure
when the time is slightly exceeded.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16638
2024-10-11 16:21:25 -07:00
Brian Behlendorf
5bc27acf51 ZTS: Slightly increase dedup_quota limit
As described in the comment above this check the space used by
logged entries is not accounted for and some margin needs to be
added in.  While uncommon we have slightly exceeded the 600,000
threshold on some CI run so we increase the limit a bit more.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16637
2024-10-11 16:21:25 -07:00
Rob Norris
666903610d zpool/zfs: allow --json wherever -j is allowed
Mostly so that with the JSON formatting options are also used, they all
look the same. To my eye, `-j --json-flat-vdevs` suggests that they are
different or unrelated, while `--json --json-flat-vdevs` invites no
further questions.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16632
2024-10-11 16:21:25 -07:00
Brian Atkinson
26ecd8b993 Always validate checksums for Direct I/O reads
This fixes an oversight in the Direct I/O PR. There is nothing that
stops a process from manipulating the contents of a buffer for a
Direct I/O read while the I/O is in flight. This can lead checksum
verify failures. However, the disk contents are still correct, and this
would lead to false reporting of checksum validation failures.

To remedy this, all Direct I/O reads that have a checksum verification
failure are treated as suspicious. In the event a checksum validation
failure occurs for a Direct I/O read, then the I/O request will be
reissued though the ARC. This allows for actual validation to happen and
removes any possibility of the buffer being manipulated after the I/O
has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O
read checksum validation failures are reported though zpool status -d in
the DIO column. Also the zevent has been updated to have both:
1. dio_verify_wr -> Checksum verification failure for writes
2. dio_verify_rd -> Checksum verification failure for reads.
This allows for determining what I/O operation was the culprit for the
checksum verification failure. All DIO errors are reported only on the
top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has
the same issue as Linux with Direct I/O reads.

This commit updates the following:
1. Propogates checksum failures for reads all the way up to the
   top-level VDEV.
2. Reports errors through zpool status -d as DIO.
3. Has two zevents for checksum verify errors with Direct I/O. One for
   read and one for write.
4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and
   handle ABD buffer contents validation the same as Linux.
5. Updated manipulate_user_buffer.c to also manipulate a buffer while a
   Direct I/O read is taking place.
6. Adds a new ZTS test case dio_read_verify that stress tests the new
   code.
7. Updated man pages.
8. Added an IMPLY statement to zio_checksum_verify() to make sure that
   Direct I/O reads are not issued as speculative.
9. Removed self healing through mirror, raidz, and dRAID VDEVs for
   Direct I/O reads.

This issue was first observed when installing a Windows 11 VM on a ZFS
dataset with the dataset property direct set to always. The zpool
devices would report checksum failures, but running a subsequent zpool
scrub would not repair any data and report no errors.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16598
2024-10-09 13:45:06 -07:00
Brian Behlendorf
10f46d2aba ZTS: resilver_restart_001.ksh restore defaults
Update resilver_restart_001.ksh to restore the default
resilver_defer_percent when the test completes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16618
2024-10-09 13:44:50 -07:00
JKDingwall
1ebb6b866f Fix generation of kernel uevents for snapshot rename on linux
`zvol_rename_minors()` needs to be given the full path not just the
snapshot name.  Use code removed in a0bd735ad as a guide
to providing the necessary values.

Add ZTS check for /dev changes after snapshot rename.  After
renaming a snapshot with 'snapdev=visible' ensure that the /dev
entries are updated to reflect the rename.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Dingwall <james@dingwall.me.uk>
Closes #14223 
Closes #16600
2024-10-09 13:44:22 -07:00
Pavel Snajdr
0d77e738e6
Defer resilver only when progress is above a threshold
Restart a resilver from scratch, if the current one in progress is
below a new tunable, zfs_resilver_defer_percent (defaulting to 10%).

The original rationale for deferring additional resilvers, when there is
already one in progress, was to help achieving data redundancy sooner
for the data that gets scanned at the end of the resilver.

But in case the admin wants to attach multiple disks to a single vdev,
it wasn't immediately obvious the admin is supposed to run
`zpool resilver` afterwards to reset the deferred resilvers and start
a new one from scratch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #15810
2024-10-04 10:41:17 -07:00
Rob Norris
224393a321
feature: large_microzap
In a4b21eadec we added the zap_micro_max_size tuneable to raise the size
at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block)
ZAPs. Before this, a microZAP was limited to 128KiB, which was the old
largest block size. The side effect of raising the max size past 128KiB
is that it be stored in a large block, requiring the large_blocks
feature.

Unfortunately, this means that a backup stream created without the
--large-block (-L) flag to zfs send would split the microZAP block into
smaller blocks and send those, as is normal behaviour for large blocks.
This would be received correctly, but since microZAPs are limited to the
first block in the object by definition, the entries in the later blocks
would be inaccessible. For directory ZAPs, this gives the appearance of
files being lost.

This commit adds a feature flag, large_microzap, that must be enabled
for microZAPs to grow beyond 128KiB, and which will be activated the
first time that occurs. This feature is later checked when generating
the stream and if active, the send operation will abort unless
--large-block has also been requested.

Changing the limit still requires zap_micro_max_size to be changed. The
state of this flag effectively sets the upper value for this tuneable,
that is, if the feature is disabled, the tuneable will be clamped to
128KiB.

A stream flag is also added to ensure that the receiver also activates
its own feature flag upon receiving the stream. This is not strictly
necessary to _use_ the received microZAP, since it doesn't care how
large its block is, but it is required to send the microZAP object on,
otherwise the original problem occurs again.

Because it's difficult to reliably distinguish a microZAP from a fatZAP
from outside the ZAP code, and because it seems unlikely that most
users are affected (a fairly niche tuneable combined with what should be
an uncommon use of send), and for the sake of expediency, this change
activates the feature the first time a microZAP grows to use a large
block, and is never deactivated after that. This can be improved in the
future.

This commit changes nothing for existing pools that already have large
microZAPs. The feature will not be retroactively applied, but will be
activated the next time a microZAP grows past the limit.

Don't use large_blocks feature for enable/disable tests.  The
large_microzap depends on large_blocks, so it gets enabled as a
dependency, breaking the test. Instead use feature "longname", which has
the exact same feature characteristics.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16593
2024-10-02 20:47:11 -07:00
Brian Behlendorf
d34d4f97a8
snapdir: add 'disabled' value to make .zfs inaccessible
In some environments, just making the .zfs control dir hidden from sight
might not be enough. In particular, the following scenarios might
warrant not allowing access at all:
- old snapshots with wrong permissions/ownership
- old snapshots with exploitable setuid/setgid binaries
- old snapshots with sensitive contents

Introducing a new 'disabled' value that not only hides the control dir,
but prevents access to its contents by returning ENOENT solves all of
the above.

The new property value takes advantage of 'iuv' semantics ("ignore
unknown value") to automatically fall back to the old default value when
a pool is accessed by an older version of ZFS that doesn't yet know
about 'disabled' semantics.

I think that technically the zfs_dirlook change is enough to prevent
access, but preventing lookups and dir entries in an already opened .zfs
handle might also be a good idea to prevent races when modifying the
property at runtime.

Add zfs_snapshot_no_setuid parameter to control whether automatically
mounted snapshots have the setuid mount option set or not.

this could be considered a partial fix for one of the scenarios
mentioned in desired.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Co-authored-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #3963
Closes #16587
2024-10-02 09:12:02 -07:00
Sanjeev Bagewadi
20232ecfaa Support for longnames for files/directories (Linux part)
This patch adds the ability for zfs to support file/dir name up to 1023
bytes. This number is chosen so we can support up to 255 4-byte
characters. This new feature is represented by the new feature flag
feature@longname.

A new dataset property "longname" is also introduced to toggle longname
support for each dataset individually. This property can be disabled,
even if it contains longname files. In such case, new file cannot be
created with longname but existing longname files can still be looked
up.

Note that, to my knowledge native Linux filesystems don't support name
longer than 255 bytes. So there might be programs not able to work with
longname.

Note that NFS server may needs to use exportfs_get_name to reconnect
dentries, and the buffer being passed is limit to NAME_MAX+1 (256). So
NFS may not work when longname is enabled.

Note, FreeBSD vfs layer imposes a limit of 255 name lengh, so even
though we add code to support it here, it won't actually work.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15921
2024-10-01 13:40:27 -07:00
Don Brady
141368a4b6
Restrict raidz faulted vdev count
Specifically, a child in a replacing vdev won't count when assessing
the dtl during a vdev_fault()

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16569
2024-10-01 09:12:11 -07:00
Tino Reichardt
b1958b531b
ZTS: Replace MD5 and SHA256 wit XXH128
For data integrity checks as done in ZTS, the verification for
unintended data corruption with xxhash128 should be a lot faster
and perfectly usable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16577
2024-09-28 09:24:05 -07:00
Brian Behlendorf
834b90fb81
ZTS: Fix zpool_import_hostid_changed_unclean_export
Update the test case to freeze the pool then export it to better
simulate a hard failure.  This is preferable to copying the vdev
while the pool's imported since with a copy we're not guaranteed
the on-disk state will be consistent.  That can in turn result
in a pool import failure and a spurious test failure.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16578
2024-09-28 09:18:21 -07:00
Brian Behlendorf
ac8837d4d6
ZTS: Update deadman_sync threshold
Lower the minimum number of expected deadman events from 4 to 3.  All
that is strictly required is a single event to consider the test a
pass.  However, since I've never seen a count of less than 3 reported
by the CI that should be sufficient.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16575
2024-09-27 09:14:13 -07:00
Brian Behlendorf
ab1b87e747
ZTS: Fix zpool_import_hostid_changed_cachefile_unclean_export
Update the test case to freeze the pool then export it to better
simulate a hard failure.  This is preferable to copying the vdev
while the pool's imported since with a copy we're not guaranteed
the on-disk state will be consistent.  That can in turn result
in a pool import failure and a spurious test failure.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16570
2024-09-26 09:58:11 -07:00
Brian Behlendorf
b2ca5105b7
ZTS: Fix zpool_reguid Makefile.am and tests
The zpool_reguid tests were not being included the dist tarball
resulting in them not running.  This is reported as a "failed
verification" warning by the CI.  Add the tests to the correct
Makefile.am.

Additionally, remove the usage of 'bc -e <expr>' from the tests.
This option is only supported by the FreeBSD version of bc.

Update the test case to reflect the 0 is not a valid GUID.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16559
2024-09-25 07:50:04 -07:00
George Melikov
e419a63bf4
xattr dataset prop: change defaults to sa
It's the main recommendation to set xattr=sa
even in man pages, so let's set it by default.

xattr=sa don't use feature flag, so in the worst
case we'll have non-readable xattrs by other
non-openzfs platforms.

Non-overridden default `xattr` prop of existing pools
will automatically use `sa` after this commit too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #15147
2024-09-23 09:50:48 -07:00
Brian Behlendorf
d565835c47 ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16553
2024-09-22 13:04:17 -07:00
Brian Behlendorf
5520fdde29 ZTS: Retire "tmpfile_reason" exception
All supported Linux kernels, 4.18 and newer, provide O_TMPFILE.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16553
2024-09-22 13:04:06 -07:00