Compare commits

...

56 Commits

Author SHA1 Message Date
Brian Behlendorf
1a54b13aaf Tag 2.3.0-rc3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-11-07 11:33:51 -08:00
Umer Saleem
9061a4da0b JSON: fix user properties output for zfs list
This commit fixes JSON output for zfs list when user properties are
requested with -o flag. This case needed to be handled specifically
since zfs_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16732
2024-11-07 11:33:51 -08:00
Sam James
e12d76176d Use <fcntl.h> instead of <sys/fcntl.h>
When building on musl, we get:

```
In file included from tests/zfs-tests/cmd/getversion.c:22:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>

In file included from module/os/linux/zfs/vdev_file.c:36:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
```

Bug: https://bugs.gentoo.org/925235
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sam James <sam@gentoo.org>
Closes #15925
2024-11-07 11:33:51 -08:00
Brian Atkinson
8131793d6f Update ABD stats for linear page Linux
a10e552 updated abd_free_linear_page() to no longer call
abd_update_scatter_stat(). This meant that linear pages that were not
attached to Direct I/O requests were not doing waste accounting for the
ARC. This led to performance issues due to incorrect ARC accounting that
resulted in 100% of CPU time being spent in arc_evict() during prolonged
I/O workloads with the ARC.

The call to abd_update_scatter_stats() is now conditionally called in
abd_free_linear_page() when the ABD is not from a Direct I/O request.

Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16729
2024-11-07 11:33:51 -08:00
Chunwei Chen
c82eb27b22 ZFS send should use spill block prefetched from send_reader_thread
Currently, even though send_reader_thread prefetches spill block,
do_dump() will not use it and issues its own blocking arc_read. This
causes significant performance degradation when sending datasets with
lots of spill blocks.

For unmodified spill blocks, we also create send_range struct for them
in send_reader_thread and issue prefetches for them. We piggyback them
on the dnode send_range instead of enqueueing them so we don't break
send_range_after check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: david.chen <david.chen@nutanix.com>
Closes #16701
2024-11-06 11:54:32 -08:00
tstabrawa
661bb434e6 Use simple folio migration function
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
tstabrawa
ae48c2f6a9 Revert "Avoid BUG in migrate_folio_extra"
This reverts commit b052035990.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
Uglymotha
b96845b632 Verify parent_dev before calling udev_device_get_sysattr_value
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Sietse <sietse@wizdom.nu>
Co-authored-by: Sietse <sietse@wizdom.nu>
Closes #16705 
Closes #16717
2024-11-04 16:46:39 -08:00
Alexander Motin
55cbd1f9bd Reduce dirty records memory usage
Small block workloads may use a very large number of dirty records.
During simple block cloning test due to BRT still using 4KB blocks
I can easily see up to 2.5M of those used.  Before this change
dbuf_dirty_record_t structures representing them were allocated via
kmem_zalloc(), that rounded their size up to 512 bytes.

Introduction of specialized kmem cache allows to reduce the size
from 512 to 408 bytes.  Additionally, since override and raw params
in dirty records are mutually exclusive, puting them into a union
allows to reduce structure size down to 368 bytes, increasing the
saving to 28%, that can be a 0.5GB or more of RAM.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16694
2024-11-04 16:46:39 -08:00
Rob Norris
880b73956b zfs(4): remove "experimental" from zfs_bclone_enabled
I think we've done enough experiments.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16189 
Closes #16712
2024-11-04 16:46:39 -08:00
Tony Hutter
d367ef2995 ZTS: Add Fedora 41, remove Fedora 39
Fedora 41 was released 10/29/24, and Fedora 39 will be EOL on 11/12/24.
Update Fedora runners in the test suite.  Some minor tweaks also needed
to support ksh 1.0.10.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16700
2024-11-01 10:03:05 -07:00
Rob Norris
7546fbd6e9 zdb: add extra -T flag to show histograms of BRT refcounts
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16692
2024-11-01 09:49:54 -07:00
Rich Ercolani
903d3f9187 Added output to zpool online and offline
I was surprised to discover today that `zpool online` and
`zpool offline` don't print any information about why they failed in
many cases, they just return 1 with no information about why.

Let's improve that where we can without changing the library function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #16244
2024-11-01 09:49:49 -07:00
Rob Norris
86b5853cfb vdev_disk: move abd return and free off the interrupt handler
Freeing an ABD can take sleeping locks to update various stats. We
aren't allowed to sleep on an interrupt handler. So, move the free off
to the io_done callback.

We should never have been freeing things in the interrupt handler, but
we got away with it because we were usually freeing a linear ABD, which
at most is returning two objects to a cache and never sleeping. Scatter
ABDs can be used now, and those have more complex locking.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687
2024-11-01 09:49:23 -07:00
Rob Norris
19a8dd48e1 vdev_disk: try harder to ensure IO alignment rules
It seems out our notion of "properly" aligned IO was incomplete. In
particular, dm-crypt does its own splitting, and assumes that a logical
block will never cross an order-0 page boundary (ie, the physical page
size, not compound size). This effectively means that it needs to be
possible to split a BIO at any page or block size boundary and have it
work correctly.

This updates the alignment check function to enforce these rules (to the
extent possible).

Our response to misaligned data is to make some new allocation that is
properly aligned, and copy the data into it. It turns out that
linearising (via abd_borrow_buf()) is not enough, because we allocate eg
4K blocks from a general purpose slab, and so may receive (or already
have) a 4K block that crosses pages.

So instead, we allocate a new ABD, which is guaranteed to be aligned
properly to block sizes, and then copy everything into it, and back out
on the way back.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687 #16631 #15646 #15533 #14533
2024-11-01 09:49:14 -07:00
Serapheim Dimitropoulos
8ac70aade7 Add warning for external consumers of dmu_tx_callback_register
While reading some code @grwilson came across the above function that
seemingly had no consumers besides a ztest callback that ensures that
the tx_callback infrastructure works correctly. It turns out that Lustre
is the main (and potentially the only) consumer of this. Refer to
`osd_trans_commit_cb` of `lustre/osd-zfs/osd_handler.c` in the Lustre
repo for more info. Let's add a comment highlighting this before someone
removes it by mistake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheimd@gmail.com>
Closes #16698
2024-11-01 09:49:05 -07:00
Alexander Motin
bbc0d34bfd On the first vdev open ignore impossible ashift hints
If on the first open device's logical ashift is bigger than set
by pool's ashift property, ignore the last as unusable instead of
creating vdev that will fail most of I/Os due to misalignment.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16690
2024-11-01 09:49:00 -07:00
Dimitry Andric
f3823a9ab2 Fix gcc uninitialized warning in FreeBSD zio_crypt.c
In FreeBSD's `zio_do_crypt_data()`, ensure that two `struct uio`
variables are cleared before copying data out of them. This avoids
accessing garbage data, and fixes gcc `-Wuninitialized` warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16688
2024-11-01 09:48:55 -07:00
Dimitry Andric
fd2cae969f Fix gcc unused value warning in FreeBSD simd.h
The macros `simd_stat_init()` and `simd_stat_fini()` in FreeBSD's
`simd.h` are defined as zero, but they are actually only used as
statements. Replace the definitions with `do {} while (0)` instead, to
avoid gcc `-Wunused-value` warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16693
2024-11-01 09:48:51 -07:00
Tony Hutter
f7b4bca66a ZTS: Add LUKS sanity test
Add a LUKS sanity test to trigger: #16631

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16681
2024-11-01 09:48:47 -07:00
Alexander Motin
7e3ce4efaa Pack dmu_buf_impl_t by 16 bytes
On 64bit FreeBSD this reduces one from 296 to 280 bytes.  On small
block workloads dbufs may consume gigabytes of ARC, and this saves
5% of it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16684
2024-11-01 09:48:42 -07:00
Tino Reichardt
77d81974b6 Fix dependency install on Debian 11 (#16683)
Adding cryptsetup breaks some dialog things on Debian 11.
Apply some workaround for it.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-11-01 09:48:37 -07:00
Brian Behlendorf
5237760b17 ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16670
2024-10-21 13:02:07 -07:00
Rob Norris
ede715d1e4 spl/thread: explicitly define thread_func_t as noreturn
All of our thread entry functions have this signature:

    void (*)(void*) __attribute__((noreturn))

The low-level `__thread_create()` function accepts a `thread_func_t` as
the entry point, which is defined more simply as:

    void (*)(void *)

And then the `thread_create()` and `thread_create_named()` macros cast
the passed-in function point down to `thread_func_t`, that is, casting
away the `noreturn` attribute.

Clang considers casting between these two types to be invalid because
both the caller and the callee may have elided parts of the stack frame
save and restore, knowing that they won't be needed.

Recent Linux appears to be setting `-Wcast-function-type-strict`, which
causes this invalid cast to emit a warning, which with `-Werror` is
converted to an error, breaking the build.

This commit fixes this in the simplest possible way: adding `noreturn`
to the `thread_func_t` attribute. Since all our thread entry functions
already have this attribute, it's arguably a just a consistency fix
anyway.

I considered removing the casts in the macros, which silences the
warnings, but it turns out that Clang has a bug that won't emit this
error for implicit conversions, only explicit casts. So leaving them
there seems like a reasonable belt-and-suspenders approach. Also,
frankly, this whole mechanism seems a little undercooked inside LLVM, so
I'm content go with my intuition about the smallest, least invaisve
change.

**NOTE**: `__thread_create` is exported by `spl.ko` and has a
`thread_func_t` arg, so this is an ABI break. Whether that matters in
practice, I have no idea.

Further reading:
- 1aad641c79
- https://github.com/llvm/llvm-project/issues/7325
- https://github.com/llvm/llvm-project/issues/41465

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16672 
Closes #16673
2024-10-21 13:02:07 -07:00
Rob Norris
e30c69365d config: fix dequeue_signal check for kernels <4.20
Before 4.20, kernel_siginfo_t was just called siginfo_t. This was
causing the kthread_dequeue_signal_3arg_task check, which uses
kernel_siginfo_t, to fail on older kernels.

In d6b8c17f1, we started checking for the "new" three-arg
dequeue_signal() by testing for the "old" version. Because that test is
explicitly using kernel_siginfo_t, it would fail, leading to the build
trying to use the new three-arg version, which would then not compile.

This commit fixes that by avoiding checking for the old 3-arg
dequeue_signal entirely. Instead, we check for the new one, as well as
the 4-arg form, and we use the old form as a fallback. This way, we
never have to test for it explicitly, and once we're building
HAVE_SIGINFO will make sure we get the right kernel_siginfo_t for it, so
everything works out nice.

Original-patch-by: Finix <yancw@info2soft.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16666
2024-10-21 13:02:07 -07:00
Rob Norris
78d39d91fa zdb: show bp in uberblock dump
Just another useful nugget of info in times of strife.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16667
2024-10-21 13:02:07 -07:00
Tomohiro Kusumi
2b359c7824 Fix compile-time warnings caused by duplicate struct typedefs
Some compiler/versions warn these typedefs according to #16660.

The platform specific header sys/abd_os.h shouldn't define or use abd_t,
as it's defined in its non-platform specific consumer sys/abd.h.
Do the same as what FreeBSD header does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #16660 
Closes #16665
2024-10-21 13:02:07 -07:00
Alexander Motin
ace2e17a9b zfs_debug: Restore log size limit for userspace
For some reason it was dropped when split from kernel, that makes
raidz_test to accumulate in RAM up to 100GB of logs we don't need.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by:  Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16492
Closes #16566
Closes #16664
2024-10-21 13:02:07 -07:00
Rob Norris
b4cd10ce5b libspl/backtrace: comment and harden libunwind backtracer
This is the sort of code that we get right once and never look at again.
Anyone reading this code is already likely in the middle of a debugging
nightmare, and then they have a wall of manual string construction and
an unfamiliar and idiosyncratic library to deal with. So, comment the
whole thing to try to make it clear what's going on.

In pursuit of the above, I've added return checks to some of the
libunwind calls, fixed the frame loop to not skip the "top" frame
(however unseful it may be), and fix a couple of calls to
spl_bt_u64_to_hex_str() which requested 18 digits instead of 16.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Rob Norris
f52d7aaaac libspl/backtrace: rename and document hex conversion function
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Rob Norris
d5db840260 libspl/backtrace: helper macros for output
My eyes are going blurry looking at all those write calls. This is much
nicer.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Close #16653
2024-10-21 13:02:07 -07:00
Rob Norris
bcd61d9579 libspl/backtrace: dump registers in libunwind backtraces
More useful stuff, especially when trying to follow a disassembly.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Umer Saleem
36a67b50a2 Fix inconsistent mount options for ZFS root
While mounting ZFS root during boot on Linux distributions from initrd,
mount from busybox is effectively used which executes mount system call
directly. This skips the ZFS helper mount.zfs, which checks and enables
the mount options as specified in dataset properties. As a result,
datasets mounted during boot from initrd do not have correct mount
options as specified in ZFS dataset properties.

There has been an attempt to use mount.zfs in zfs initrd script,
responsible for mounting the ZFS root filesystem (PR#13305). This was
later reverted (PR#14908) after discovering that using mount.zfs breaks
mounting of snapshots on root (/) and other child datasets of root have
the same issue (Issue#9461).

This happens because switching from busybox mount to mount.zfs correctly
parses the mount options but also adds 'mntpoint=/root' to the mount
options, which is then prepended to the snapshot mountpoint in
'.zfs/snapshot'. '/root' is the directory on Debian with initramfs-tools
where root filesystem is mounted before pivot_root. When Linux runtime
is reached, trying to access the snapshots on root results in
automounting the snapshot on '/root/.zfs/*', which fails.

This commit attempts to fix the automounting of snapshots on root, while
using mount.zfs in initrd script. Since the mountpoint of dataset is
stored in vfs_mntpoint field, we can check if current mountpoint of
dataset and vfs_mntpoint are same or not. If they are not same, reset
the vfs_mntpoint field with current mountpoint. This fixes the
mountpoints of root dataset and children in respective vfs_mntpoint
fields when we try to access the snapshots of root dataset or its
children. With correct mountpoint for root dataset and children stored
in vfs_mntpoint, all snapshots of root dataset are mounted correctly
and become accessible.

This fix will come into play only if current process, that is trying to
access the snapshots is not in chroot context. The Linux kernel API
that is used to convert struct path into char format (d_path), returns
the complete path for given struct path. It works in chroot environment
as well and returns the correct path from original filesystem root.

However d_path fails to return the complete path if any directory from
original root filesystem is mounted using --bind flag or --rbind flag
in chroot environment. In this case, if we try to access the snapshot
from outside the chroot environment, d_path returns the path correctly,
i.e. it returns the correct path to the directory that is mounted with
--bind flag. However inside the chroot environment, it only returns the
path inside chroot.

For now, there is not a better way in my understanding that gives the
complete path in char format and handles the case where directories from
root filesystem are mounted with --bind or --rbind on another path which
user will later chroot into. So this fix gets enabled if current
process trying to access the snapshot is not in chroot context.

With the snapshots issue fixed for root filesystem, using mount.zfs in
ZFS initrd script, mounts the datasets with correct mount options.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16646
2024-10-21 13:02:07 -07:00
Warner Losh
3d9129a7b6 freebsd: Use compiler.h from FreeBSD's base's linuxkpi
The FreeBSD linux/compiler.h in OpenZFS was copied from a very old
version of FreeBSD's linuxkpi's linux/compiler.h. There's no need for
this duplication. Use FreeBSD's linuxkpi version instead, and provide
zfs_fallthrough to augment it (it's all that's needed). Use #pragma once
to avoid naming issues for guard variables. Since this is a complete
rewrite, use my copyright here (the original code in FreeBSD still
credits everybody). This works back at least to FreeBSD 12.4, which
is not out of support, and all newer releases.

Remove extra copies of macros that were defined elsewhere, but are now
properly defined in LinuxKPI so are redundant.

Sponsored-by: Netflix
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes #16650
2024-10-21 13:02:07 -07:00
Brian Behlendorf
0409c47fe0 Tag 2.3.0-rc2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-10-13 19:25:58 -07:00
Tino Reichardt
b5a3825244 ZTS: Make use of optimal CPU pinning
With CPU pinning, we should get some speedup because of better
cpu cache re-use.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:25:58 -07:00
Tino Reichardt
77df762a1b ZTS: Optimize Kernel Same-page Merging (KSM)
Kernel same-page Merging (KSM) allows KVM guests to share identical
memory pages. These shared pages are usually common libraries or other
identical, high-use data.

The current configuration was a bit to lazy - so KSM didn't work very
well. With the new configuration I could run 3 Linux VMs in parralel.

FreeBSD can't benefit from it. But FreeBSD is not so memory hungry in
general, so there is no need for it ;)

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:25:58 -07:00
Brian Behlendorf
56871e465a Fallback to strerror() when strerror_l() isn't available
Some C libraries, such as uClibc, do not provide strerror_l() in
which case we fallback to strerror().

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16636 
Closes #16640
2024-10-13 19:25:58 -07:00
Brian Behlendorf
c645b07eaa ZTS: Increase zpool_import_parallel_pos import margin
Increase the pool import time allowed by assuming a minimum reduction
to 1/2 instead of 1/3 when comparing sequential to parallel import
times.  This is sufficient to verify parallel imports are working as
intended and should address the occasional false positive failure
when the time is slightly exceeded.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16638
2024-10-11 16:21:25 -07:00
Brian Behlendorf
5bc27acf51 ZTS: Slightly increase dedup_quota limit
As described in the comment above this check the space used by
logged entries is not accounted for and some margin needs to be
added in.  While uncommon we have slightly exceeded the 600,000
threshold on some CI run so we increase the limit a bit more.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16637
2024-10-11 16:21:25 -07:00
Brian Behlendorf
7f830d783b CI: Stick with ubuntu-22.04 for CodeQL analysis
The ubuntu-latest alias now refers to ubuntu-24.04 instead of
ubuntu-22.04 which causes CodeQL's autobuild to fail with:

    cpp/autobuilder: deptrace not supported in ubuntu 24.04

Until deptrace is supported by ubuntu-24.04 hosted runners request
ubuntu-22.04 which is supported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Closes #16639
2024-10-11 16:21:25 -07:00
Martin Matuška
58162960a1 zdb: fix printf format in dump_zap()
When compiling zdb.c on 32-bit platforms, a format conversion error 
is reported for a printf() in dump_zap().  Change %l to macro 
%" PRIu64 " to match the platform size of a 64-bit unsigned integer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16635
2024-10-11 16:21:25 -07:00
Rob Norris
666903610d zpool/zfs: allow --json wherever -j is allowed
Mostly so that with the JSON formatting options are also used, they all
look the same. To my eye, `-j --json-flat-vdevs` suggests that they are
different or unrelated, while `--json --json-flat-vdevs` invites no
further questions.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16632
2024-10-11 16:21:25 -07:00
Brian Atkinson
26ecd8b993 Always validate checksums for Direct I/O reads
This fixes an oversight in the Direct I/O PR. There is nothing that
stops a process from manipulating the contents of a buffer for a
Direct I/O read while the I/O is in flight. This can lead checksum
verify failures. However, the disk contents are still correct, and this
would lead to false reporting of checksum validation failures.

To remedy this, all Direct I/O reads that have a checksum verification
failure are treated as suspicious. In the event a checksum validation
failure occurs for a Direct I/O read, then the I/O request will be
reissued though the ARC. This allows for actual validation to happen and
removes any possibility of the buffer being manipulated after the I/O
has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O
read checksum validation failures are reported though zpool status -d in
the DIO column. Also the zevent has been updated to have both:
1. dio_verify_wr -> Checksum verification failure for writes
2. dio_verify_rd -> Checksum verification failure for reads.
This allows for determining what I/O operation was the culprit for the
checksum verification failure. All DIO errors are reported only on the
top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has
the same issue as Linux with Direct I/O reads.

This commit updates the following:
1. Propogates checksum failures for reads all the way up to the
   top-level VDEV.
2. Reports errors through zpool status -d as DIO.
3. Has two zevents for checksum verify errors with Direct I/O. One for
   read and one for write.
4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and
   handle ABD buffer contents validation the same as Linux.
5. Updated manipulate_user_buffer.c to also manipulate a buffer while a
   Direct I/O read is taking place.
6. Adds a new ZTS test case dio_read_verify that stress tests the new
   code.
7. Updated man pages.
8. Added an IMPLY statement to zio_checksum_verify() to make sure that
   Direct I/O reads are not issued as speculative.
9. Removed self healing through mirror, raidz, and dRAID VDEVs for
   Direct I/O reads.

This issue was first observed when installing a Windows 11 VM on a ZFS
dataset with the dataset property direct set to always. The zpool
devices would report checksum failures, but running a subsequent zpool
scrub would not repair any data and report no errors.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16598
2024-10-09 13:45:06 -07:00
Martin Matuška
774dcba86d FreeBSD: ignore some includes when not building kernel
The function abd_alloc_from_pages() is used only in kernel.
Excluding sys/vm.h, and vm/vm_page.h includes avoids dependency
problems.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16616
2024-10-09 13:45:02 -07:00
Brian Behlendorf
09f6b2ebe3 ztest: Fix scrub check in ztest_raidz_expand_check()
The scrub code may return EBUSY under several possible scenarios
causing ztest to incorrectly ASSERT when verifying the result of
a raidz expansion.  Update the test case to allow EBUSY since it
does not indicate pool damage.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16627
2024-10-09 13:44:58 -07:00
Matthew Heller
2609d93b65 vdev_id: multi-lun disks & slot num zero pad
Add ability to generate disk names that contain both a slot number
and a lun number in order to support multi-actuator SAS hard drives
with multiple luns. Also add the ability to zero pad slot numbers to
a desired digit length for easier sorting.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Heller <matthew.f.heller@accre.vanderbilt.edu>
Closes #16603
2024-10-09 13:44:55 -07:00
Brian Behlendorf
10f46d2aba ZTS: resilver_restart_001.ksh restore defaults
Update resilver_restart_001.ksh to restore the default
resilver_defer_percent when the test completes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16618
2024-10-09 13:44:50 -07:00
Umer Saleem
0df10dc911 Only serialize native-deb* targets
.NOTPARALLEL target is being forced on userspace as well. This commit
removes .NOTPARALEL target and only serializes the execution of
native-deb* targets.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16622
2024-10-09 13:44:46 -07:00
Rob Norris
0fbe9d352c zpool/zfs: restore -V & --version options
The -j option added a round of getopt, which didn't know the magic
version flags. So just bypass the whole thing and go straight to the
human output function for the special case.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16615 
Closes #16617
2024-10-09 13:44:42 -07:00
Martin Matuška
84f44ec07f Return boolean_t in inline functions of lib/libspl/include/sys/uio.h
The inline functions zfs_dio_offset_aligned(), zfs_dio_size_aligned()
and zfs_dio_aligned() are declared as boolean_t but return the bool
type.

This fixes the build of FreeBSD.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16613
2024-10-09 13:44:36 -07:00
Shengqi Chen
fc9608e2e6 Bump SONAME of libzfs and libzpool
The ABI of libzfs and libzpool have breaking changes since last
SONAME bump in commit fe6babc:

* libzfs: `zpool_print_unsup_feat` removed (used by zpool cmd).
* libzpool: multiple `ddt_*` symbols removed (used by zdb cmd).

Bump them to avoid ABI breakage.

See: https://github.com/openzfs/zfs/pull/11817
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-09 13:44:32 -07:00
Shengqi Chen
d32c05949a contrib/debian: add new manpages to installation list
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-09 13:44:26 -07:00
JKDingwall
1ebb6b866f Fix generation of kernel uevents for snapshot rename on linux
`zvol_rename_minors()` needs to be given the full path not just the
snapshot name.  Use code removed in a0bd735ad as a guide
to providing the necessary values.

Add ZTS check for /dev changes after snapshot rename.  After
renaming a snapshot with 'snapdev=visible' ensure that the /dev
entries are updated to reflect the rename.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Dingwall <james@dingwall.me.uk>
Closes #14223 
Closes #16600
2024-10-09 13:44:22 -07:00
Tino Reichardt
f019b445f3 ZTS: Fix summary page creation again - second try
In PR #16599 I used 'return' like in C - which is wrong :/
This fix generates the summary as needed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16611
2024-10-09 13:44:18 -07:00
Tino Reichardt
03822a61be ZTS: Remove FreeBSD 13.4-STABLE
Current CI is failing on FreeBSD 13.4-STABLE, because samba4 can't be
installed there. Lets remove it for now.

Update also the FreeBSD version definitions a bit.

The naming is like this now:

FreeBSD variants:
- freebsd13-3r, freebsd13-4r, freebsd14-0r, freebsd14-1r (RELEASE)
- freebsd13-4s, freebsd14-1s (STABLE)
- freebsd15-0c (CURRENT)

RHL based distros:
- almalinux8, almalinux9, centos-stream9, fedora39, fedora40

Debian based:
- debian11, debian12, ubuntu20, ubuntu22, ubuntu24

Misc Linux distros:
- archlinux, tumbleweed

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16610
2024-10-09 13:43:46 -07:00
111 changed files with 1793 additions and 932 deletions

View File

@ -11,7 +11,7 @@ concurrency:
jobs: jobs:
analyze: analyze:
name: Analyze name: Analyze
runs-on: ubuntu-latest runs-on: ubuntu-22.04
permissions: permissions:
actions: read actions: read
contents: read contents: read

View File

@ -18,19 +18,21 @@ ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -q -N ""
# we expect RAM shortage # we expect RAM shortage
cat << EOF | sudo tee /etc/ksmtuned.conf > /dev/null cat << EOF | sudo tee /etc/ksmtuned.conf > /dev/null
# /etc/ksmtuned.conf - Configuration file for ksmtuned
# https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-ksm # https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-ksm
KSM_MONITOR_INTERVAL=60 KSM_MONITOR_INTERVAL=60
# Millisecond sleep between ksm scans for 16Gb server. # Millisecond sleep between ksm scans for 16Gb server.
# Smaller servers sleep more, bigger sleep less. # Smaller servers sleep more, bigger sleep less.
KSM_SLEEP_MSEC=10 KSM_SLEEP_MSEC=30
KSM_NPAGES_BOOST=300
KSM_NPAGES_DECAY=-50
KSM_NPAGES_MIN=64
KSM_NPAGES_MAX=2048
KSM_THRES_COEF=25 KSM_NPAGES_BOOST=0
KSM_THRES_CONST=2048 KSM_NPAGES_DECAY=0
KSM_NPAGES_MIN=1000
KSM_NPAGES_MAX=25000
KSM_THRES_COEF=80
KSM_THRES_CONST=8192
LOGFILE=/var/log/ksmtuned.log LOGFILE=/var/log/ksmtuned.log
DEBUG=1 DEBUG=1

View File

@ -14,7 +14,7 @@ OSv=$OS
# compressed with .zst extension # compressed with .zst extension
REPO="https://github.com/mcmilk/openzfs-freebsd-images" REPO="https://github.com/mcmilk/openzfs-freebsd-images"
FREEBSD="$REPO/releases/download/v2024-09-16" FREEBSD="$REPO/releases/download/v2024-10-05"
URLzs="" URLzs=""
# Ubuntu mirrors # Ubuntu mirrors
@ -52,43 +52,55 @@ case "$OS" in
OSNAME="Debian 12" OSNAME="Debian 12"
URL="https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2" URL="https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2"
;; ;;
fedora39)
OSNAME="Fedora 39"
OSv="fedora39"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/39/Cloud/x86_64/images/Fedora-Cloud-Base-39-1.5.x86_64.qcow2"
;;
fedora40) fedora40)
OSNAME="Fedora 40" OSNAME="Fedora 40"
OSv="fedora39" OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/40/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2" URL="https://download.fedoraproject.org/pub/fedora/linux/releases/40/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2"
;; ;;
freebsd13r) fedora41)
OSNAME="Fedora 41"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/41/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-1.4.x86_64.qcow2"
;;
freebsd13-3r)
OSNAME="FreeBSD 13.3-RELEASE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.3-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd13-4r)
OSNAME="FreeBSD 13.4-RELEASE" OSNAME="FreeBSD 13.4-RELEASE"
OSv="freebsd13.0" OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-RELEASE.qcow2.zst" URLzs="$FREEBSD/amd64-freebsd-13.4-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash" BASH="/usr/local/bin/bash"
NIC="rtl8139" NIC="rtl8139"
;; ;;
freebsd13) freebsd14-0r)
OSNAME="FreeBSD 13.4-STABLE" OSNAME="FreeBSD 14.0-RELEASE"
OSv="freebsd13.0" OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-STABLE.qcow2.zst" URLzs="$FREEBSD/amd64-freebsd-14.0-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash" BASH="/usr/local/bin/bash"
NIC="rtl8139"
;; ;;
freebsd14r) freebsd14-1r)
OSNAME="FreeBSD 14.1-RELEASE" OSNAME="FreeBSD 14.1-RELEASE"
OSv="freebsd14.0" OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-RELEASE.qcow2.zst" URLzs="$FREEBSD/amd64-freebsd-14.1-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash" BASH="/usr/local/bin/bash"
;; ;;
freebsd14) freebsd13-4s)
OSNAME="FreeBSD 13.4-STABLE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd14-1s)
OSNAME="FreeBSD 14.1-STABLE" OSNAME="FreeBSD 14.1-STABLE"
OSv="freebsd14.0" OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-STABLE.qcow2.zst" URLzs="$FREEBSD/amd64-freebsd-14.1-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash" BASH="/usr/local/bin/bash"
;; ;;
freebsd15) freebsd15-0c)
OSNAME="FreeBSD 15.0-CURRENT" OSNAME="FreeBSD 15.0-CURRENT"
OSv="freebsd14.0" OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-15.0-CURRENT.qcow2.zst" URLzs="$FREEBSD/amd64-freebsd-15.0-CURRENT.qcow2.zst"

View File

@ -13,10 +13,10 @@ function archlinux() {
echo "##[endgroup]" echo "##[endgroup]"
echo "##[group]Install Development Tools" echo "##[group]Install Development Tools"
sudo pacman -Sy --noconfirm base-devel bc cpio dhclient dkms fakeroot \ sudo pacman -Sy --noconfirm base-devel bc cpio cryptsetup dhclient dkms \
fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils parted \ fakeroot fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils \
pax perf python-packaging python-setuptools qemu-guest-agent ksh samba \ parted pax perf python-packaging python-setuptools qemu-guest-agent ksh \
sysstat rng-tools rsync wget xxhash samba sysstat rng-tools rsync wget xxhash
echo "##[endgroup]" echo "##[endgroup]"
} }
@ -30,11 +30,11 @@ function debian() {
echo "##[group]Install Development Tools" echo "##[group]Install Development Tools"
sudo apt-get install -y \ sudo apt-get install -y \
acl alien attr autoconf bc cpio curl dbench dh-python dkms fakeroot \ acl alien attr autoconf bc cpio cryptsetup curl dbench dh-python dkms \
fio gdb gdebi git ksh lcov isc-dhcp-client jq libacl1-dev libaio-dev \ fakeroot fio gdb gdebi git ksh lcov isc-dhcp-client jq libacl1-dev \
libattr1-dev libblkid-dev libcurl4-openssl-dev libdevmapper-dev libelf-dev \ libaio-dev libattr1-dev libblkid-dev libcurl4-openssl-dev libdevmapper-dev \
libffi-dev libmount-dev libpam0g-dev libselinux-dev libssl-dev libtool \ libelf-dev libffi-dev libmount-dev libpam0g-dev libselinux-dev libssl-dev \
libtool-bin libudev-dev libunwind-dev linux-headers-$(uname -r) \ libtool libtool-bin libudev-dev libunwind-dev linux-headers-$(uname -r) \
lsscsi nfs-kernel-server pamtester parted python3 python3-all-dev \ lsscsi nfs-kernel-server pamtester parted python3 python3-all-dev \
python3-cffi python3-dev python3-distlib python3-packaging \ python3-cffi python3-dev python3-distlib python3-packaging \
python3-setuptools python3-sphinx qemu-guest-agent rng-tools rpm2cpio \ python3-setuptools python3-sphinx qemu-guest-agent rng-tools rpm2cpio \
@ -66,16 +66,23 @@ function rhel() {
echo "##[endgroup]" echo "##[endgroup]"
echo "##[group]Install Development Tools" echo "##[group]Install Development Tools"
sudo dnf group install -y "Development Tools"
# Alma wants "Development Tools", Fedora 41 wants "development-tools"
if ! sudo dnf group install -y "Development Tools" ; then
echo "Trying 'development-tools' instead of 'Development Tools'"
sudo dnf group install -y development-tools
fi
sudo dnf install -y \ sudo dnf install -y \
acl attr bc bzip2 curl dbench dkms elfutils-libelf-devel fio gdb git \ acl attr bc bzip2 cryptsetup curl dbench dkms elfutils-libelf-devel fio \
jq kernel-rpm-macros ksh libacl-devel libaio-devel libargon2-devel \ gdb git jq kernel-rpm-macros ksh libacl-devel libaio-devel \
libattr-devel libblkid-devel libcurl-devel libffi-devel ncompress \ libargon2-devel libattr-devel libblkid-devel libcurl-devel libffi-devel \
libselinux-devel libtirpc-devel libtool libudev-devel libuuid-devel \ ncompress libselinux-devel libtirpc-devel libtool libudev-devel \
lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester parted perf \ libuuid-devel lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester \
python3 python3-cffi python3-devel python3-packaging kernel-devel \ parted perf python3 python3-cffi python3-devel python3-packaging \
python3-setuptools qemu-guest-agent rng-tools rpcgen rpm-build rsync \ kernel-devel python3-setuptools qemu-guest-agent rng-tools rpcgen \
samba sysstat systemd watchdog wget xfsprogs-devel xxhash zlib-devel rpm-build rsync samba sysstat systemd watchdog wget xfsprogs-devel xxhash \
zlib-devel
echo "##[endgroup]" echo "##[endgroup]"
} }
@ -111,6 +118,7 @@ case "$1" in
archlinux archlinux
;; ;;
debian*) debian*)
echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
debian debian
echo "##[group]Install Debian specific" echo "##[group]Install Debian specific"
sudo apt-get install -yq linux-perf dh-sequence-dkms sudo apt-get install -yq linux-perf dh-sequence-dkms

View File

@ -83,7 +83,7 @@ function rpm_build_and_install() {
echo "##[endgroup]" echo "##[endgroup]"
echo "##[group]Install" echo "##[group]Install"
run sudo dnf -y --skip-broken localinstall $(ls *.rpm | grep -v src.rpm) run sudo dnf -y --nobest install $(ls *.rpm | grep -v src.rpm)
echo "##[endgroup]" echo "##[endgroup]"
} }

View File

@ -14,17 +14,21 @@ PID=$(pidof /usr/bin/qemu-system-x86_64)
tail --pid=$PID -f /dev/null tail --pid=$PID -f /dev/null
sudo virsh undefine openzfs sudo virsh undefine openzfs
# definitions of per operating system # default values per test vm:
VMs=2
CPU=2
# cpu pinning
CPUSET=("0,1" "2,3")
case "$OS" in case "$OS" in
freebsd*) freebsd*)
VMs=2 # FreeBSD can't be optimized via ksmtuned
CPU=3
RAM=6 RAM=6
;; ;;
*) *)
VMs=2 # Linux can be optimized via ksmtuned
CPU=3 RAM=8
RAM=7
;; ;;
esac esac
@ -73,6 +77,7 @@ EOF
--cpu host-passthrough \ --cpu host-passthrough \
--virt-type=kvm --hvm \ --virt-type=kvm --hvm \
--vcpus=$CPU,sockets=1 \ --vcpus=$CPU,sockets=1 \
--cpuset=${CPUSET[$((i-1))]} \
--memory $((1024*RAM)) \ --memory $((1024*RAM)) \
--memballoon model=virtio \ --memballoon model=virtio \
--graphics none \ --graphics none \

View File

@ -11,12 +11,10 @@ function output() {
} }
function outfile() { function outfile() {
test -s "$1" || return
cat "$1" >> "out-$logfile.md" cat "$1" >> "out-$logfile.md"
} }
function outfile_plain() { function outfile_plain() {
test -s "$1" || return
output "<pre>" output "<pre>"
cat "$1" >> "out-$logfile.md" cat "$1" >> "out-$logfile.md"
output "</pre>" output "</pre>"
@ -45,6 +43,8 @@ if [ ! -f out-1.md ]; then
tar xf "$tarfile" tar xf "$tarfile"
test -s env.txt || continue test -s env.txt || continue
source env.txt source env.txt
# when uname.txt is there, the other files are also ok
test -s uname.txt || continue
output "\n## Functional Tests: $OSNAME\n" output "\n## Functional Tests: $OSNAME\n"
outfile_plain uname.txt outfile_plain uname.txt
outfile_plain summary.txt outfile_plain summary.txt

View File

@ -22,8 +22,8 @@ jobs:
- name: Generate OS config and CI type - name: Generate OS config and CI type
id: os id: os
run: | run: |
FULL_OS='["almalinux8", "almalinux9", "centos-stream9", "debian11", "debian12", "fedora39", "fedora40", "freebsd13", "freebsd13r", "freebsd14", "freebsd14r", "ubuntu20", "ubuntu22", "ubuntu24"]' FULL_OS='["almalinux8", "almalinux9", "centos-stream9", "debian11", "debian12", "fedora40", "fedora41", "freebsd13-4r", "freebsd14-0r", "freebsd14-1s", "ubuntu20", "ubuntu22", "ubuntu24"]'
QUICK_OS='["almalinux8", "almalinux9", "debian12", "fedora40", "freebsd13", "freebsd14", "ubuntu24"]' QUICK_OS='["almalinux8", "almalinux9", "debian12", "fedora41", "freebsd13-3r", "freebsd14-1r", "ubuntu24"]'
# determine CI type when running on PR # determine CI type when running on PR
ci_type="full" ci_type="full"
if ${{ github.event_name == 'pull_request' }}; then if ${{ github.event_name == 'pull_request' }}; then
@ -46,10 +46,12 @@ jobs:
strategy: strategy:
fail-fast: false fail-fast: false
matrix: matrix:
# all: # rhl: almalinux8, almalinux9, centos-stream9, fedora40, fedora41
# os: [almalinux8, almalinux9, archlinux, centos-stream9, fedora39, fedora40, debian11, debian12, freebsd13, freebsd13r, freebsd14, freebsd14r, freebsd15, ubuntu20, ubuntu22, ubuntu24] # debian: debian11, debian12, ubuntu20, ubuntu22, ubuntu24
# openzfs: # misc: archlinux, tumbleweed
# os: [almalinux8, almalinux9, centos-stream9, debian11, debian12, fedora39, fedora40, freebsd13, freebsd13r, freebsd14, freebsd14r, ubuntu20, ubuntu22, ubuntu24] # FreeBSD Release: freebsd13-3r, freebsd13-4r, freebsd14-0r, freebsd14-1r
# FreeBSD Stable: freebsd13-4s, freebsd14-1s
# FreeBSD Current: freebsd15-0c
os: ${{ fromJson(needs.test-config.outputs.test_os) }} os: ${{ fromJson(needs.test-config.outputs.test_os) }}
runs-on: ubuntu-24.04 runs-on: ubuntu-24.04
steps: steps:

2
META
View File

@ -2,7 +2,7 @@ Meta: 1
Name: zfs Name: zfs
Branch: 1.0 Branch: 1.0
Version: 2.3.0 Version: 2.3.0
Release: rc1 Release: rc3
Release-Tags: relext Release-Tags: relext
License: CDDL License: CDDL
Author: OpenZFS Author: OpenZFS

View File

@ -1131,7 +1131,7 @@ dump_zap(objset_t *os, uint64_t object, void *data, size_t size)
!!(zap_getflags(zc.zc_zap) & ZAP_FLAG_UINT64_KEY); !!(zap_getflags(zc.zc_zap) & ZAP_FLAG_UINT64_KEY);
if (key64) if (key64)
(void) printf("\t\t0x%010lx = ", (void) printf("\t\t0x%010" PRIu64 "x = ",
*(uint64_t *)attrp->za_name); *(uint64_t *)attrp->za_name);
else else
(void) printf("\t\t%s = ", attrp->za_name); (void) printf("\t\t%s = ", attrp->za_name);
@ -2152,14 +2152,21 @@ dump_brt(spa_t *spa)
if (dump_opt['T'] < 3) if (dump_opt['T'] < 3)
return; return;
/* -TTT shows a per-vdev histograms; -TTTT shows all entries */
boolean_t do_histo = dump_opt['T'] == 3;
char dva[64]; char dva[64];
printf("\n%-16s %-10s\n", "DVA", "REFCNT");
if (!do_histo)
printf("\n%-16s %-10s\n", "DVA", "REFCNT");
for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) { for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid]; brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid];
if (brtvd == NULL || !brtvd->bv_initiated) if (brtvd == NULL || !brtvd->bv_initiated)
continue; continue;
uint64_t counts[64] = {};
zap_cursor_t zc; zap_cursor_t zc;
zap_attribute_t *za = zap_attribute_alloc(); zap_attribute_t *za = zap_attribute_alloc();
for (zap_cursor_init(&zc, brt->brt_mos, brtvd->bv_mos_entries); for (zap_cursor_init(&zc, brt->brt_mos, brtvd->bv_mos_entries);
@ -2172,14 +2179,26 @@ dump_brt(spa_t *spa)
za->za_integer_length, za->za_num_integers, za->za_integer_length, za->za_num_integers,
&refcnt)); &refcnt));
uint64_t offset = *(const uint64_t *)za->za_name; if (do_histo)
counts[highbit64(refcnt)]++;
else {
uint64_t offset =
*(const uint64_t *)za->za_name;
snprintf(dva, sizeof (dva), "%" PRIu64 ":%llx", vdevid, snprintf(dva, sizeof (dva), "%" PRIu64 ":%llx",
(u_longlong_t)offset); vdevid, (u_longlong_t)offset);
printf("%-16s %-10llu\n", dva, (u_longlong_t)refcnt); printf("%-16s %-10llu\n", dva,
(u_longlong_t)refcnt);
}
} }
zap_cursor_fini(&zc); zap_cursor_fini(&zc);
zap_attribute_free(za); zap_attribute_free(za);
if (do_histo) {
printf("\nBRT: vdev %" PRIu64
": DVAs with 2^n refcnts:\n", vdevid);
dump_histogram(counts, 64, 0);
}
} }
} }
@ -4266,6 +4285,10 @@ dump_uberblock(uberblock_t *ub, const char *header, const char *footer)
(void) printf("\ttimestamp = %llu UTC = %s", (void) printf("\ttimestamp = %llu UTC = %s",
(u_longlong_t)ub->ub_timestamp, ctime(&timestamp)); (u_longlong_t)ub->ub_timestamp, ctime(&timestamp));
char blkbuf[BP_SPRINTF_LEN];
snprintf_blkptr(blkbuf, sizeof (blkbuf), &ub->ub_rootbp);
(void) printf("\tbp = %s\n", blkbuf);
(void) printf("\tmmp_magic = %016llx\n", (void) printf("\tmmp_magic = %016llx\n",
(u_longlong_t)ub->ub_mmp_magic); (u_longlong_t)ub->ub_mmp_magic);
if (MMP_VALID(ub)) { if (MMP_VALID(ub)) {

View File

@ -139,7 +139,8 @@ dev_event_nvlist(struct udev_device *dev)
* is /dev/sda. * is /dev/sda.
*/ */
struct udev_device *parent_dev = udev_device_get_parent(dev); struct udev_device *parent_dev = udev_device_get_parent(dev);
if ((value = udev_device_get_sysattr_value(parent_dev, "size")) if (parent_dev != NULL &&
(value = udev_device_get_sysattr_value(parent_dev, "size"))
!= NULL) { != NULL) {
uint64_t numval = DEV_BSIZE; uint64_t numval = DEV_BSIZE;

View File

@ -2162,6 +2162,7 @@ zfs_do_get(int argc, char **argv)
cb.cb_type = ZFS_TYPE_DATASET; cb.cb_type = ZFS_TYPE_DATASET;
struct option long_options[] = { struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZFS_OPTION_JSON_NUMS_AS_INT}, {"json-int", no_argument, NULL, ZFS_OPTION_JSON_NUMS_AS_INT},
{0, 0, 0, 0} {0, 0, 0, 0}
}; };
@ -3760,8 +3761,13 @@ collect_dataset(zfs_handle_t *zhp, list_cbdata_t *cb)
if (cb->cb_json) { if (cb->cb_json) {
if (pl->pl_prop == ZFS_PROP_NAME) if (pl->pl_prop == ZFS_PROP_NAME)
continue; continue;
const char *prop_name;
if (pl->pl_prop != ZPROP_USERPROP)
prop_name = zfs_prop_to_name(pl->pl_prop);
else
prop_name = pl->pl_user_prop;
if (zprop_nvlist_one_property( if (zprop_nvlist_one_property(
zfs_prop_to_name(pl->pl_prop), propstr, prop_name, propstr,
sourcetype, source, NULL, props, sourcetype, source, NULL, props,
cb->cb_json_as_int) != 0) cb->cb_json_as_int) != 0)
nomem(); nomem();
@ -3852,6 +3858,7 @@ zfs_do_list(int argc, char **argv)
nvlist_t *data = NULL; nvlist_t *data = NULL;
struct option long_options[] = { struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZFS_OPTION_JSON_NUMS_AS_INT}, {"json-int", no_argument, NULL, ZFS_OPTION_JSON_NUMS_AS_INT},
{0, 0, 0, 0} {0, 0, 0, 0}
}; };
@ -7436,9 +7443,15 @@ share_mount(int op, int argc, char **argv)
uint_t nthr; uint_t nthr;
jsobj = data = item = NULL; jsobj = data = item = NULL;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{0, 0, 0, 0}
};
/* check options */ /* check options */
while ((c = getopt(argc, argv, op == OP_MOUNT ? ":ajRlvo:Of" : "al")) while ((c = getopt_long(argc, argv,
!= -1) { op == OP_MOUNT ? ":ajRlvo:Of" : "al",
op == OP_MOUNT ? long_options : NULL, NULL)) != -1) {
switch (c) { switch (c) {
case 'a': case 'a':
do_all = 1; do_all = 1;
@ -8374,8 +8387,14 @@ zfs_do_channel_program(int argc, char **argv)
boolean_t sync_flag = B_TRUE, json_output = B_FALSE; boolean_t sync_flag = B_TRUE, json_output = B_FALSE;
zpool_handle_t *zhp; zpool_handle_t *zhp;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{0, 0, 0, 0}
};
/* check options */ /* check options */
while ((c = getopt(argc, argv, "nt:m:j")) != -1) { while ((c = getopt_long(argc, argv, "nt:m:j", long_options,
NULL)) != -1) {
switch (c) { switch (c) {
case 't': case 't':
case 'm': { case 'm': {
@ -9083,7 +9102,13 @@ zfs_do_version(int argc, char **argv)
int c; int c;
nvlist_t *jsobj = NULL, *zfs_ver = NULL; nvlist_t *jsobj = NULL, *zfs_ver = NULL;
boolean_t json = B_FALSE; boolean_t json = B_FALSE;
while ((c = getopt(argc, argv, "j")) != -1) {
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{0, 0, 0, 0}
};
while ((c = getopt_long(argc, argv, "j", long_options, NULL)) != -1) {
switch (c) { switch (c) {
case 'j': case 'j':
json = B_TRUE; json = B_TRUE;
@ -9187,7 +9212,7 @@ main(int argc, char **argv)
* Special case '-V|--version' * Special case '-V|--version'
*/ */
if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0)) if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0))
return (zfs_do_version(argc, argv)); return (zfs_version_print() != 0);
/* /*
* Special case 'help' * Special case 'help'

View File

@ -7340,6 +7340,7 @@ zpool_do_list(int argc, char **argv)
current_prop_type = ZFS_TYPE_POOL; current_prop_type = ZFS_TYPE_POOL;
struct option long_options[] = { struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT}, {"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT},
{"json-pool-key-guid", no_argument, NULL, {"json-pool-key-guid", no_argument, NULL,
ZPOOL_OPTION_POOL_KEY_GUID}, ZPOOL_OPTION_POOL_KEY_GUID},
@ -7965,8 +7966,11 @@ zpool_do_online(int argc, char **argv)
poolname = argv[0]; poolname = argv[0];
if ((zhp = zpool_open(g_zfs, poolname)) == NULL) if ((zhp = zpool_open(g_zfs, poolname)) == NULL) {
(void) fprintf(stderr, gettext("failed to open pool "
"\"%s\""), poolname);
return (1); return (1);
}
for (i = 1; i < argc; i++) { for (i = 1; i < argc; i++) {
vdev_state_t oldstate; vdev_state_t oldstate;
@ -7987,12 +7991,15 @@ zpool_do_online(int argc, char **argv)
&l2cache, NULL); &l2cache, NULL);
if (tgt == NULL) { if (tgt == NULL) {
ret = 1; ret = 1;
(void) fprintf(stderr, gettext("couldn't find device "
"\"%s\" in pool \"%s\"\n"), argv[i], poolname);
continue; continue;
} }
uint_t vsc; uint_t vsc;
oldstate = ((vdev_stat_t *)fnvlist_lookup_uint64_array(tgt, oldstate = ((vdev_stat_t *)fnvlist_lookup_uint64_array(tgt,
ZPOOL_CONFIG_VDEV_STATS, &vsc))->vs_state; ZPOOL_CONFIG_VDEV_STATS, &vsc))->vs_state;
if (zpool_vdev_online(zhp, argv[i], flags, &newstate) == 0) { if ((rc = zpool_vdev_online(zhp, argv[i], flags,
&newstate)) == 0) {
if (newstate != VDEV_STATE_HEALTHY) { if (newstate != VDEV_STATE_HEALTHY) {
(void) printf(gettext("warning: device '%s' " (void) printf(gettext("warning: device '%s' "
"onlined, but remains in faulted state\n"), "onlined, but remains in faulted state\n"),
@ -8018,6 +8025,9 @@ zpool_do_online(int argc, char **argv)
} }
} }
} else { } else {
(void) fprintf(stderr, gettext("Failed to online "
"\"%s\" in pool \"%s\": %d\n"),
argv[i], poolname, rc);
ret = 1; ret = 1;
} }
} }
@ -8102,8 +8112,11 @@ zpool_do_offline(int argc, char **argv)
poolname = argv[0]; poolname = argv[0];
if ((zhp = zpool_open(g_zfs, poolname)) == NULL) if ((zhp = zpool_open(g_zfs, poolname)) == NULL) {
(void) fprintf(stderr, gettext("failed to open pool "
"\"%s\""), poolname);
return (1); return (1);
}
for (i = 1; i < argc; i++) { for (i = 1; i < argc; i++) {
uint64_t guid = zpool_vdev_path_to_guid(zhp, argv[i]); uint64_t guid = zpool_vdev_path_to_guid(zhp, argv[i]);
@ -9224,6 +9237,12 @@ vdev_stats_nvlist(zpool_handle_t *zhp, status_cbdata_t *cb, nvlist_t *nv,
} }
} }
if (cb->cb_print_dio_verify) {
nice_num_str_nvlist(vds, "dio_verify_errors",
vs->vs_dio_verify_errors, cb->cb_literal,
cb->cb_json_as_int, ZFS_NICENUM_1024);
}
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT,
&notpresent) == 0) { &notpresent) == 0) {
nice_num_str_nvlist(vds, ZPOOL_CONFIG_NOT_PRESENT, nice_num_str_nvlist(vds, ZPOOL_CONFIG_NOT_PRESENT,
@ -10975,6 +10994,7 @@ zpool_do_status(int argc, char **argv)
struct option long_options[] = { struct option long_options[] = {
{"power", no_argument, NULL, ZPOOL_OPTION_POWER}, {"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT}, {"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT},
{"json-flat-vdevs", no_argument, NULL, {"json-flat-vdevs", no_argument, NULL,
ZPOOL_OPTION_JSON_FLAT_VDEVS}, ZPOOL_OPTION_JSON_FLAT_VDEVS},
@ -12583,6 +12603,7 @@ zpool_do_get(int argc, char **argv)
current_prop_type = cb.cb_type; current_prop_type = cb.cb_type;
struct option long_options[] = { struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT}, {"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT},
{"json-pool-key-guid", no_argument, NULL, {"json-pool-key-guid", no_argument, NULL,
ZPOOL_OPTION_POOL_KEY_GUID}, ZPOOL_OPTION_POOL_KEY_GUID},
@ -13497,7 +13518,12 @@ zpool_do_version(int argc, char **argv)
int c; int c;
nvlist_t *jsobj = NULL, *zfs_ver = NULL; nvlist_t *jsobj = NULL, *zfs_ver = NULL;
boolean_t json = B_FALSE; boolean_t json = B_FALSE;
while ((c = getopt(argc, argv, "j")) != -1) {
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
};
while ((c = getopt_long(argc, argv, "j", long_options, NULL)) != -1) {
switch (c) { switch (c) {
case 'j': case 'j':
json = B_TRUE; json = B_TRUE;
@ -13613,7 +13639,7 @@ main(int argc, char **argv)
* Special case '-V|--version' * Special case '-V|--version'
*/ */
if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0)) if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0))
return (zpool_do_version(argc, argv)); return (zfs_version_print() != 0);
/* /*
* Special case 'help' * Special case 'help'

View File

@ -6717,6 +6717,17 @@ out:
* *
* Only after a full scrub has been completed is it safe to start injecting * Only after a full scrub has been completed is it safe to start injecting
* data corruption. See the comment in zfs_fault_inject(). * data corruption. See the comment in zfs_fault_inject().
*
* EBUSY may be returned for the following six cases. It's the callers
* responsibility to handle them accordingly.
*
* Current state Requested
* 1. Normal Scrub Running Normal Scrub or Error Scrub
* 2. Normal Scrub Paused Error Scrub
* 3. Normal Scrub Paused Pause Normal Scrub
* 4. Error Scrub Running Normal Scrub or Error Scrub
* 5. Error Scrub Paused Pause Error Scrub
* 6. Resilvering Anything else
*/ */
static int static int
ztest_scrub_impl(spa_t *spa) ztest_scrub_impl(spa_t *spa)
@ -8082,8 +8093,14 @@ ztest_raidz_expand_check(spa_t *spa)
(void) printf("verifying an interrupted raidz " (void) printf("verifying an interrupted raidz "
"expansion using a pool scrub ...\n"); "expansion using a pool scrub ...\n");
} }
/* Will fail here if there is non-recoverable corruption detected */ /* Will fail here if there is non-recoverable corruption detected */
VERIFY0(ztest_scrub_impl(spa)); int error = ztest_scrub_impl(spa);
if (error == EBUSY)
error = 0;
VERIFY0(error);
if (ztest_opts.zo_verbose >= 1) { if (ztest_opts.zo_verbose >= 1) {
(void) printf("raidz expansion scrub check complete\n"); (void) printf("raidz expansion scrub check complete\n");
} }

View File

@ -58,9 +58,9 @@ deb-utils: deb-local rpm-utils-initramfs
pkg1=$${name}-$${version}.$${arch}.rpm; \ pkg1=$${name}-$${version}.$${arch}.rpm; \
pkg2=libnvpair3-$${version}.$${arch}.rpm; \ pkg2=libnvpair3-$${version}.$${arch}.rpm; \
pkg3=libuutil3-$${version}.$${arch}.rpm; \ pkg3=libuutil3-$${version}.$${arch}.rpm; \
pkg4=libzfs5-$${version}.$${arch}.rpm; \ pkg4=libzfs6-$${version}.$${arch}.rpm; \
pkg5=libzpool5-$${version}.$${arch}.rpm; \ pkg5=libzpool6-$${version}.$${arch}.rpm; \
pkg6=libzfs5-devel-$${version}.$${arch}.rpm; \ pkg6=libzfs6-devel-$${version}.$${arch}.rpm; \
pkg7=$${name}-test-$${version}.$${arch}.rpm; \ pkg7=$${name}-test-$${version}.$${arch}.rpm; \
pkg8=$${name}-dracut-$${version}.noarch.rpm; \ pkg8=$${name}-dracut-$${version}.noarch.rpm; \
pkg9=$${name}-initramfs-$${version}.$${arch}.rpm; \ pkg9=$${name}-initramfs-$${version}.$${arch}.rpm; \
@ -72,7 +72,7 @@ deb-utils: deb-local rpm-utils-initramfs
path_prepend=`mktemp -d /tmp/intercept.XXXXXX`; \ path_prepend=`mktemp -d /tmp/intercept.XXXXXX`; \
echo "#!$(SHELL)" > $${path_prepend}/dh_shlibdeps; \ echo "#!$(SHELL)" > $${path_prepend}/dh_shlibdeps; \
echo "`which dh_shlibdeps` -- \ echo "`which dh_shlibdeps` -- \
-xlibuutil3linux -xlibnvpair3linux -xlibzfs5linux -xlibzpool5linux" \ -xlibuutil3linux -xlibnvpair3linux -xlibzfs6linux -xlibzpool6linux" \
>> $${path_prepend}/dh_shlibdeps; \ >> $${path_prepend}/dh_shlibdeps; \
## These -x arguments are passed to dpkg-shlibdeps, which exclude the ## These -x arguments are passed to dpkg-shlibdeps, which exclude the
## Debianized packages from the auto-generated dependencies of the new debs, ## Debianized packages from the auto-generated dependencies of the new debs,
@ -93,13 +93,17 @@ debian:
cp -r contrib/debian debian; chmod +x debian/rules; cp -r contrib/debian debian; chmod +x debian/rules;
native-deb-utils: native-deb-local debian native-deb-utils: native-deb-local debian
while [ -f debian/deb-build.lock ]; do sleep 1; done; \
echo "native-deb-utils" > debian/deb-build.lock; \
cp contrib/debian/control debian/control; \ cp contrib/debian/control debian/control; \
$(DPKGBUILD) -b -rfakeroot -us -uc; $(DPKGBUILD) -b -rfakeroot -us -uc; \
$(RM) -f debian/deb-build.lock
native-deb-kmod: native-deb-local debian native-deb-kmod: native-deb-local debian
while [ -f debian/deb-build.lock ]; do sleep 1; done; \
echo "native-deb-kmod" > debian/deb-build.lock; \
sh scripts/make_gitrev.sh; \ sh scripts/make_gitrev.sh; \
fakeroot debian/rules override_dh_binary-modules; fakeroot debian/rules override_dh_binary-modules; \
$(RM) -f debian/deb-build.lock
native-deb: native-deb-utils native-deb-kmod native-deb: native-deb-utils native-deb-kmod
.NOTPARALLEL: native-deb native-deb-utils native-deb-kmod

View File

@ -17,14 +17,21 @@ AC_DEFUN([ZFS_AC_KERNEL_KTHREAD_COMPLETE_AND_EXIT], [
AC_DEFUN([ZFS_AC_KERNEL_KTHREAD_DEQUEUE_SIGNAL], [ AC_DEFUN([ZFS_AC_KERNEL_KTHREAD_DEQUEUE_SIGNAL], [
dnl # dnl #
dnl # 5.17 API: enum pid_type * as new 4th dequeue_signal() argument, dnl # prehistory:
dnl # 5768d8906bc23d512b1a736c1e198aa833a6daa4 ("signal: Requeue signals in the appropriate queue") dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask,
dnl # siginfo_t *info)
dnl # dnl #
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask, kernel_siginfo_t *info); dnl # 4.20: kernel_siginfo_t introduced, replaces siginfo_t
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type); dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask,
dnl kernel_siginfo_t *info)
dnl # dnl #
dnl # 6.12 API: first arg struct_task* removed dnl # 5.17: enum pid_type introduced as 4th arg
dnl # int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type); dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask,
dnl # kernel_siginfo_t *info, enum pid_type *type)
dnl #
dnl # 6.12: first arg struct_task* removed
dnl # int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info,
dnl # enum pid_type *type)
dnl # dnl #
AC_MSG_CHECKING([whether dequeue_signal() takes 4 arguments]) AC_MSG_CHECKING([whether dequeue_signal() takes 4 arguments])
ZFS_LINUX_TEST_RESULT([kthread_dequeue_signal_4arg], [ ZFS_LINUX_TEST_RESULT([kthread_dequeue_signal_4arg], [
@ -33,11 +40,11 @@ AC_DEFUN([ZFS_AC_KERNEL_KTHREAD_DEQUEUE_SIGNAL], [
[dequeue_signal() takes 4 arguments]) [dequeue_signal() takes 4 arguments])
], [ ], [
AC_MSG_RESULT(no) AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether dequeue_signal() a task argument]) AC_MSG_CHECKING([whether 3-arg dequeue_signal() takes a type argument])
ZFS_LINUX_TEST_RESULT([kthread_dequeue_signal_3arg_task], [ ZFS_LINUX_TEST_RESULT([kthread_dequeue_signal_3arg_type], [
AC_MSG_RESULT(yes) AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_DEQUEUE_SIGNAL_3ARG_TASK, 1, AC_DEFINE(HAVE_DEQUEUE_SIGNAL_3ARG_TYPE, 1,
[dequeue_signal() takes a task argument]) [3-arg dequeue_signal() takes a type argument])
], [ ], [
AC_MSG_RESULT(no) AC_MSG_RESULT(no)
]) ])
@ -56,17 +63,6 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_KTHREAD_COMPLETE_AND_EXIT], [
]) ])
AC_DEFUN([ZFS_AC_KERNEL_SRC_KTHREAD_DEQUEUE_SIGNAL], [ AC_DEFUN([ZFS_AC_KERNEL_SRC_KTHREAD_DEQUEUE_SIGNAL], [
ZFS_LINUX_TEST_SRC([kthread_dequeue_signal_3arg_task], [
#include <linux/sched/signal.h>
], [
struct task_struct *task = NULL;
sigset_t *mask = NULL;
kernel_siginfo_t *info = NULL;
int error __attribute__ ((unused));
error = dequeue_signal(task, mask, info);
])
ZFS_LINUX_TEST_SRC([kthread_dequeue_signal_4arg], [ ZFS_LINUX_TEST_SRC([kthread_dequeue_signal_4arg], [
#include <linux/sched/signal.h> #include <linux/sched/signal.h>
], [ ], [
@ -78,6 +74,17 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_KTHREAD_DEQUEUE_SIGNAL], [
error = dequeue_signal(task, mask, info, type); error = dequeue_signal(task, mask, info, type);
]) ])
ZFS_LINUX_TEST_SRC([kthread_dequeue_signal_3arg_type], [
#include <linux/sched/signal.h>
], [
sigset_t *mask = NULL;
kernel_siginfo_t *info = NULL;
enum pid_type *type = NULL;
int error __attribute__ ((unused));
error = dequeue_signal(mask, info, type);
])
]) ])
AC_DEFUN([ZFS_AC_KERNEL_KTHREAD], [ AC_DEFUN([ZFS_AC_KERNEL_KTHREAD], [

View File

@ -1,33 +0,0 @@
dnl #
dnl # Linux 5.18 uses invalidate_folio in lieu of invalidate_page
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_INVALIDATE_FOLIO], [
ZFS_LINUX_TEST_SRC([vfs_has_invalidate_folio], [
#include <linux/fs.h>
static void
test_invalidate_folio(struct folio *folio, size_t offset,
size_t len) {
(void) folio; (void) offset; (void) len;
return;
}
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.invalidate_folio = test_invalidate_folio,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_INVALIDATE_FOLIO], [
dnl #
dnl # Linux 5.18 uses invalidate_folio in lieu of invalidate_page
dnl #
AC_MSG_CHECKING([whether invalidate_folio exists])
ZFS_LINUX_TEST_RESULT([vfs_has_invalidate_folio], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_INVALIDATE_FOLIO, 1, [invalidate_folio exists])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -0,0 +1,27 @@
dnl #
dnl # Linux 6.0 uses migrate_folio in lieu of migrate_page
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_MIGRATE_FOLIO], [
ZFS_LINUX_TEST_SRC([vfs_has_migrate_folio], [
#include <linux/fs.h>
#include <linux/migrate.h>
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.migrate_folio = migrate_folio,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_MIGRATE_FOLIO], [
dnl #
dnl # Linux 6.0 uses migrate_folio in lieu of migrate_page
dnl #
AC_MSG_CHECKING([whether migrate_folio exists])
ZFS_LINUX_TEST_RESULT([vfs_has_migrate_folio], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_MIGRATE_FOLIO, 1, [migrate_folio exists])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -1,32 +0,0 @@
dnl #
dnl # Linux 5.19 uses release_folio in lieu of releasepage
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_RELEASE_FOLIO], [
ZFS_LINUX_TEST_SRC([vfs_has_release_folio], [
#include <linux/fs.h>
static bool
test_release_folio(struct folio *folio, gfp_t gfp) {
(void) folio; (void) gfp;
return (0);
}
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.release_folio = test_release_folio,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_RELEASE_FOLIO], [
dnl #
dnl # Linux 5.19 uses release_folio in lieu of releasepage
dnl #
AC_MSG_CHECKING([whether release_folio exists])
ZFS_LINUX_TEST_RESULT([vfs_has_release_folio], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_RELEASE_FOLIO, 1, [release_folio exists])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -77,8 +77,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_SGET ZFS_AC_KERNEL_SRC_SGET
ZFS_AC_KERNEL_SRC_VFS_FILEMAP_DIRTY_FOLIO ZFS_AC_KERNEL_SRC_VFS_FILEMAP_DIRTY_FOLIO
ZFS_AC_KERNEL_SRC_VFS_READ_FOLIO ZFS_AC_KERNEL_SRC_VFS_READ_FOLIO
ZFS_AC_KERNEL_SRC_VFS_RELEASE_FOLIO ZFS_AC_KERNEL_SRC_VFS_MIGRATE_FOLIO
ZFS_AC_KERNEL_SRC_VFS_INVALIDATE_FOLIO
ZFS_AC_KERNEL_SRC_VFS_FSYNC_2ARGS ZFS_AC_KERNEL_SRC_VFS_FSYNC_2ARGS
ZFS_AC_KERNEL_SRC_VFS_DIRECT_IO ZFS_AC_KERNEL_SRC_VFS_DIRECT_IO
ZFS_AC_KERNEL_SRC_VFS_READPAGES ZFS_AC_KERNEL_SRC_VFS_READPAGES
@ -189,8 +188,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_SGET ZFS_AC_KERNEL_SGET
ZFS_AC_KERNEL_VFS_FILEMAP_DIRTY_FOLIO ZFS_AC_KERNEL_VFS_FILEMAP_DIRTY_FOLIO
ZFS_AC_KERNEL_VFS_READ_FOLIO ZFS_AC_KERNEL_VFS_READ_FOLIO
ZFS_AC_KERNEL_VFS_RELEASE_FOLIO ZFS_AC_KERNEL_VFS_MIGRATE_FOLIO
ZFS_AC_KERNEL_VFS_INVALIDATE_FOLIO
ZFS_AC_KERNEL_VFS_FSYNC_2ARGS ZFS_AC_KERNEL_VFS_FSYNC_2ARGS
ZFS_AC_KERNEL_VFS_DIRECT_IO ZFS_AC_KERNEL_VFS_DIRECT_IO
ZFS_AC_KERNEL_VFS_READPAGES ZFS_AC_KERNEL_VFS_READPAGES

View File

@ -33,7 +33,7 @@ AC_DEFUN([ZFS_AC_CONFIG_USER], [
ZFS_AC_CONFIG_USER_MAKEDEV_IN_MKDEV ZFS_AC_CONFIG_USER_MAKEDEV_IN_MKDEV
ZFS_AC_CONFIG_USER_ZFSEXEC ZFS_AC_CONFIG_USER_ZFSEXEC
AC_CHECK_FUNCS([execvpe issetugid mlockall strlcat strlcpy gettid]) AC_CHECK_FUNCS([execvpe issetugid mlockall strerror_l strlcat strlcpy gettid])
AC_SUBST(RM) AC_SUBST(RM)
]) ])

View File

@ -12,14 +12,14 @@ dist_noinst_DATA += %D%/openzfs-libpam-zfs.postinst
dist_noinst_DATA += %D%/openzfs-libpam-zfs.prerm dist_noinst_DATA += %D%/openzfs-libpam-zfs.prerm
dist_noinst_DATA += %D%/openzfs-libuutil3.docs dist_noinst_DATA += %D%/openzfs-libuutil3.docs
dist_noinst_DATA += %D%/openzfs-libuutil3.install.in dist_noinst_DATA += %D%/openzfs-libuutil3.install.in
dist_noinst_DATA += %D%/openzfs-libzfs4.docs dist_noinst_DATA += %D%/openzfs-libzfs6.docs
dist_noinst_DATA += %D%/openzfs-libzfs4.install.in dist_noinst_DATA += %D%/openzfs-libzfs6.install.in
dist_noinst_DATA += %D%/openzfs-libzfsbootenv1.docs dist_noinst_DATA += %D%/openzfs-libzfsbootenv1.docs
dist_noinst_DATA += %D%/openzfs-libzfsbootenv1.install.in dist_noinst_DATA += %D%/openzfs-libzfsbootenv1.install.in
dist_noinst_DATA += %D%/openzfs-libzfs-dev.docs dist_noinst_DATA += %D%/openzfs-libzfs-dev.docs
dist_noinst_DATA += %D%/openzfs-libzfs-dev.install.in dist_noinst_DATA += %D%/openzfs-libzfs-dev.install.in
dist_noinst_DATA += %D%/openzfs-libzpool5.docs dist_noinst_DATA += %D%/openzfs-libzpool6.docs
dist_noinst_DATA += %D%/openzfs-libzpool5.install.in dist_noinst_DATA += %D%/openzfs-libzpool6.install.in
dist_noinst_DATA += %D%/openzfs-python3-pyzfs.install dist_noinst_DATA += %D%/openzfs-python3-pyzfs.install
dist_noinst_DATA += %D%/openzfs-zfs-dkms.config dist_noinst_DATA += %D%/openzfs-zfs-dkms.config
dist_noinst_DATA += %D%/openzfs-zfs-dkms.dkms dist_noinst_DATA += %D%/openzfs-zfs-dkms.dkms

View File

@ -6,6 +6,6 @@ contrib/pyzfs/libzfs_core/bindings/__pycache__/
contrib/pyzfs/pyzfs.egg-info/ contrib/pyzfs/pyzfs.egg-info/
debian/openzfs-libnvpair3.install debian/openzfs-libnvpair3.install
debian/openzfs-libuutil3.install debian/openzfs-libuutil3.install
debian/openzfs-libzfs4.install debian/openzfs-libzfs6.install
debian/openzfs-libzfs-dev.install debian/openzfs-libzfs-dev.install
debian/openzfs-libzpool5.install debian/openzfs-libzpool6.install

View File

@ -78,9 +78,9 @@ Architecture: linux-any
Depends: libssl-dev | libssl1.0-dev, Depends: libssl-dev | libssl1.0-dev,
openzfs-libnvpair3 (= ${binary:Version}), openzfs-libnvpair3 (= ${binary:Version}),
openzfs-libuutil3 (= ${binary:Version}), openzfs-libuutil3 (= ${binary:Version}),
openzfs-libzfs4 (= ${binary:Version}), openzfs-libzfs6 (= ${binary:Version}),
openzfs-libzfsbootenv1 (= ${binary:Version}), openzfs-libzfsbootenv1 (= ${binary:Version}),
openzfs-libzpool5 (= ${binary:Version}), openzfs-libzpool6 (= ${binary:Version}),
${misc:Depends} ${misc:Depends}
Replaces: libzfslinux-dev Replaces: libzfslinux-dev
Conflicts: libzfslinux-dev Conflicts: libzfslinux-dev
@ -90,18 +90,18 @@ Description: OpenZFS filesystem development files for Linux
libraries of OpenZFS filesystem. libraries of OpenZFS filesystem.
. .
This package includes the development files of libnvpair3, libuutil3, This package includes the development files of libnvpair3, libuutil3,
libzpool5 and libzfs4. libzpool6 and libzfs6.
Package: openzfs-libzfs4 Package: openzfs-libzfs6
Section: contrib/libs Section: contrib/libs
Architecture: linux-any Architecture: linux-any
Depends: ${misc:Depends}, ${shlibs:Depends} Depends: ${misc:Depends}, ${shlibs:Depends}
# The libcurl4 is loaded through dlopen("libcurl.so.4"). # The libcurl4 is loaded through dlopen("libcurl.so.4").
# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988521 # https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988521
Recommends: libcurl4 Recommends: libcurl4
Breaks: libzfs2, libzfs4 Breaks: libzfs2, libzfs4, libzfs4linux, libzfs6linux
Replaces: libzfs2, libzfs4, libzfs4linux Replaces: libzfs2, libzfs4, libzfs4linux, libzfs6linux
Conflicts: libzfs4linux Conflicts: libzfs6linux
Description: OpenZFS filesystem library for Linux - general support Description: OpenZFS filesystem library for Linux - general support
OpenZFS is a storage platform that encompasses the functionality of OpenZFS is a storage platform that encompasses the functionality of
traditional filesystems and volume managers. It supports data checksums, traditional filesystems and volume managers. It supports data checksums,
@ -123,13 +123,13 @@ Description: OpenZFS filesystem library for Linux - label info support
. .
The zfsbootenv library provides support for modifying ZFS label information. The zfsbootenv library provides support for modifying ZFS label information.
Package: openzfs-libzpool5 Package: openzfs-libzpool6
Section: contrib/libs Section: contrib/libs
Architecture: linux-any Architecture: linux-any
Depends: ${misc:Depends}, ${shlibs:Depends} Depends: ${misc:Depends}, ${shlibs:Depends}
Breaks: libzpool2, libzpool5 Breaks: libzpool2, libzpool5, libzpool5linux, libzpool6linux
Replaces: libzpool2, libzpool5, libzpool5linux Replaces: libzpool2, libzpool5, libzpool5linux, libzpool6linux
Conflicts: libzpool5linux Conflicts: libzpool6linux
Description: OpenZFS pool library for Linux Description: OpenZFS pool library for Linux
OpenZFS is a storage platform that encompasses the functionality of OpenZFS is a storage platform that encompasses the functionality of
traditional filesystems and volume managers. It supports data checksums, traditional filesystems and volume managers. It supports data checksums,
@ -246,8 +246,8 @@ Architecture: linux-any
Pre-Depends: ${misc:Pre-Depends} Pre-Depends: ${misc:Pre-Depends}
Depends: openzfs-libnvpair3 (= ${binary:Version}), Depends: openzfs-libnvpair3 (= ${binary:Version}),
openzfs-libuutil3 (= ${binary:Version}), openzfs-libuutil3 (= ${binary:Version}),
openzfs-libzfs4 (= ${binary:Version}), openzfs-libzfs6 (= ${binary:Version}),
openzfs-libzpool5 (= ${binary:Version}), openzfs-libzpool6 (= ${binary:Version}),
python3, python3,
${misc:Depends}, ${misc:Depends},
${shlibs:Depends} ${shlibs:Depends}

View File

@ -98,6 +98,7 @@ usr/share/man/man8/zpool-attach.8
usr/share/man/man8/zpool-checkpoint.8 usr/share/man/man8/zpool-checkpoint.8
usr/share/man/man8/zpool-clear.8 usr/share/man/man8/zpool-clear.8
usr/share/man/man8/zpool-create.8 usr/share/man/man8/zpool-create.8
usr/share/man/man8/zpool-ddtprune.8
usr/share/man/man8/zpool-destroy.8 usr/share/man/man8/zpool-destroy.8
usr/share/man/man8/zpool-detach.8 usr/share/man/man8/zpool-detach.8
usr/share/man/man8/zpool-ddtprune.8 usr/share/man/man8/zpool-ddtprune.8
@ -113,6 +114,7 @@ usr/share/man/man8/zpool-list.8
usr/share/man/man8/zpool-offline.8 usr/share/man/man8/zpool-offline.8
usr/share/man/man8/zpool-online.8 usr/share/man/man8/zpool-online.8
usr/share/man/man8/zpool-prefetch.8 usr/share/man/man8/zpool-prefetch.8
usr/share/man/man8/zpool-prefetch.8
usr/share/man/man8/zpool-reguid.8 usr/share/man/man8/zpool-reguid.8
usr/share/man/man8/zpool-remove.8 usr/share/man/man8/zpool-remove.8
usr/share/man/man8/zpool-reopen.8 usr/share/man/man8/zpool-reopen.8

View File

@ -344,7 +344,7 @@ mount_fs()
# Need the _original_ datasets mountpoint! # Need the _original_ datasets mountpoint!
mountpoint=$(get_fs_value "$fs" mountpoint) mountpoint=$(get_fs_value "$fs" mountpoint)
ZFS_CMD="mount -o zfsutil -t zfs" ZFS_CMD="mount.zfs -o zfsutil"
if [ "$mountpoint" = "legacy" ] || [ "$mountpoint" = "none" ]; then if [ "$mountpoint" = "legacy" ] || [ "$mountpoint" = "none" ]; then
# Can't use the mountpoint property. Might be one of our # Can't use the mountpoint property. Might be one of our
# clones. Check the 'org.zol:mountpoint' property set in # clones. Check the 'org.zol:mountpoint' property set in
@ -359,9 +359,8 @@ mount_fs()
# isn't the root fs. # isn't the root fs.
return 0 return 0
fi fi
# Don't use mount.zfs -o zfsutils for legacy mountpoint
if [ "$mountpoint" = "legacy" ]; then if [ "$mountpoint" = "legacy" ]; then
ZFS_CMD="mount -t zfs" ZFS_CMD="mount.zfs"
fi fi
# Last hail-mary: Hope 'rootmnt' is set! # Last hail-mary: Hope 'rootmnt' is set!
mountpoint="" mountpoint=""

View File

@ -276,7 +276,11 @@ _LIBZUTIL_H void update_vdev_config_dev_sysfs_path(nvlist_t *nv,
* Thread-safe strerror() for use in ZFS libraries * Thread-safe strerror() for use in ZFS libraries
*/ */
static inline char *zfs_strerror(int errnum) { static inline char *zfs_strerror(int errnum) {
#ifdef HAVE_STRERROR_L
return (strerror_l(errnum, uselocale(0))); return (strerror_l(errnum, uselocale(0)));
#else
return (strerror(errnum));
#endif
} }
#ifdef __cplusplus #ifdef __cplusplus

View File

@ -1,10 +1,5 @@
/* /*
* Copyright (c) 2010 Isilon Systems, Inc. * Copyright (c) 2024 Warner Losh.
* Copyright (c) 2010 iXsystems, Inc.
* Copyright (c) 2010 Panasas, Inc.
* Copyright (c) 2013-2016 Mellanox Technologies, Ltd.
* Copyright (c) 2015 François Tigeot
* All rights reserved.
* *
* Redistribution and use in source and binary forms, with or without * Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions * modification, are permitted provided that the following conditions
@ -26,76 +21,14 @@
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* $FreeBSD$
*/ */
#ifndef _LINUX_COMPILER_H_
#define _LINUX_COMPILER_H_
#include <sys/cdefs.h> /*
* FreeBSD's LinuxKPI compiler.h as far back as FreeBSD 12 has what we need,
* except zfs_fallthrough.
*/
#pragma once
#include <compat/linuxkpi/common/include/linux/compiler.h>
#define __user
#define __kernel
#define __safe
#define __force
#define __nocast
#define __iomem
#define __chk_user_ptr(x) ((void)0)
#define __chk_io_ptr(x) ((void)0)
#define __builtin_warning(x, y...) (1)
#define __acquires(x)
#define __releases(x)
#define __acquire(x) do { } while (0)
#define __release(x) do { } while (0)
#define __cond_lock(x, c) (c)
#define __bitwise
#define __devinitdata
#define __deprecated
#define __init
#define __initconst
#define __devinit
#define __devexit
#define __exit
#define __rcu
#define __percpu
#define __weak __weak_symbol
#define __malloc
#define ___stringify(...) #__VA_ARGS__
#define __stringify(...) ___stringify(__VA_ARGS__)
#define __attribute_const__ __attribute__((__const__))
#undef __always_inline
#define __always_inline inline
#define noinline __noinline
#define ____cacheline_aligned __aligned(CACHE_LINE_SIZE)
#define zfs_fallthrough __attribute__((__fallthrough__)) #define zfs_fallthrough __attribute__((__fallthrough__))
#if !defined(_KERNEL) && !defined(_STANDALONE)
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
#endif
#define typeof(x) __typeof(x)
#define uninitialized_var(x) x = x
#define __maybe_unused __unused
#define __always_unused __unused
#define __must_check __result_use_check
#define __printf(a, b) __printflike(a, b)
#define barrier() __asm__ __volatile__("": : :"memory")
#define ___PASTE(a, b) a##b
#define __PASTE(a, b) ___PASTE(a, b)
#define ACCESS_ONCE(x) (*(volatile __typeof(x) *)&(x))
#define WRITE_ONCE(x, v) do { \
barrier(); \
ACCESS_ONCE(x) = (v); \
barrier(); \
} while (0)
#define lockless_dereference(p) READ_ONCE(p)
#define _AT(T, X) ((T)(X))
#endif /* _LINUX_COMPILER_H_ */

View File

@ -70,15 +70,6 @@ hlist_del(struct hlist_node *n)
n->next->pprev = n->pprev; n->next->pprev = n->pprev;
} }
/* BEGIN CSTYLED */ /* BEGIN CSTYLED */
#define READ_ONCE(x) ({ \
__typeof(x) __var = ({ \
barrier(); \
ACCESS_ONCE(x); \
}); \
barrier(); \
__var; \
})
#define HLIST_HEAD_INIT { } #define HLIST_HEAD_INIT { }
#define HLIST_HEAD(name) struct hlist_head name = HLIST_HEAD_INIT #define HLIST_HEAD(name) struct hlist_head name = HLIST_HEAD_INIT
#define INIT_HLIST_HEAD(head) (head)->first = NULL #define INIT_HLIST_HEAD(head) (head)->first = NULL

View File

@ -95,10 +95,6 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
#ifndef expect #ifndef expect
#define expect(expr, value) (__builtin_expect((expr), (value))) #define expect(expr, value) (__builtin_expect((expr), (value)))
#endif #endif
#ifndef __linux__
#define likely(expr) expect((expr) != 0, 1)
#define unlikely(expr) expect((expr) != 0, 0)
#endif
#define PANIC(fmt, a...) \ #define PANIC(fmt, a...) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, fmt, ## a) spl_panic(__FILE__, __FUNCTION__, __LINE__, fmt, ## a)

View File

@ -50,7 +50,7 @@
#define kfpu_fini() do {} while (0) #define kfpu_fini() do {} while (0)
#endif #endif
#define simd_stat_init() 0 #define simd_stat_init() do {} while (0)
#define simd_stat_fini() 0 #define simd_stat_fini() do {} while (0)
#endif #endif

View File

@ -26,8 +26,10 @@
#ifndef _ABD_OS_H #ifndef _ABD_OS_H
#define _ABD_OS_H #define _ABD_OS_H
#ifdef _KERNEL
#include <sys/vm.h> #include <sys/vm.h>
#include <vm/vm_page.h> #include <vm/vm_page.h>
#endif
#ifdef __cplusplus #ifdef __cplusplus
extern "C" { extern "C" {
@ -47,8 +49,10 @@ struct abd_linear {
#endif #endif
}; };
#ifdef _KERNEL
__attribute__((malloc)) __attribute__((malloc))
struct abd *abd_alloc_from_pages(vm_page_t *, unsigned long, uint64_t); struct abd *abd_alloc_from_pages(vm_page_t *, unsigned long, uint64_t);
#endif
#ifdef __cplusplus #ifdef __cplusplus
} }

View File

@ -38,8 +38,7 @@
#include <sys/rwlock.h> #include <sys/rwlock.h>
#include <sys/wait.h> #include <sys/wait.h>
#include <sys/wmsum.h> #include <sys/wmsum.h>
#include <sys/kstat.h>
typedef struct kstat_s kstat_t;
#define TASKQ_NAMELEN 31 #define TASKQ_NAMELEN 31

View File

@ -42,7 +42,7 @@
#define TS_ZOMB EXIT_ZOMBIE #define TS_ZOMB EXIT_ZOMBIE
#define TS_STOPPED TASK_STOPPED #define TS_STOPPED TASK_STOPPED
typedef void (*thread_func_t)(void *); typedef void (*thread_func_t)(void *) __attribute__((noreturn));
#define thread_create_named(name, stk, stksize, func, arg, len, \ #define thread_create_named(name, stk, stksize, func, arg, len, \
pp, state, pri) \ pp, state, pri) \

View File

@ -30,6 +30,8 @@
extern "C" { extern "C" {
#endif #endif
struct abd;
struct abd_scatter { struct abd_scatter {
uint_t abd_offset; uint_t abd_offset;
uint_t abd_nents; uint_t abd_nents;
@ -41,10 +43,8 @@ struct abd_linear {
struct scatterlist *abd_sgl; /* for LINEAR_PAGE */ struct scatterlist *abd_sgl; /* for LINEAR_PAGE */
}; };
typedef struct abd abd_t;
typedef int abd_iter_page_func_t(struct page *, size_t, size_t, void *); typedef int abd_iter_page_func_t(struct page *, size_t, size_t, void *);
int abd_iterate_page_func(abd_t *, size_t, size_t, abd_iter_page_func_t *, int abd_iterate_page_func(struct abd *, size_t, size_t, abd_iter_page_func_t *,
void *); void *);
/* /*
@ -52,11 +52,11 @@ int abd_iterate_page_func(abd_t *, size_t, size_t, abd_iter_page_func_t *,
* Note: these are only needed to support vdev_classic. See comment in * Note: these are only needed to support vdev_classic. See comment in
* vdev_disk.c. * vdev_disk.c.
*/ */
unsigned int abd_bio_map_off(struct bio *, abd_t *, unsigned int, size_t); unsigned int abd_bio_map_off(struct bio *, struct abd *, unsigned int, size_t);
unsigned long abd_nr_pages_off(abd_t *, unsigned int, size_t); unsigned long abd_nr_pages_off(struct abd *, unsigned int, size_t);
__attribute__((malloc)) __attribute__((malloc))
abd_t *abd_alloc_from_pages(struct page **, unsigned long, uint64_t); struct abd *abd_alloc_from_pages(struct page **, unsigned long, uint64_t);
#ifdef __cplusplus #ifdef __cplusplus
} }

View File

@ -69,6 +69,7 @@ typedef struct vfs {
boolean_t vfs_do_relatime; boolean_t vfs_do_relatime;
boolean_t vfs_nbmand; boolean_t vfs_nbmand;
boolean_t vfs_do_nbmand; boolean_t vfs_do_nbmand;
kmutex_t vfs_mntpt_lock;
} vfs_t; } vfs_t;
typedef struct zfs_mnt { typedef struct zfs_mnt {

View File

@ -171,7 +171,6 @@ typedef struct dbuf_dirty_record {
* gets COW'd in a subsequent transaction group. * gets COW'd in a subsequent transaction group.
*/ */
arc_buf_t *dr_data; arc_buf_t *dr_data;
blkptr_t dr_overridden_by;
override_states_t dr_override_state; override_states_t dr_override_state;
uint8_t dr_copies; uint8_t dr_copies;
boolean_t dr_nopwrite; boolean_t dr_nopwrite;
@ -179,14 +178,21 @@ typedef struct dbuf_dirty_record {
boolean_t dr_diowrite; boolean_t dr_diowrite;
boolean_t dr_has_raw_params; boolean_t dr_has_raw_params;
/* /* Override and raw params are mutually exclusive. */
* If dr_has_raw_params is set, the following crypt union {
* params will be set on the BP that's written. blkptr_t dr_overridden_by;
*/ struct {
boolean_t dr_byteorder; /*
uint8_t dr_salt[ZIO_DATA_SALT_LEN]; * If dr_has_raw_params is set, the
uint8_t dr_iv[ZIO_DATA_IV_LEN]; * following crypt params will be set
uint8_t dr_mac[ZIO_DATA_MAC_LEN]; * on the BP that's written.
*/
boolean_t dr_byteorder;
uint8_t dr_salt[ZIO_DATA_SALT_LEN];
uint8_t dr_iv[ZIO_DATA_IV_LEN];
uint8_t dr_mac[ZIO_DATA_MAC_LEN];
};
};
} dl; } dl;
struct dirty_lightweight_leaf { struct dirty_lightweight_leaf {
/* /*
@ -264,6 +270,27 @@ typedef struct dmu_buf_impl {
*/ */
uint8_t db_level; uint8_t db_level;
/* This block was freed while a read or write was active. */
uint8_t db_freed_in_flight;
/*
* Evict user data as soon as the dirty and reference counts are equal.
*/
uint8_t db_user_immediate_evict;
/*
* dnode_evict_dbufs() or dnode_evict_bonus() tried to evict this dbuf,
* but couldn't due to outstanding references. Evict once the refcount
* drops to 0.
*/
uint8_t db_pending_evict;
/* Number of TXGs in which this buffer is dirty. */
uint8_t db_dirtycnt;
/* The buffer was partially read. More reads may follow. */
uint8_t db_partial_read;
/* /*
* Protects db_buf's contents if they contain an indirect block or data * Protects db_buf's contents if they contain an indirect block or data
* block of the meta-dnode. We use this lock to protect the structure of * block of the meta-dnode. We use this lock to protect the structure of
@ -288,6 +315,9 @@ typedef struct dmu_buf_impl {
*/ */
dbuf_states_t db_state; dbuf_states_t db_state;
/* In which dbuf cache this dbuf is, if any. */
dbuf_cached_state_t db_caching_status;
/* /*
* Refcount accessed by dmu_buf_{hold,rele}. * Refcount accessed by dmu_buf_{hold,rele}.
* If nonzero, the buffer can't be destroyed. * If nonzero, the buffer can't be destroyed.
@ -304,39 +334,10 @@ typedef struct dmu_buf_impl {
/* Link in dbuf_cache or dbuf_metadata_cache */ /* Link in dbuf_cache or dbuf_metadata_cache */
multilist_node_t db_cache_link; multilist_node_t db_cache_link;
/* Tells us which dbuf cache this dbuf is in, if any */
dbuf_cached_state_t db_caching_status;
uint64_t db_hash; uint64_t db_hash;
/* Data which is unique to data (leaf) blocks: */
/* User callback information. */ /* User callback information. */
dmu_buf_user_t *db_user; dmu_buf_user_t *db_user;
/*
* Evict user data as soon as the dirty and reference
* counts are equal.
*/
uint8_t db_user_immediate_evict;
/*
* This block was freed while a read or write was
* active.
*/
uint8_t db_freed_in_flight;
/*
* dnode_evict_dbufs() or dnode_evict_bonus() tried to
* evict this dbuf, but couldn't due to outstanding
* references. Evict once the refcount drops to 0.
*/
uint8_t db_pending_evict;
uint8_t db_dirtycnt;
/* The buffer was partially read. More reads may follow. */
uint8_t db_partial_read;
} dmu_buf_impl_t; } dmu_buf_impl_t;
#define DBUF_HASH_MUTEX(h, idx) \ #define DBUF_HASH_MUTEX(h, idx) \
@ -351,6 +352,8 @@ typedef struct dbuf_hash_table {
typedef void (*dbuf_prefetch_fn)(void *, uint64_t, uint64_t, boolean_t); typedef void (*dbuf_prefetch_fn)(void *, uint64_t, uint64_t, boolean_t);
extern kmem_cache_t *dbuf_dirty_kmem_cache;
uint64_t dbuf_whichblock(const struct dnode *di, const int64_t level, uint64_t dbuf_whichblock(const struct dnode *di, const int64_t level,
const uint64_t offset); const uint64_t offset);

View File

@ -42,7 +42,8 @@ extern "C" {
#define FM_EREPORT_ZFS_DATA "data" #define FM_EREPORT_ZFS_DATA "data"
#define FM_EREPORT_ZFS_DELAY "delay" #define FM_EREPORT_ZFS_DELAY "delay"
#define FM_EREPORT_ZFS_DEADMAN "deadman" #define FM_EREPORT_ZFS_DEADMAN "deadman"
#define FM_EREPORT_ZFS_DIO_VERIFY "dio_verify" #define FM_EREPORT_ZFS_DIO_VERIFY_WR "dio_verify_wr"
#define FM_EREPORT_ZFS_DIO_VERIFY_RD "dio_verify_rd"
#define FM_EREPORT_ZFS_POOL "zpool" #define FM_EREPORT_ZFS_POOL "zpool"
#define FM_EREPORT_ZFS_DEVICE_UNKNOWN "vdev.unknown" #define FM_EREPORT_ZFS_DEVICE_UNKNOWN "vdev.unknown"
#define FM_EREPORT_ZFS_DEVICE_OPEN_FAILED "vdev.open_failed" #define FM_EREPORT_ZFS_DEVICE_OPEN_FAILED "vdev.open_failed"

View File

@ -57,7 +57,7 @@ void vdev_raidz_reconstruct(struct raidz_map *, const int *, int);
void vdev_raidz_child_done(zio_t *); void vdev_raidz_child_done(zio_t *);
void vdev_raidz_io_done(zio_t *); void vdev_raidz_io_done(zio_t *);
void vdev_raidz_checksum_error(zio_t *, struct raidz_col *, abd_t *); void vdev_raidz_checksum_error(zio_t *, struct raidz_col *, abd_t *);
struct raidz_row *vdev_raidz_row_alloc(int); struct raidz_row *vdev_raidz_row_alloc(int, zio_t *);
void vdev_raidz_reflow_copy_scratch(spa_t *); void vdev_raidz_reflow_copy_scratch(spa_t *);
void raidz_dtl_reassessed(vdev_t *); void raidz_dtl_reassessed(vdev_t *);

View File

@ -208,25 +208,25 @@ typedef uint64_t zio_flag_t;
#define ZIO_FLAG_PROBE (1ULL << 16) #define ZIO_FLAG_PROBE (1ULL << 16)
#define ZIO_FLAG_TRYHARD (1ULL << 17) #define ZIO_FLAG_TRYHARD (1ULL << 17)
#define ZIO_FLAG_OPTIONAL (1ULL << 18) #define ZIO_FLAG_OPTIONAL (1ULL << 18)
#define ZIO_FLAG_DIO_READ (1ULL << 19)
#define ZIO_FLAG_VDEV_INHERIT (ZIO_FLAG_DONT_QUEUE - 1) #define ZIO_FLAG_VDEV_INHERIT (ZIO_FLAG_DONT_QUEUE - 1)
/* /*
* Flags not inherited by any children. * Flags not inherited by any children.
*/ */
#define ZIO_FLAG_DONT_QUEUE (1ULL << 19) /* must be first for INHERIT */ #define ZIO_FLAG_DONT_QUEUE (1ULL << 20) /* must be first for INHERIT */
#define ZIO_FLAG_DONT_PROPAGATE (1ULL << 20) #define ZIO_FLAG_DONT_PROPAGATE (1ULL << 21)
#define ZIO_FLAG_IO_BYPASS (1ULL << 21) #define ZIO_FLAG_IO_BYPASS (1ULL << 22)
#define ZIO_FLAG_IO_REWRITE (1ULL << 22) #define ZIO_FLAG_IO_REWRITE (1ULL << 23)
#define ZIO_FLAG_RAW_COMPRESS (1ULL << 23) #define ZIO_FLAG_RAW_COMPRESS (1ULL << 24)
#define ZIO_FLAG_RAW_ENCRYPT (1ULL << 24) #define ZIO_FLAG_RAW_ENCRYPT (1ULL << 25)
#define ZIO_FLAG_GANG_CHILD (1ULL << 25) #define ZIO_FLAG_GANG_CHILD (1ULL << 26)
#define ZIO_FLAG_DDT_CHILD (1ULL << 26) #define ZIO_FLAG_DDT_CHILD (1ULL << 27)
#define ZIO_FLAG_GODFATHER (1ULL << 27) #define ZIO_FLAG_GODFATHER (1ULL << 28)
#define ZIO_FLAG_NOPWRITE (1ULL << 28) #define ZIO_FLAG_NOPWRITE (1ULL << 29)
#define ZIO_FLAG_REEXECUTED (1ULL << 29) #define ZIO_FLAG_REEXECUTED (1ULL << 30)
#define ZIO_FLAG_DELEGATED (1ULL << 30) #define ZIO_FLAG_DELEGATED (1ULL << 31)
#define ZIO_FLAG_DIO_CHKSUM_ERR (1ULL << 31) #define ZIO_FLAG_DIO_CHKSUM_ERR (1ULL << 32)
#define ZIO_ALLOCATOR_NONE (-1) #define ZIO_ALLOCATOR_NONE (-1)
#define ZIO_HAS_ALLOCATOR(zio) ((zio)->io_allocator != ZIO_ALLOCATOR_NONE) #define ZIO_HAS_ALLOCATOR(zio) ((zio)->io_allocator != ZIO_ALLOCATOR_NONE)
@ -647,6 +647,7 @@ extern void zio_vdev_io_redone(zio_t *zio);
extern void zio_change_priority(zio_t *pio, zio_priority_t priority); extern void zio_change_priority(zio_t *pio, zio_priority_t priority);
extern void zio_checksum_verified(zio_t *zio); extern void zio_checksum_verified(zio_t *zio);
extern void zio_dio_chksum_verify_error_report(zio_t *zio);
extern int zio_worst_error(int e1, int e2); extern int zio_worst_error(int e1, int e2);
extern enum zio_checksum zio_checksum_select(enum zio_checksum child, extern enum zio_checksum zio_checksum_select(enum zio_checksum child,

View File

@ -25,19 +25,32 @@
#include <sys/backtrace.h> #include <sys/backtrace.h>
#include <sys/types.h> #include <sys/types.h>
#include <sys/debug.h>
#include <unistd.h> #include <unistd.h>
/* /*
* libspl_backtrace() must be safe to call from inside a signal hander. This * Output helpers. libspl_backtrace() must not block, must be thread-safe and
* mostly means it must not allocate, and so we can't use things like printf. * must be safe to call from a signal handler. At least, that means not having
* printf, so we end up having to call write() directly on the fd. That's
* awkward, as we always have to pass through a length, and some systems will
* complain if we don't consume the return. So we have some macros to make
* things a little more palatable.
*/ */
#define spl_bt_write_n(fd, s, n) \
do { ssize_t r __maybe_unused = write(fd, s, n); } while (0)
#define spl_bt_write(fd, s) spl_bt_write_n(fd, s, sizeof (s))
#if defined(HAVE_LIBUNWIND) #if defined(HAVE_LIBUNWIND)
#define UNW_LOCAL_ONLY #define UNW_LOCAL_ONLY
#include <libunwind.h> #include <libunwind.h>
/*
* Convert `v` to ASCII hex characters. The bottom `n` nybbles (4-bits ie one
* hex digit) will be written, up to `buflen`. The buffer will not be
* null-terminated. Returns the number of digits written.
*/
static size_t static size_t
libspl_u64_to_hex_str(uint64_t v, size_t digits, char *buf, size_t buflen) spl_bt_u64_to_hex_str(uint64_t v, size_t n, char *buf, size_t buflen)
{ {
static const char hexdigits[] = { static const char hexdigits[] = {
'0', '1', '2', '3', '4', '5', '6', '7', '0', '1', '2', '3', '4', '5', '6', '7',
@ -45,10 +58,10 @@ libspl_u64_to_hex_str(uint64_t v, size_t digits, char *buf, size_t buflen)
}; };
size_t pos = 0; size_t pos = 0;
boolean_t want = (digits == 0); boolean_t want = (n == 0);
for (int i = 15; i >= 0; i--) { for (int i = 15; i >= 0; i--) {
const uint64_t d = v >> (i * 4) & 0xf; const uint64_t d = v >> (i * 4) & 0xf;
if (!want && (d != 0 || digits > i)) if (!want && (d != 0 || n > i))
want = B_TRUE; want = B_TRUE;
if (want) { if (want) {
buf[pos++] = hexdigits[d]; buf[pos++] = hexdigits[d];
@ -62,40 +75,181 @@ libspl_u64_to_hex_str(uint64_t v, size_t digits, char *buf, size_t buflen)
void void
libspl_backtrace(int fd) libspl_backtrace(int fd)
{ {
ssize_t ret __attribute__((unused));
unw_context_t uc; unw_context_t uc;
unw_cursor_t cp; unw_cursor_t cp;
unw_word_t loc; unw_word_t v;
char buf[128]; char buf[128];
size_t n; size_t n;
int err;
ret = write(fd, "Call trace:\n", 12); /* Snapshot the current frame and state. */
unw_getcontext(&uc); unw_getcontext(&uc);
/*
* TODO: walk back to the frame that tripped the assertion / the place
* where the signal was recieved.
*/
/*
* Register dump. We're going to loop over all the registers in the
* top frame, and show them, with names, in a nice three-column
* layout, which keeps us within 80 columns.
*/
spl_bt_write(fd, "Registers:\n");
/* Initialise a frame cursor, starting at the current frame */
unw_init_local(&cp, &uc); unw_init_local(&cp, &uc);
while (unw_step(&cp) > 0) {
unw_get_reg(&cp, UNW_REG_IP, &loc); /*
ret = write(fd, " [0x", 5); * libunwind's list of possible registers for this architecture is an
n = libspl_u64_to_hex_str(loc, 10, buf, sizeof (buf)); * enum, unw_regnum_t. UNW_TDEP_LAST_REG is the highest-numbered
ret = write(fd, buf, n); * register in that list, however, not all register numbers in this
ret = write(fd, "] ", 2); * range are defined by the architecture, and not all defined registers
unw_get_proc_name(&cp, buf, sizeof (buf), &loc); * will be present on every implementation of that architecture.
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {} * Moreover, libunwind provides nice names for most, but not all
ret = write(fd, buf, n); * registers, but these are hardcoded; a name being available does not
ret = write(fd, "+0x", 3); * mean that register is available.
n = libspl_u64_to_hex_str(loc, 2, buf, sizeof (buf)); *
ret = write(fd, buf, n); * So, we have to pull this all together here. We try to get the value
#ifdef HAVE_LIBUNWIND_ELF * of every possible register. If we get a value for it, then the
ret = write(fd, " (in ", 5); * register must exist, and so we get its name. If libunwind has no
unw_get_elf_filename(&cp, buf, sizeof (buf), &loc); * name for it, we synthesize something. These cases should be rare,
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {} * and they're usually for uninteresting or niche registers, so it
ret = write(fd, buf, n); * shouldn't really matter. We can see the value, and that's the main
ret = write(fd, " +0x", 4); * thing.
n = libspl_u64_to_hex_str(loc, 2, buf, sizeof (buf)); */
ret = write(fd, buf, n); uint_t cols = 0;
ret = write(fd, ")", 1); for (uint_t regnum = 0; regnum <= UNW_TDEP_LAST_REG; regnum++) {
#endif /*
ret = write(fd, "\n", 1); * Get the value. Any error probably means the register
* doesn't exist, and we skip it.
*/
if (unw_get_reg(&cp, regnum, &v) < 0)
continue;
/*
* Register name. If libunwind doesn't have a name for it,
* it will return "???". As a shortcut, we just treat '?'
* is an alternate end-of-string character.
*/
const char *name = unw_regname(regnum);
for (n = 0; name[n] != '\0' && name[n] != '?'; n++) {}
if (n == 0) {
/*
* No valid name, so make one of the form "?xx", where
* "xx" is the two-char hex of libunwind's register
* number.
*/
buf[0] = '?';
n = spl_bt_u64_to_hex_str(regnum, 2,
&buf[1], sizeof (buf)-1) + 1;
name = buf;
}
/*
* Two spaces of padding before each column, plus extra
* spaces to align register names shorter than three chars.
*/
spl_bt_write_n(fd, " ", 5-MIN(n, 3));
/* Register name and column punctuation */
spl_bt_write_n(fd, name, n);
spl_bt_write(fd, ": 0x");
/*
* Convert register value (from unw_get_reg()) to hex. We're
* assuming that all registers are 64-bits wide, which is
* probably fine for any general-purpose registers on any
* machine currently in use. A more generic way would be to
* look at the width of unw_word_t, but that would also
* complicate the column code a bit. This is fine.
*/
n = spl_bt_u64_to_hex_str(v, 16, buf, sizeof (buf));
spl_bt_write_n(fd, buf, n);
/* Every third column, emit a newline */
if (!(++cols % 3))
spl_bt_write(fd, "\n");
} }
/* If we finished before the third column, emit a newline. */
if (cols % 3)
spl_bt_write(fd, "\n");
/* Now the main event, the backtrace. */
spl_bt_write(fd, "Call trace:\n");
/* Reset the cursor to the top again. */
unw_init_local(&cp, &uc);
do {
/*
* Getting the IP should never fail; libunwind handles it
* specially, because its used a lot internally. Still, no
* point being silly about it, as the last thing we want is
* our crash handler to crash. So if it ever does fail, we'll
* show an error line, but keep going to the next frame.
*/
if (unw_get_reg(&cp, UNW_REG_IP, &v) < 0) {
spl_bt_write(fd, " [couldn't get IP register; "
"corrupt frame?]");
continue;
}
/* IP & punctuation */
n = spl_bt_u64_to_hex_str(v, 16, buf, sizeof (buf));
spl_bt_write(fd, " [0x");
spl_bt_write_n(fd, buf, n);
spl_bt_write(fd, "] ");
/*
* Function ("procedure") name for the current frame. `v`
* receives the offset from the named function to the IP, which
* we show as a "+offset" suffix.
*
* If libunwind can't determine the name, we just show "???"
* instead. We've already displayed the IP above; that will
* have to do.
*
* unw_get_proc_name() will return ENOMEM if the buffer is too
* small, instead truncating the name. So we treat that as a
* success and use whatever is in the buffer.
*/
err = unw_get_proc_name(&cp, buf, sizeof (buf), &v);
if (err == 0 || err == -UNW_ENOMEM) {
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {}
spl_bt_write_n(fd, buf, n);
/* Offset from proc name */
spl_bt_write(fd, "+0x");
n = spl_bt_u64_to_hex_str(v, 2, buf, sizeof (buf));
spl_bt_write_n(fd, buf, n);
} else
spl_bt_write(fd, "???");
#ifdef HAVE_LIBUNWIND_ELF
/*
* Newer libunwind has unw_get_elf_filename(), which gets
* the name of the ELF object that the frame was executing in.
* Like `unw_get_proc_name()`, `v` recieves the offset within
* the file, and UNW_ENOMEM indicates that a truncate filename
* was left in the buffer.
*/
err = unw_get_elf_filename(&cp, buf, sizeof (buf), &v);
if (err == 0 || err == -UNW_ENOMEM) {
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {}
spl_bt_write(fd, " (in ");
spl_bt_write_n(fd, buf, n);
/* Offset within file */
spl_bt_write(fd, " +0x");
n = spl_bt_u64_to_hex_str(v, 2, buf, sizeof (buf));
spl_bt_write_n(fd, buf, n);
spl_bt_write(fd, ")");
}
#endif
spl_bt_write(fd, "\n");
} while (unw_step(&cp) > 0);
} }
#elif defined(HAVE_BACKTRACE) #elif defined(HAVE_BACKTRACE)
#include <execinfo.h> #include <execinfo.h>
@ -103,15 +257,12 @@ libspl_backtrace(int fd)
void void
libspl_backtrace(int fd) libspl_backtrace(int fd)
{ {
ssize_t ret __attribute__((unused));
void *btptrs[64]; void *btptrs[64];
size_t nptrs = backtrace(btptrs, 64); size_t nptrs = backtrace(btptrs, 64);
ret = write(fd, "Call trace:\n", 12); spl_bt_write(fd, "Call trace:\n");
backtrace_symbols_fd(btptrs, nptrs, fd); backtrace_symbols_fd(btptrs, nptrs, fd);
} }
#else #else
#include <sys/debug.h>
void void
libspl_backtrace(int fd __maybe_unused) libspl_backtrace(int fd __maybe_unused)
{ {

View File

@ -92,20 +92,20 @@ zfs_dio_page_aligned(void *buf)
static inline boolean_t static inline boolean_t
zfs_dio_offset_aligned(uint64_t offset, uint64_t blksz) zfs_dio_offset_aligned(uint64_t offset, uint64_t blksz)
{ {
return (IS_P2ALIGNED(offset, blksz)); return ((IS_P2ALIGNED(offset, blksz)) ? B_TRUE : B_FALSE);
} }
static inline boolean_t static inline boolean_t
zfs_dio_size_aligned(uint64_t size, uint64_t blksz) zfs_dio_size_aligned(uint64_t size, uint64_t blksz)
{ {
return ((size % blksz) == 0); return (((size % blksz) == 0) ? B_TRUE : B_FALSE);
} }
static inline boolean_t static inline boolean_t
zfs_dio_aligned(uint64_t offset, uint64_t size, uint64_t blksz) zfs_dio_aligned(uint64_t offset, uint64_t size, uint64_t blksz)
{ {
return (zfs_dio_offset_aligned(offset, blksz) && return ((zfs_dio_offset_aligned(offset, blksz) &&
zfs_dio_size_aligned(size, blksz)); zfs_dio_size_aligned(size, blksz)) ? B_TRUE : B_FALSE);
} }
static inline void static inline void

View File

@ -70,7 +70,7 @@ if BUILD_FREEBSD
libzfs_la_LIBADD += -lutil -lgeom libzfs_la_LIBADD += -lutil -lgeom
endif endif
libzfs_la_LDFLAGS += -version-info 5:0:1 libzfs_la_LDFLAGS += -version-info 6:0:0
pkgconfig_DATA += %D%/libzfs.pc pkgconfig_DATA += %D%/libzfs.pc

View File

@ -1,4 +1,4 @@
<abi-corpus version='2.0' architecture='elf-amd-x86_64' soname='libzfs.so.4'> <abi-corpus version='2.0' architecture='elf-amd-x86_64' soname='libzfs.so.6'>
<elf-needed> <elf-needed>
<dependency name='libzfs_core.so.3'/> <dependency name='libzfs_core.so.3'/>
<dependency name='libnvpair.so.3'/> <dependency name='libnvpair.so.3'/>

View File

@ -2796,7 +2796,7 @@ zpool_scan(zpool_handle_t *zhp, pool_scan_func_t func, pool_scrub_cmd_t cmd)
} }
/* /*
* With EBUSY, five cases are possible: * With EBUSY, six cases are possible:
* *
* Current state Requested * Current state Requested
* 1. Normal Scrub Running Normal Scrub or Error Scrub * 1. Normal Scrub Running Normal Scrub or Error Scrub

View File

@ -212,7 +212,7 @@ if BUILD_FREEBSD
libzpool_la_LIBADD += -lgeom libzpool_la_LIBADD += -lgeom
endif endif
libzpool_la_LDFLAGS += -version-info 5:0:0 libzpool_la_LDFLAGS += -version-info 6:0:0
if TARGET_CPU_POWERPC if TARGET_CPU_POWERPC
module/zfs/libzpool_la-vdev_raidz_math_powerpc_altivec.$(OBJEXT) : CFLAGS += -maltivec module/zfs/libzpool_la-vdev_raidz_math_powerpc_altivec.$(OBJEXT) : CFLAGS += -maltivec

View File

@ -35,9 +35,25 @@ typedef struct zfs_dbgmsg {
static list_t zfs_dbgmsgs; static list_t zfs_dbgmsgs;
static kmutex_t zfs_dbgmsgs_lock; static kmutex_t zfs_dbgmsgs_lock;
static uint_t zfs_dbgmsg_size = 0;
static uint_t zfs_dbgmsg_maxsize = 4<<20; /* 4MB */
int zfs_dbgmsg_enable = B_TRUE; int zfs_dbgmsg_enable = B_TRUE;
static void
zfs_dbgmsg_purge(uint_t max_size)
{
while (zfs_dbgmsg_size > max_size) {
zfs_dbgmsg_t *zdm = list_remove_head(&zfs_dbgmsgs);
if (zdm == NULL)
return;
uint_t size = zdm->zdm_size;
kmem_free(zdm, size);
zfs_dbgmsg_size -= size;
}
}
void void
zfs_dbgmsg_init(void) zfs_dbgmsg_init(void)
{ {
@ -74,6 +90,8 @@ __zfs_dbgmsg(char *buf)
mutex_enter(&zfs_dbgmsgs_lock); mutex_enter(&zfs_dbgmsgs_lock);
list_insert_tail(&zfs_dbgmsgs, zdm); list_insert_tail(&zfs_dbgmsgs, zdm);
zfs_dbgmsg_size += size;
zfs_dbgmsg_purge(zfs_dbgmsg_maxsize);
mutex_exit(&zfs_dbgmsgs_lock); mutex_exit(&zfs_dbgmsgs_lock);
} }

View File

@ -18,7 +18,7 @@
.\" .\"
.\" Copyright (c) 2024, Klara, Inc. .\" Copyright (c) 2024, Klara, Inc.
.\" .\"
.Dd October 2, 2024 .Dd November 1, 2024
.Dt ZFS 4 .Dt ZFS 4
.Os .Os
. .
@ -436,7 +436,7 @@ write.
It can also help to identify if reported checksum errors are tied to Direct I/O It can also help to identify if reported checksum errors are tied to Direct I/O
writes. writes.
Each verify error causes a Each verify error causes a
.Sy dio_verify .Sy dio_verify_wr
zevent. zevent.
Direct Write I/O checkum verify errors can be seen with Direct Write I/O checkum verify errors can be seen with
.Nm zpool Cm status Fl d . .Nm zpool Cm status Fl d .
@ -1333,9 +1333,10 @@ results in vector instructions
from the respective CPU instruction set being used. from the respective CPU instruction set being used.
. .
.It Sy zfs_bclone_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int .It Sy zfs_bclone_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
Enable the experimental block cloning feature. Enables access to the block cloning feature.
If this setting is 0, then even if feature@block_cloning is enabled, If this setting is 0, then even if feature@block_cloning is enabled,
attempts to clone blocks will act as though the feature is disabled. using functions and system calls that attempt to clone blocks will act as
though the feature is disabled.
. .
.It Sy zfs_bclone_wait_dirty Ns = Ns Sy 0 Ns | Ns 1 Pq int .It Sy zfs_bclone_wait_dirty Ns = Ns Sy 0 Ns | Ns 1 Pq int
When set to 1 the FICLONE and FICLONERANGE ioctls wait for dirty data to be When set to 1 the FICLONE and FICLONERANGE ioctls wait for dirty data to be

View File

@ -92,6 +92,11 @@ before a generic mapping for the same slot.
In this way a custom mapping may be applied to a particular channel In this way a custom mapping may be applied to a particular channel
and a default mapping applied to the others. and a default mapping applied to the others.
. .
.It Sy zpad_slot Ar digits
Pad slot numbers with zeros to make them
.Ar digits
long, which can help to make disk names a consistent length and easier to sort.
.
.It Sy multipath Sy yes Ns | Ns Sy no .It Sy multipath Sy yes Ns | Ns Sy no
Specifies whether Specifies whether
.Xr vdev_id 8 .Xr vdev_id 8
@ -122,7 +127,7 @@ device is connected to.
The default is The default is
.Sy 4 . .Sy 4 .
. .
.It Sy slot Sy bay Ns | Ns Sy phy Ns | Ns Sy port Ns | Ns Sy id Ns | Ns Sy lun Ns | Ns Sy ses .It Sy slot Sy bay Ns | Ns Sy phy Ns | Ns Sy port Ns | Ns Sy id Ns | Ns Sy lun Ns | Ns Sy bay_lun Ns | Ns Sy ses
Specifies from which element of a SAS identifier the slot number is Specifies from which element of a SAS identifier the slot number is
taken. taken.
The default is The default is
@ -138,6 +143,9 @@ use the SAS port as the slot number.
use the scsi id as the slot number. use the scsi id as the slot number.
.It Sy lun .It Sy lun
use the scsi lun as the slot number. use the scsi lun as the slot number.
.It Sy bay_lun
read the slot number from the bay identifier and append the lun number.
Useful for multi-lun multi-actuator hard drives.
.It Sy ses .It Sy ses
use the SCSI Enclosure Services (SES) enclosure device slot number, use the SCSI Enclosure Services (SES) enclosure device slot number,
as reported by as reported by

View File

@ -14,7 +14,7 @@
.\" Copyright (c) 2017 Lawrence Livermore National Security, LLC. .\" Copyright (c) 2017 Lawrence Livermore National Security, LLC.
.\" Copyright (c) 2017 Intel Corporation. .\" Copyright (c) 2017 Intel Corporation.
.\" .\"
.Dd November 18, 2023 .Dd October 27, 2024
.Dt ZDB 8 .Dt ZDB 8
.Os .Os
. .
@ -408,6 +408,8 @@ blocks cloned, the space saving as a result of cloning, and the saving ratio.
.It Fl TT .It Fl TT
Display the per-vdev BRT statistics, including total references. Display the per-vdev BRT statistics, including total references.
.It Fl TTT .It Fl TTT
Display histograms of per-vdev BRT refcounts.
.It Fl TTTT
Dump the contents of the block reference tables. Dump the contents of the block reference tables.
.It Fl u , -uberblock .It Fl u , -uberblock
Display the current uberblock. Display the current uberblock.

View File

@ -71,7 +71,7 @@ The following fields are displayed:
Used for scripting mode. Used for scripting mode.
Do not print headers and separate fields by a single tab instead of arbitrary Do not print headers and separate fields by a single tab instead of arbitrary
white space. white space.
.It Fl j Op Ar --json-int .It Fl j , -json Op Ar --json-int
Print the output in JSON format. Print the output in JSON format.
Specify Specify
.Sy --json-int .Sy --json-int

View File

@ -59,7 +59,7 @@
.Xc .Xc
Displays all ZFS file systems currently mounted. Displays all ZFS file systems currently mounted.
.Bl -tag -width "-j" .Bl -tag -width "-j"
.It Fl j .It Fl j , -json
Displays all mounted file systems in JSON format. Displays all mounted file systems in JSON format.
.El .El
.It Xo .It Xo

View File

@ -50,7 +50,7 @@ and any attempts to access or modify other pools will cause an error.
. .
.Sh OPTIONS .Sh OPTIONS
.Bl -tag -width "-t" .Bl -tag -width "-t"
.It Fl j .It Fl j , -json
Display channel program output in JSON format. Display channel program output in JSON format.
When this flag is specified and standard output is empty - When this flag is specified and standard output is empty -
channel program encountered an error. channel program encountered an error.

View File

@ -130,7 +130,7 @@ The value
can be used to display all properties that apply to the given dataset's type can be used to display all properties that apply to the given dataset's type
.Pq Sy filesystem , volume , snapshot , No or Sy bookmark . .Pq Sy filesystem , volume , snapshot , No or Sy bookmark .
.Bl -tag -width "-s source" .Bl -tag -width "-s source"
.It Fl j Op Ar --json-int .It Fl j , -json Op Ar --json-int
Display the output in JSON format. Display the output in JSON format.
Specify Specify
.Sy --json-int .Sy --json-int

View File

@ -98,7 +98,10 @@ This can be an indicator of problems with the underlying storage device.
The number of delay events is ratelimited by the The number of delay events is ratelimited by the
.Sy zfs_slow_io_events_per_second .Sy zfs_slow_io_events_per_second
module parameter. module parameter.
.It Sy dio_verify .It Sy dio_verify_rd
Issued when there was a checksum verify error after a Direct I/O read has been
issued.
.It Sy dio_verify_wr
Issued when there was a checksum verify error after a Direct I/O write has been Issued when there was a checksum verify error after a Direct I/O write has been
issued. issued.
This event can only take place if the module parameter This event can only take place if the module parameter

View File

@ -98,7 +98,7 @@ See the
.Xr zpoolprops 7 .Xr zpoolprops 7
manual page for more information on the available pool properties. manual page for more information on the available pool properties.
.Bl -tag -compact -offset Ds -width "-o field" .Bl -tag -compact -offset Ds -width "-o field"
.It Fl j Op Ar --json-int, --json-pool-key-guid .It Fl j , -json Op Ar --json-int, --json-pool-key-guid
Display the list of properties in JSON format. Display the list of properties in JSON format.
Specify Specify
.Sy --json-int .Sy --json-int
@ -157,7 +157,7 @@ See the
.Xr vdevprops 7 .Xr vdevprops 7
manual page for more information on the available pool properties. manual page for more information on the available pool properties.
.Bl -tag -compact -offset Ds -width "-o field" .Bl -tag -compact -offset Ds -width "-o field"
.It Fl j Op Ar --json-int .It Fl j , -json Op Ar --json-int
Display the list of properties in JSON format. Display the list of properties in JSON format.
Specify Specify
.Sy --json-int .Sy --json-int

View File

@ -59,7 +59,7 @@ is specified, the command exits after
.Ar count .Ar count
reports are printed. reports are printed.
.Bl -tag -width Ds .Bl -tag -width Ds
.It Fl j Op Ar --json-int, --json-pool-key-guid .It Fl j , -json Op Ar --json-int, --json-pool-key-guid
Display the list of pools in JSON format. Display the list of pools in JSON format.
Specify Specify
.Sy --json-int .Sy --json-int

View File

@ -70,7 +70,7 @@ See the
option of option of
.Nm zpool Cm iostat .Nm zpool Cm iostat
for complete details. for complete details.
.It Fl j Op Ar --json-int, --json-flat-vdevs, --json-pool-key-guid .It Fl j , -json Op Ar --json-int, --json-flat-vdevs, --json-pool-key-guid
Display the status for ZFS pools in JSON format. Display the status for ZFS pools in JSON format.
Specify Specify
.Sy --json-int .Sy --json-int
@ -82,14 +82,18 @@ Specify
.Sy --json-pool-key-guid .Sy --json-pool-key-guid
to set pool GUID as key for pool objects instead of pool names. to set pool GUID as key for pool objects instead of pool names.
.It Fl d .It Fl d
Display the number of Direct I/O write checksum verify errors that have occured Display the number of Direct I/O read/write checksum verify errors that have
on a top-level VDEV. occured on a top-level VDEV.
See See
.Sx zfs_vdev_direct_write_verify .Sx zfs_vdev_direct_write_verify
in in
.Xr zfs 4 .Xr zfs 4
for details about the conditions that can cause Direct I/O write checksum for details about the conditions that can cause Direct I/O write checksum
verify failures to occur. verify failures to occur.
Direct I/O reads checksum verify errors can also occur if the contents of the
buffer are being manipulated after the I/O has been issued and is in flight.
In the case of Direct I/O read checksum verify errors, the I/O will be reissued
through the ARC.
.It Fl D .It Fl D
Display a histogram of deduplication statistics, showing the allocated Display a histogram of deduplication statistics, showing the allocated
.Pq physically present on disk .Pq physically present on disk

View File

@ -620,9 +620,16 @@ abd_borrow_buf_copy(abd_t *abd, size_t n)
/* /*
* Return a borrowed raw buffer to an ABD. If the ABD is scattered, this will * Return a borrowed raw buffer to an ABD. If the ABD is scattered, this will
* no change the contents of the ABD and will ASSERT that you didn't modify * not change the contents of the ABD. If you want any changes you made to
* the buffer since it was borrowed. If you want any changes you made to buf to * buf to be copied back to abd, use abd_return_buf_copy() instead. If the
* be copied back to abd, use abd_return_buf_copy() instead. * ABD is not constructed from user pages from Direct I/O then an ASSERT
* checks to make sure the contents of the buffer have not changed since it was
* borrowed. We can not ASSERT the contents of the buffer have not changed if
* it is composed of user pages. While Direct I/O write pages are placed under
* write protection and can not be changed, this is not the case for Direct I/O
* reads. The pages of a Direct I/O read could be manipulated at any time.
* Checksum verifications in the ZIO pipeline check for this issue and handle
* it by returning an error on checksum verification failure.
*/ */
void void
abd_return_buf(abd_t *abd, void *buf, size_t n) abd_return_buf(abd_t *abd, void *buf, size_t n)
@ -632,8 +639,34 @@ abd_return_buf(abd_t *abd, void *buf, size_t n)
#ifdef ZFS_DEBUG #ifdef ZFS_DEBUG
(void) zfs_refcount_remove_many(&abd->abd_children, n, buf); (void) zfs_refcount_remove_many(&abd->abd_children, n, buf);
#endif #endif
if (abd_is_linear(abd)) { if (abd_is_from_pages(abd)) {
if (!abd_is_linear_page(abd))
zio_buf_free(buf, n);
} else if (abd_is_linear(abd)) {
ASSERT3P(buf, ==, abd_to_buf(abd)); ASSERT3P(buf, ==, abd_to_buf(abd));
} else if (abd_is_gang(abd)) {
#ifdef ZFS_DEBUG
/*
* We have to be careful with gang ABD's that we do not ASSERT
* for any ABD's that contain user pages from Direct I/O. See
* the comment above about Direct I/O read buffers possibly
* being manipulated. In order to handle this, we jsut iterate
* through the gang ABD and only verify ABD's that are not from
* user pages.
*/
void *cmp_buf = buf;
for (abd_t *cabd = list_head(&ABD_GANG(abd).abd_gang_chain);
cabd != NULL;
cabd = list_next(&ABD_GANG(abd).abd_gang_chain, cabd)) {
if (!abd_is_from_pages(cabd)) {
ASSERT0(abd_cmp_buf(cabd, cmp_buf,
cabd->abd_size));
}
cmp_buf = (char *)cmp_buf + cabd->abd_size;
}
#endif
zio_buf_free(buf, n);
} else { } else {
ASSERT0(abd_cmp_buf(abd, buf, n)); ASSERT0(abd_cmp_buf(abd, buf, n));
zio_buf_free(buf, n); zio_buf_free(buf, n);

View File

@ -1686,11 +1686,10 @@ zio_do_crypt_data(boolean_t encrypt, zio_crypt_key_t *key,
freebsd_crypt_session_t *tmpl = NULL; freebsd_crypt_session_t *tmpl = NULL;
uint8_t *authbuf = NULL; uint8_t *authbuf = NULL;
memset(&puio_s, 0, sizeof (puio_s));
memset(&cuio_s, 0, sizeof (cuio_s));
zfs_uio_init(&puio, &puio_s); zfs_uio_init(&puio, &puio_s);
zfs_uio_init(&cuio, &cuio_s); zfs_uio_init(&cuio, &cuio_s);
memset(GET_UIO_STRUCT(&puio), 0, sizeof (struct uio));
memset(GET_UIO_STRUCT(&cuio), 0, sizeof (struct uio));
#ifdef FCRYPTO_DEBUG #ifdef FCRYPTO_DEBUG
printf("%s(%s, %p, %p, %d, %p, %p, %u, %s, %p, %p, %p)\n", printf("%s(%s, %p, %p, %d, %p, %p, %u, %s, %p, %p, %p)\n",

View File

@ -171,11 +171,11 @@ issig(void)
#if defined(HAVE_DEQUEUE_SIGNAL_4ARG) #if defined(HAVE_DEQUEUE_SIGNAL_4ARG)
enum pid_type __type; enum pid_type __type;
if (dequeue_signal(current, &set, &__info, &__type) != 0) { if (dequeue_signal(current, &set, &__info, &__type) != 0) {
#elif defined(HAVE_DEQUEUE_SIGNAL_3ARG_TASK) #elif defined(HAVE_DEQUEUE_SIGNAL_3ARG_TYPE)
if (dequeue_signal(current, &set, &__info) != 0) {
#else
enum pid_type __type; enum pid_type __type;
if (dequeue_signal(&set, &__info, &__type) != 0) { if (dequeue_signal(&set, &__info, &__type) != 0) {
#else
if (dequeue_signal(current, &set, &__info) != 0) {
#endif #endif
spin_unlock_irq(&current->sighand->siglock); spin_unlock_irq(&current->sighand->siglock);
kernel_signal_stop(); kernel_signal_stop();

View File

@ -701,6 +701,8 @@ abd_free_linear_page(abd_t *abd)
/* When backed by user page unmap it */ /* When backed by user page unmap it */
if (abd_is_from_pages(abd)) if (abd_is_from_pages(abd))
zfs_kunmap(sg_page(sg)); zfs_kunmap(sg_page(sg));
else
abd_update_scatter_stats(abd, ABDSTAT_DECR);
abd->abd_flags &= ~ABD_FLAG_LINEAR; abd->abd_flags &= ~ABD_FLAG_LINEAR;
abd->abd_flags &= ~ABD_FLAG_LINEAR_PAGE; abd->abd_flags &= ~ABD_FLAG_LINEAR_PAGE;
@ -1008,7 +1010,9 @@ abd_borrow_buf_copy(abd_t *abd, size_t n)
* borrowed. We can not ASSERT that the contents of the buffer have not changed * borrowed. We can not ASSERT that the contents of the buffer have not changed
* if it is composed of user pages because the pages can not be placed under * if it is composed of user pages because the pages can not be placed under
* write protection and the user could have possibly changed the contents in * write protection and the user could have possibly changed the contents in
* the pages at any time. * the pages at any time. This is also an issue for Direct I/O reads. Checksum
* verifications in the ZIO pipeline check for this issue and handle it by
* returning an error on checksum verification failure.
*/ */
void void
abd_return_buf(abd_t *abd, void *buf, size_t n) abd_return_buf(abd_t *abd, void *buf, size_t n)

View File

@ -801,24 +801,13 @@ vbio_completion(struct bio *bio)
bio_put(bio); bio_put(bio);
/* /*
* If we copied the ABD before issuing it, clean up and return the copy * We're likely in an interrupt context so we can't do ABD/memory work
* to the ADB, with changes if appropriate. * here; instead we stash vbio on the zio and take care of it in the
* done callback.
*/ */
if (vbio->vbio_abd != NULL) { ASSERT3P(zio->io_bio, ==, NULL);
void *buf = abd_to_buf(vbio->vbio_abd); zio->io_bio = vbio;
abd_free(vbio->vbio_abd);
vbio->vbio_abd = NULL;
if (zio->io_type == ZIO_TYPE_READ)
abd_return_buf_copy(zio->io_abd, buf, zio->io_size);
else
abd_return_buf(zio->io_abd, buf, zio->io_size);
}
/* Final cleanup */
kmem_free(vbio, sizeof (vbio_t));
/* All done, submit for processing */
zio_delay_interrupt(zio); zio_delay_interrupt(zio);
} }
@ -834,38 +823,61 @@ vbio_completion(struct bio *bio)
* NOTE: if you change this function, change the copy in * NOTE: if you change this function, change the copy in
* tests/zfs-tests/tests/functional/vdev_disk/page_alignment.c, and add test * tests/zfs-tests/tests/functional/vdev_disk/page_alignment.c, and add test
* data there to validate the change you're making. * data there to validate the change you're making.
*
*/ */
typedef struct { typedef struct {
uint_t bmask; size_t blocksize;
uint_t npages; int seen_first;
uint_t end; int seen_last;
} vdev_disk_check_pages_t; } vdev_disk_check_alignment_t;
static int static int
vdev_disk_check_pages_cb(struct page *page, size_t off, size_t len, void *priv) vdev_disk_check_alignment_cb(struct page *page, size_t off, size_t len,
void *priv)
{ {
(void) page; (void) page;
vdev_disk_check_pages_t *s = priv; vdev_disk_check_alignment_t *s = priv;
/* /*
* If we didn't finish on a block size boundary last time, then there * The cardinal rule: a single on-disk block must never cross an
* would be a gap if we tried to use this ABD as-is, so abort. * physical (order-0) page boundary, as the kernel expects to be able
* to split at both LBS and page boundaries.
*
* This implies various alignment rules for the blocks in this
* (possibly compound) page, which we can check for.
*/ */
if (s->end != 0)
/*
* If the previous page did not end on a page boundary, then we
* can't proceed without creating a hole.
*/
if (s->seen_last)
return (1);
/* This page must contain only whole LBS-sized blocks. */
if (!IS_P2ALIGNED(len, s->blocksize))
return (1); return (1);
/* /*
* Note if we're taking less than a full block, so we can check it * If this is not the first page in the ABD, then the data must start
* above on the next call. * on a page-aligned boundary (so the kernel can split on page
* boundaries without having to deal with a hole). If it is, then
* it can start on LBS-alignment.
*/ */
s->end = (off+len) & s->bmask; if (s->seen_first) {
if (!IS_P2ALIGNED(off, PAGESIZE))
return (1);
} else {
if (!IS_P2ALIGNED(off, s->blocksize))
return (1);
s->seen_first = 1;
}
/* All blocks after the first must start on a block size boundary. */ /*
if (s->npages != 0 && (off & s->bmask) != 0) * If this data does not end on a page-aligned boundary, then this
return (1); * must be the last page in the ABD, for the same reason.
*/
s->seen_last = !IS_P2ALIGNED(off+len, PAGESIZE);
s->npages++;
return (0); return (0);
} }
@ -874,15 +886,14 @@ vdev_disk_check_pages_cb(struct page *page, size_t off, size_t len, void *priv)
* the number of pages, or 0 if it can't be submitted like this. * the number of pages, or 0 if it can't be submitted like this.
*/ */
static boolean_t static boolean_t
vdev_disk_check_pages(abd_t *abd, uint64_t size, struct block_device *bdev) vdev_disk_check_alignment(abd_t *abd, uint64_t size, struct block_device *bdev)
{ {
vdev_disk_check_pages_t s = { vdev_disk_check_alignment_t s = {
.bmask = bdev_logical_block_size(bdev)-1, .blocksize = bdev_logical_block_size(bdev),
.npages = 0,
.end = 0,
}; };
if (abd_iterate_page_func(abd, 0, size, vdev_disk_check_pages_cb, &s)) if (abd_iterate_page_func(abd, 0, size,
vdev_disk_check_alignment_cb, &s))
return (B_FALSE); return (B_FALSE);
return (B_TRUE); return (B_TRUE);
@ -916,37 +927,32 @@ vdev_disk_io_rw(zio_t *zio)
/* /*
* Check alignment of the incoming ABD. If any part of it would require * Check alignment of the incoming ABD. If any part of it would require
* submitting a page that is not aligned to the logical block size, * submitting a page that is not aligned to both the logical block size
* then we take a copy into a linear buffer and submit that instead. * and the page size, then we take a copy into a new memory region with
* This should be impossible on a 512b LBS, and fairly rare on 4K, * correct alignment. This should be impossible on a 512b LBS. On
* usually requiring abnormally-small data blocks (eg gang blocks) * larger blocks, this can happen at least when a small number of
* mixed into the same ABD as larger ones (eg aggregated). * blocks (usually 1) are allocated from a shared slab, or when
* abnormally-small data regions (eg gang headers) are mixed into the
* same ABD as larger allocations (eg aggregations).
*/ */
abd_t *abd = zio->io_abd; abd_t *abd = zio->io_abd;
if (!vdev_disk_check_pages(abd, zio->io_size, bdev)) { if (!vdev_disk_check_alignment(abd, zio->io_size, bdev)) {
void *buf; /* Allocate a new memory region with guaranteed alignment */
if (zio->io_type == ZIO_TYPE_READ) abd = abd_alloc_for_io(zio->io_size,
buf = abd_borrow_buf(zio->io_abd, zio->io_size); zio->io_abd->abd_flags & ABD_FLAG_META);
else
buf = abd_borrow_buf_copy(zio->io_abd, zio->io_size); /* If we're writing copy our data into it */
if (zio->io_type == ZIO_TYPE_WRITE)
abd_copy(abd, zio->io_abd, zio->io_size);
/* /*
* Wrap the copy in an abd_t, so we can use the same iterators * False here would mean the new allocation has an invalid
* to count and fill the vbio later. * alignment too, which would mean that abd_alloc() is not
*/ * guaranteeing this, or our logic in
abd = abd_get_from_buf(buf, zio->io_size); * vdev_disk_check_alignment() is wrong. In either case,
/*
* False here would mean the borrowed copy has an invalid
* alignment too, which would mean we've somehow been passed a
* linear ABD with an interior page that has a non-zero offset
* or a size not a multiple of PAGE_SIZE. This is not possible.
* It would mean either zio_buf_alloc() or its underlying
* allocators have done something extremely strange, or our
* math in vdev_disk_check_pages() is wrong. In either case,
* something in seriously wrong and its not safe to continue. * something in seriously wrong and its not safe to continue.
*/ */
VERIFY(vdev_disk_check_pages(abd, zio->io_size, bdev)); VERIFY(vdev_disk_check_alignment(abd, zio->io_size, bdev));
} }
/* Allocate vbio, with a pointer to the borrowed ABD if necessary */ /* Allocate vbio, with a pointer to the borrowed ABD if necessary */
@ -1437,6 +1443,28 @@ vdev_disk_io_start(zio_t *zio)
static void static void
vdev_disk_io_done(zio_t *zio) vdev_disk_io_done(zio_t *zio)
{ {
/* If this was a read or write, we need to clean up the vbio */
if (zio->io_bio != NULL) {
vbio_t *vbio = zio->io_bio;
zio->io_bio = NULL;
/*
* If we copied the ABD before issuing it, clean up and return
* the copy to the ADB, with changes if appropriate.
*/
if (vbio->vbio_abd != NULL) {
if (zio->io_type == ZIO_TYPE_READ)
abd_copy(zio->io_abd, vbio->vbio_abd,
zio->io_size);
abd_free(vbio->vbio_abd);
vbio->vbio_abd = NULL;
}
/* Final cleanup */
kmem_free(vbio, sizeof (vbio_t));
}
/* /*
* If the device returned EIO, we revalidate the media. If it is * If the device returned EIO, we revalidate the media. If it is
* determined the media has changed this triggers the asynchronous * determined the media has changed this triggers the asynchronous

View File

@ -33,11 +33,13 @@
#include <sys/fs/zfs.h> #include <sys/fs/zfs.h>
#include <sys/fm/fs/zfs.h> #include <sys/fm/fs/zfs.h>
#include <sys/abd.h> #include <sys/abd.h>
#include <sys/fcntl.h>
#include <sys/vnode.h> #include <sys/vnode.h>
#include <sys/zfs_file.h> #include <sys/zfs_file.h>
#ifdef _KERNEL #ifdef _KERNEL
#include <linux/falloc.h> #include <linux/falloc.h>
#include <sys/fcntl.h>
#else
#include <fcntl.h>
#endif #endif
/* /*
* Virtual device vector for files. * Virtual device vector for files.

View File

@ -767,9 +767,6 @@ zfsctl_snapshot_path_objset(zfsvfs_t *zfsvfs, uint64_t objsetid,
uint64_t id, pos = 0; uint64_t id, pos = 0;
int error = 0; int error = 0;
if (zfsvfs->z_vfs->vfs_mntpoint == NULL)
return (SET_ERROR(ENOENT));
cookie = spl_fstrans_mark(); cookie = spl_fstrans_mark();
snapname = kmem_alloc(ZFS_MAX_DATASET_NAME_LEN, KM_SLEEP); snapname = kmem_alloc(ZFS_MAX_DATASET_NAME_LEN, KM_SLEEP);
@ -786,8 +783,14 @@ zfsctl_snapshot_path_objset(zfsvfs_t *zfsvfs, uint64_t objsetid,
break; break;
} }
snprintf(full_path, path_len, "%s/.zfs/snapshot/%s", mutex_enter(&zfsvfs->z_vfs->vfs_mntpt_lock);
zfsvfs->z_vfs->vfs_mntpoint, snapname); if (zfsvfs->z_vfs->vfs_mntpoint != NULL) {
snprintf(full_path, path_len, "%s/.zfs/snapshot/%s",
zfsvfs->z_vfs->vfs_mntpoint, snapname);
} else
error = SET_ERROR(ENOENT);
mutex_exit(&zfsvfs->z_vfs->vfs_mntpt_lock);
out: out:
kmem_free(snapname, ZFS_MAX_DATASET_NAME_LEN); kmem_free(snapname, ZFS_MAX_DATASET_NAME_LEN);
spl_fstrans_unmark(cookie); spl_fstrans_unmark(cookie);
@ -1049,6 +1052,66 @@ exportfs_flush(void)
(void) call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC); (void) call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
} }
/*
* Returns the path in char format for given struct path. Uses
* d_path exported by kernel to convert struct path to char
* format. Returns the correct path for mountpoints and chroot
* environments.
*
* If chroot environment has directories that are mounted with
* --bind or --rbind flag, d_path returns the complete path inside
* chroot environment but does not return the absolute path, i.e.
* the path to chroot environment is missing.
*/
static int
get_root_path(struct path *path, char *buff, int len)
{
char *path_buffer, *path_ptr;
int error = 0;
path_get(path);
path_buffer = kmem_zalloc(len, KM_SLEEP);
path_ptr = d_path(path, path_buffer, len);
if (IS_ERR(path_ptr))
error = SET_ERROR(-PTR_ERR(path_ptr));
else
strcpy(buff, path_ptr);
kmem_free(path_buffer, len);
path_put(path);
return (error);
}
/*
* Returns if the current process root is chrooted or not. Linux
* kernel exposes the task_struct for current process and init.
* Since init process root points to actual root filesystem when
* Linux runtime is reached, we can compare the current process
* root with init process root to determine if root of the current
* process is different from init, which can reliably determine if
* current process is in chroot context or not.
*/
static int
is_current_chrooted(void)
{
struct task_struct *curr = current, *global = &init_task;
struct path cr_root, gl_root;
task_lock(curr);
get_fs_root(curr->fs, &cr_root);
task_unlock(curr);
task_lock(global);
get_fs_root(global->fs, &gl_root);
task_unlock(global);
int chrooted = !path_equal(&cr_root, &gl_root);
path_put(&gl_root);
path_put(&cr_root);
return (chrooted);
}
/* /*
* Attempt to unmount a snapshot by making a call to user space. * Attempt to unmount a snapshot by making a call to user space.
* There is no assurance that this can or will succeed, is just a * There is no assurance that this can or will succeed, is just a
@ -1123,14 +1186,50 @@ zfsctl_snapshot_mount(struct path *path, int flags)
if (error) if (error)
goto error; goto error;
if (is_current_chrooted() == 0) {
/*
* Current process is not in chroot context
*/
char *m = kmem_zalloc(MAXPATHLEN, KM_SLEEP);
struct path mnt_path;
mnt_path.mnt = path->mnt;
mnt_path.dentry = path->mnt->mnt_root;
/*
* Get path to current mountpoint
*/
error = get_root_path(&mnt_path, m, MAXPATHLEN);
if (error != 0) {
kmem_free(m, MAXPATHLEN);
goto error;
}
mutex_enter(&zfsvfs->z_vfs->vfs_mntpt_lock);
if (zfsvfs->z_vfs->vfs_mntpoint != NULL) {
/*
* If current mnountpoint and vfs_mntpoint are not same,
* store current mountpoint in vfs_mntpoint.
*/
if (strcmp(zfsvfs->z_vfs->vfs_mntpoint, m) != 0) {
kmem_strfree(zfsvfs->z_vfs->vfs_mntpoint);
zfsvfs->z_vfs->vfs_mntpoint = kmem_strdup(m);
}
} else
zfsvfs->z_vfs->vfs_mntpoint = kmem_strdup(m);
mutex_exit(&zfsvfs->z_vfs->vfs_mntpt_lock);
kmem_free(m, MAXPATHLEN);
}
/* /*
* Construct a mount point path from sb of the ctldir inode and dirent * Construct a mount point path from sb of the ctldir inode and dirent
* name, instead of from d_path(), so that chroot'd process doesn't fail * name, instead of from d_path(), so that chroot'd process doesn't fail
* on mount.zfs(8). * on mount.zfs(8).
*/ */
mutex_enter(&zfsvfs->z_vfs->vfs_mntpt_lock);
snprintf(full_path, MAXPATHLEN, "%s/.zfs/snapshot/%s", snprintf(full_path, MAXPATHLEN, "%s/.zfs/snapshot/%s",
zfsvfs->z_vfs->vfs_mntpoint ? zfsvfs->z_vfs->vfs_mntpoint : "", zfsvfs->z_vfs->vfs_mntpoint ? zfsvfs->z_vfs->vfs_mntpoint : "",
dname(dentry)); dname(dentry));
mutex_exit(&zfsvfs->z_vfs->vfs_mntpt_lock);
snprintf(options, 7, "%s", snprintf(options, 7, "%s",
zfs_snapshot_no_setuid ? "nosuid" : "suid"); zfs_snapshot_no_setuid ? "nosuid" : "suid");

View File

@ -115,7 +115,7 @@ zfsvfs_vfs_free(vfs_t *vfsp)
if (vfsp != NULL) { if (vfsp != NULL) {
if (vfsp->vfs_mntpoint != NULL) if (vfsp->vfs_mntpoint != NULL)
kmem_strfree(vfsp->vfs_mntpoint); kmem_strfree(vfsp->vfs_mntpoint);
mutex_destroy(&vfsp->vfs_mntpt_lock);
kmem_free(vfsp, sizeof (vfs_t)); kmem_free(vfsp, sizeof (vfs_t));
} }
} }
@ -197,10 +197,11 @@ zfsvfs_parse_option(char *option, int token, substring_t *args, vfs_t *vfsp)
vfsp->vfs_do_nbmand = B_TRUE; vfsp->vfs_do_nbmand = B_TRUE;
break; break;
case TOKEN_MNTPOINT: case TOKEN_MNTPOINT:
if (vfsp->vfs_mntpoint != NULL)
kmem_strfree(vfsp->vfs_mntpoint);
vfsp->vfs_mntpoint = match_strdup(&args[0]); vfsp->vfs_mntpoint = match_strdup(&args[0]);
if (vfsp->vfs_mntpoint == NULL) if (vfsp->vfs_mntpoint == NULL)
return (SET_ERROR(ENOMEM)); return (SET_ERROR(ENOMEM));
break; break;
default: default:
break; break;
@ -219,6 +220,7 @@ zfsvfs_parse_options(char *mntopts, vfs_t **vfsp)
int error; int error;
tmp_vfsp = kmem_zalloc(sizeof (vfs_t), KM_SLEEP); tmp_vfsp = kmem_zalloc(sizeof (vfs_t), KM_SLEEP);
mutex_init(&tmp_vfsp->vfs_mntpt_lock, NULL, MUTEX_DEFAULT, NULL);
if (mntopts != NULL) { if (mntopts != NULL) {
substring_t args[MAX_OPT_ARGS]; substring_t args[MAX_OPT_ARGS];

View File

@ -260,15 +260,6 @@ update_pages(znode_t *zp, int64_t start, int len, objset_t *os)
} else { } else {
ClearPageError(pp); ClearPageError(pp);
SetPageUptodate(pp); SetPageUptodate(pp);
if (!PagePrivate(pp)) {
/*
* Set private bit so page migration
* will wait for us to finish writeback
* before calling migrate_folio().
*/
SetPagePrivate(pp);
get_page(pp);
}
if (mapping_writably_mapped(mp)) if (mapping_writably_mapped(mp))
flush_dcache_page(pp); flush_dcache_page(pp);
@ -4090,14 +4081,6 @@ zfs_fillpage(struct inode *ip, struct page *pp)
} else { } else {
ClearPageError(pp); ClearPageError(pp);
SetPageUptodate(pp); SetPageUptodate(pp);
if (!PagePrivate(pp)) {
/*
* Set private bit so page migration will wait for us to
* finish writeback before calling migrate_folio().
*/
SetPagePrivate(pp);
get_page(pp);
}
} }
return (error); return (error);

View File

@ -1577,14 +1577,6 @@ zfs_zero_partial_page(znode_t *zp, uint64_t start, uint64_t len)
mark_page_accessed(pp); mark_page_accessed(pp);
SetPageUptodate(pp); SetPageUptodate(pp);
ClearPageError(pp); ClearPageError(pp);
if (!PagePrivate(pp)) {
/*
* Set private bit so page migration will wait for us to
* finish writeback before calling migrate_folio().
*/
SetPagePrivate(pp);
get_page(pp);
}
unlock_page(pp); unlock_page(pp);
put_page(pp); put_page(pp);
} }

View File

@ -28,6 +28,7 @@
#include <linux/compat.h> #include <linux/compat.h>
#endif #endif
#include <linux/fs.h> #include <linux/fs.h>
#include <linux/migrate.h>
#include <sys/file.h> #include <sys/file.h>
#include <sys/dmu_objset.h> #include <sys/dmu_objset.h>
#include <sys/zfs_znode.h> #include <sys/zfs_znode.h>
@ -607,42 +608,6 @@ zpl_writepage(struct page *pp, struct writeback_control *wbc)
return (zpl_putpage(pp, wbc, &for_sync)); return (zpl_putpage(pp, wbc, &for_sync));
} }
static int
zpl_releasepage(struct page *pp, gfp_t gfp)
{
if (PagePrivate(pp)) {
ClearPagePrivate(pp);
put_page(pp);
}
return (1);
}
#ifdef HAVE_VFS_RELEASE_FOLIO
static bool
zpl_release_folio(struct folio *folio, gfp_t gfp)
{
return (zpl_releasepage(&folio->page, gfp));
}
#endif
#ifdef HAVE_VFS_INVALIDATE_FOLIO
static void
zpl_invalidate_folio(struct folio *folio, size_t offset, size_t len)
{
if ((offset == 0) && (len == PAGE_SIZE)) {
zpl_releasepage(&folio->page, 0);
}
}
#else
static void
zpl_invalidatepage(struct page *pp, unsigned int offset, unsigned int len)
{
if ((offset == 0) && (len == PAGE_SIZE)) {
zpl_releasepage(pp, 0);
}
}
#endif
/* /*
* The flag combination which matches the behavior of zfs_space() is * The flag combination which matches the behavior of zfs_space() is
* FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE. The FALLOC_FL_PUNCH_HOLE * FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE. The FALLOC_FL_PUNCH_HOLE
@ -1126,15 +1091,10 @@ const struct address_space_operations zpl_address_space_operations = {
#ifdef HAVE_VFS_FILEMAP_DIRTY_FOLIO #ifdef HAVE_VFS_FILEMAP_DIRTY_FOLIO
.dirty_folio = filemap_dirty_folio, .dirty_folio = filemap_dirty_folio,
#endif #endif
#ifdef HAVE_VFS_RELEASE_FOLIO #ifdef HAVE_VFS_MIGRATE_FOLIO
.release_folio = zpl_release_folio, .migrate_folio = migrate_folio,
#else #else
.releasepage = zpl_releasepage, .migratepage = migrate_page,
#endif
#ifdef HAVE_VFS_INVALIDATE_FOLIO
.invalidate_folio = zpl_invalidate_folio,
#else
.invalidatepage = zpl_invalidatepage,
#endif #endif
}; };

View File

@ -206,6 +206,7 @@ _VALSTR_BITFIELD_IMPL(zio_flag,
{ '.', "PR", "PROBE" }, { '.', "PR", "PROBE" },
{ '.', "TH", "TRYHARD" }, { '.', "TH", "TRYHARD" },
{ '.', "OP", "OPTIONAL" }, { '.', "OP", "OPTIONAL" },
{ '.', "RD", "DIO_READ" },
{ '.', "DQ", "DONT_QUEUE" }, { '.', "DQ", "DONT_QUEUE" },
{ '.', "DP", "DONT_PROPAGATE" }, { '.', "DP", "DONT_PROPAGATE" },
{ '.', "BY", "IO_BYPASS" }, { '.', "BY", "IO_BYPASS" },

View File

@ -182,6 +182,7 @@ static void dbuf_sync_leaf_verify_bonus_dnode(dbuf_dirty_record_t *dr);
* Global data structures and functions for the dbuf cache. * Global data structures and functions for the dbuf cache.
*/ */
static kmem_cache_t *dbuf_kmem_cache; static kmem_cache_t *dbuf_kmem_cache;
kmem_cache_t *dbuf_dirty_kmem_cache;
static taskq_t *dbu_evict_taskq; static taskq_t *dbu_evict_taskq;
static kthread_t *dbuf_cache_evict_thread; static kthread_t *dbuf_cache_evict_thread;
@ -966,6 +967,8 @@ dbuf_init(void)
dbuf_kmem_cache = kmem_cache_create("dmu_buf_impl_t", dbuf_kmem_cache = kmem_cache_create("dmu_buf_impl_t",
sizeof (dmu_buf_impl_t), sizeof (dmu_buf_impl_t),
0, dbuf_cons, dbuf_dest, NULL, NULL, NULL, 0); 0, dbuf_cons, dbuf_dest, NULL, NULL, NULL, 0);
dbuf_dirty_kmem_cache = kmem_cache_create("dbuf_dirty_record_t",
sizeof (dbuf_dirty_record_t), 0, NULL, NULL, NULL, NULL, NULL, 0);
for (int i = 0; i < hmsize; i++) for (int i = 0; i < hmsize; i++)
mutex_init(&h->hash_mutexes[i], NULL, MUTEX_NOLOCKDEP, NULL); mutex_init(&h->hash_mutexes[i], NULL, MUTEX_NOLOCKDEP, NULL);
@ -1041,6 +1044,7 @@ dbuf_fini(void)
sizeof (kmutex_t)); sizeof (kmutex_t));
kmem_cache_destroy(dbuf_kmem_cache); kmem_cache_destroy(dbuf_kmem_cache);
kmem_cache_destroy(dbuf_dirty_kmem_cache);
taskq_destroy(dbu_evict_taskq); taskq_destroy(dbu_evict_taskq);
mutex_enter(&dbuf_evict_lock); mutex_enter(&dbuf_evict_lock);
@ -2343,7 +2347,8 @@ dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx)
* to make a copy of it so that the changes we make in this * to make a copy of it so that the changes we make in this
* transaction group won't leak out when we sync the older txg. * transaction group won't leak out when we sync the older txg.
*/ */
dr = kmem_zalloc(sizeof (dbuf_dirty_record_t), KM_SLEEP); dr = kmem_cache_alloc(dbuf_dirty_kmem_cache, KM_SLEEP);
memset(dr, 0, sizeof (*dr));
list_link_init(&dr->dr_dirty_node); list_link_init(&dr->dr_dirty_node);
list_link_init(&dr->dr_dbuf_node); list_link_init(&dr->dr_dbuf_node);
dr->dr_dnode = dn; dr->dr_dnode = dn;
@ -2526,7 +2531,7 @@ dbuf_undirty_bonus(dbuf_dirty_record_t *dr)
mutex_destroy(&dr->dt.di.dr_mtx); mutex_destroy(&dr->dt.di.dr_mtx);
list_destroy(&dr->dt.di.dr_children); list_destroy(&dr->dt.di.dr_children);
} }
kmem_free(dr, sizeof (dbuf_dirty_record_t)); kmem_cache_free(dbuf_dirty_kmem_cache, dr);
ASSERT3U(db->db_dirtycnt, >, 0); ASSERT3U(db->db_dirtycnt, >, 0);
db->db_dirtycnt -= 1; db->db_dirtycnt -= 1;
} }
@ -2616,7 +2621,7 @@ dbuf_undirty(dmu_buf_impl_t *db, dmu_tx_t *tx)
} }
} }
kmem_free(dr, sizeof (dbuf_dirty_record_t)); kmem_cache_free(dbuf_dirty_kmem_cache, dr);
ASSERT(db->db_dirtycnt > 0); ASSERT(db->db_dirtycnt > 0);
db->db_dirtycnt -= 1; db->db_dirtycnt -= 1;
@ -2941,7 +2946,7 @@ dmu_buf_set_crypt_params(dmu_buf_t *db_fake, boolean_t byteorder,
* (see dbuf_sync_dnode_leaf_crypt()). * (see dbuf_sync_dnode_leaf_crypt()).
*/ */
ASSERT3U(db->db.db_object, ==, DMU_META_DNODE_OBJECT); ASSERT3U(db->db.db_object, ==, DMU_META_DNODE_OBJECT);
ASSERT3U(db->db_level, ==, 0); ASSERT0(db->db_level);
ASSERT(db->db_objset->os_raw_receive); ASSERT(db->db_objset->os_raw_receive);
dmu_buf_will_dirty_impl(db_fake, dmu_buf_will_dirty_impl(db_fake,
@ -2950,6 +2955,7 @@ dmu_buf_set_crypt_params(dmu_buf_t *db_fake, boolean_t byteorder,
dr = dbuf_find_dirty_eq(db, tx->tx_txg); dr = dbuf_find_dirty_eq(db, tx->tx_txg);
ASSERT3P(dr, !=, NULL); ASSERT3P(dr, !=, NULL);
ASSERT3U(dr->dt.dl.dr_override_state, ==, DR_NOT_OVERRIDDEN);
dr->dt.dl.dr_has_raw_params = B_TRUE; dr->dt.dl.dr_has_raw_params = B_TRUE;
dr->dt.dl.dr_byteorder = byteorder; dr->dt.dl.dr_byteorder = byteorder;
@ -2964,10 +2970,14 @@ dbuf_override_impl(dmu_buf_impl_t *db, const blkptr_t *bp, dmu_tx_t *tx)
struct dirty_leaf *dl; struct dirty_leaf *dl;
dbuf_dirty_record_t *dr; dbuf_dirty_record_t *dr;
ASSERT3U(db->db.db_object, !=, DMU_META_DNODE_OBJECT);
ASSERT0(db->db_level);
dr = list_head(&db->db_dirty_records); dr = list_head(&db->db_dirty_records);
ASSERT3P(dr, !=, NULL); ASSERT3P(dr, !=, NULL);
ASSERT3U(dr->dr_txg, ==, tx->tx_txg); ASSERT3U(dr->dr_txg, ==, tx->tx_txg);
dl = &dr->dt.dl; dl = &dr->dt.dl;
ASSERT0(dl->dr_has_raw_params);
dl->dr_overridden_by = *bp; dl->dr_overridden_by = *bp;
dl->dr_override_state = DR_OVERRIDDEN; dl->dr_override_state = DR_OVERRIDDEN;
BP_SET_LOGICAL_BIRTH(&dl->dr_overridden_by, dr->dr_txg); BP_SET_LOGICAL_BIRTH(&dl->dr_overridden_by, dr->dr_txg);
@ -3040,6 +3050,7 @@ dmu_buf_write_embedded(dmu_buf_t *dbuf, void *data,
ASSERT3P(dr, !=, NULL); ASSERT3P(dr, !=, NULL);
ASSERT3U(dr->dr_txg, ==, tx->tx_txg); ASSERT3U(dr->dr_txg, ==, tx->tx_txg);
dl = &dr->dt.dl; dl = &dr->dt.dl;
ASSERT0(dl->dr_has_raw_params);
encode_embedded_bp_compressed(&dl->dr_overridden_by, encode_embedded_bp_compressed(&dl->dr_overridden_by,
data, comp, uncompressed_size, compressed_size); data, comp, uncompressed_size, compressed_size);
BPE_SET_ETYPE(&dl->dr_overridden_by, etype); BPE_SET_ETYPE(&dl->dr_overridden_by, etype);
@ -5083,7 +5094,7 @@ dbuf_write_done(zio_t *zio, arc_buf_t *buf, void *vdb)
dsl_pool_undirty_space(dmu_objset_pool(os), dr->dr_accounted, dsl_pool_undirty_space(dmu_objset_pool(os), dr->dr_accounted,
zio->io_txg); zio->io_txg);
kmem_free(dr, sizeof (dbuf_dirty_record_t)); kmem_cache_free(dbuf_dirty_kmem_cache, dr);
} }
static void static void

View File

@ -1895,6 +1895,7 @@ dmu_sync_done(zio_t *zio, arc_buf_t *buf, void *varg)
mutex_enter(&db->db_mtx); mutex_enter(&db->db_mtx);
ASSERT(dr->dt.dl.dr_override_state == DR_IN_DMU_SYNC); ASSERT(dr->dt.dl.dr_override_state == DR_IN_DMU_SYNC);
if (zio->io_error == 0) { if (zio->io_error == 0) {
ASSERT0(dr->dt.dl.dr_has_raw_params);
dr->dt.dl.dr_nopwrite = !!(zio->io_flags & ZIO_FLAG_NOPWRITE); dr->dt.dl.dr_nopwrite = !!(zio->io_flags & ZIO_FLAG_NOPWRITE);
if (dr->dt.dl.dr_nopwrite) { if (dr->dt.dl.dr_nopwrite) {
blkptr_t *bp = zio->io_bp; blkptr_t *bp = zio->io_bp;
@ -2190,6 +2191,7 @@ dmu_sync(zio_t *pio, uint64_t txg, dmu_sync_cb_t *done, zgd_t *zgd)
return (SET_ERROR(EALREADY)); return (SET_ERROR(EALREADY));
} }
ASSERT0(dr->dt.dl.dr_has_raw_params);
ASSERT(dr->dt.dl.dr_override_state == DR_NOT_OVERRIDDEN); ASSERT(dr->dt.dl.dr_override_state == DR_NOT_OVERRIDDEN);
dr->dt.dl.dr_override_state = DR_IN_DMU_SYNC; dr->dt.dl.dr_override_state = DR_IN_DMU_SYNC;
mutex_exit(&db->db_mtx); mutex_exit(&db->db_mtx);
@ -2657,6 +2659,7 @@ dmu_brt_clone(objset_t *os, uint64_t object, uint64_t offset, uint64_t length,
db = (dmu_buf_impl_t *)dbuf; db = (dmu_buf_impl_t *)dbuf;
bp = &bps[i]; bp = &bps[i];
ASSERT3U(db->db.db_object, !=, DMU_META_DNODE_OBJECT);
ASSERT0(db->db_level); ASSERT0(db->db_level);
ASSERT(db->db_blkid != DMU_BONUS_BLKID); ASSERT(db->db_blkid != DMU_BONUS_BLKID);
ASSERT(db->db_blkid != DMU_SPILL_BLKID); ASSERT(db->db_blkid != DMU_SPILL_BLKID);
@ -2672,11 +2675,6 @@ dmu_brt_clone(objset_t *os, uint64_t object, uint64_t offset, uint64_t length,
db = (dmu_buf_impl_t *)dbuf; db = (dmu_buf_impl_t *)dbuf;
bp = &bps[i]; bp = &bps[i];
ASSERT0(db->db_level);
ASSERT(db->db_blkid != DMU_BONUS_BLKID);
ASSERT(db->db_blkid != DMU_SPILL_BLKID);
ASSERT(BP_IS_HOLE(bp) || dbuf->db_size == BP_GET_LSIZE(bp));
dmu_buf_will_clone_or_dio(dbuf, tx); dmu_buf_will_clone_or_dio(dbuf, tx);
mutex_enter(&db->db_mtx); mutex_enter(&db->db_mtx);
@ -2685,6 +2683,7 @@ dmu_brt_clone(objset_t *os, uint64_t object, uint64_t offset, uint64_t length,
VERIFY(dr != NULL); VERIFY(dr != NULL);
ASSERT3U(dr->dr_txg, ==, tx->tx_txg); ASSERT3U(dr->dr_txg, ==, tx->tx_txg);
dl = &dr->dt.dl; dl = &dr->dt.dl;
ASSERT0(dl->dr_has_raw_params);
dl->dr_overridden_by = *bp; dl->dr_overridden_by = *bp;
if (!BP_IS_HOLE(bp) || BP_GET_LOGICAL_BIRTH(bp) != 0) { if (!BP_IS_HOLE(bp) || BP_GET_LOGICAL_BIRTH(bp) != 0) {
if (!BP_IS_EMBEDDED(bp)) { if (!BP_IS_EMBEDDED(bp)) {

View File

@ -180,6 +180,7 @@ dmu_write_direct(zio_t *pio, dmu_buf_impl_t *db, abd_t *data, dmu_tx_t *tx)
if (list_next(&db->db_dirty_records, dr_head) != NULL) if (list_next(&db->db_dirty_records, dr_head) != NULL)
zp.zp_nopwrite = B_FALSE; zp.zp_nopwrite = B_FALSE;
ASSERT0(dr_head->dt.dl.dr_has_raw_params);
ASSERT3S(dr_head->dt.dl.dr_override_state, ==, DR_NOT_OVERRIDDEN); ASSERT3S(dr_head->dt.dl.dr_override_state, ==, DR_NOT_OVERRIDDEN);
dr_head->dt.dl.dr_override_state = DR_IN_DMU_SYNC; dr_head->dt.dl.dr_override_state = DR_IN_DMU_SYNC;
@ -330,7 +331,7 @@ dmu_read_abd(dnode_t *dn, uint64_t offset, uint64_t size,
*/ */
zio_t *cio = zio_read(rio, spa, bp, mbuf, db->db.db_size, zio_t *cio = zio_read(rio, spa, bp, mbuf, db->db.db_size,
dmu_read_abd_done, NULL, ZIO_PRIORITY_SYNC_READ, dmu_read_abd_done, NULL, ZIO_PRIORITY_SYNC_READ,
ZIO_FLAG_CANFAIL, &zb); ZIO_FLAG_CANFAIL | ZIO_FLAG_DIO_READ, &zb);
mutex_exit(&db->db_mtx); mutex_exit(&db->db_mtx);
zfs_racct_read(spa, db->db.db_size, 1, flags); zfs_racct_read(spa, db->db.db_size, 1, flags);

View File

@ -180,6 +180,8 @@ struct send_range {
*/ */
dnode_phys_t *dnp; dnode_phys_t *dnp;
blkptr_t bp; blkptr_t bp;
/* Piggyback unmodified spill block */
struct send_range *spill_range;
} object; } object;
struct srr { struct srr {
uint32_t datablksz; uint32_t datablksz;
@ -231,6 +233,8 @@ range_free(struct send_range *range)
size_t size = sizeof (dnode_phys_t) * size_t size = sizeof (dnode_phys_t) *
(range->sru.object.dnp->dn_extra_slots + 1); (range->sru.object.dnp->dn_extra_slots + 1);
kmem_free(range->sru.object.dnp, size); kmem_free(range->sru.object.dnp, size);
if (range->sru.object.spill_range)
range_free(range->sru.object.spill_range);
} else if (range->type == DATA) { } else if (range->type == DATA) {
mutex_enter(&range->sru.data.lock); mutex_enter(&range->sru.data.lock);
while (range->sru.data.io_outstanding) while (range->sru.data.io_outstanding)
@ -617,7 +621,7 @@ dump_spill(dmu_send_cookie_t *dscp, const blkptr_t *bp, uint64_t object,
drrs->drr_length = blksz; drrs->drr_length = blksz;
drrs->drr_toguid = dscp->dsc_toguid; drrs->drr_toguid = dscp->dsc_toguid;
/* See comment in dump_dnode() for full details */ /* See comment in piggyback_unmodified_spill() for full details */
if (zfs_send_unmodified_spill_blocks && if (zfs_send_unmodified_spill_blocks &&
(BP_GET_LOGICAL_BIRTH(bp) <= dscp->dsc_fromtxg)) { (BP_GET_LOGICAL_BIRTH(bp) <= dscp->dsc_fromtxg)) {
drrs->drr_flags |= DRR_SPILL_UNMODIFIED; drrs->drr_flags |= DRR_SPILL_UNMODIFIED;
@ -793,35 +797,6 @@ dump_dnode(dmu_send_cookie_t *dscp, const blkptr_t *bp, uint64_t object,
(dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT), DMU_OBJECT_END) != 0) (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT), DMU_OBJECT_END) != 0)
return (SET_ERROR(EINTR)); return (SET_ERROR(EINTR));
/*
* Send DRR_SPILL records for unmodified spill blocks. This is useful
* because changing certain attributes of the object (e.g. blocksize)
* can cause old versions of ZFS to incorrectly remove a spill block.
* Including these records in the stream forces an up to date version
* to always be written ensuring they're never lost. Current versions
* of the code which understand the DRR_FLAG_SPILL_BLOCK feature can
* ignore these unmodified spill blocks.
*/
if (zfs_send_unmodified_spill_blocks &&
(dnp->dn_flags & DNODE_FLAG_SPILL_BLKPTR) &&
(BP_GET_LOGICAL_BIRTH(DN_SPILL_BLKPTR(dnp)) <= dscp->dsc_fromtxg)) {
struct send_range record;
blkptr_t *bp = DN_SPILL_BLKPTR(dnp);
memset(&record, 0, sizeof (struct send_range));
record.type = DATA;
record.object = object;
record.eos_marker = B_FALSE;
record.start_blkid = DMU_SPILL_BLKID;
record.end_blkid = record.start_blkid + 1;
record.sru.data.bp = *bp;
record.sru.data.obj_type = dnp->dn_type;
record.sru.data.datablksz = BP_GET_LSIZE(bp);
if (do_dump(dscp, &record) != 0)
return (SET_ERROR(EINTR));
}
if (dscp->dsc_err != 0) if (dscp->dsc_err != 0)
return (SET_ERROR(EINTR)); return (SET_ERROR(EINTR));
@ -911,6 +886,9 @@ do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
case OBJECT: case OBJECT:
err = dump_dnode(dscp, &range->sru.object.bp, range->object, err = dump_dnode(dscp, &range->sru.object.bp, range->object,
range->sru.object.dnp); range->sru.object.dnp);
/* Dump piggybacked unmodified spill block */
if (!err && range->sru.object.spill_range)
err = do_dump(dscp, range->sru.object.spill_range);
return (err); return (err);
case OBJECT_RANGE: { case OBJECT_RANGE: {
ASSERT3U(range->start_blkid + 1, ==, range->end_blkid); ASSERT3U(range->start_blkid + 1, ==, range->end_blkid);
@ -939,34 +917,7 @@ do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
ASSERT3U(srdp->datablksz, ==, BP_GET_LSIZE(bp)); ASSERT3U(srdp->datablksz, ==, BP_GET_LSIZE(bp));
ASSERT3U(range->start_blkid + 1, ==, range->end_blkid); ASSERT3U(range->start_blkid + 1, ==, range->end_blkid);
if (BP_GET_TYPE(bp) == DMU_OT_SA) {
arc_flags_t aflags = ARC_FLAG_WAIT;
zio_flag_t zioflags = ZIO_FLAG_CANFAIL;
if (dscp->dsc_featureflags & DMU_BACKUP_FEATURE_RAW) {
ASSERT(BP_IS_PROTECTED(bp));
zioflags |= ZIO_FLAG_RAW;
}
zbookmark_phys_t zb;
ASSERT3U(range->start_blkid, ==, DMU_SPILL_BLKID);
zb.zb_objset = dmu_objset_id(dscp->dsc_os);
zb.zb_object = range->object;
zb.zb_level = 0;
zb.zb_blkid = range->start_blkid;
arc_buf_t *abuf = NULL;
if (!dscp->dsc_dso->dso_dryrun && arc_read(NULL, spa,
bp, arc_getbuf_func, &abuf, ZIO_PRIORITY_ASYNC_READ,
zioflags, &aflags, &zb) != 0)
return (SET_ERROR(EIO));
err = dump_spill(dscp, bp, zb.zb_object,
(abuf == NULL ? NULL : abuf->b_data));
if (abuf != NULL)
arc_buf_destroy(abuf, &abuf);
return (err);
}
if (send_do_embed(bp, dscp->dsc_featureflags)) { if (send_do_embed(bp, dscp->dsc_featureflags)) {
err = dump_write_embedded(dscp, range->object, err = dump_write_embedded(dscp, range->object,
range->start_blkid * srdp->datablksz, range->start_blkid * srdp->datablksz,
@ -975,8 +926,9 @@ do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
} }
ASSERT(range->object > dscp->dsc_resume_object || ASSERT(range->object > dscp->dsc_resume_object ||
(range->object == dscp->dsc_resume_object && (range->object == dscp->dsc_resume_object &&
(range->start_blkid == DMU_SPILL_BLKID ||
range->start_blkid * srdp->datablksz >= range->start_blkid * srdp->datablksz >=
dscp->dsc_resume_offset)); dscp->dsc_resume_offset)));
/* it's a level-0 block of a regular object */ /* it's a level-0 block of a regular object */
mutex_enter(&srdp->lock); mutex_enter(&srdp->lock);
@ -1006,8 +958,6 @@ do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
ASSERT(dscp->dsc_dso->dso_dryrun || ASSERT(dscp->dsc_dso->dso_dryrun ||
srdp->abuf != NULL || srdp->abd != NULL); srdp->abuf != NULL || srdp->abd != NULL);
uint64_t offset = range->start_blkid * srdp->datablksz;
char *data = NULL; char *data = NULL;
if (srdp->abd != NULL) { if (srdp->abd != NULL) {
data = abd_to_buf(srdp->abd); data = abd_to_buf(srdp->abd);
@ -1016,6 +966,14 @@ do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
data = srdp->abuf->b_data; data = srdp->abuf->b_data;
} }
if (BP_GET_TYPE(bp) == DMU_OT_SA) {
ASSERT3U(range->start_blkid, ==, DMU_SPILL_BLKID);
err = dump_spill(dscp, bp, range->object, data);
return (err);
}
uint64_t offset = range->start_blkid * srdp->datablksz;
/* /*
* If we have large blocks stored on disk but the send flags * If we have large blocks stored on disk but the send flags
* don't allow us to send large blocks, we split the data from * don't allow us to send large blocks, we split the data from
@ -1098,6 +1056,8 @@ range_alloc(enum type type, uint64_t object, uint64_t start_blkid,
range->sru.data.io_outstanding = 0; range->sru.data.io_outstanding = 0;
range->sru.data.io_err = 0; range->sru.data.io_err = 0;
range->sru.data.io_compressed = B_FALSE; range->sru.data.io_compressed = B_FALSE;
} else if (type == OBJECT) {
range->sru.object.spill_range = NULL;
} }
return (range); return (range);
} }
@ -1742,6 +1702,45 @@ enqueue_range(struct send_reader_thread_arg *srta, bqueue_t *q, dnode_t *dn,
bqueue_enqueue(q, range, datablksz); bqueue_enqueue(q, range, datablksz);
} }
/*
* Send DRR_SPILL records for unmodified spill blocks. This is useful
* because changing certain attributes of the object (e.g. blocksize)
* can cause old versions of ZFS to incorrectly remove a spill block.
* Including these records in the stream forces an up to date version
* to always be written ensuring they're never lost. Current versions
* of the code which understand the DRR_FLAG_SPILL_BLOCK feature can
* ignore these unmodified spill blocks.
*
* We piggyback the spill_range to dnode range instead of enqueueing it
* so send_range_after won't complain.
*/
static uint64_t
piggyback_unmodified_spill(struct send_reader_thread_arg *srta,
struct send_range *range)
{
ASSERT3U(range->type, ==, OBJECT);
dnode_phys_t *dnp = range->sru.object.dnp;
uint64_t fromtxg = srta->smta->to_arg->fromtxg;
if (!zfs_send_unmodified_spill_blocks ||
!(dnp->dn_flags & DNODE_FLAG_SPILL_BLKPTR) ||
!(BP_GET_LOGICAL_BIRTH(DN_SPILL_BLKPTR(dnp)) <= fromtxg))
return (0);
blkptr_t *bp = DN_SPILL_BLKPTR(dnp);
struct send_range *spill_range = range_alloc(DATA, range->object,
DMU_SPILL_BLKID, DMU_SPILL_BLKID+1, B_FALSE);
spill_range->sru.data.bp = *bp;
spill_range->sru.data.obj_type = dnp->dn_type;
spill_range->sru.data.datablksz = BP_GET_LSIZE(bp);
issue_data_read(srta, spill_range);
range->sru.object.spill_range = spill_range;
return (BP_GET_LSIZE(bp));
}
/* /*
* This thread is responsible for two things: First, it retrieves the correct * This thread is responsible for two things: First, it retrieves the correct
* blkptr in the to ds if we need to send the data because of something from * blkptr in the to ds if we need to send the data because of something from
@ -1773,17 +1772,20 @@ send_reader_thread(void *arg)
uint64_t last_obj_exists = B_TRUE; uint64_t last_obj_exists = B_TRUE;
while (!range->eos_marker && !srta->cancel && smta->error == 0 && while (!range->eos_marker && !srta->cancel && smta->error == 0 &&
err == 0) { err == 0) {
uint64_t spill = 0;
switch (range->type) { switch (range->type) {
case DATA: case DATA:
issue_data_read(srta, range); issue_data_read(srta, range);
bqueue_enqueue(outq, range, range->sru.data.datablksz); bqueue_enqueue(outq, range, range->sru.data.datablksz);
range = get_next_range_nofree(inq, range); range = get_next_range_nofree(inq, range);
break; break;
case HOLE:
case OBJECT: case OBJECT:
spill = piggyback_unmodified_spill(srta, range);
zfs_fallthrough;
case HOLE:
case OBJECT_RANGE: case OBJECT_RANGE:
case REDACT: // Redacted blocks must exist case REDACT: // Redacted blocks must exist
bqueue_enqueue(outq, range, sizeof (*range)); bqueue_enqueue(outq, range, sizeof (*range) + spill);
range = get_next_range_nofree(inq, range); range = get_next_range_nofree(inq, range);
break; break;
case PREVIOUSLY_REDACTED: { case PREVIOUSLY_REDACTED: {

View File

@ -1377,6 +1377,13 @@ dmu_tx_pool(dmu_tx_t *tx)
return (tx->tx_pool); return (tx->tx_pool);
} }
/*
* Register a callback to be executed at the end of a TXG.
*
* Note: This currently exists for outside consumers, specifically the ZFS OSD
* for Lustre. Please do not remove before checking that project. For examples
* on how to use this see `ztest_commit_callback`.
*/
void void
dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *func, void *data) dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *func, void *data)
{ {

View File

@ -566,7 +566,7 @@ dnode_undirty_dbufs(list_t *list)
mutex_destroy(&dr->dt.di.dr_mtx); mutex_destroy(&dr->dt.di.dr_mtx);
list_destroy(&dr->dt.di.dr_children); list_destroy(&dr->dt.di.dr_children);
} }
kmem_free(dr, sizeof (dbuf_dirty_record_t)); kmem_cache_free(dbuf_dirty_kmem_cache, dr);
dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg, B_FALSE); dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg, B_FALSE);
} }
} }

View File

@ -2987,6 +2987,7 @@ dsl_dataset_rename_snapshot_sync_impl(dsl_pool_t *dp,
dsl_dataset_t *ds; dsl_dataset_t *ds;
uint64_t val; uint64_t val;
dmu_tx_t *tx = ddrsa->ddrsa_tx; dmu_tx_t *tx = ddrsa->ddrsa_tx;
char *oldname, *newname;
int error; int error;
error = dsl_dataset_snap_lookup(hds, ddrsa->ddrsa_oldsnapname, &val); error = dsl_dataset_snap_lookup(hds, ddrsa->ddrsa_oldsnapname, &val);
@ -3011,8 +3012,14 @@ dsl_dataset_rename_snapshot_sync_impl(dsl_pool_t *dp,
VERIFY0(zap_add(dp->dp_meta_objset, VERIFY0(zap_add(dp->dp_meta_objset,
dsl_dataset_phys(hds)->ds_snapnames_zapobj, dsl_dataset_phys(hds)->ds_snapnames_zapobj,
ds->ds_snapname, 8, 1, &ds->ds_object, tx)); ds->ds_snapname, 8, 1, &ds->ds_object, tx));
zvol_rename_minors(dp->dp_spa, ddrsa->ddrsa_oldsnapname,
ddrsa->ddrsa_newsnapname, B_TRUE); oldname = kmem_asprintf("%s@%s", ddrsa->ddrsa_fsname,
ddrsa->ddrsa_oldsnapname);
newname = kmem_asprintf("%s@%s", ddrsa->ddrsa_fsname,
ddrsa->ddrsa_newsnapname);
zvol_rename_minors(dp->dp_spa, oldname, newname, B_TRUE);
kmem_strfree(oldname);
kmem_strfree(newname);
dsl_dataset_rele(ds, FTAG); dsl_dataset_rele(ds, FTAG);
return (0); return (0);

View File

@ -2205,10 +2205,11 @@ vdev_open(vdev_t *vd)
vd->vdev_max_asize = max_asize; vd->vdev_max_asize = max_asize;
/* /*
* If the vdev_ashift was not overridden at creation time, * If the vdev_ashift was not overridden at creation time
* (0) or the override value is impossible for the device,
* then set it the logical ashift and optimize the ashift. * then set it the logical ashift and optimize the ashift.
*/ */
if (vd->vdev_ashift == 0) { if (vd->vdev_ashift < vd->vdev_logical_ashift) {
vd->vdev_ashift = vd->vdev_logical_ashift; vd->vdev_ashift = vd->vdev_logical_ashift;
if (vd->vdev_logical_ashift > ASHIFT_MAX) { if (vd->vdev_logical_ashift > ASHIFT_MAX) {

View File

@ -1026,7 +1026,7 @@ vdev_draid_map_alloc_row(zio_t *zio, raidz_row_t **rrp, uint64_t io_offset,
ASSERT3U(vdc->vdc_nparity, >, 0); ASSERT3U(vdc->vdc_nparity, >, 0);
raidz_row_t *rr = vdev_raidz_row_alloc(groupwidth); raidz_row_t *rr = vdev_raidz_row_alloc(groupwidth, zio);
rr->rr_bigcols = bc; rr->rr_bigcols = bc;
rr->rr_firstdatacol = vdc->vdc_nparity; rr->rr_firstdatacol = vdc->vdc_nparity;
#ifdef ZFS_DEBUG #ifdef ZFS_DEBUG

View File

@ -34,6 +34,7 @@
#include <sys/zap.h> #include <sys/zap.h>
#include <sys/abd.h> #include <sys/abd.h>
#include <sys/zthr.h> #include <sys/zthr.h>
#include <sys/fm/fs/zfs.h>
/* /*
* An indirect vdev corresponds to a vdev that has been removed. Since * An indirect vdev corresponds to a vdev that has been removed. Since
@ -1832,6 +1833,19 @@ vdev_indirect_io_done(zio_t *zio)
zio_bad_cksum_t zbc; zio_bad_cksum_t zbc;
int ret = zio_checksum_error(zio, &zbc); int ret = zio_checksum_error(zio, &zbc);
/*
* Any Direct I/O read that has a checksum error must be treated as
* suspicious as the contents of the buffer could be getting
* manipulated while the I/O is taking place. The checksum verify error
* will be reported to the top-level VDEV.
*/
if (zio->io_flags & ZIO_FLAG_DIO_READ && ret == ECKSUM) {
zio->io_error = ret;
zio->io_flags |= ZIO_FLAG_DIO_CHKSUM_ERR;
zio_dio_chksum_verify_error_report(zio);
ret = 0;
}
if (ret == 0) { if (ret == 0) {
zio_checksum_verified(zio); zio_checksum_verified(zio);
return; return;

View File

@ -764,6 +764,27 @@ vdev_mirror_io_done(zio_t *zio)
ASSERT(zio->io_type == ZIO_TYPE_READ); ASSERT(zio->io_type == ZIO_TYPE_READ);
/*
* Any Direct I/O read that has a checksum error must be treated as
* suspicious as the contents of the buffer could be getting
* manipulated while the I/O is taking place. The checksum verify error
* will be reported to the top-level Mirror VDEV.
*
* There will be no attampt at reading any additional data copies. If
* the buffer is still being manipulated while attempting to read from
* another child, there exists a possibly that the checksum could be
* verified as valid. However, the buffer contents could again get
* manipulated after verifying the checksum. This would lead to bad data
* being written out during self healing.
*/
if ((zio->io_flags & ZIO_FLAG_DIO_READ) &&
(zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR)) {
zio_dio_chksum_verify_error_report(zio);
zio->io_error = vdev_mirror_worst_error(mm);
ASSERT3U(zio->io_error, ==, ECKSUM);
return;
}
/* /*
* If we don't have a good copy yet, keep trying other children. * If we don't have a good copy yet, keep trying other children.
*/ */

View File

@ -433,7 +433,7 @@ const zio_vsd_ops_t vdev_raidz_vsd_ops = {
}; };
raidz_row_t * raidz_row_t *
vdev_raidz_row_alloc(int cols) vdev_raidz_row_alloc(int cols, zio_t *zio)
{ {
raidz_row_t *rr = raidz_row_t *rr =
kmem_zalloc(offsetof(raidz_row_t, rr_col[cols]), KM_SLEEP); kmem_zalloc(offsetof(raidz_row_t, rr_col[cols]), KM_SLEEP);
@ -445,7 +445,17 @@ vdev_raidz_row_alloc(int cols)
raidz_col_t *rc = &rr->rr_col[c]; raidz_col_t *rc = &rr->rr_col[c];
rc->rc_shadow_devidx = INT_MAX; rc->rc_shadow_devidx = INT_MAX;
rc->rc_shadow_offset = UINT64_MAX; rc->rc_shadow_offset = UINT64_MAX;
rc->rc_allow_repair = 1; /*
* We can not allow self healing to take place for Direct I/O
* reads. There is nothing that stops the buffer contents from
* being manipulated while the I/O is in flight. It is possible
* that the checksum could be verified on the buffer and then
* the contents of that buffer are manipulated afterwards. This
* could lead to bad data being written out during self
* healing.
*/
if (!(zio->io_flags & ZIO_FLAG_DIO_READ))
rc->rc_allow_repair = 1;
} }
return (rr); return (rr);
} }
@ -619,7 +629,7 @@ vdev_raidz_map_alloc(zio_t *zio, uint64_t ashift, uint64_t dcols,
} }
ASSERT3U(acols, <=, scols); ASSERT3U(acols, <=, scols);
rr = vdev_raidz_row_alloc(scols); rr = vdev_raidz_row_alloc(scols, zio);
rm->rm_row[0] = rr; rm->rm_row[0] = rr;
rr->rr_cols = acols; rr->rr_cols = acols;
rr->rr_bigcols = bc; rr->rr_bigcols = bc;
@ -765,7 +775,7 @@ vdev_raidz_map_alloc_expanded(zio_t *zio,
for (uint64_t row = 0; row < rows; row++) { for (uint64_t row = 0; row < rows; row++) {
boolean_t row_use_scratch = B_FALSE; boolean_t row_use_scratch = B_FALSE;
raidz_row_t *rr = vdev_raidz_row_alloc(cols); raidz_row_t *rr = vdev_raidz_row_alloc(cols, zio);
rm->rm_row[row] = rr; rm->rm_row[row] = rr;
/* The starting RAIDZ (parent) vdev sector of the row. */ /* The starting RAIDZ (parent) vdev sector of the row. */
@ -2633,6 +2643,20 @@ raidz_checksum_verify(zio_t *zio)
raidz_map_t *rm = zio->io_vsd; raidz_map_t *rm = zio->io_vsd;
int ret = zio_checksum_error(zio, &zbc); int ret = zio_checksum_error(zio, &zbc);
/*
* Any Direct I/O read that has a checksum error must be treated as
* suspicious as the contents of the buffer could be getting
* manipulated while the I/O is taking place. The checksum verify error
* will be reported to the top-level RAIDZ VDEV.
*/
if (zio->io_flags & ZIO_FLAG_DIO_READ && ret == ECKSUM) {
zio->io_error = ret;
zio->io_flags |= ZIO_FLAG_DIO_CHKSUM_ERR;
zio_dio_chksum_verify_error_report(zio);
zio_checksum_verified(zio);
return (0);
}
if (ret != 0 && zbc.zbc_injected != 0) if (ret != 0 && zbc.zbc_injected != 0)
rm->rm_ecksuminjected = 1; rm->rm_ecksuminjected = 1;
@ -2776,6 +2800,11 @@ vdev_raidz_io_done_verified(zio_t *zio, raidz_row_t *rr)
(rc->rc_error == 0 || rc->rc_size == 0)) { (rc->rc_error == 0 || rc->rc_size == 0)) {
continue; continue;
} }
/*
* We do not allow self healing for Direct I/O reads.
* See comment in vdev_raid_row_alloc().
*/
ASSERT0(zio->io_flags & ZIO_FLAG_DIO_READ);
zfs_dbgmsg("zio=%px repairing c=%u devidx=%u " zfs_dbgmsg("zio=%px repairing c=%u devidx=%u "
"offset=%llx", "offset=%llx",
@ -2979,6 +3008,8 @@ raidz_reconstruct(zio_t *zio, int *ltgts, int ntgts, int nparity)
/* Check for success */ /* Check for success */
if (raidz_checksum_verify(zio) == 0) { if (raidz_checksum_verify(zio) == 0) {
if (zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR)
return (0);
/* Reconstruction succeeded - report errors */ /* Reconstruction succeeded - report errors */
for (int i = 0; i < rm->rm_nrows; i++) { for (int i = 0; i < rm->rm_nrows; i++) {
@ -3379,7 +3410,6 @@ vdev_raidz_io_done_unrecoverable(zio_t *zio)
zio_bad_cksum_t zbc; zio_bad_cksum_t zbc;
zbc.zbc_has_cksum = 0; zbc.zbc_has_cksum = 0;
zbc.zbc_injected = rm->rm_ecksuminjected; zbc.zbc_injected = rm->rm_ecksuminjected;
mutex_enter(&cvd->vdev_stat_lock); mutex_enter(&cvd->vdev_stat_lock);
cvd->vdev_stat.vs_checksum_errors++; cvd->vdev_stat.vs_checksum_errors++;
mutex_exit(&cvd->vdev_stat_lock); mutex_exit(&cvd->vdev_stat_lock);
@ -3444,6 +3474,9 @@ vdev_raidz_io_done(zio_t *zio)
} }
if (raidz_checksum_verify(zio) == 0) { if (raidz_checksum_verify(zio) == 0) {
if (zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR)
goto done;
for (int i = 0; i < rm->rm_nrows; i++) { for (int i = 0; i < rm->rm_nrows; i++) {
raidz_row_t *rr = rm->rm_row[i]; raidz_row_t *rr = rm->rm_row[i];
vdev_raidz_io_done_verified(zio, rr); vdev_raidz_io_done_verified(zio, rr);
@ -3538,6 +3571,7 @@ vdev_raidz_io_done(zio_t *zio)
} }
} }
} }
done:
if (rm->rm_lr != NULL) { if (rm->rm_lr != NULL) {
zfs_rangelock_exit(rm->rm_lr); zfs_rangelock_exit(rm->rm_lr);
rm->rm_lr = NULL; rm->rm_lr = NULL;

View File

@ -58,9 +58,9 @@
#include <sys/zfs_znode.h> #include <sys/zfs_znode.h>
/* /*
* Enable the experimental block cloning feature. If this setting is 0, then * Enables access to the block cloning feature. If this setting is 0, then even
* even if feature@block_cloning is enabled, attempts to clone blocks will act * if feature@block_cloning is enabled, using functions and system calls that
* as though the feature is disabled. * attempt to clone blocks will act as though the feature is disabled.
*/ */
int zfs_bclone_enabled = 1; int zfs_bclone_enabled = 1;
@ -303,6 +303,7 @@ zfs_read(struct znode *zp, zfs_uio_t *uio, int ioflag, cred_t *cr)
(void) cr; (void) cr;
int error = 0; int error = 0;
boolean_t frsync = B_FALSE; boolean_t frsync = B_FALSE;
boolean_t dio_checksum_failure = B_FALSE;
zfsvfs_t *zfsvfs = ZTOZSB(zp); zfsvfs_t *zfsvfs = ZTOZSB(zp);
if ((error = zfs_enter_verify_zp(zfsvfs, zp, FTAG)) != 0) if ((error = zfs_enter_verify_zp(zfsvfs, zp, FTAG)) != 0)
@ -424,8 +425,26 @@ zfs_read(struct znode *zp, zfs_uio_t *uio, int ioflag, cred_t *cr)
if (error) { if (error) {
/* convert checksum errors into IO errors */ /* convert checksum errors into IO errors */
if (error == ECKSUM) if (error == ECKSUM) {
error = SET_ERROR(EIO); /*
* If a Direct I/O read returned a checksum
* verify error, then it must be treated as
* suspicious. The contents of the buffer could
* have beeen manipulated while the I/O was in
* flight. In this case, the remainder of I/O
* request will just be reissued through the
* ARC.
*/
if (uio->uio_extflg & UIO_DIRECT) {
dio_checksum_failure = B_TRUE;
uio->uio_extflg &= ~UIO_DIRECT;
n += dio_remaining_resid;
dio_remaining_resid = 0;
continue;
} else {
error = SET_ERROR(EIO);
}
}
#if defined(__linux__) #if defined(__linux__)
/* /*
@ -472,6 +491,9 @@ zfs_read(struct znode *zp, zfs_uio_t *uio, int ioflag, cred_t *cr)
out: out:
zfs_rangelock_exit(lr); zfs_rangelock_exit(lr);
if (dio_checksum_failure == B_TRUE)
uio->uio_extflg |= UIO_DIRECT;
/* /*
* Cleanup for Direct I/O if requested. * Cleanup for Direct I/O if requested.
*/ */

View File

@ -804,11 +804,11 @@ zio_notify_parent(zio_t *pio, zio_t *zio, enum zio_wait_type wait,
pio->io_reexecute |= zio->io_reexecute; pio->io_reexecute |= zio->io_reexecute;
ASSERT3U(*countp, >, 0); ASSERT3U(*countp, >, 0);
if (zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR) { /*
ASSERT3U(*errorp, ==, EIO); * Propogate the Direct I/O checksum verify failure to the parent.
ASSERT3U(pio->io_child_type, ==, ZIO_CHILD_LOGICAL); */
if (zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR)
pio->io_flags |= ZIO_FLAG_DIO_CHKSUM_ERR; pio->io_flags |= ZIO_FLAG_DIO_CHKSUM_ERR;
}
(*countp)--; (*countp)--;
@ -1573,6 +1573,14 @@ zio_vdev_child_io(zio_t *pio, blkptr_t *bp, vdev_t *vd, uint64_t offset,
*/ */
pipeline |= ZIO_STAGE_CHECKSUM_VERIFY; pipeline |= ZIO_STAGE_CHECKSUM_VERIFY;
pio->io_pipeline &= ~ZIO_STAGE_CHECKSUM_VERIFY; pio->io_pipeline &= ~ZIO_STAGE_CHECKSUM_VERIFY;
/*
* We never allow the mirror VDEV to attempt reading from any
* additional data copies after the first Direct I/O checksum
* verify failure. This is to avoid bad data being written out
* through the mirror during self healing. See comment in
* vdev_mirror_io_done() for more details.
*/
ASSERT0(pio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR);
} else if (type == ZIO_TYPE_WRITE && } else if (type == ZIO_TYPE_WRITE &&
pio->io_prop.zp_direct_write == B_TRUE) { pio->io_prop.zp_direct_write == B_TRUE) {
/* /*
@ -4555,18 +4563,18 @@ zio_vdev_io_assess(zio_t *zio)
} }
/* /*
* If a Direct I/O write checksum verify error has occurred then this * If a Direct I/O operation has a checksum verify error then this I/O
* I/O should not attempt to be issued again. Instead the EIO will * should not attempt to be issued again.
* be returned.
*/ */
if (zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR) { if (zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR) {
ASSERT3U(zio->io_child_type, ==, ZIO_CHILD_LOGICAL); if (zio->io_type == ZIO_TYPE_WRITE) {
ASSERT3U(zio->io_error, ==, EIO); ASSERT3U(zio->io_child_type, ==, ZIO_CHILD_LOGICAL);
ASSERT3U(zio->io_error, ==, EIO);
}
zio->io_pipeline = ZIO_INTERLOCK_PIPELINE; zio->io_pipeline = ZIO_INTERLOCK_PIPELINE;
return (zio); return (zio);
} }
if (zio_injection_enabled && zio->io_error == 0) if (zio_injection_enabled && zio->io_error == 0)
zio->io_error = zio_handle_fault_injection(zio, EIO); zio->io_error = zio_handle_fault_injection(zio, EIO);
@ -4864,16 +4872,40 @@ zio_checksum_verify(zio_t *zio)
ASSERT3U(zio->io_prop.zp_checksum, ==, ZIO_CHECKSUM_LABEL); ASSERT3U(zio->io_prop.zp_checksum, ==, ZIO_CHECKSUM_LABEL);
} }
ASSERT0(zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR);
IMPLY(zio->io_flags & ZIO_FLAG_DIO_READ,
!(zio->io_flags & ZIO_FLAG_SPECULATIVE));
if ((error = zio_checksum_error(zio, &info)) != 0) { if ((error = zio_checksum_error(zio, &info)) != 0) {
zio->io_error = error; zio->io_error = error;
if (error == ECKSUM && if (error == ECKSUM &&
!(zio->io_flags & ZIO_FLAG_SPECULATIVE)) { !(zio->io_flags & ZIO_FLAG_SPECULATIVE)) {
mutex_enter(&zio->io_vd->vdev_stat_lock); if (zio->io_flags & ZIO_FLAG_DIO_READ) {
zio->io_vd->vdev_stat.vs_checksum_errors++; zio->io_flags |= ZIO_FLAG_DIO_CHKSUM_ERR;
mutex_exit(&zio->io_vd->vdev_stat_lock); zio_t *pio = zio_unique_parent(zio);
(void) zfs_ereport_start_checksum(zio->io_spa, /*
zio->io_vd, &zio->io_bookmark, zio, * Any Direct I/O read that has a checksum
zio->io_offset, zio->io_size, &info); * error must be treated as suspicous as the
* contents of the buffer could be getting
* manipulated while the I/O is taking place.
*
* The checksum verify error will only be
* reported here for disk and file VDEV's and
* will be reported on those that the failure
* occurred on. Other types of VDEV's report the
* verify failure in their own code paths.
*/
if (pio->io_child_type == ZIO_CHILD_LOGICAL) {
zio_dio_chksum_verify_error_report(zio);
}
} else {
mutex_enter(&zio->io_vd->vdev_stat_lock);
zio->io_vd->vdev_stat.vs_checksum_errors++;
mutex_exit(&zio->io_vd->vdev_stat_lock);
(void) zfs_ereport_start_checksum(zio->io_spa,
zio->io_vd, &zio->io_bookmark, zio,
zio->io_offset, zio->io_size, &info);
}
} }
} }
@ -4899,22 +4931,8 @@ zio_dio_checksum_verify(zio_t *zio)
if ((error = zio_checksum_error(zio, NULL)) != 0) { if ((error = zio_checksum_error(zio, NULL)) != 0) {
zio->io_error = error; zio->io_error = error;
if (error == ECKSUM) { if (error == ECKSUM) {
mutex_enter(&zio->io_vd->vdev_stat_lock);
zio->io_vd->vdev_stat.vs_dio_verify_errors++;
mutex_exit(&zio->io_vd->vdev_stat_lock);
zio->io_error = SET_ERROR(EIO);
zio->io_flags |= ZIO_FLAG_DIO_CHKSUM_ERR; zio->io_flags |= ZIO_FLAG_DIO_CHKSUM_ERR;
zio_dio_chksum_verify_error_report(zio);
/*
* The EIO error must be propagated up to the logical
* parent ZIO in zio_notify_parent() so it can be
* returned to dmu_write_abd().
*/
zio->io_flags &= ~ZIO_FLAG_DONT_PROPAGATE;
(void) zfs_ereport_post(FM_EREPORT_ZFS_DIO_VERIFY,
zio->io_spa, zio->io_vd, &zio->io_bookmark,
zio, 0);
} }
} }
@ -4932,6 +4950,39 @@ zio_checksum_verified(zio_t *zio)
zio->io_pipeline &= ~ZIO_STAGE_CHECKSUM_VERIFY; zio->io_pipeline &= ~ZIO_STAGE_CHECKSUM_VERIFY;
} }
/*
* Report Direct I/O checksum verify error and create ZED event.
*/
void
zio_dio_chksum_verify_error_report(zio_t *zio)
{
ASSERT(zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR);
if (zio->io_child_type == ZIO_CHILD_LOGICAL)
return;
mutex_enter(&zio->io_vd->vdev_stat_lock);
zio->io_vd->vdev_stat.vs_dio_verify_errors++;
mutex_exit(&zio->io_vd->vdev_stat_lock);
if (zio->io_type == ZIO_TYPE_WRITE) {
/*
* Convert checksum error for writes into EIO.
*/
zio->io_error = SET_ERROR(EIO);
/*
* Report dio_verify_wr ZED event.
*/
(void) zfs_ereport_post(FM_EREPORT_ZFS_DIO_VERIFY_WR,
zio->io_spa, zio->io_vd, &zio->io_bookmark, zio, 0);
} else {
/*
* Report dio_verify_rd ZED event.
*/
(void) zfs_ereport_post(FM_EREPORT_ZFS_DIO_VERIFY_RD,
zio->io_spa, zio->io_vd, &zio->io_bookmark, zio, 0);
}
}
/* /*
* ========================================================================== * ==========================================================================
* Error rank. Error are ranked in the order 0, ENXIO, ECKSUM, EIO, other. * Error rank. Error are ranked in the order 0, ENXIO, ECKSUM, EIO, other.
@ -5343,10 +5394,9 @@ zio_done(zio_t *zio)
if (zio->io_reexecute) { if (zio->io_reexecute) {
/* /*
* A Direct I/O write that has a checksum verify error should * A Direct I/O operation that has a checksum verify error
* not attempt to reexecute. Instead, EAGAIN should just be * should not attempt to reexecute. Instead, the error should
* propagated back up so the write can be attempt to be issued * just be propagated back.
* through the ARC.
*/ */
ASSERT(!(zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR)); ASSERT(!(zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR));

View File

@ -99,10 +99,10 @@ License: @ZFS_META_LICENSE@
URL: https://github.com/openzfs/zfs URL: https://github.com/openzfs/zfs
Source0: %{name}-%{version}.tar.gz Source0: %{name}-%{version}.tar.gz
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n)
Requires: libzpool5%{?_isa} = %{version}-%{release} Requires: libzpool6%{?_isa} = %{version}-%{release}
Requires: libnvpair3%{?_isa} = %{version}-%{release} Requires: libnvpair3%{?_isa} = %{version}-%{release}
Requires: libuutil3%{?_isa} = %{version}-%{release} Requires: libuutil3%{?_isa} = %{version}-%{release}
Requires: libzfs5%{?_isa} = %{version}-%{release} Requires: libzfs6%{?_isa} = %{version}-%{release}
Requires: %{name}-kmod = %{version} Requires: %{name}-kmod = %{version}
Provides: %{name}-kmod-common = %{version}-%{release} Provides: %{name}-kmod-common = %{version}-%{release}
Obsoletes: spl <= %{version} Obsoletes: spl <= %{version}
@ -150,21 +150,22 @@ Requires: sysstat
%description %description
This package contains the core ZFS command line utilities. This package contains the core ZFS command line utilities.
%package -n libzpool5 %package -n libzpool6
Summary: Native ZFS pool library for Linux Summary: Native ZFS pool library for Linux
Group: System Environment/Kernel Group: System Environment/Kernel
Obsoletes: libzpool2 <= %{version} Obsoletes: libzpool2 <= %{version}
Obsoletes: libzpool4 <= %{version} Obsoletes: libzpool4 <= %{version}
Obsoletes: libzpool5 <= %{version}
%description -n libzpool5 %description -n libzpool6
This package contains the zpool library, which provides support This package contains the zpool library, which provides support
for managing zpools for managing zpools
%if %{defined ldconfig_scriptlets} %if %{defined ldconfig_scriptlets}
%ldconfig_scriptlets -n libzpool5 %ldconfig_scriptlets -n libzpool6
%else %else
%post -n libzpool5 -p /sbin/ldconfig %post -n libzpool6 -p /sbin/ldconfig
%postun -n libzpool5 -p /sbin/ldconfig %postun -n libzpool6 -p /sbin/ldconfig
%endif %endif
%package -n libnvpair3 %package -n libnvpair3
@ -211,37 +212,39 @@ This library provides a variety of compatibility functions for OpenZFS:
# The library version is encoded in the package name. When updating the # The library version is encoded in the package name. When updating the
# version information it is important to add an obsoletes line below for # version information it is important to add an obsoletes line below for
# the previous version of the package. # the previous version of the package.
%package -n libzfs5 %package -n libzfs6
Summary: Native ZFS filesystem library for Linux Summary: Native ZFS filesystem library for Linux
Group: System Environment/Kernel Group: System Environment/Kernel
Obsoletes: libzfs2 <= %{version} Obsoletes: libzfs2 <= %{version}
Obsoletes: libzfs4 <= %{version} Obsoletes: libzfs4 <= %{version}
Obsoletes: libzfs5 <= %{version}
%description -n libzfs5 %description -n libzfs6
This package provides support for managing ZFS filesystems This package provides support for managing ZFS filesystems
%if %{defined ldconfig_scriptlets} %if %{defined ldconfig_scriptlets}
%ldconfig_scriptlets -n libzfs5 %ldconfig_scriptlets -n libzfs6
%else %else
%post -n libzfs5 -p /sbin/ldconfig %post -n libzfs6 -p /sbin/ldconfig
%postun -n libzfs5 -p /sbin/ldconfig %postun -n libzfs6 -p /sbin/ldconfig
%endif %endif
%package -n libzfs5-devel %package -n libzfs6-devel
Summary: Development headers Summary: Development headers
Group: System Environment/Kernel Group: System Environment/Kernel
Requires: libzfs5%{?_isa} = %{version}-%{release} Requires: libzfs6%{?_isa} = %{version}-%{release}
Requires: libzpool5%{?_isa} = %{version}-%{release} Requires: libzpool6%{?_isa} = %{version}-%{release}
Requires: libnvpair3%{?_isa} = %{version}-%{release} Requires: libnvpair3%{?_isa} = %{version}-%{release}
Requires: libuutil3%{?_isa} = %{version}-%{release} Requires: libuutil3%{?_isa} = %{version}-%{release}
Provides: libzpool5-devel = %{version}-%{release} Provides: libzpool6-devel = %{version}-%{release}
Provides: libnvpair3-devel = %{version}-%{release} Provides: libnvpair3-devel = %{version}-%{release}
Provides: libuutil3-devel = %{version}-%{release} Provides: libuutil3-devel = %{version}-%{release}
Obsoletes: zfs-devel <= %{version} Obsoletes: zfs-devel <= %{version}
Obsoletes: libzfs2-devel <= %{version} Obsoletes: libzfs2-devel <= %{version}
Obsoletes: libzfs4-devel <= %{version} Obsoletes: libzfs4-devel <= %{version}
Obsoletes: libzfs5-devel <= %{version}
%description -n libzfs5-devel %description -n libzfs6-devel
This package contains the header files needed for building additional This package contains the header files needed for building additional
applications against the ZFS libraries. applications against the ZFS libraries.
@ -290,7 +293,7 @@ Summary: Python %{python_version} wrapper for libzfs_core
Group: Development/Languages/Python Group: Development/Languages/Python
License: Apache-2.0 License: Apache-2.0
BuildArch: noarch BuildArch: noarch
Requires: libzfs5 = %{version}-%{release} Requires: libzfs6 = %{version}-%{release}
Requires: libnvpair3 = %{version}-%{release} Requires: libnvpair3 = %{version}-%{release}
Requires: libffi Requires: libffi
Requires: python%{__python_pkg_version} Requires: python%{__python_pkg_version}
@ -534,7 +537,7 @@ systemctl --system daemon-reload >/dev/null || true
%config(noreplace) %{_bashcompletiondir}/zfs %config(noreplace) %{_bashcompletiondir}/zfs
%config(noreplace) %{_bashcompletiondir}/zpool %config(noreplace) %{_bashcompletiondir}/zpool
%files -n libzpool5 %files -n libzpool6
%{_libdir}/libzpool.so.* %{_libdir}/libzpool.so.*
%files -n libnvpair3 %files -n libnvpair3
@ -543,10 +546,10 @@ systemctl --system daemon-reload >/dev/null || true
%files -n libuutil3 %files -n libuutil3
%{_libdir}/libuutil.so.* %{_libdir}/libuutil.so.*
%files -n libzfs5 %files -n libzfs6
%{_libdir}/libzfs*.so.* %{_libdir}/libzfs*.so.*
%files -n libzfs5-devel %files -n libzfs6-devel
%{_pkgconfigdir}/libzfs.pc %{_pkgconfigdir}/libzfs.pc
%{_pkgconfigdir}/libzfsbootenv.pc %{_pkgconfigdir}/libzfsbootenv.pc
%{_pkgconfigdir}/libzfs_core.pc %{_pkgconfigdir}/libzfs_core.pc

View File

@ -697,8 +697,8 @@ tags = ['functional', 'delegate']
tests = ['dio_aligned_block', 'dio_async_always', 'dio_async_fio_ioengines', tests = ['dio_aligned_block', 'dio_async_always', 'dio_async_fio_ioengines',
'dio_compression', 'dio_dedup', 'dio_encryption', 'dio_grow_block', 'dio_compression', 'dio_dedup', 'dio_encryption', 'dio_grow_block',
'dio_max_recordsize', 'dio_mixed', 'dio_mmap', 'dio_overwrites', 'dio_max_recordsize', 'dio_mixed', 'dio_mmap', 'dio_overwrites',
'dio_property', 'dio_random', 'dio_recordsize', 'dio_unaligned_block', 'dio_property', 'dio_random', 'dio_read_verify', 'dio_recordsize',
'dio_unaligned_filesize'] 'dio_unaligned_block', 'dio_unaligned_filesize']
tags = ['functional', 'direct'] tags = ['functional', 'direct']
[tests/functional/exec] [tests/functional/exec]

View File

@ -147,6 +147,12 @@ tags = ['functional', 'largest_pool']
tests = ['longname_001_pos', 'longname_002_pos', 'longname_003_pos'] tests = ['longname_001_pos', 'longname_002_pos', 'longname_003_pos']
tags = ['functional', 'longname'] tags = ['functional', 'longname']
[tests/functional/luks:Linux]
pre =
post =
tests = ['luks_sanity']
tags = ['functional', 'luks']
[tests/functional/mmap:Linux] [tests/functional/mmap:Linux]
tests = ['mmap_libaio_001_pos', 'mmap_sync_001_pos'] tests = ['mmap_libaio_001_pos', 'mmap_sync_001_pos']
tags = ['functional', 'mmap'] tags = ['functional', 'mmap']

View File

@ -213,6 +213,7 @@ maybe = {
'cli_root/zfs_unshare/zfs_unshare_006_pos': ['SKIP', na_reason], 'cli_root/zfs_unshare/zfs_unshare_006_pos': ['SKIP', na_reason],
'cli_root/zpool_add/zpool_add_004_pos': ['FAIL', known_reason], 'cli_root/zpool_add/zpool_add_004_pos': ['FAIL', known_reason],
'cli_root/zpool_destroy/zpool_destroy_001_pos': ['SKIP', 6145], 'cli_root/zpool_destroy/zpool_destroy_001_pos': ['SKIP', 6145],
'cli_root/zpool_import/import_devices_missing': ['FAIL', 16669],
'cli_root/zpool_import/zpool_import_missing_003_pos': ['SKIP', 6839], 'cli_root/zpool_import/zpool_import_missing_003_pos': ['SKIP', 6839],
'cli_root/zpool_initialize/zpool_initialize_import_export': 'cli_root/zpool_initialize/zpool_initialize_import_export':
['FAIL', 11948], ['FAIL', 11948],
@ -275,7 +276,8 @@ if sys.platform.startswith('freebsd'):
'pool_checkpoint/checkpoint_big_rewind': ['FAIL', 12622], 'pool_checkpoint/checkpoint_big_rewind': ['FAIL', 12622],
'pool_checkpoint/checkpoint_indirect': ['FAIL', 12623], 'pool_checkpoint/checkpoint_indirect': ['FAIL', 12623],
'resilver/resilver_restart_001': ['FAIL', known_reason], 'resilver/resilver_restart_001': ['FAIL', known_reason],
'snapshot/snapshot_002_pos': ['FAIL', '14831'], 'snapshot/snapshot_002_pos': ['FAIL', 14831],
'zvol/zvol_misc/zvol_misc_volmode': ['FAIL', 16668],
'bclone/bclone_crossfs_corner_cases': ['SKIP', cfr_cross_reason], 'bclone/bclone_crossfs_corner_cases': ['SKIP', cfr_cross_reason],
'bclone/bclone_crossfs_corner_cases_limited': 'bclone/bclone_crossfs_corner_cases_limited':
['SKIP', cfr_cross_reason], ['SKIP', cfr_cross_reason],

View File

@ -19,9 +19,9 @@
*/ */
#include <sys/ioctl.h> #include <sys/ioctl.h>
#include <sys/fcntl.h>
#include <linux/fs.h> #include <linux/fs.h>
#include <err.h> #include <err.h>
#include <fcntl.h>
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include <unistd.h> #include <unistd.h>

View File

@ -20,7 +20,7 @@
*/ */
/* /*
* Copyright (c) 2022 by Triad National Security, LLC. * Copyright (c) 2024 by Triad National Security, LLC.
*/ */
#include <sys/types.h> #include <sys/types.h>
@ -39,51 +39,59 @@
#define MIN(a, b) ((a) < (b)) ? (a) : (b) #define MIN(a, b) ((a) < (b)) ? (a) : (b)
#endif #endif
static char *outputfile = NULL; static char *filename = NULL;
static int blocksize = 131072; /* 128K */ static int blocksize = 131072; /* 128K */
static int wr_err_expected = 0; static int err_expected = 0;
static int read_op = 0;
static int write_op = 0;
static int numblocks = 100; static int numblocks = 100;
static char *execname = NULL; static char *execname = NULL;
static int print_usage = 0; static int print_usage = 0;
static int randompattern = 0; static int randompattern = 0;
static int ofd; static int fd;
char *buf = NULL; char *buf = NULL;
typedef struct { typedef struct {
int entire_file_written; int entire_file_completed;
} pthread_args_t; } pthread_args_t;
static void static void
usage(void) usage(void)
{ {
(void) fprintf(stderr, (void) fprintf(stderr,
"usage %s -o outputfile [-b blocksize] [-e wr_error_expected]\n" "usage %s -f filename [-b blocksize] [-e wr_error_expected]\n"
" [-n numblocks] [-p randpattern] [-h help]\n" " [-n numblocks] [-p randompattern] -r read_op \n"
" -w write_op [-h help]\n"
"\n" "\n"
"Testing whether checksum verify works correctly for O_DIRECT.\n" "Testing whether checksum verify works correctly for O_DIRECT.\n"
"when manipulating the contents of a userspace buffer.\n" "when manipulating the contents of a userspace buffer.\n"
"\n" "\n"
" outputfile: File to write to.\n" " filename: File to read or write to.\n"
" blocksize: Size of each block to write (must be at \n" " blocksize: Size of each block to write (must be at \n"
" least >= 512).\n" " least >= 512).\n"
" wr_err_expected: Whether pwrite() is expected to return EIO\n" " err_expected: Whether write() is expected to return EIO\n"
" while manipulating the contents of the\n" " while manipulating the contents of the\n"
" buffer.\n" " buffer.\n"
" numblocks: Total number of blocksized blocks to\n" " numblocks: Total number of blocksized blocks to\n"
" write.\n" " write.\n"
" randpattern: Fill data buffer with random data. Default\n" " read_op: Perform reads to the filename file while\n"
" behavior is to fill the buffer with the \n" " while manipulating the buffer contents\n"
" known data pattern (0xdeadbeef).\n" " write_op: Perform writes to the filename file while\n"
" manipulating the buffer contents\n"
" randompattern: Fill data buffer with random data for \n"
" writes. Default behavior is to fill the \n"
" buffer with known data pattern (0xdeadbeef)\n"
" help: Print usage information and exit.\n" " help: Print usage information and exit.\n"
"\n" "\n"
" Required parameters:\n" " Required parameters:\n"
" outputfile\n" " filename\n"
" read_op or write_op\n"
"\n" "\n"
" Default Values:\n" " Default Values:\n"
" blocksize -> 131072\n" " blocksize -> 131072\n"
" wr_err_expexted -> false\n" " wr_err_expexted -> false\n"
" numblocks -> 100\n" " numblocks -> 100\n"
" randpattern -> false\n", " randompattern -> false\n",
execname); execname);
(void) exit(1); (void) exit(1);
} }
@ -97,16 +105,21 @@ parse_options(int argc, char *argv[])
extern int optind, optopt; extern int optind, optopt;
execname = argv[0]; execname = argv[0];
while ((c = getopt(argc, argv, "b:ehn:o:p")) != -1) { while ((c = getopt(argc, argv, "b:ef:hn:rw")) != -1) {
switch (c) { switch (c) {
case 'b': case 'b':
blocksize = atoi(optarg); blocksize = atoi(optarg);
break; break;
case 'e': case 'e':
wr_err_expected = 1; err_expected = 1;
break; break;
case 'f':
filename = optarg;
break;
case 'h': case 'h':
print_usage = 1; print_usage = 1;
break; break;
@ -115,12 +128,12 @@ parse_options(int argc, char *argv[])
numblocks = atoi(optarg); numblocks = atoi(optarg);
break; break;
case 'o': case 'r':
outputfile = optarg; read_op = 1;
break; break;
case 'p': case 'w':
randompattern = 1; write_op = 1;
break; break;
case ':': case ':':
@ -141,7 +154,8 @@ parse_options(int argc, char *argv[])
if (errflag || print_usage == 1) if (errflag || print_usage == 1)
(void) usage(); (void) usage();
if (blocksize < 512 || outputfile == NULL || numblocks <= 0) { if (blocksize < 512 || filename == NULL || numblocks <= 0 ||
(read_op == 0 && write_op == 0)) {
(void) fprintf(stderr, (void) fprintf(stderr,
"Required paramater(s) missing or invalid.\n"); "Required paramater(s) missing or invalid.\n");
(void) usage(); (void) usage();
@ -160,10 +174,10 @@ write_thread(void *arg)
ssize_t wrote = 0; ssize_t wrote = 0;
pthread_args_t *args = (pthread_args_t *)arg; pthread_args_t *args = (pthread_args_t *)arg;
while (!args->entire_file_written) { while (!args->entire_file_completed) {
wrote = pwrite(ofd, buf, blocksize, offset); wrote = pwrite(fd, buf, blocksize, offset);
if (wrote != blocksize) { if (wrote != blocksize) {
if (wr_err_expected) if (err_expected)
assert(errno == EIO); assert(errno == EIO);
else else
exit(2); exit(2);
@ -173,7 +187,35 @@ write_thread(void *arg)
left -= blocksize; left -= blocksize;
if (left == 0) if (left == 0)
args->entire_file_written = 1; args->entire_file_completed = 1;
}
pthread_exit(NULL);
}
/*
* Read blocksize * numblocks to the file using O_DIRECT.
*/
static void *
read_thread(void *arg)
{
size_t offset = 0;
int total_data = blocksize * numblocks;
int left = total_data;
ssize_t read = 0;
pthread_args_t *args = (pthread_args_t *)arg;
while (!args->entire_file_completed) {
read = pread(fd, buf, blocksize, offset);
if (read != blocksize) {
exit(2);
}
offset = ((offset + blocksize) % total_data);
left -= blocksize;
if (left == 0)
args->entire_file_completed = 1;
} }
pthread_exit(NULL); pthread_exit(NULL);
@ -189,7 +231,7 @@ manipulate_buf_thread(void *arg)
char rand_char; char rand_char;
pthread_args_t *args = (pthread_args_t *)arg; pthread_args_t *args = (pthread_args_t *)arg;
while (!args->entire_file_written) { while (!args->entire_file_completed) {
rand_offset = (rand() % blocksize); rand_offset = (rand() % blocksize);
rand_char = (rand() % (126 - 33) + 33); rand_char = (rand() % (126 - 33) + 33);
buf[rand_offset] = rand_char; buf[rand_offset] = rand_char;
@ -202,9 +244,9 @@ int
main(int argc, char *argv[]) main(int argc, char *argv[])
{ {
const char *datapattern = "0xdeadbeef"; const char *datapattern = "0xdeadbeef";
int ofd_flags = O_WRONLY | O_CREAT | O_DIRECT; int fd_flags = O_DIRECT;
mode_t mode = S_IRUSR | S_IWUSR; mode_t mode = S_IRUSR | S_IWUSR;
pthread_t write_thr; pthread_t io_thr;
pthread_t manipul_thr; pthread_t manipul_thr;
int left = blocksize; int left = blocksize;
int offset = 0; int offset = 0;
@ -213,9 +255,15 @@ main(int argc, char *argv[])
parse_options(argc, argv); parse_options(argc, argv);
ofd = open(outputfile, ofd_flags, mode); if (write_op) {
if (ofd == -1) { fd_flags |= (O_WRONLY | O_CREAT);
(void) fprintf(stderr, "%s, %s\n", execname, outputfile); } else {
fd_flags |= O_RDONLY;
}
fd = open(filename, fd_flags, mode);
if (fd == -1) {
(void) fprintf(stderr, "%s, %s\n", execname, filename);
perror("open"); perror("open");
exit(2); exit(2);
} }
@ -228,24 +276,22 @@ main(int argc, char *argv[])
exit(2); exit(2);
} }
if (!randompattern) { if (write_op) {
/* Putting known data pattern in buffer */ if (!randompattern) {
while (left) { /* Putting known data pattern in buffer */
size_t amt = MIN(strlen(datapattern), left); while (left) {
memcpy(&buf[offset], datapattern, amt); size_t amt = MIN(strlen(datapattern), left);
offset += amt; memcpy(&buf[offset], datapattern, amt);
left -= amt; offset += amt;
left -= amt;
}
} else {
/* Putting random data in buffer */
for (int i = 0; i < blocksize; i++)
buf[i] = rand();
} }
} else {
/* Putting random data in buffer */
for (int i = 0; i < blocksize; i++)
buf[i] = rand();
} }
/*
* Writing using O_DIRECT while manipulating the buffer contents until
* the entire file is written.
*/
if ((rc = pthread_create(&manipul_thr, NULL, manipulate_buf_thread, if ((rc = pthread_create(&manipul_thr, NULL, manipulate_buf_thread,
&args))) { &args))) {
fprintf(stderr, "error: pthreads_create, manipul_thr, " fprintf(stderr, "error: pthreads_create, manipul_thr, "
@ -253,18 +299,34 @@ main(int argc, char *argv[])
exit(2); exit(2);
} }
if ((rc = pthread_create(&write_thr, NULL, write_thread, &args))) { if (write_op) {
fprintf(stderr, "error: pthreads_create, write_thr, " /*
"rc: %d\n", rc); * Writing using O_DIRECT while manipulating the buffer contents
exit(2); * until the entire file is written.
*/
if ((rc = pthread_create(&io_thr, NULL, write_thread, &args))) {
fprintf(stderr, "error: pthreads_create, io_thr, "
"rc: %d\n", rc);
exit(2);
}
} else {
/*
* Reading using O_DIRECT while manipulating the buffer contents
* until the entire file is read.
*/
if ((rc = pthread_create(&io_thr, NULL, read_thread, &args))) {
fprintf(stderr, "error: pthreads_create, io_thr, "
"rc: %d\n", rc);
exit(2);
}
} }
pthread_join(write_thr, NULL); pthread_join(io_thr, NULL);
pthread_join(manipul_thr, NULL); pthread_join(manipul_thr, NULL);
assert(args.entire_file_written == 1); assert(args.entire_file_completed == 1);
(void) close(ofd); (void) close(fd);
free(buf); free(buf);

View File

@ -129,6 +129,7 @@ export SYSTEM_FILES_LINUX='attr
blkdiscard blkdiscard
blockdev blockdev
chattr chattr
cryptsetup
exportfs exportfs
fallocate fallocate
flock flock

View File

@ -80,7 +80,8 @@ if BUILD_LINUX
nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/simd/simd_supported.ksh \ functional/simd/simd_supported.ksh \
functional/tmpfile/cleanup.ksh \ functional/tmpfile/cleanup.ksh \
functional/tmpfile/setup.ksh functional/tmpfile/setup.ksh \
functional/luks/luks_sanity.ksh
endif endif
nobase_dist_datadir_zfs_tests_tests_DATA += \ nobase_dist_datadir_zfs_tests_tests_DATA += \
@ -1477,6 +1478,7 @@ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/direct/dio_overwrites.ksh \ functional/direct/dio_overwrites.ksh \
functional/direct/dio_property.ksh \ functional/direct/dio_property.ksh \
functional/direct/dio_random.ksh \ functional/direct/dio_random.ksh \
functional/direct/dio_read_verify.ksh \
functional/direct/dio_recordsize.ksh \ functional/direct/dio_recordsize.ksh \
functional/direct/dio_unaligned_block.ksh \ functional/direct/dio_unaligned_block.ksh \
functional/direct/dio_unaligned_filesize.ksh \ functional/direct/dio_unaligned_filesize.ksh \

View File

@ -30,28 +30,39 @@
# STRATEGY: # STRATEGY:
# 1. Run different zfs/zpool -j commands and check for valid JSON # 1. Run different zfs/zpool -j commands and check for valid JSON
#
# -j and --json mean the same thing. Each command will be run twice, replacing
# JSONFLAG with the flag under test.
list=( list=(
"zpool status -j -g --json-int --json-flat-vdevs --json-pool-key-guid" "zpool status JSONFLAG -g --json-int --json-flat-vdevs --json-pool-key-guid"
"zpool status -p -j -g --json-int --json-flat-vdevs --json-pool-key-guid" "zpool status -p JSONFLAG -g --json-int --json-flat-vdevs --json-pool-key-guid"
"zpool status -j -c upath" "zpool status JSONFLAG -c upath"
"zpool status -j" "zpool status JSONFLAG"
"zpool status -j testpool1" "zpool status JSONFLAG testpool1"
"zpool list -j" "zpool list JSONFLAG"
"zpool list -j -g" "zpool list JSONFLAG -g"
"zpool list -j -o fragmentation" "zpool list JSONFLAG -o fragmentation"
"zpool get -j size" "zpool get JSONFLAG size"
"zpool get -j all" "zpool get JSONFLAG all"
"zpool version -j" "zpool version JSONFLAG"
"zfs list -j" "zfs list JSONFLAG"
"zfs list -j testpool1" "zfs list JSONFLAG testpool1"
"zfs get -j all" "zfs get JSONFLAG all"
"zfs get -j available" "zfs get JSONFLAG available"
"zfs mount -j" "zfs mount JSONFLAG"
"zfs version -j" "zfs version JSONFLAG"
) )
for cmd in "${list[@]}" ; do function run_json_tests
log_must eval "$cmd | jq > /dev/null" {
done typeset flag=$1
for cmd in "${list[@]}" ; do
cmd=${cmd//JSONFLAG/$flag}
log_must eval "$cmd | jq > /dev/null"
done
}
log_must run_json_tests -j
log_must run_json_tests --json
log_pass "zpool and zfs commands outputted valid JSON" log_pass "zpool and zfs commands outputted valid JSON"

View File

@ -148,9 +148,9 @@ done
# Foreach test create pool, add -n devices and check output. # Foreach test create pool, add -n devices and check output.
for (( i=0; i < ${#tests[@]}; i+=1 )); do for (( i=0; i < ${#tests[@]}; i+=1 )); do
typeset tree="${tests[$i].tree}" tree="${tests[$i].tree}"
typeset add="${tests[$i].add}" add="${tests[$i].add}"
typeset want="${tests[$i].want}" want="${tests[$i].want}"
log_must eval zpool create "$TESTPOOL" $tree log_must eval zpool create "$TESTPOOL" $tree
log_must poolexists "$TESTPOOL" log_must poolexists "$TESTPOOL"

View File

@ -124,8 +124,8 @@ done
# Foreach test create pool, add -n devices and check output. # Foreach test create pool, add -n devices and check output.
for (( i=0; i < ${#tests[@]}; i+=1 )); do for (( i=0; i < ${#tests[@]}; i+=1 )); do
typeset tree="${tests[$i].tree}" tree="${tests[$i].tree}"
typeset want="${tests[$i].want}" want="${tests[$i].want}"
typeset out="$(log_must eval "zpool create -n '$TESTPOOL' $tree" | \ typeset out="$(log_must eval "zpool create -n '$TESTPOOL' $tree" | \
sed /^SUCCESS/d)" sed /^SUCCESS/d)"

View File

@ -113,7 +113,7 @@ wait
parallel_time=$SECONDS parallel_time=$SECONDS
log_note "asyncronously imported 4 pools in $parallel_time seconds" log_note "asyncronously imported 4 pools in $parallel_time seconds"
log_must test $parallel_time -lt $(($sequential_time / 3)) log_must test $parallel_time -lt $(($sequential_time / 2))
# #
# export pools with import delay injectors # export pools with import delay injectors
@ -132,6 +132,6 @@ log_must zpool import -a -d $DEVICE_DIR -f
parallel_time=$SECONDS parallel_time=$SECONDS
log_note "asyncronously imported 4 pools in $parallel_time seconds" log_note "asyncronously imported 4 pools in $parallel_time seconds"
log_must test $parallel_time -lt $(($sequential_time / 3)) log_must test $parallel_time -lt $(($sequential_time / 2))
log_pass "Pool imports occur in parallel" log_pass "Pool imports occur in parallel"

Some files were not shown because too many files have changed in this diff Show More